Modeling is used to make predictions about future outcomes
First, it is essential to understand that if you have a "data set" - for example, information about the number of covid-19 patients in Stockholm at a given time - then you cannot say anything about how it will look in a week or two weeks if you are not willing to set up a statistical model of the spread of infection and make different assumptions about how the epidemic will develop. All statistical modeling is based on various assumptions. The covid-19 pandemic was a lot about planning for hospital care and community measures in the coming weeks but also trying to understand which preventive measures are appropriate.
The model can be simple, but we need assumptions to say something about the future. Regarding covid-19, we were interested in descriptively keeping track of the situation: how many people are infected, how many are treated in hospital and how many die, and so on. But, above all, we are interested in being able to handle the situation as we advance into the future. This is precisely why we need statistical modeling, to predict future developments. We are also interested in understanding how various preventive measures, for example, social distancing and closing parts of the school system, affect the spread of infection. This is to understand the appropriate steps and when various restrictions can be lifted.
The reproduction number – how much does an infection spread?
For infectious diseases, without measures, each infected person infects a number of new people, for covid-19, the so-called reproduction number R at the beginning of the epidemic was stated to be between 2 and 3 in the media. Therefore, a prerequisite for an epidemic to establish itself at all is that each infected person, on average, infects more than one person. But, of course, the reproduction number does not have to be a whole number, even if, in reality, it is the case that each person infects precisely zero, one, two, or more. How many people infects depends on many factors, such as contagiousness for that particular individual and what the social contact pattern looks like.
The reproduction rate also changes over time, partly due to various measures to limit infection, partly when a larger part of the population has become immune. This means that when we have achieved herd immunity, i.e. when a large percentage of the population has had the infection and is no longer susceptible, the epidemic subsides because the infection no longer has the same opportunities to spread. In addition, various measures such as social distancing and the closure of workplaces and schools also affect the reproduction rate so that it becomes lower.
Why was the first death chosen as the starting point?
Those who followed news broadcasts during the pandemic, might have heard about day zero, and also measured how quickly the number of cases doubled in different countries. The rate of doubling is of course linked to how many new people each infected person infects, but also how fast this happens. A clear starting point is also needed to develop the number of cases over time and to be able to compare countries.
There are probably more reasons why the day of the first death was chosen as the starting point. Still, one thing is sure is that it is a day that can be determined with a relatively high degree of certainty. That also suggests that the epidemic has reached a certain degree of maturity in the country. Models of how infections spread can be made as complicated as desired, but these are important basic concepts.
Statistical modeling in clinical studies
We all understand that statistical models are needed to predict the development of an epidemic and take appropriate measures. The fact is, however, that for almost everything we study, and where we want to draw conclusions beyond the actual data we observe, some form of model is needed. The absolute simplest statistical model we use is that we often assume that observations (effects) are normally distributed.
For example, suppose we conclude that a study shows that treatment with vitamin C shortens a cold by an average of two days. In that case, we intuitively understand that on an individual basis, treatment can provide both minor and significant shortening of the course of the disease but that a " typical" or 'average' patient can expect to recover two days faster if they receive treatment with vitamin C. We think this way because we know what mean and normal distribution are.
For understandable reasons, we cannot include an infinite number of patients in clinical trials. Sometimes we need thousands, but usually significantly fewer to get approval for a new drug. If we are to be very blunt, a clinical study is about evaluating the effect of a new treatment for the infinite number of patients who did not participate in the clinical study - we want to extrapolate the results from the clinical study in order to be able to say something meaningful about the effect of a new treatment. The clinical study serves as a "proxy" for us to be able to make a statement about the true effect of a new treatment - if it is given to all conceivable patients with the current diagnosis. In other words, statistical models are not tricks or cheat where we reshape reality, but simple tools to make reality understandable and to enable us to think beyond only a few patients for whom we have data, regardless of how an epidemic develops or how we shall interpret results from a clinical study.
The clinical study as a model of reality
For understandable reasons, we cannot include an infinite number of patients in clinical trials. Sometimes we need thousands, but usually significantly fewer to get approval for a new drug. If we are to be very blunt, a clinical study is about evaluating the effect of a new treatment for the infinite number of patients who did not participate in the clinical study - we want to extrapolate the results from the clinical study in order to be able to say something meaningful about the effect of a new treatment. The clinical study serves as a "proxy" for us to be able to make a statement about the true effect of a new treatment - if it is given to all conceivable patients with the current diagnosis. In other words, statistical models are not tricks or cheat where we reshape reality, but simple tools to make reality understandable and to enable us to think beyond only a few patients for whom we have data, regardless of how an epidemic develops or how we shall interpret results from a clinical study.
Statistical modeling is sometimes in disrepute - the impression is that you can prove anything with statistics. And indeed it is so; if you make completely wrong assumptions and choose a model that does not describe the data well, the road is open to misinterpretations and crazy conclusions. For statistics, as for all other complex disciplines, expert knowledge is required to do the actual work. When results and clarifications are available, the interpretation is often straightforward and simple.