A Guide to Understanding Pandemic Predictions
Epidemiologists offer eight tips for reading mathematical models
Co-authored by Ellie Murray
Mathematical modeling of the spread of SARS-Cov-2 and Covid-19 has received a lot of attention, for good and for bad. Modeling is a key tool used by scientists to predict outcomes when data are unavailable or difficult to collect, or when the time frame for understanding or taking action is short. Covid-19 is a prime example of a scenario where all of these are true. As a result, much of our preliminary understanding of the pandemic has stemmed from predictions made from mathematical models.
As mathematical modelers, we have been closely following these efforts to understand the pandemic with models. Our academic research involves developing detailed mathematical models of HIV/AIDS transmission and progression and understanding how these models can be used to learn about cause and effect relationships. What we have learned in the process of this work may help you better understand the applications, and limitations, of mathematical models for Covid-19.
Mathematical models are a tool for synthesizing what we know to be true, what we suspect may be true (based on our knowledge of similar diseases), and expert best guesses about key unknowns. As a result, no model is ever expected to be perfectly true. The accuracy of models can also vary greatly, depending on the level of expert knowledge used to build them. For example, early in the Covid-19 pandemic, graphs of unrestricted exponential growth were circulating widely on social media. These captured public attention and, in some cases, reasonably described the trajectory of pandemic spread in the short term, but were clearly wrong about long-term forecasts — unrestricted exponential growth implies cases would rapidly surpass the entire population of the planet.
Much of our preliminary understanding of the pandemic has stemmed from predictions made from mathematical models.
Don’t rely on models to tell you what you can easily see
In the early months of 2020, we did not need a model to tell us that a Covid-19 outbreak in the United States was going to be devastating. Simply watching what was happening in China and Italy was enough to show that if we didn’t change our behavior and start responding as a nation, our health care systems would be overwhelmed. Instead, models created early in the outbreak were key for determining how much our hospitals would be overburdened, as well as which areas of the country might expect to see an outbreak first. These models helped assess what steps to take to reduce the impact of the outbreak.
Model outputs are only as good as model inputs
When evaluating models to determine whether or not to take an action, it’s important to remember that all models rely on assumptions and knowledge about the world. Simplifying reality is really complicated. The ultimate goal of mathematical modeling is to find a way to represent real life with equations. Modelers must find ways to represent the messy, chaotic, and complicated nature of human behavior and the biology of transmission by making assumptions like how frequently people are coming into contact with one another, and how easily the virus can infect a person who is exposed to it.
Ask yourself “who,” “what,” “where,” and “when”
Even the best mathematical model cannot describe all of reality. When reviewing a model, ask yourself: Who is the model for? Where and when were the model inputs obtained? What is the specific goal of this model? For example, a model that uses data from Iceland to estimate the number of unrecognized infections (based on data obtained from widespread SARS-CoV-2 testing) may be very useful for understanding how much testing should be conducted in Canada or Finland, but less useful for understanding how much testing should be conducted in New York City or South Africa.
There are likely social factors, such as population density and time spent indoors, as well as environmental factors, such as temperature and humidity, that vary between locations. If these factors influence how SARS-CoV-2 transmits between individuals, then they will also influence how useful a model built in one context is when applied to a different context.
Be skeptical of a model that surprises the experts
Mathematical models are a way of summarizing expert knowledge on a topic to make numeric predictions about what might happen in the future if we do, or do not, take certain actions. As a result, model results should generally fit within the expected range of expert best guesses about the future. As an example, when asked to guess how many beans are in a jar, it’s very unlikely that any one individual will pick exactly the right answer, but often the average of all guesses will be very close to the truth.
Model results can also be highly sensitive to model inputs in predictable ways. For example, a recent Covid-19 model from Oxford University predicted that half the U.K. population was already infected with SARS-CoV-2, a result that surprised many. However, upon closer inspection of the model, it became clear that this result followed directly from one particular input value — the proportion of the population vulnerable to severe disease (likely to be hospitalized). The modelers had set this value to 1 in 1,000, and as a result, their model estimated the actual number of cases at 1,000 times the number of reported cases.
If a model returns very surprising results, this is a sign that something in the model might be going wrong. However, just because a model is surprising doesn’t mean that the results are wrong. Like anyone, experts can be swayed by their own preconceptions and fail to recognize the important consequences of their knowledge. For example, the Imperial College model estimated that, in the absence of any measures of control or any changes to behavior, over 80% of the population in Great Britain and the United States would be infected. Originally, the response to these model predictions was shock and disbelief, but most in the scientific community do not believe they are wrong and the model encouraged governments to take measures to alter the course of the epidemic.
Being wrong is a part of the process
Mathematical models are never perfect. If we understood everything there was to know about a system, there would be little need to model it. In addition, our knowledge of Covid-19 is evolving on a daily basis. Responsible modelers will recognize this uncertainty and incorporate it into their modeling process by conducting and reporting sensitivity analyses and regularly updating and revising their model structure and inputs to reflect new knowledge. Sensitivity analyses are a way of understanding how changes to the model inputs can change the model outputs. For example, with the Oxford model, an important sensitivity check would have been varying the proportion of infectious people who display symptoms from, say 100% to 0.0001%, and seeing how many Covid-19 cases were forecasted under each scenario. The most useful models will report a range of results under key input ranges.
Simplifying reality is really complicated.
Being right is also part of the process
When developing a mathematical model, a crucial step in the process is comparing the results of the model to some known data. If the model cannot replicate what we can observe in the real world, then we should not trust it to predict what we might see in the future or after we take some action. The process of comparing the model to the real world is called model validation. Typically, model outputs don’t match the real world on the first attempt, and modelers proceed to a step called model calibration. This step involves tweaking the model structure or input values until the model outputs do match real-world data. As with the model creation step, it’s important to think about who, where, when, and what were used as validation and calibration targets — that is, what do the real-world data that the model is matching actually tell us, and to whom do those data apply.
Models are warnings, not prophecies
Finally, mathematical models are used for understanding trends in disease transmission and patterns related to prevention and treatment. Be wary of any modeler that claims to predict the exact number of cases or deaths in a particular week or month. Models are generally expected to get the order of magnitude of their predictions right, but not the exact number.
Furthermore, because models tell us what could happen if we do or do not act in a certain way, it’s important to remember that if we take a different set of actions in the real world, we do not expect the model results to actually predict what will happen. Importantly, this does not imply that the model was wrong.
For example, if I tell you that it is going to rain tomorrow and that you will get wet if you do not take an umbrella with you on your walk, this does not become a false statement if you go for a walk with an umbrella and therefore do not get wet in the rain. Similarly, if a Covid-19 model predicts that the absence of any interventions to prevent SARS-CoV-2 spread could result in 1 million to 2 million deaths in the U.S., this does not become a false statement when we intervene with lockdown to reduce disease transmission and thus observe far fewer deaths.
Remember the tortoise and the hare
Among mathematical modelers, there is always a rush to get the first model out — to make predictions quickly and swiftly. This comes from the competitive nature of academia, the popularity of modeling in science today, and the utility of model predictions early in an epidemic. Models are the most useful when we don’t have good data, but need to take rapid action. Therefore, models can be maximally useful earliest in an outbreak situation. However, because of the lack of good data, early models generally rely on more assumptions and often use more inputs from similar diseases rather than from the disease of interest.
As a result, their predictions can have higher uncertainty than models which are made later in an epidemic. Like with the proverbial tortoise and the hare, a rapidly created model may be preferred in the short term, but a model built slowly and steadily by incorporating the best knowledge as it becomes available will generally be preferable in the long term. We should be careful not to choose a favorite model early on in an epidemic and dismiss all others.