A lot of people are trying to prove a lot with very bad numbers for the current outbreak. This is a great time to reconsider statistics and their limits. There are two concepts that people do not consider when using statistics.

Intervening variables: When a person is exposed to the virus, we think a certain percent will get it and a certain percent will die from it. The intervening variable changes that direct relationship. The effect of COVID-19 is affected by age, health history, and duration of exposer (among other factors). Without considering these external factors, the predictions can be wildly unreliable.

Proportion: With COVID-19, people are watching infection rate and mortality rate. Both are implicitly or explicitly proportions. A proportion depends on two numbers – either can be incorrect. The numerator (top number) divided by the denominator. For example, deaths divided by cases.

In an early press conference, President Trump did a poor job explaining the problem of proportions. At that time US deaths divided by known cases placed the mortality rate at 6.5% — well above world average. By making the argument that people have contracted COVID-19 and don’t seek help (and worse go to work), he left people with an unsettling idea that the disease was much more widespread. By increasing the number of people infected (increasing the denominator), the proportion changes and the apparent mortality rate goes down. COVID-19 looks much less harmful as more people are included in the base.

The United State have had only XXXX cases of the virus compared to XXXX in Italy or XXXX in China. Part of the problem is that we are counting known cases – a measurement error issue. Without enough testing, we have less accurate information about the number of cases. With testing, we will definitely have more (KNOWN) cases. With more known cases, we will identify both more deaths attributable (numerator) and more active/cured cases (denominator).

Random error: It is possible that variance between people is greater than variance between groups. The fact that an older person may be more susceptible to the virus does not mean that some younger people can be profoundly affected. Random error is the unknown in any prediction. It can come from a poorly considered intervening variable, measurement error, or just the mood of the person in the predictive model. It is often ignored but must be feared and respected.

A business looking to have accurate, reliable information must actively consider where their numbers do not tell the whole story. By adjusting for error in the model, the predictions may be more complex but more accurate.