Coronavirus, gaining some perspective on the descriptive statistics reported by news media
Updated: Apr 8, 2020
The tidal wave of descriptive statistics reported by the media really fails to clarify the severity of coronavirus. About the coronavirus, there are two statistical measures we hear all the time: the aggregate number of cases worldwide, by country, and by region or state and the aggregate number of deaths worldwide, by country, and by region or state. These statistics should give rise to other, perhaps more informative, questions about the data. Here's one: do these aggregate, cumulative numbers give us the best picture of the effects of coronavirus? Not necessarily, there are other statistical data that should concern us.
Note that aggregate data accumulate, so that from one day to the next, of course, the number of confirmed cases and the number of deaths will increase. Only in instances where a confirmed case or death is disconfirmed at a later time because it is discovered that a person who was believed to be COVID-19 is discovered to have the flu, e.g., will the aggregate number either stay the same or decrease. Likewise for a death, it is discovered that a person died because of complications from diabetes. e.g. We should not be surprised by the cumulative data reported in the news when the number of confirmed cases and deaths increase.
The aggregate numbers, therefore, are not giving us a good barometric reading on the status of the coronavirus. Where should we be looking? Well, the first data we should be concerned with is the daily number of new cases globally and in our own country. We should compare the number of new cases in our own country with the number of new cases globally. That will provide us with a data point of comparison on the spread of the virus in our country with the rest of the world. Moreover, we should compare the difference in the number of new cases in our own country or region from one day to the next. If the number of new cases remains consistent or drops--which we haven't seen in the US--from one day to the next, then the measure of social distance we have instituted is working. If the number of new cases continue to increase despite the measure of social distance being followed largely by the population, then we need to return to how the virus is transmitted. It's likely, then, that medical practitioners will need to return to the question of transmission.
Let's think about the above in the context of the number of coronavirus cases and deaths in the U.S. The total number of confirmed Coronavirus cases surpassed 386,000 yesterday, 6 April 2020. Hearing that number, it is difficult to comprehend. Our immediate reaction is emotionally based. We are upset that so many people are infected, and we likely fear being infected ourselves because there are so many cases. The total population of the U.S. is approximately 365,000,000, so in the U.S., the total number of confirmed Coronavirus cases just passed 0.1% of the country's total population. That is 1/10 of 1% of the total population. A tragedy though it is, we would need to reach 3,650,000 to reach 1% of the total population.
Next, let's consider the number of deaths from Coronavirus. Let me reiterate that any death resulting from a Coronavirus, which is an entirely preventable virus (this is something that I may discuss later!), is terrible, especially for those who are affected directly by the death of a family member or friend. So, when I am talking about thinking through the data rationally, I am merely trying to get some perspective on the aggregate numbers that tend to invoke fear or anger in the population. There have been a little over 11,000 deaths in the U.S. as of 6 April 2020, which is 0.003% or 1/100th of 1% of the population. Very bad news, yes, but the embellishments by the 24hr news cycle fails to fit the data. All of this, for the moment, is relatively good news, and the sincere hope is that the number of deaths doesn't increase greatly.---The reality, however, is that the number of deaths will continue to rise.
When we drill down to the statistics for my current residence: Connecticut USA (a relatively good comparison for my residency in New Zealand), we find a stark contrast between the cumulative numbers of the U.S. and this particular region. After about four weeks of soft isolation, we have a good sense of how problematic Coronavirus for the state. In a population of 3,500,000, there have been 6900 confirmed cases of COVID-19, which is 0.2% of the total population of the state and 206 deaths or 0.006% of the total population of the state. Again, you hate to see anyone die from a preventable virus, but when we put things into perspective that number is relatively low. The concern is that we stop the measure of social distance, return to normal business, and the number of confirmed cases, and thus deaths, increase again.
Social distancing has been working in Connecticut, just as it has been in other parts of the world. The data reported by State health officials yesterday seem to indicate that we have plateaued, i.e., “flattened the curve.” Still, at 1k new cases/day that flattening comes at a great price. It would be great if we saw a reduction in the number of new cases. That would suggest the peak has been reached. Models, however, suggest that the peak will occur in Connecticut at the end of April and beginning of May.
Before we permit data to evoke fear and anger, let's really take pause and think about the data set. Do not let the data set control your emotions.
Following publication of this blog post, I caught Nate Silver's insightful article "Coronavirus Case Counts are Meaningless" of 4 April 2020 (here). He raises some excellent points. First, the data collected on Coronavirus are highly incomplete. For example, different countries collect data in different ways, and the data collected likely aren't telling us the whole story because of that fact. Fivethirtyeight.com reports:
[D]ata on tests and the number of reported cases is highly nonrandom. In many parts of the world today, health authorities are still trying to triage the situation with a limited number of tests available. [...] [I]f you're not accounting for testing patters, it can throw your conclusions entirely out of whack. You don't just run the risk of being a little bit wrong: Your analysis could be off by an order of magnitude. Or even worse, you might be led in the opposite direction of what is actually happening. A country where the case count is increasing because it's doing more testing, for instance, might actually be getting its epidemic under control.