Time to be blunt: the data sucks.

Good information is vital for decision making.  With respect to the ongoing pandemic, leaders and internet experts alike in the US are analyzing the numbers, doing projections, making plans, not to mention news outlets scaring the poop out of people with “death counters” and “breaking news” of new numbers.  There is just one problem: all of this is based on numbers that are so noisy they are almost useless. And there are powerful political forces in both “tribes” driving the data in to places it doesn’t want to go – something much easier to do with bad data.

Yes, often you have to work with data that is noisy. Experts do it all the time. But I can’t recall seeing so much analysis, decisions, and opinions rendered on vital subjects on data that is so crappy – often by people with little to no experience in dealing with noisy data.

Take a look at the latest mortality data charts.  First, here is the country level plot:

… and here is the US state level plot:

Notice the sharp breaks and jumps, especially in the last few days?  In most cases, those are reporting issues (weekends tend to be lower than weekdays due to recording delays, as one example) and/or changes in the criteria used to classify deaths.  And this should be the simplest and most reliable data we have, the whole population mortality rate.  Almost all deaths are recorded eventually in some way.  It is only based on a fixed denominator (total population) and deaths believed to be caused by COVID-19 (the numerator).  Of course, that’s fuzzy: as we saw in China in January, and now in the US, when the definition of a COVID19 related death is changed from “died and had tested positive” to “died, tested positive, and/or had symptoms that look like COVID19”, the numbers jump.  Neither number is great, but at least there is some professional opinion or real data (a test) that indicates the death might be COVID-19 related. The problem is if you make a change, you have to apply it retrospectively, and that isn’t always done consistently if at all.

That’s bad enough.  If you try to calculate the Case Fatality Rate (CFR), a number lots of folks are tossing around, it’s pointless.  The CFR depends on the definition of a “case”, and that depends on either testing or the individual being sick enough to seek care, and being classified as a probably COVID-19 patient based on symptoms.  But it’s worse than that: who is tested or classified is a state level and even local level decision.  New York has taken a very expansive definition; Florida has taken very restrictive definitions.  In each case, politics are probably a factor (and, from a scientific standpoint, both are probably wrong in different ways, but that is a complex discussion).

Given the well known problems with testing in the US, not to mention the fact that there are apparently lots of minimally or even asymptomatic cases, and different definitions of a “case,” the number of cases is simply unknown with a factor of 10, likely even worse than that.  In math terms, the numerator (fatal cases) is fuzzy, and the denominator (total cases) is essentially unknown.

The heart of the problem is that the rate of new cases – and how those cases are being created – is vital for decisions when to implement and lift mitigation measures like social distancing, business shutdowns, travel restrictions, etc. Since the data is so bad, those decisions are likely to be bad.  And, from what we are seeing, that is exactly what they are.

So in short, the US response to COVID-19 was and remains doomed to be more disruptive, with more deaths, and with potentially catastrophic economic repercussions, because the US system cannot seem to create objective, reliable, trusted data.   There is only one possible response:

