Doubling times, growth rates, and forecasts

A lot of people who are playing with the numbers for COVID19 and coming up with huge death tolls, in the millions or even billions, are missing some key aspects of how infectious diseases and population growth works. Here is a bit more about the dark art of predicting how many people will fall ill from something like this.

The exponential growth phase of any predator (the SARS-COV-2 virus) moving into a new environment is limited by the food source in terms of both the raw supply and behavior of that food supply (the food supply, in this case, is us).  If you want to learn more about that, here is a nice article that describes how this works.  The bottom line is that the period of time of exponential growth for a virus is limited both in terms of the total population, immunity (either existing or developed) in that population, and changes in the behavior of that population (for humans, things like “social distancing”).  So the “curve” always ends up being “S” shaped following what is called a logistic function or logistic curve. At some point, the thing just runs out of food …

In modeling viral outbreaks, the simplest models just try to figure out the three parameters that describe that curve (the midpoint, the peak rate or shape, and the maximum).  In the early days of the outbreak, you can collect data such as mortality, and “fit” that data to the curve to try to estimate the ultimate variable of interest, usually the end total population mortality.  More advanced models simulate things like transportation networks,  interaction between people, infection rates, development of immunity, etc.  These kinds of models are really useful to figure out what is the most effective way of dealing with an outbreak.  But the neat thing is that these advanced models usually end up generating a logistic curve.  There are theoretical reasons why this works that I won’t bore you with here (sort of like how the central limit theorem and probability bell curves work).

If we look at the data as of this morning (30 March 2020), we can fit the various data sets to logistic functions and see what the future might hold, and how things are progressing in various locations.  One of my real pet peeves is when people put raw numbers on a graph that are not “normalized” for population.  For example, even though a US State is similar in geography and size to a European country, comparing New York (19.5 million people) and Italy (over 60 million) to Georgia (10 Million) directly doesn’t work unless you scale it for the population.  The most common way of doing that is in deaths per 10,000 people.  That way you can compare them more directly.  I’m not showing the “whole US” numbers, because the US is a very disparate place, with multiple “epicenters”.

Here I’m running a simple model on Italy and Spain, as they are far enough along to see how things are going, and comparing to US States.   Here’s today’s plot of the data (points), with several projections (lines).  As always, click to embiggen:

The solid black line is based on data from around the world as of the first week of March.  At that point, we had the China data, but didn’t really trust all of it.  We also had limited data from the Diamond Princess.  The solid light grey line is a line derived from the H3N2 outbreak in 2017, but assumes the day of maximum rate occurred 5 times sooner (in other words, the progression of the outbreak happened five times faster).  This line is interesting since it provides context in terms of the final outcome, but also to reinforce the fact that COVID19 is dangerous because it moves so fast.  Of course now we have nearly 20 days more data, and Italy and Spain are much further “down the curve.”  If we fit these lines, we end up with two additional estimates of our three parameters.  The end point for Italy would seem to be about 2.9 deaths per 10,000 people.  For Spain, it is on track to be higher, 3.1 per 10,000.  Spain may be a bit high, due to two separate “epicenters” of their outbreak, but let’s stick with what the data says as a boundary.  For reference, the end mortality rate for the unvaccinated population of H3N2 was 2.96 per 10,000.  Netherlands and France are along similar tracks.  Elsewhere, I think we can say that China and Iran are not really reasonable.  South Korea is a special case – fast action, prepared health care system.

As you can see from the US state points, we’ve got a variety of things going on.  Washington State, after being the initial epicenter, has done well in limiting the spread.  NOLA scared everyone but is now below the projections – but I suspect that is a reporting artifact and will “jump” back up to the rest of the pack.  NY and NJ are right on track.  I’m having a really hard time believing the Georgia reports.   I suspect munging.

So what does that mean for the US?  The US is a big place with weeks separating the exposure times across the country.  Some areas will be hit harder than others based on urbanization, how soon and how proactive the measures were taken, how patient folks are in sticking with them.  Here are the end values using each of these four estimates, along with an estimate from a complex biological warfare model:

  • Early March COVID Model: 72,820
  • H3N2 Analog: 97,976
  • Italy Curve: 95,990
  • Spain Curve: 102,610
  • TAOS(tm) Eir: 133,215

Dr. Anthony Fauci on CNN’s “State of the Union” Sunday talked a bit about this and CDC’s  internal models:

Whenever the models come in, they give a worst-case scenario and a best-case scenario. Generally, the reality is somewhere in the middle. I’ve never seen a model of the diseases that I’ve dealt with where the worst case actually came out … They always overshoot. I mean, looking at what we’re seeing now, you know, I would say between 100 and 200,000 (deaths).  But I don’t want to be held to that.

All I can say is I’m with Dr. Fauci: I don’t want to be held to any of this either 😛

What does all this mean to you personally?  To repeat: take this seriously, follow the CDC guidelines, limit interactions outside your immediate household (aka social distancing), keep strict hygiene protocols, and otherwise do everything you can to try to slow down the rate of spread. It’s more than likely not about you. It’s about that 1% of so of the population who will get very sick, and may not get enough care because the system will be overloaded.  Don’t focus on the numbers, just take care of yourself, your family, and your neighbors, and in three or four weeks the worst should be over.

Don’t be scared by the numbers or media terms like “skyrocketing” and the heartbreaking individual stories.  As you can see from the curves, that’s a natural part of the process.  I understand the sensitivity around comparing COVID to influenza because it is moving so much faster, but as horrible as this is going to get for our health care professionals, the “good” news is from a whole population mortality rate it’s not all that different. The 2017 flu season probably killed 61,000 (1.87/10,000 whole pop, 2.96/10k unvaccinated).  In the late 1990s, several years had rates well above 3/10k (1998 was 3.46, or with today’s population, 115 thousand deaths).  As I have said, it’s not that we are taking COVID19 too seriously, it’s we don’t take influenza seriously enough most of the time.

3 thoughts on “Doubling times, growth rates, and forecasts

  1. I greatly appreciate your analysis. You may have seen the IHME projections that are making the rounds where they show peak incidence and mortality in April and May in the U.S., with things petering out in July *assuming we maintain a high state of lockdown.* At the same time, their numbers show only 3% of the population will have been exposed by then. Are you anticipating a series of ever-shrinking logistic curves as we go through cycles of lockdown then semi-resumption of normal activity? Are there any comparable examples? It’s unclear to me whether we have any better choices than “let it burn through everyone to build up enough herd immunity” or “stay locked down until we have a vaccine deployed.”

    Thanks again.

    • It’s an interesting question. The statistical approaches (such logistic curves) for mortality are really independent of questions like percent of population exposed, infected, etc. All they are doing is fitting those observed rates and projecting them. In the more advanced models that are actually simulating the spread of the virus, it’s complicated. There are a fairly large range of possible exposure and infection rates in the literature, and that gets you a huge range of end scenarios. For example, I’m using much higher exposure and infected rates, but lower symptomatic and mortality rates, and getting the similar results.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.