Friday, May 1, 2009

An Agent-Based Model for Influenza A (H1N1)

“So the final lesson of 1918, a simple one yet the one most difficult to execute, is that those who occupy positions of authority must lessen the panic that can alienate the members of a society. A society that takes as its motto, "every man for himself" is no longer a civilized society. Those in authority must retain the public's trust. The way to do that is to distort nothing, to put the best face on nothing, to try to manipulate no one. Lincoln said that first and best. A leader must make whatever horror exists concrete. Only then will people be able to break it apart.”
--John Grady, The Great Influenza


Update: In this article I go over the basic design of an ABM for flu. In the next article I discuss some experiments I ran on the model. And now you can actually run the model.
  • How does an epidemic like the new H1N1 strain spread?
  • What factors turn an epidemic into a pandemic?
  • Why doesn't everyone get it?
  • Can hand-washing and face-masks really prevent the spread of disease or do they just slow it down?
  • Why are some communities devastated and others emerge almost unscathed?

Over the last weekend as news about the new Influenza A H1N1 virus spread I went into personal crisis management mode, i.e. stressed out. An influenza that primarily kills healthy younger people is troubling and the early reported death rates seemed to confirm that we were faced with a flu that was both virulent and that our immune systems aren't experienced with. I've built Agent-based models of pathogen epidemics and so have some sense of the dynamics involved. And a few years back I'd read John Barry's brilliant book and come away from that with a greater sense of the uniquely challenging aspects of influenza and how political failures in the face of pandemic disease can cost countless lives and plunge whole societies into terror and dysfunction. (By the way, it turns out the George Bush actually did read a few worthwhile things in office.) This new strain looked like the one that people have been worrying about. But sitting around feeling vaguely uneasy isn't very satisfying. So, I thought that it might be helpful and interesting to build an ABM model and share it with people. Agent-based modeling is a great tool for helping people to understand complex phenomenon -- and influenza is a very complex phenomenon.

An important caveat: This isn't a perfect model, and I'm not an expert in epidemiology. So as in everything, question assumptions, trust your own critical judgement, and find primary sources. On the other hand, people make policy and personal decisions based on far worse models than this. And, as my friend and mentor Joshua Epstein has often pointed out, we all have models, but unfortunately most of the time they are implicit -- full of hidden assumptions and inconsistencies. So while this model isn't perfect, it is better than one based on paranoia, anecdote or broad generalization. Because Agent-Based Models (ABMs) map directly to a mental model of how the various parts fit together but add rigor, transparency and the ability to test hypothesis, people who aren't experts in esoteric math can understand and trust them. And because they work over space and time, we get to see the dynamics; how a disease spreads, not just what a given end-state looks like.

While we're talking about Swine Flu because of current concerns, this is actually a quite general model that applies to many different kinds of pathogens. Joshua Epstein, Derek Cummings, Shubha Chakravarty, Ramesh Singha and Donald Burke developed and researched the first ABM models of pathogens, and this model draws on that effort. You can read about the small pox model alone in Toward a Containment Strategy for Smallpox Bioterror: An Individual-Based Computational Approach, but I'd recommend Chapter 12 in Josh's Generative Social Science as it goes into great detail about the overall context, and the other chapters will give you a lot more to chew on. In all of the model runs below, we've made no attempt to calibrate the model to an actual disease, so while it fits qualitative influneza dynamics well, don't expect it to match quantitative results.

While I've addressed prior work that I'm aware of there may be important pieces I've missed. If I've omitted an attribution please let me know. Similarly, any errors are my own and should not reflect on the work of others that I mention here.

A Traditional Model classic model for epidemics is the compartment model. The basic idea is to create a set of variables representing the number of people in different disease related categories. In the simplest "SIR" model we have a variable each for the number of individuals who are Susceptible, Infectious, and Recovered. A set of equations represent the rate at which these numbers change over time. When you make some (not-insignificant) assumptions and work these equations out, you get the basic outcome on the right. The green curve represents the number of people infected at any given time. (Unfortunatly, the graphic I found for this doesn't match up with the colors we use later in the model.) Over time, as people move from susceptible and then recover we get two classic "S-curves" as the population as a whole moves out of the susceptible state and later into the recovered state. This kind of curve is found throughout natural and social systems. For example, sales of a new category of products (VHS machines being the classic example) often look like the red curve. In the case of contagions, the rate of new infection moves quite slowly at first, accelerates steeply as more infected people create even more infections, and then falls off as the number of people infected reaches a kind of saturation level.

In this context I can't help but point out how a perfect example of how the public's lack of basic science literacy can be further confounded by news accounts that don't take the responsibility to put raw numbers into context. The NY Times generally has excellent science coverage, but reported the following reassurance on Thursday: "Some experts are cautiously optimistic. A computer simulation of this outbreak released Wednesday by a team from Northwestern University projected a worst-case scenario, meaning no measures have been taken to combat the spread. It predicted a mere 1,700 cases in the United States four weeks from now." Take a look at the figure to the right. How reassured would you be if you knew that four weeks was at this point in the disease projection? Actually, we don't know from the story where we are, so the numbers are only meaningful in a narrow context, for example in determining appropriate short-term health care response. They tell us literally nothing about the long-term outlook!Let's build an ABM and see if we can capture that basic behavior. I designed a working version of this model in an hour or two using MetaABM -- a very high-level modeling tool I've been developing over the last few years -- and have been refining it over the last week. Once you design a model in MetaABM using MetaABM's visual tools, that model can be automatically converted into the Java computer language to run on a number of different modeling platforms.

The ABM Model


Agent Color KeyOur model has categories, but they are a bit more complicated than the model above -- we've added a disease status for exposed; split infected into two categories, one for those who are symptomatic and one for those who are asymptomatic (it isn't obvious that the person is sick and they are still moving around normally); and we'll keep track of dead agents as well as recovered. But the most important thing about our model -- what makes it an agent-based model -- is that instead of keeping tack of a single population number for each category, we actually create individual objects, each with its own disease state, other characteristics, and most importantly a location in space. As I explain below for each agent we also keep track of the period that disease states occur.


Then we give the agents a set of behaviors; they move around a landscape, they can become infected and move from one disease state to another, and most importantly, they can transfer the disease from one person to another. The basic life cycle of an agent goes like this; underlined values are model parameters that we can change. At the beginning of each model run, for every agent:
  1. Move to a random location in our space.
  2. Draw a random number. If that number is less than the Initial Infection Probability, become exposed.

For each period after for every agent:
  1. Move to a random neighboring space.
  2. Pick a random neighbor and if one exists, make a random draw. If that draw is lower than the Contact Transmission Probability then expose our neighbor to the virus.
  3. If exposed, the agent determines the number of days that it will be in each of the states, based on:
    • Min and Max Period Exposure
    • Min and Max Period Asymptom
    • Min and Max Period Symtom
  4. Go through our disease process:
    1. If exposed and the period of exposure is over, become infected, either asymptomatically, or if the asymptomatic period is zero, symptomatically.
    2. If asymptomatic and that period is over, become symptomatic.
    3. If symptomatic and that period is over, make a random draw. If that number if greater than the Case Mortality Rate, recover, otherwise die.

Model Runs

“The three goals, Dr. Cetron said, can be plotted on the graph of new infections called the epidemic curve. "You want to shift the curve to the right, blunt the peaks and squash the area under them."” - From the NY Times article quoted above.

How might we accomplish that goal? Why would we even want to? Let's explore different parameterizations of our model and see if we can gain any insights from them. Play the Quicktime Movies below to see running versions of the models.

Run I: No Interventions (Base Parameters)

Flu Model I Chart
As we've kept statistics on our model, we can a graph of what has happened over time. Looks familiar! (Again, note that the colors don't match the first chart.) Now, our model is noisier, but that's exactly what we want. "The real world is messier", and though we certainly aren't modeling the real world, we're arguably a lot closer than in the original equation based model. Now it's my chance to be reassuring and point out that this model has not been calibrated against any kind of real world situation. For example, the death rates are very high -- evan at its most virulent, the Spanish Influenza showed "only" half the death rate shown. You might have noticed already something in the Quicktime movie: even in this model of a very rapidly spreading and virulent disease, not everyone got exposed.

Run II: Hygiene (Lower Transmission Probability)

Flu Model II ChartIn the next run, we'll lower the Contact Transmission Probability from .12 to .08, or by 1/3. We could interpret the difference in two ways. Perhaps in this new version of our little agent world, the virus is simply less contagious. Or perhaps our agents have followed everyone's advice and are washing their hands more often. This is an example, folks -- please keep washing your hands, regardless of what this model seems to be saying, ok?

Now, we have to be careful about drawing conclusions by comparing single model runs. In a real experimental effort we'd be doing parametric studies, which just means that we'd be changing the values like transmission probability systematically and doing many (like thousands) of runs to determine what effects these changes have and whether model runs are consistent when given same initial values. (And actually, as we'll discuss in another post, some of the most interesting models are those that aren't consistent from run to run.)

But we can make some observations. And.. hmmm.. nothing much seems to have changed. We're certainly not seeing 1/3 fewer cases! Almost the same number of people became sick and the duration of the epidemic isn't any different. Perhaps the overall exposure is a bit lower but that could be due to random variation in the model. But.. one thing that is somewhat different is the peak number of cases. That could actually be very important. If you only have so many ventilators available, lowering the peak number of cases by 80% could mean the difference between life and death for many more people, not to mention saving hospital personnel from having to make some very difficult choices.

Equations or Agents?

The issue of hospital resources points out another challenge of modeling that is subtle but extremely important. What exactly does a case mortality rate even mean? Casually, people take it to mean “how deadly is the disease”, but it really matches closer with “how many people did the disease kill, given their health status, exposure to previous strains, and availability and quality of care“. Well, is the quality of care some fixed thing? No, it is actually affected by the number of people who are sick. But our model assumes the death rate as in input parameter, and we don't know how many people will die; that's the point of running the model!

This is a real problem with equation-based models like the classic one we've mentioned above. It doesn't have a “closed-form” solution, which is a polite way of saying that they can never really provide a solution for a question such as “how many people might die in this epidemic”? Now, before I get a bunch of hate-mail from every researcher in the world who still owns a jacket with elbow patches, I should clarify that. There are ways to get very very close to a good solution by using some sophisticated tricks (I mean “techniques”) and by making just a few minor assumptions. One of the most common assumptions is that there are an infinite number of individuals. Another, a favorite of economists, is that there is an infinite amount of time. The biggest assumption of all is that everyone is the same. OK, I'm still grossly simplifying -- and I'll probably get some very polite hate-mail anyway -- so let's move on.

What is different about an agent-based model? First, and perhaps most importantly, questions like “how can we know the death rate when we don't know how many hosptital beds we'll have?” are a lot harder to ignore. The very process of exploring the model causes us to ask more and more questions. The model transparency draws us in, rather than forcing us out. Instead of being faced with complex equations (“you wouldn't understand, but trust me on this”) we have a clear set of attributes and behaviors and we can watch those behaviors in action. Does this mean that our models will never suck? Absolutely not, but we have a much better chance of seeing when they do and making them better. In the case we're referring to here, modelers might choose to add a value for hospital care quality to the overall death rate calculation. This value might at first be some measure of total cases against some global resource level. But over time, we could refine it further and create agents that represent hospitals at particular locations and give them a particular number of ventilators along with some way to get more. And so on.

The key point is that while the general public (whatever that means) seem to actually over estimate the quality of expert estimates -- sloppy media interpretation of those estimates only compounding the issue. Not only do experts make (quite understandable) guesses, the theoretical models behind those assumptions are typically quite limited. Even with (or perhaps because of) all sorts of mathematical tricks and assumptions we end up with a model that is very hard to understand and analyze. People spend a lot of time in graduate schools learning how to do this and the models still miss important details. And they always will, because in the real-world where we are in time matters, and individuals are different. We need equation-based and statistical models and all the rest, but we shouldn't rely upon them. Agent-based models give us another way to look at the world that can challenge traditional models and reveal dynamics we might not otherwise have seen.

Run III: Restricted Contact (Lower Movement Probability)

Flu Model III Chart Back to the main point..we really haven't changed the total number of cases at all. What about adding movement restrictions? Let's add a parameter, called Movement Probability. Then we'll modify the movement rule so that every period, agents make a random draw and only move if that draw is less than the probability. We'll set the value to 0.5 to start off. If we run the model we can see the agents do seem a little less active, and we get a curve that is much lower. But so what? Unless our primary concern is emergency services capacity all that we've done is prolong the misery, right? Whereas the unchecked epdemic is pretty much over by period 500, in this model it drags on for another 100 periods or so. This one could actually get us into the land of unintended consequences. For example, researchers seem to agree that a new flu could be much more dangerous if it appeared at the beginning of the regular flu season instead of the end. In that sense, we're lucky that the flu appeared when it did. But if we prolong the period of outbreak, are we increasing the chances that the disease will linger and spread into the next flu season? The Spanish Flu Pandemic followed exactly that pattern.

Run IV: Hygiene and Restricted Contact

Flu Model III Chart
There is an obvious next move here, and that is to combine the two strategies. Let's try to get people to wash their hands and be a little less social. In the face of a crisis, this is exactly the kind of thing that makes everyone scream about useless half-measures. “What difference does it really make? Everyone is going to get it anyway”. But let's run the model and look at the results. Not only have we lowered the peak as before, but we've also actually managed to control the epidemic much more quickly. And something else that's really nice. We have a significant number of people who never acquire the flu at all -- three times fewer than for the run we did on the previous model. This is especially interesting, because the two methods in isolation didn't change that figure at all. (The death rate is better too, but the rates are so small that we can't really draw a solid conclusion from that.) Do two half-measures make a whole measure? (Outside of the realm of music class, that is?) Well, I don't know about you, but I have a pretty healthy level of skepticism when it comes to government pronouncements about my well-being. (Apparently, eggs are good for us again.) And all of this talk about washing hands and avoiding crowds strikes one as exactly the kind of thing someone would say if they were out of any better ideas. But maybe these guys are on to something.. It certainly merits a closer look.

What's next...

Well, as always seems to be the case, the explorations of the model took a lot more time to write up than they did to do. (And I'd still like to edit this all a bit more as I'm sure I've made a mess of one or two things!) But its 3 am and time to pack it in.
So far we've looked at a pretty simple model, and found some interesting -- if not actually earth-shattering -- dynamics. There is a lot more to explore. If you want to try the model out yourself, it's available in the Eclipse AMP project examples. See And please share your insights and questions here.. we can marry our fears and anxieties with a bit of curiousity and perhaps even enjoy ourselves in the process!

Books Mentioned in this Article

1 comment:

Popular Posts

Recent Tweets

    follow me on Twitter