"Learning Curves" and Genomics: Thoughts on the Future of Sequencing

(Update: 23 March 2009, I fixed various broken links.)

I have been wondering what additional information about future technology and markets can be discerned from trends in genome synthesis and sequencing ("Carlson Curves").  To see if there is anything there, I have been playing around with applying the idea of "learning curves" (also called "experience curves") to data on cost and productivity.

Learning curves generally are used to estimate decreases in costs that result from efficiencies that come from increases in production.  The more you make of something, the more efficient you become.  T.P. Wright famously used this idea in the 1930s to project decreases in cost as a function of increased airplane production.  The effect also shows up in a reduction of the cost of photovoltaic power as a function of cumulative production (see this figure, for example).

To start with here are some musings about the future of sequencing and the thousand dollar genome:

Figure 1 was generated using data on sequencing cost and productivity using commercially available instruments (click on the image for a larger pop-up).  I am not yet sure how seriously to take the plot, but it is interesting to think about the implications.

A few words on methodology: the data is sparse (see inset) in that there are not many points and data is not readily available in each category for each year.  This makes generating the plot of cost vs. productivity subject to estimation and some guesswork.  In particular, fitting a power law to the five productivity points, which are spread over only three logs, makes me uneasy.  The cost data isn't much better.  In the past I have cautioned both the private sector and governments from attempting to use this data to forecast trends.  But, really, everyone else is doing it, so why should I let good sense stop me?

Before going on, I should note that sequencing cost and productivity are related but not strictly correlated.  They are mostly independent variables at this point in time.  Reagents account for a substantial fraction of current sequencing costs, and increasing throughput and automation do not necessarily affect anything other than the number of bases one person can sequence in a day.  It is also important to point out that I am plotting productivity rather than cumulative production, and that both productivity and cost improvements include changes to new technology.  So the learning curve here is sort of an average over different technologies.  It is not a standard way to look at things, but it allows for a few interesting insights.

The blue line was generated by taking a ratio of fits to both the cost and productivity lines.  In other words, the blue line is basically data, and it suggests that for every order of magnitude improvement in productivity you get roughly a one and a half order of magnitude reduction in cost.  Here is the next point that makes me uneasy: I really have no reason to expect the current trends to maintain their present rates.  New sequencing technologies may well cause both productivity and cost changes to accelerate (though I would not expect them to slow -- see, for example, my previous post "The Thousand Dollar Genome").

Forging ahead, extending the trend out to the day when technology provides for the still-mythical Thousand Dollar Genome (TGD) provides an interesting insight.  At present rates, the TGD comes when an instrument allows for a productivity of one human genome per person-day.  It didn't have to be that way; slightly different doubling times (slopes) in the fits to cost and productivity would have produced a different result.  Frankly, I don't know if it means anything at all, but it did make me sit up and look more closely at the plot.  You could even call it a weak prediction about technological change -- weak because any deviation from the present average doubling rates would break the prediction.

But even if the present rates remain steady, that doesn't mean the actual cost of sequencing to the end user falls as quickly as it has.  Let's say somebody commercially produces an instrument that can actually provide a productivity of one genome per person-day.  How many of those instruments might make it onto the market?

Let's estimate that one percent of the US population wants to sign up for sequencing.  Those three million people would then require three million person-days worth of effort to sequence.  Operating 24/7 for one year, that would require just over 2700 instruments.  It will take some time before that many sequencers are available, which means that even if the technological capability exists there will be some -- probably substantial -- scarcity (the green circle on Figure1 ) keeping prices higher for some period.  Given that demand will certainly extend into Europe and Asia, further elevating prices, there is no reason to think the TGD will be a practical reality until there exists competition among providers.  This competition, in turn, will probably only emerge with the development of a diverse set of technologies capable of hitting the appropriate productivity threshold.

What does this imply for the sequencing market, and in particular for health care based on full genome sequencing?  First, costs will stay high until there are a large number of instruments in operation, and probably until there are many different technologies available.  Thus, if prices are determined solely by the market, the idea of sequencing newborns to give them a head start on maximizing their state of health will probably be out of reach for many years after the initial instrument is developed.  Market pricing probably means that sequencing will remain a tool of the wealthy for many, many years to come.

So, what other foolish, over-extended observations can I make based on fitting power laws to sparse data?  Just one more for the moment, and it actually doesn't depend so much on the actual data.  At a productivity of one genome per person-day, you really have to start thinking about the cost of that person.  Somebody will be running the machine, and that person draws a salary.  Let's say that this person earns a technician's wage, which amounts with benefits to $300/day.  All of a sudden (the trends are power laws, after all) that is 30% of the $1000 spent on sequencing the genome.  If the margin is 10-20% of the cost, then the actual sequencing, including financial loads such as depreciation of the instrument and interest, can cost only $500.  We are definitely a long time from seeing that price point.

I'll post on the learning curve for genome synthesis after I make more sense of it.