DNA Synthesis "Learning Curve": Thoughts on the Future of Building Genes and Organisms

With experience comes skill and efficiency.  That is the theory behind "learning" or "experience curves", which I played around with last week for DNA sequencing.  As promised, here are a few thoughts on the future of DNA synthesis.  Playing around with the synthesis curves a bit seems to kick out a couple of quantitative metrics for technological change.

For everything below, clicking on a Figure launches a pop-up with a full sized .jpg.  The data come from my papers, the Bio-era "Genome Synthesis and Design Futures" report, and a couple of my blog posts over the last year.

Figure 1.

The simplest application of a learning curve to DNA synthesis is to compare productivity with cost.  Figure 1 shows those curves for both oligo synthesis and gene synthesis (click on the figure for a larger pop-up).  These lines are generated by taking the ratios of fits to data (shown in the inset).  This is necessary due to the methodological annoyance that productivity and cost data do not overlap -- the fits allow comparison of trends even when data is missing from one set or another.  As before, 1) I am not really thrilled to rely on power law fits to a small number of points, and 2) the projections (dashed lines) are really just for the sake of asking "what if?".

What can we learn from the figure?  First, the two lines cover different periods of time.  Thus it isn't completely kosher to compare them directly.  But with that in mind, we come to the second point: even the simple cost data in the inset makes clear that the commercial cost of synthetic genes is rapidly approaching the cost of the constituent single-stranded oligos. This is the result of competition, and is almost certainly due to new technologies introduced by those competitors.

Assuming that commercial gene foundries are making money, the "Assembly Cost" is probably falling because of increased automation and other gains in efficiency.  But it can't fall to zero, and there will (probably?) always be some profit margin for genes over oligos.  I am not going to guess at how low the Assembly Cost can fall, and the projections are drawn in by hand just for illustration.


Figure 2.

It isn't clear that a couple of straight lines in Figure 1 teach us much about the future, except in pondering the shrinking margins of gene foundries.  But combining the productivity information with my "Longest Synthetic DNA" plot gives a little more to chew on.  Figure 2 is a ratio of a curve fitted to the longest published synthetic DNA (sDNA) to the productivity curve.

In what follows, remember that the green line is based on data.

First, the caveat: the fit to the longest sDNA is basically a hand hack.  On a semilog plot I fit a curve consisting of a logarithm and a power law (not shown).  That means the actual functional form (on the original data) is a linear term plus a super power law in which the exponent increases with time.  There isn't any rationale for this function other than it fits the crazy data (in the inset), and I would be oh-so-wary of inferring anything deep from it.  Perhaps one could make the somewhat trivial observation that for a long time synthesizing DNA was hard (the linear regime), and then we entered a period when it has become progressively easier (the super power law).  I should probably win a prize for that.  No?  A lollipop?

There are a couple of interesting things about this curve, along which distance represents "progress".  First, so far as I am aware, commercial oligo synthesis started in 1992 and commercial gene foundries starting showing up in 1999.  The distance along the curve in those seven years is quite short, while the distance over the next nine years to the Venter Institute's recent synthetic chromosome is substantially larger.

This change in distance/speed represents some sort of quantitative measure of accelerating progress in synthesizing genomes, though frankly I am not yet settled on what the proper metric should be.  That is, how exactly should one measure distance or speed along this curve?  And then, given proper caution about the utility of the underlying fits to data, how seriously should one trust the metric?  Maybe it is just fine as is.  I am still pondering this.

Next, while the "learning curve" is presently "concave up", it really ought to turn over and level off sometime soon.  As I argued in the post on the Venter Institute's fine technical achievement, they are already well beyond what will be economically interesting for the foreseeable future, which is probably only 10-50 kilobases (kB).  It isn't at all clear that assembling sDNA larger than 100 kB will be anything more than an academic demonstration.  The red octagon (hint!) is positioned at about 100 MB, which is in the range of a human chromosome.  Even assembling something that large, and then using it to fabricate an artificial human chromosome, is probably not technologically that useful.  I reserve a bit of judgement here in the event it turns out that actually building functioning human chromosomes from smaller pieces is problematic.  But really, why bother otherwise?

Figure 3.

Next, with the other curves in hand I couldn't help but compare the longest sDNA to gene assembly cost (beware the products of actual free time!).  (Update: Can't recall what I meant by this next sentence, so I struck it out.) Figure 3 may only be interesting because of what it doesn't show.  Note the reversed axis -- cost decreases to the right.

The assembly cost (inset) was generated simply by subtracting the oligo cost curve from the gene cost curve (see Figure 1 above) -- yes, I ignored the fact that those data are over different time periods.  There is no cost information available for any of the longest sDNA data, which all come from academic papers.  But the fact that gene assembly cost has been consistently halving every 18 months or so just serves to emphasize that the "acceleration" in the ratio of sDNA to assembly cost results from real improvements in processes and automation used to fabricate long sDNA.  I don't know that this is that deep an observation, but it does go some way towards providing additional quantitative estimates of progress in developing biological technologies.