Results tagged “Carlson Curves”

Late Night, Unedited Musings on Synthesizing Secret Genomes

By now you have probably heard that a meeting took place this past week at Harvard to discuss large scale genome synthesis. The headline large genome to synthesize is, of course, that of humans. All 6 billion (duplex) bases, wrapped up in 23 pairs of chromosomes that display incredible architectural and functional complexity that we really don't understand very well just yet. So no one is going to be running off to the lab to crank out synthetic humans. That 6 billion bases, by the way, just for one genome, exceeds the total present global demand for synthetic DNA. This isn't happening tomorrow. In fact, synthesizing a human genome isn't going to happen for a long time.

But, if you believe the press coverage, nefarious scientists are planning pull a Frankenstein and "fabricate" a human genome in secret. Oh, shit! Burn some late night oil! Burn some books! Wait, better — burn some scientists! Not so much, actually. There are a several important points here. I'll take them in no particular order.

First, it's true, the meeting was held behind closed doors. It wasn't intended to be so, originally. The rationale given by the organizers for the change is that a manuscript on the topic is presently under review, and the editor of the journal considering the manuscript made it clear that it considers the entire topic under embargo until the paper is published. This put the organizers in a bit of a pickle. They decided the easiest way to comply with the editor's wishes (which were communicated to the authors well after the attendees had made travel plans) was to hold the meeting under rules even more strict than Chatham House until the paper is published. At that point, they plan to make a full record of the meeting available. It just isn't a big deal. If it sounds boring and stupid so far, it is. The word "secret" was only introduced into the conversation by a notable critic who, as best I can tell, perhaps misconstrued the language around the editor's requirement to respect the embargo. A requirement that is also boring and stupid. But, still, we are now stuck with "secret", and all the press and bloggers who weren't there are seeing Watergate headlines and fame. Still boring and stupid.

Next, It has been reported that there were no press at the meeting. However, I understand that there were several reporters present. It has also been suggested that the press present were muzzled. This is a ridiculous claim if you know anything about reporters. They've simply been asked to respect the embargo, which so far they are doing, just like they do with every other embargo. (Note to self, and to readers: do not piss off reporters. Do not accuse them of being simpletons or shills. Avoid this at all costs. All reporters are brilliant and write like Hemingway and/or Shakespeare and/or Oliver Morton / Helen Branswell / Philip Ball / Carl Zimmer / Erica Check-Hayden. Especially that one over there. You know who I mean. Just sayin'.)

How do I know all this? You can take a guess, but my response is also covered by the embargo. 

Moving on: I was invited to the meeting in question, but could not attend. I've checked the various associated correspondence, and there's nothing about keeping it "secret". In fact, the whole frickin' point of coupling the meeting to a serious, peer-reviewed paper on the topic was to open up the conversation with the public as broadly as possible. (How do you miss that unsubtle point, except by trying?) The paper was supposed to come out before, or, at the latest, at the same time as the meeting. Or, um, maybe just a little bit after? But, whoops. Surprise! Academic publishing can be slow and/or manipulated/politicized. Not that this happened here. Anyway, get over it. (Also: Editors! And, reviewers! And, how many times will I say "this is the last time!")

(Psst: an aside. Science should be open. Biology, in particular, should be done in the public view and should be discussed in the open. I've said and written this in public on many occasions. I won't bore you with the references. [Hint: right here.] But that doesn't mean that every conversation you have should be subject to review by the peanut gallery right now. Think of it like a marriage/domestic partnership. You are part of society; you have a role and a responsibility, especially if you have children. But that doesn't mean you publicize your pillow talk. That would be deeply foolish and would inevitably prevent you from having honest conversations with your spouse. You need privacy to work on your thinking and relationships. Science: same thing. Critics: fuck off back to that sewery rag in — wait, what was I saying about not pissing off reporters?)

Is this really a controversy? Or is it merely a controversy because somebody said it is? Plenty of people are weighing in who weren't there or, undoubtedly worse from their perspective, weren't invited and didn't know it was happening. So I wonder if this is more about drawing attention to those doing the shouting. That is probably unfair, this being an academic discussion, full of academics.

Secondly (am I just on secondly?), the supposed ethical issues. Despite what you may read, there is no rush. No human genome, nor any human chromosome, will be synthesized for some time to come. Make no mistake about how hard a technical challenge this is. While we have some success in hand at synthesizing yeast chromosomes, and while that project certainly serves as some sort of model for other genomes, the chromatin in multicellular organisms has proven more challenging to understand or build. Consequently, any near-term progress made in synthesizing human chromosomes is going to teach us a great deal about biology, about disease, and about what makes humans different from other animals. It is still going to take a long time. There isn't any real pressing ethical issue to be had here, yet. Building the ubermench comes later. You can be sure, however, that any federally funded project to build the ubermench will come with a ~2% set aside to pay for plenty of bioethics studies. And that's a good thing. It will happen.

There is, however, an ethical concern here that needs discussing. I care very deeply about getting this right, and about not screwing up the future of biology. As someone who has done multiple tours on bioethics projects in the U.S. and Europe, served as a scientific advisor to various other bioethics projects, and testified before the Presidential Commission on Bioethical Concerns (whew!), I find that many of these conversations are more about the ethicists than the bio. Sure, we need to have public conversations about how we use biology as a technology. It is a very powerful technology. I wrote a book about that. If only we had such involved and thorough ethical conversations about other powerful technologies. Then we would have more conversations about stuff. We would converse and say things, all democratic-like, and it would feel good. And there would be stuff, always more stuff to discuss. We would say the same things about that new stuff. That would be awesome, that stuff, those words. <dreamy sigh> You can quote me on that. <another dreamy sigh>

But on to the technical issues. As I wrote last month, I estimate that the global demand for synthetic DNA (sDNA) to be 4.8 billion bases worth of short oligos and ~1 billion worth of longer double-stranded (dsDNA), for not quite 6 Gigabases total. That, obviously, is the equivalent of a single human duplex genome. Most of that demand is from commercial projects that must return value within a few quarters, which biotech is now doing at eye-popping rates. Any synthetic human genome project is going to take many years, if not decades, and any commercial return is way, way off in the future. Even if the annual growth in commercial use of sDNA were 20% — which is isn't — this tells you, dear reader, that the commercial biotech use of synthetic DNA is never, ever, going to provide sufficient demand to scale up production to build many synthetic human genomes. Or possibly even a single human genome. The government might step in to provide a market to drive technology, just as it did for the human genome sequencing project, but my judgement is that the scale mismatch is so large as to be insurmountable. Even while sDNA is already a commodity, it has far more value in reprogramming crops and microbes with relatively small tweaks than it has in building synthetic human genomes. So if this story were only about existing use of biology as technology, you could go back to sleep.

But there is a use of DNA that might change this story, which is why we should be paying attention, even at this late hour on a Friday night.

DNA is, by far, the most sophisticated and densest information storage medium humans have ever come across. DNA can be used to store orders of magnitude more bits per gram than anything else humans have come up with. Moreover, the internet is expanding so rapidly that our need to archive data will soon outstrip existing technologies. If we continue down our current path, in coming decades we would need not only exponentially more magnetic tape, disk drives, or flash memory, but exponentially more factories to produce these storage media, and exponentially more warehouses to store them. Even if this is technically feasible it is economically implausible. But biology can provide a solution. DNA exceeds by many times even the theoretical capacity of magnetic tape or solid state storage.

A massive warehouse full of magnetic tapes might be replaced by an amount of DNA the size of a sugar cube. Moreover, while tape might last decades, and paper might last millennia, we have found intact DNA in animal carcasses that have spent three-quarters of a million years frozen in the Canadian tundra. Consequently, there is a push to combine our ability to read and write DNA with our accelerating need for more long-term information storage. Encoding and retrieval of text, photos, and video in DNA has already been demonstrated. (Yes, I am working on one of these projects, but I can't talk about it just yet. We're not even to the embargo stage.) 

Governments and corporations alike have recognized the opportunity. Both are funding research to support the scaling up of infrastructure to synthesize and sequence DNA at sufficient rates.

For a “DNA drive” to compete with an archival tape drive today, it needs to be able to write ~2Gbits/sec, which is about 2 Gbases/sec. That is the equivalent of ~20 synthetic human genomes/min, or ~10K sHumans/day, if I must coin a unit of DNA synthesis to capture the magnitude of the change. Obviously this is likely to be in the form of either short ssDNA, or possibly medium-length ss- or dsDNA if enzymatic synthesis becomes a factor. If this sDNA were to be used to assemble genomes, it would first have to be assembled into genes, and then into synthetic chromosomes, a non trivial task. While this would be hard, and would to take a great deal of effort and PhD theses, it certainly isn't science fiction.

But here, finally, is the interesting bit: the volume of sDNA necessary to make DNA information storage work, and the necessary price point, would make possible any number of synthetic genome projects. That, dear reader, is definitely something that needs careful consideration by publics. And here I do not mean "the public", the 'them' opposed to scientists and engineers in the know and in the do (and in the doo-doo, just now), but rather the Latiny, rootier sense of "the people". There is no them, here, just us, all together. This is important.

The scale of the demand for DNA storage, and the price at which it must operate, will completely alter the economics of reading and writing genetic information, in the process marginalizing the use by existing multibillion-dollar biotech markets while at the same time massively expanding capabilities to reprogram life. This sort of pull on biotechnology from non-traditional applications will only increase with time. That means whatever conversation we think we are having about the calm and ethical development biological technologies is about to be completely inundated and overwhelmed by the relentless pull of global capitalism, beyond borders, probably beyond any control. Note that all the hullabaloo so far about synthetic human genomes, and even about CRISPR editing of embryos, etc., has been written by Western commentators, in Western press. But not everybody lives in the West, and vast resources are pushing development of biotechnology outside of the of West. And that is worth an extended public conversation.

So, to sum up, have fun with all the talk of secret genome synthesis. That's boring. I am going off the grid for the rest of the weekend to pester litoral invertebrates with my daughter. You are on your own for a couple of days. Reporters, you are all awesome, make of the above what you will. Also: you are all awesome. When I get back to the lab on Monday I will get right on with fabricating the ubermench for fun and profit. But — shhh — that's a secret.

On DNA and Transistors

Here is a short post to clarify some important differences between the economics of markets for DNA and for transistors. I keep getting asked related questions, so I decided to elaborate here.

But first, new cost curves for reading and writing DNA. The occasion is some new data gleaned from a somewhat out of the way source, the Genscript IPO Prospectus. It turns out that, while preparing their IPO docs, Genscript hired Frost & Sullivan to do market survey across much of life sciences. The Prospectus then puts Genscript's revenues in the context of the global market for synthetic DNA, which together provide some nice anchors for discussing how things are changing (or not).

So, with no further ado, Frost & Sullivan found that the 2014 global market for oligos was $241 million, and the global market for genes was $137 million. (Note that I tweeted out larger estimates a few weeks ago when I had not yet read the whole document.) Genscript reports that they received $35 million in 2014 for gene synthesis, for 25.6% of the market, which they claim puts them in the pole position globally. Genscript further reports that the price for genes in 2014 was $.34 per base pair. This sounds much too high to me, so it must be based on duplex synthesis, which would bring the linear per base cost down to $.17 per base, which sounds much more reasonable to me because it is more consistent with what I hear on the street. (It may be that Gen9 is shipping genes at $.07 per base, but I don't know anyone outside of academia who is paying that low a rate.) If you combine the price per base and the size of the market, you get about 1 billion bases worth of genes shipped in 2014 (so a million genes, give or take). This is consistent with Ginkgo's assertions saying that their 100 million base deal with Twist was the equivalent of 10% of the global gene market in 2015. For oligos, if you combine Genscript's reported average price per base, $.05, with the market size you get about 4.8 billion bases worth of oligos shipped in 2014. Frost & Sullivan thinks that from 2015 to 2019 the oligo market CAGR will be 6.6% and the gene synthesis market will come in at 14.7%.

For the sequencing, I have capitulated and put the NextSeq $1000 human genome price point on the plot. This instrument is optimized to sequence human DNA, and I can testify personally that sequencing arbitrary DNA is more expensive because you have to work up your own processes and software. But I am tired of arguing with people. So use the plot with those caveats in mind.

(My blogging service seems to be having an issue embedding the image. Click below and you will get a pop up window.)

What is most remarkable about these numbers is how small they are. The way I usually gather data for these curves is to chat with people in the industry, mine publications, and spot check price lists. All that led me to estimate that the gene synthesis market was about $350 million (and has been for years) and the oligo market was in the neighborhood of $700 million (and has been for years).

If the gene synthesis market is really only $137 million, with four or 5 companies vying for market share, then that is quite an eye opener. Even if that is off by a factor of two or three, getting closer to my estimate of $350 million, that just isn't a very big market to play in. A ~15% CAGR is nothing to sneeze at, usually, and that is a doubling rate of about 5 years. But the price of genes is now falling by 15% every 3-4 years (or only about 5% annually). So, for the overall dollar size of the market to grow at 15%, the number of genes shipped every year has to grow at close to 20% annually. That's about 200 million additional bases (or ~200,000 more genes) ordered in 2016 compared to 2015. That seems quite large to me. How many users can you think of who are ramping up their ability to design or use synthetic genes by 20% a year? Obviously Ginkgo, for one. As it happens, I do know of a small number of other such users, but added together they do not come close to constituting that 20% overall increase. All this suggests to me that the dollar value of the gene synthesis market will be hard pressed to keep up with Frost & Sullivan estimate of 14.7% CAGR, at least in the near term. As usual, I will be happy to be wrong about this, and happy to celebrate faster growth in the industry. But bring me data.

People in the industry keep insisting that once the price of genes falls far enough, the ~$3 billion market for cloning will open up to synthetic DNA. I have been hearing that story for a decade. And then price isn't the only factor. To play in the cloning market, synthesis companies would actually have to be able to deliver genes and plasmids faster than cloning. Given that I'm hearing delivery times for synthetic genes are running at weeks, to months, to "we're working on it", I don't see people switching en mass to synthetic genes until the performance improves. If it costs more to have your staff waiting for genes to show up by FedEx than to have them bash the DNA by hand, they aren't going to order synthetic DNA.

And then what happens if the price of genes starts falling rapidly again? Or, forget rapidly, what about modestly? What if a new technology comes in and outcompetes standard phosphoramidite chemistry? The demand for synthetic DNA could accelerate and the total market size still might be stagnant, or even fall. It doesn't take much to turn this into a race to the bottom. For these and other reasons, I just don't see the gene synthesis market growing very quickly over the next 5 or so years.

Which brings me to transistors. The market for DNA is very unlike the market for transistors, because the role of DNA in product development and manufacturing is very unlike the role of transistors. Analogies are tremendously useful in thinking about the future of technologies, but only to a point; the unwary may miss differences that are just as important as the similarities.

For example, the computer in your pocket fits there because it contains orders of magnitude more transistors than a desktop machine did fifteen years ago. Next year, you will want even more transistors in your pocket, or on your wrist, which will give you access to even greater computational power in the cloud. Those transistors are manufactured in facilities now costing billions of dollars apiece, a trend driven by our evidently insatiable demand for more and more computational power and bandwidth access embedded in every product that we buy. Here is the important bit: the total market value for transistors has grown for decades precisely because the total number of transistors shipped has climbed even faster than the cost per transistor has fallen.

In contrast, biological manufacturing requires only one copy of the correct DNA sequence to produce billions in value. That DNA may code for just one protein used as a pharmaceutical, or it may code for an entire enzymatic pathway that can produce any molecule now derived from a barrel of petroleum. Prototyping that pathway will require many experiments, and therefore many different versions of genes and genetic pathways. Yet once the final sequence is identified and embedded within a production organism, that sequence will be copied as the organism grows and reproduces, terminating the need for synthetic DNA in manufacturing any given product. The industrial scaling of gene synthesis is completely different than that of semiconductors.
[Given the mix-up in the publication date of 2015, I have now deleted the original post. I have appended the comments from the original post to the bottom of this post.]

It's time once again to see how quickly the world of biological technologies is changing. The story is mixed, in part because it is getting harder to find useful data, and in part because it is getting harder to generate appropriate metrics. 

Sequencing and synthesis productivity

I'll start with the productivity plot, as this one isn't new. For a discussion of the substantial performance increase in sequencing compared to Moore's Law, as well as the difficulty of finding this data, please see this post. If nothing else, keep two features of the plot in mind: 1) the consistency of the pace of Moore's Law and 2) the inconsistency and pace of sequencing productivity. Illumina appears to be the primary driver, and beneficiary, of improvements in productivity at the moment, especially if you are looking at share prices. It looks like the recently announced NextSeq and Hiseq instruments will provide substantially higher productivities (hand waving, I would say the next datum will come in another order of magnitude higher), but I think I need a bit more data before officially putting another point on the plot. Based on Eric Check Hayden's coverage at Nature, it seems that the new instruments should also provide substantial price improvements, which I get into below.

As for synthesis productivity, there have been no new commercially available instruments released for many years. sDNA providers are clearly pushing productivity gains in house, but no one outside those companies has access to productivity data.
DNA sequencing and synthesis prices

The most important thing to notice about the plots below is that prices have stopped falling precipitously. If you hear or read anyone asserting that costs are falling exponentially, you can politely refer them to the data (modulo the putative performance of the new Illumina instruments). We might again see exponential price decreases, but that will depend on a combination of technical innovation, demand, and competition, and I refer the reader to previous posts on the subject. Note that prices not falling isn't necessarily bad and doesn't mean the industry is somehow stagnant. Instead, it means that revenues in these sectors are probably not falling, which will certainly be welcomed by the companies involved. As I described a couple of weeks ago, and in a Congressional briefing in November, revenues in biotech continue to climb steeply.

The second important thing to notice about these plots is that I have changed the name of the metric from "cost" to "price". Previously, I had decided that this distinction amounted to no real difference for my purposes. Now, however, the world has changed, and cost and price are very different concepts for anyone thinking about the future of DNA. Previously, there was at times an order of magnitude change in cost from year to year, and keeping track of the difference between cost and price didn't matter. In a period when change is much slower, that difference becomes much more important. Moreover, as the industry becomes larger, more established, and generally more important for the economy, we should all take more care in distinguishing between concepts like cost to whom and price for whom.

In the plot that follows, the price is for finished, not raw, sequencing.
And here is a plot only of oligo and gene-length DNA:
What does all this mean? Illumina's instruments are now responsible for such a high percentage of sequencing output that the company is effectively setting prices for the entire industry. Illumina is being pushed by competition to increase performance, but this does not necessarily translate into lower prices. It doesn't behoove Illumina to drop prices at this point, and we won't see any substantial decrease until a serious competitor shows up and starts threatening Illumina's market share. The absence of real competition is the primary reason sequencing prices have flattened out over the last couple of data points.

I pulled the final datum on the sequencing curve from the NIH; the title on the NIH curve is "cost", but as it includes indirect academic costs I am going to fudge and call it "price". I notice that the NIH is now publishing two sequencing cost curves, and I'll bet that the important differences between them are too subtle for most viewers. One curve shows cost per megabase of raw sequence - that is, data straight off the instruments - and the other curve shows cost per finished human genome (assuming ~30X coverage of 3x10^9 bases). The cost per base of that finished sequencing is a couple orders of magnitude higher than the cost of the raw data. On the Hiseq X data sheet, Illumina has boldly put a point on the cost per human genome curve at $1000. But I have left it off the above plot for the time being; the performance and cost claimed by Illumina are just for human genomes rather than for arbitrary de novo sequencing. Mick Watson dug into this, and his sources inside Illumina claim that this limitation is in the software, rather than the hardware or the wetware, in which case a relatively simple upgrade could dramatically expand the utility of the instrument. Or perhaps the "de novo sequencing level" automatically unlocks after you spend $20 million in reagents. (Mick also has some strong opinions about the role of competition in pushing the development of these instruments, which I got into a few months ago.) 

Synthesis prices have slowed for entirely different reasons. Again, I have covered this ground in many other posts, so I won't belabor it here. 

Note that the oligo prices above are for column-based synthesis, and that oligos synthesized on arrays are much less expensive. However, array synthesis comes with the usual caveat that the quality is generally lower, unless you are getting your DNA from Agilent, which probably means you are getting your dsDNA from Gen9

Note also that the distinction between the price of oligos and the price of double-stranded sDNA is becoming less useful. Whether you are ordering from Life/Thermo or from your local academic facility, the cost of producing oligos is now, in most cases, independent of their length. That's because the cost of capital (including rent, insurance, labor, etc) is now more significant than the cost of goods. Consequently, the price reflects the cost of capital rather than the cost of goods. Moreover, the cost of the columns, reagents, and shipping tubes is certainly more than the cost of the atoms in the sDNA you are ostensibly paying for. Once you get into longer oligos (substantially larger than 50-mers) this relationship breaks down and the sDNA is more expensive. But, at this point in time, most people aren't going to use longer oligos to assemble genes unless they have a tricky job that doesn't work using short oligos.

Looking forward, I suspect oligos aren't going to get much cheaper unless someone sorts out how to either 1) replace the requisite human labor and thereby reduce the cost of capital, or 2) finally replace the phosphoramidite chemistry that the industry relies upon. I know people have been talking about new synthesis chemistries for many years, but I have not seen anything close to market.

Even the cost of double-stranded sDNA depends less strongly on length than it used to. For example, IDT's gBlocks come at prices that are constant across quite substantial ranges in length. Moreover, part of the decrease in price for these products is embedded in the fact that you are buying smaller chunks of DNA that you then must assemble and integrate into your organism of choice. The longer gBlocks come in as low as ~$0.15/base, but you have to spend time and labor in house in order to do anything with them. Finally, so far as I know, we are still waiting for Gen9 and Cambrian Genomics to ship DNA at the prices they have suggested are possible. 

How much should we care about the price of sDNA?

I recently had a chat with someone who has purchased and assembled an absolutely enormous amount of sDNA over the last decade. He suggested that if prices fell by another order of magnitude, he could switch completely to outsourced assembly. This is an interesting claim, and potentially an interesting "tipping point". However, what this person really needs is not just sDNA, but sDNA integrated in a particular way into a particular genome operating in a particular host. And, of course, the integration and testing of the new genome in the host organism is where most of the cost is. Given the wide variety of emerging applications, and the growing array of hosts/chassis, it isn't clear that any given technology or firm will be able to provide arbitrary synthetic sequences incorporated into arbitrary hosts.

Consequently, as I have described before, I suspect that we aren't going to see a huge uptake in orders for sDNA until cheaper genes and circuits are coupled to reductions in cost for the rest of the build, test, and measurement cycle. Despite all the talk recently about organism fabs and outsourced testing, I suggest that what will really make a difference is providing every lab and innovator with adequate tools and infrastructure to do their own complete test and measurement. We should look to progress in pushing all facets of engineering capacity for biological systems, not just on reducing the cost of reading old instructions and writing new ones.


Comments from original post follow.

George Church:

"the performance and cost claimed by Illumina are just for human genomes rather than for arbitrary de novo sequencing."  --Rob
 ( )  But most of the curve has been based on human genome sequencing until now.  Why exclude human, rather than having a separate curve for "de novo"?  Human genomes constitute a huge and compelling market.    -- George  

"oligos synthesized on arrays are much less expensive. However, array synthesis comes with the usual caveat that the quality is generally lower, unless you are getting your DNA from Agilent"  -- Rob
So why exclude Agilent from the curve? -- George

"we aren't going to see a huge uptake in orders for sDNA until cheaper genes and circuits are coupled to reductions in cost for the rest of the build, test, and measurement cycle." --Rob
Is this the sort of enabling technology needed?:

My response to George:


Thanks for your comments. I am not sure what you might mean by "most of the curve has been based on human genome sequencing". From my first efforts in 2000 (published initially in 2003), I have tried to use data that is more general. It is true that human genomes constitute a large market, but they aren't the only market. By definition, if you are interested in sequencing or building any other organism, then instruments that specialize in sequencing humans are of limited relevance. It may also be true that the development of new sequencing technologies and instruments has been driven by human sequencing, but that is beside the point. It may even be true that the new Illumina systems can be just as easily used to sequencing mammoths, but that isn't happening yet. I have been doing my best to track the cost, and now the price, of de novo sequencing.

As I mention in the post, it is time that everyone, including me, started being more careful about the difference between cost and price. This brings me to oligos.

Agilent oligos are a special case. So far as I know, only Gen9 is using Agilent oligos as raw material to build genes. As you know, Cambrian Genomics is using arrays produced using technology developed at Combimatrix, and in any event isn't yet on the market. It is my understanding that, aside from Gen9, Agilent's arrays are primarily used for analysis rather than for building anything. Therefore, the market *price* of Agilent oligos is irrelevant to anyone except Gen9.

If the *cost* of Agilent oligos to Gen9 were reflected in the *price* of the genes sold by Gen9, or if those oligos were more broadly used, then I would be more interested in including them on the price curve. So far as I am aware, the price for the average customer at Gen9 is in the neighborhood of $.15-.18 per base. I've heard Drew Endy speak of a "friends and family" (all academics?) price of ~$.09 from Gen9, but that does not appear to be available to all customers, so I will leave it off the plot for now.

All this comes down to the obvious fact that, as the industry matures and becomes supported more by business-to-business sales rather than being subsidized by government grants and artificially cheap academic labor, the actual cost and actual price start to matter a great deal. Deextinction, in particular, might be an example where an academic/non-profit project might benefit from low cost (primarily cost of goods and cost of labor) that would be unachievable on the broader market where the price would set by 1) keeping the doors of a business open, 2) return on capital, and 3) competition, not necessarily in that order. The academic cost of developing, demonstrating, and even using technologies is almost always very different from the eventual market price of those technologies.

The bottom line is that, from day one, I have been trying to understand the impact of biological technologies on the economy. This impact is most directly felt, and tracked, via the price that most customers pay for goods and services. I am always looking to improve the metrics I use, and if you have suggestions about how to do this better I am all ears.

Finally, yes, the papers you cite (above and on the Deexctinction mailing list) describe the sort of thing the could help reduce engineering costs. Ultimately technologies like those will reduce the market price of products resulting from that engineering process. I look forward to seeing more, and also to seeing this technology utilized in the market.

Thanks again for your thoughtful questions.

How Competition Improves DNA Sequencing

The technology that enables reading DNA is changing very quickly.  I've chronicled how price and productivity are each improving in a previous post; here I want to try to get at how the diversity of companies and technologies is contributing to that improvement.

As I wrote previously, all hell is breaking loose in sequencing, which is great for the user.  Prices are falling and the capabilities of sequencing instruments are skyrocketing.  From an analytical perspective, the diversity of platforms is a blessing and a curse.  There is a great deal more data than just a few years ago, but it has become quite difficult to directly compare instruments that produce different qualities of DNA sequence, produce different read lengths, and have widely different throughputs.

I have worked for many years to come up with intuitive metrics to aid in understanding how technology is changing.  Price and productivity in reading and writing DNA are pretty straightforward.  My original paper on this topic (PDFalso looked at the various components of determining protein structures, which, given the many different quantifiable tasks involved, turned out to be a nice way to encapsulate a higher level look at rates of change.

In 2007, with the publication of bio-era's Genome Synthesis and Design Futures, I tried to get at how improvements in instrumentation were moving us toward sequencing whole genomes. The two axes of the relevant plot were 1) read length -- the length of each contiguous string of bases read by an instrument, critical to accurate assembly of genomes or chromosomes that can be hundreds of millions of bases long -- and 2) the daily throughput per instrument -- how much total DNA each instrument could read.  If you have enough long reads you can use this information as a map to assemble many shorter reads into the contiguous sequence.

Because there weren't very many models of commercially available sequencers in 2007, the original plot didn't have a lot of data on it (the red squares and blue circles below).  But the plot did show something interesting, which was that two general kinds of instruments were emerging at that time: those that produced long reads but had relatively limited throughput, and those that produced short reads but could process enormous amounts of sequence per day.  The blue dots below were data from my original paper, and the red squares were derived from a Science news article in 2006 that looked at instruments said to be emerging over the next year or so.

I have now pulled performance estimates out of several papers assessing instruments currently on the market and added them to the plot (purple triangles).  The two groupings present in 2007 are still roughly extant, though the edges are blurring a bit. (As with the price and productivity figures, I will publish a full bibliography in a paper later this year.  For now, this blog post serves as the primary citation for the figure below.)

I am still trying to sort out the best way to represent the data (I am open to suggestions about how do it better).  At this point, it is pretty clear that the two major axes are insufficient to truly understand what is going on, so I have attempted to add some information regarding the release schedules of new instruments.  Very roughly, we went from a small number of first generation instruments in 2003 to a few more real instruments in 2006 that performed a little better in some regards, plus a few promised instruments that didn't work out for one reason or another.  However, starting in about 2010, we began to see seriously improved instruments being released on an increasingly rapid schedule.  This improvement is the result of competition not just between firms, but also between technologies.  In addition, some of what we are seeing is the emergence of instruments that have niches; long reads but medium throughput, short reads but extraordinary throughput -- combine these two capabilities and you have the ability to crank out de novo sequences at pretty remarkable rate.  (For reference, the synthetic chromosome Venter et al published a few years ago was about one million bases; human chromosomes are in the range of 60 to 250 million bases.)

And now something even more interesting is going on.  Because platforms like PacBio and IonTorrent can upgrade internal components used in the actual sequencing, where those components include hardware, software, and wetware, revisions can result in stunning performance improvements.  Below is a plot with all the same data as above, with the addition of one revision from PacBio.  It's true that the throughput per instrument didn't change so much, but such long read lengths mean you can process less DNA and still rapidly produce high resolution sequence, potentially over megabases (modulo error rates, about which there seems to be some vigorous discussion).  This is not to say that PacBio makes the best overall instrument, nor that the company will be commercially viable, but rather that the competitive environment is producing change at an extraordinary rate.

If I now take the same plot as above and add a single (putative) MinION nanopore sequencer from Oxford Nanopore (where I have used their performance claims from public presentations; note the question mark on the date), the world again shifts quite dramatically.  Oxford also claims they will ship GridION instruments that essentially consist of racks of MinIONs, but I have not even tried to guess at the performance of that beast.  The resulting sequencing power will alter the shape of the commercial sequencing landscape.  Illumina and Life are not sitting still, of course, but have their own next generation instruments in development.  Jens Gundlach's (PDF) team at the University of Washington has demonstrated a nanopore that is argued to be better than the one Oxford uses, and I understand commercialization is proceeding rapidly, though of course Oxford won't be sitting still either.

One take home message from this, which is highlighted by taking the time to plot this data, is that over the next few years sequencing will become highly accurate, fast, and commonplace.  With the caveat that it is difficult to predict the future, continued competition will result in continued price decreases.

A more speculative take home emerges if you consider the implications of the MinION.  That device is described as a disposable USB sequencer.  If it -- or anything else like it -- works as promised, then some centralized sequencing operations might soon reach the end of their lives.  There are, of course, different kinds of sequencing operations.  If I read the tea leaves correctly, Illumina just reported that its clinical sequencing operations brought in about as much revenue as their other operations combined, including instrument sales.  That's interesting, because it points to two kinds of revenue: sales of boxes and reagents that enable other people to sequence, and certified service operations that provide clinically relevant sequence data.  At the moment, organizations like BGI appear to be generating revenue by sequencing everything under the sun, but cheaper and cheaper boxes might mean that the BGI operations outside of clinical sequencing aren't cost effective going forward.  Once the razors (electric, disposable, whatever) get cheap enough, you no longer bother going to the barber for a shave.

I will continue to work with the data in an effort to make the plots simpler and therefore hopefully more compelling.
Here are updated cost and productivity curves for DNA sequencing and synthesis.  Reading and writing DNA is becoming ever cheaper and easier.  The Economist and others call these "Carlson Curves", a name I am ambivalent about but have come to accept if only for the good advertising.  I've been meaning to post updates for a few weeks; the appearance today of an opinion piece at Wired about Moore's Law serves as a catalyst to launch them into the world.  In particular, two points need some attention, the  notions that Moore's Law 1) is unplanned and unpredictable, and 2) somehow represents the maximum pace of technological innovation.

DNA Sequencing Productivity is Skyrocketing

First up: the productivity curve.  Readers new to these metrics might want to have a look at my first paper on the subject, "The Pace and Proliferation of Biological Technologies" (PDF) from 2003, which describes why I chose to compare the productivity enabled by commercially available sequencing and synthesis instruments to Moore's Law.  (Briefly, Moore's Law is a proxy for productivity; more transistors putatively means more stuff gets done.)  You have to choose some sort of metric when making comparisons across such widely different technologies, and, however much I hunt around for something better, productivity always emerges at the top.

It's been a few years since I updated this chart.  The primary reason for the delay is that, with the profusion of different sequencing platforms, it became somewhat difficult to compare productivity [bases/person/day] across platforms.  Fortunately, a number of papers have come out recently that either directly make that calculation or provide enough information for me to make an estimate.  (I will publish a full bibliography in a paper later this year.  For now, this blog post serves as the primary citation for the figure below.)

Visual inspection reveals a number of interesting things.  First, the DNA synthesis productivity line stops in about 2008 because there have been no new instruments released publicly since then.  New synthesis and assembly technologies are under development by at least two firms, which have announced they will run centralized foundries and not sell instruments.  More on this later.

Second, it is clear that DNA sequencing platforms are improving very rapidly, now much faster than Moore's Law.  This is interesting in itself, but I point it out here because of the post today at Wired by Pixar co-founder Alvy Ray Smith, "How Pixar Used Moore's Law to Predict the Future".  Smith suggests that "Moore's Law reflects the top rate at which humans can innovate. If we could proceed faster, we would," and that "Hardly anyone can see across even the next crank of the Moore's Law clock."

Moore's Law is a Business Model and is All About Planning -- Theirs and Yours

As I have written previously, early on at Intel it was recognized that Moore's Law is a business model (see the Pace and Proliferation paper, my book, and in a previous post, "The Origin of Moore's Law").  Moore's Law was always about economics and planning in a multi-billion dollar industry.  When I started writing about all this in 2000, a new chip fab cost about $1 billion.  Now, according to The Economist, Intel estimates a new chip fab costs about $10 billion.  (There is probably another Law to be named here, something about exponential increases in cost of semiconductor processing as an inverse function of feature size.  Update: This turns out to be Rock's Law.)  Nobody spends $10 billion without a great deal of planning, and in particular nobody borrows that much from banks or other financial institutions without demonstrating a long-term plan to pay off the loan.   Moreover, Intel has had to coordinate the manufacturing and delivery of very expensive, very complex semiconductor processing instruments made by other companies.  Thus Intel's planning cycle explicitly extends many years into the future; the company sees not just the next crank of the Moore's Law clock, but several cranks.  New technology has certainly been required to achieve these planning goals, but that is just part of the research, development, and design process for Intel.  What is clear from comments by Carver Mead and others is that even if the path was unclear at times, the industry was confident that they could to get to the next crank of the clock.

Moore's Law served a second purpose for Intel, and one that is less well recognized but arguably more important; Moore's Law was a pace selected to enable Intel to win.  That is why Andy Grove ran around Intel pushing for financial scale (see "The Origin of Moore's Law").  I have more historical work to do here, but it is pretty clear that Intel successfully organized an entire industry to move at a pace only it could survive.  And only Intel did survive.  Yes, there are competitors in specialty chips and in memory or GPUs, but as far as high volume, general CPUs go, Intel is the last man standing.  Finally, and alas I don't have a source anywhere for this other than hearsay, Intel could have in fact gone faster than Moore's Law.  Here is the hearsay: Gordon Moore told Danny Hillis who told me that Intel could have gone faster.  (If anybody has a better source for that particular point, give me a yell on Twitter.)  The inescapable conclusion from all this is that the management of Intel made a very careful calculation.  They evaluated product roll-outs to consumers, the rate of new product adoption, the rate of semiconductor processing improvements, and the financial requirements for building the next chip fab line, and then set a pace that nobody else could match but that left Intel plenty of headroom for future products.  It was all about planning.

The reason I bother to point all this out is that Pixar was able to use Moore's Law to "predict the future" precisely because Intel meticulously planned that future.  (Calling Alan Kay: "The best way to predict the future is to invent it.")  Which brings us back to biology.  Whereas Moore's Law is all about Intel and photolithography, the reason that productivity in DNA sequencing is going through the roof is competition among not just companies but among technologies.  And we only just getting started.  As Smith writes in his Wired piece, Moore's Law tells you that "Everything good about computers gets an order of magnitude better every five years."  Which is great: it enabled other industries and companies to plan in the same way Pixar did.  But Moore's Law doesn't tell you anything about any other technology, because Moore's Law was about building a monopoly atop an extremely narrow technology base.  In contrast, there are many different DNA sequencing technologies emerging because many different entrepreneurs and companies are inventing the future.

The first consequence of all this competition and invention is that it makes my job of predicting the future very difficult.  This emphasizes the difference between Moore's Law and Carlson Curves (it still feels so weird to write my own name like that): whereas Intel and the semiconductor industry were meeting planning goals, I am simply keeping track of data.  There is no real industry-wide planning in DNA synthesis or sequencing, other than a race to get to the "$1000 genome" before the next guy.  (Yes, there is a vague road-mappy thing promoted by the NIH that accompanied some of its grant programs, but there is little if any coordination because there is intense competition.)

Biological Technologies are Hard to Predict in Part Because They Are Cheaper than Chips

Compared to other industries, the barrier to entry in biological technologies is pretty low.  Unlike chip fabs, there is nothing in biology that costs $10 billion commercially, nor even $1 billion.  (I have come to mostly disbelieve pharma industry claims that developing drugs is actually that expensive, but that is another story for another time.)  The Boeing 787 reportedly cost $32 billion to develop as of 2011, and that is on top of a century of multi-billion dollar aviation projects that had to come before the 787.

There are two kinds of costs that are important to distinguish here.  The first is the cost of developing and commercializing a particular product.  Based on the money reportedly raised and spent by Life, Illumina, Ion Torrent (before acquisition), Pacific Biosciences, Complete Genomics (before acquisition), and others, it looks like developing and marketing second-generation sequencing technology can cost upwards of about $100 million.  Even more money gets spent, and lost, in operations before anybody is in the black.  My intuition says that the development costs are probably falling as sequencing starts to rely more on other technology bases, for example semiconductor processing and sensor technology, but I don't know of any real data.  I would also guess that nanopore sequencing, should it actually become a commercial product this year, will have cost less to develop than other technologies, but, again, that is my intuition based on my time in clean rooms and at the wet bench.  I don't think there is great information yet here, so I will suspend discussion for the time being.

The second kind of cost to keep in mind is the use of new technologies to get something done.  Which brings in the cost curve.  Again, the forthcoming paper will contain appropriate references.
carlson_cost per_base_oct_2012.png
The cost per base of DNA sequencing has clearly plummeted lately.  I don't think there is much to be made of the apparent slow-down in the last couple of years.  The NIH version of this plot has more fine grained data, and it also directly compares the cost of sequencing with the cost per megabyte for memory, another form of Moore's Law.  Both my productivity plot above and the NIH plot show that sequencing has at times improved much faster than Moore's Law, and generally no slower.

If you ponder the various wiggles, it may be true that the fall in sequencing cost is returning to a slower pace after a period in which new technologies dramatically changed the market.  Time will tell.  (The wiggles certainly make prediction difficult.)  One feature of the rapid fall in sequencing costs is that it makes the slow-down in synthesis look smaller; see this earlier post for different scale plots and a discussion of the evaporating maximum profit margin for long, double-stranded synthetic DNA (the difference between the orange and yellow lines above).

Whereas competition among companies and technologies is driving down sequencing costs, the lack of competition among synthesis companies has contributed to a stagnation in price decreases.  I've covered this in previous posts (and in this Nature Biotech article), but it boils down to the fact that synthetic DNA has become a commodity produced using relatively old technology.

Where Are We Headed?

Now, after concluding that the structure of the industry makes it hard to prognosticate, I must of course prognosticate.  In DNA sequencing, all hell is breaking loose, and that is great for the user.  Whether instrument developers thrive is another matter entirely.  As usual with start-ups and disruptive technologies, surviving first contact with the market is all about execution.  I'll have an additional post soon on how DNA sequencing performance has changed over the years, and what the launch of nanopore sequencing might mean.

DNA synthesis may also see some change soon.  The industry as it exists today is based on chemistry that is several decades old.  The common implementation of that chemistry has heretofore set a floor on the cost of short and long synthetic DNA, and in particular the cost of synthetic genes.  However, at least two companies are claiming to have technology that facilitates busting through that cost floor by enabling the use of smaller amounts of poorer quality, and thus less expensive, synthetic DNA to build synthetic genes and chromosomes.

Gen9 is already on the market with synthetic genes selling for something like $.07 per base.  I am not aware of published cost estimates for production, other than the CEO claiming it will soon drop by orders of magnitude.  Cambrian Genomics has a related technology and its CEO suggests costs will immediately fall by 5 orders of magnitude.  Of course, neither company is likely to drop prices so far at the beginning, but rather will set prices to undercut existing companies and grab market share.  Assuming Gen9 and Cambrian don't collude on pricing, and assuming the technologies work as they expect, the existence of competition should lead to substantially lower prices on genes and chromosomes within the year.  We will have to see how things actually work in the market.  Finally, Synthetic Genomics has announced it will collaborate with IDT to sell synthetic genes, but as far as I am aware nothing new is actually shipping yet, nor have they announced pricing.

So, supposedly we are soon going to have lots more, lots cheaper DNA.  But you have to ask yourself who is going to use all this DNA, and for what.  The important business point here is that both Gen9 and Cambrian Genomics are working on the hypothesis that demand will increase markedly (by orders of magnitude) as the price falls.  Yet nobody can design a synthetic genetic circuit with more than a handful of components at the moment, which is something of a bottleneck on demand.  Another option is that customers will do less up-front predictive design and instead do more screening of variants.  This is how Amyris works -- despite their other difficulties, Amyris does have a truly impressive metabolic screening operation -- and there are several start-ups planning to provide similar (or even improved) high-throughput screening services for libraries of metabolic pathways.  I infer this is the strategy at Synthetic Genomics as well.  This all may work out well for both customers and DNA synthesis providers.  Again, I think people are working on an implicit hypothesis of radically increased demand, and it would be better to make the hypothesis explicit in part to identify the risk of getting it wrong.  As Naveen Jain says, successful entrepreneurs are good at eliminating risk, and I worry a bit that the new DNA synthesis companies are not paying enough attention on this point.

There are relatively simple scaling calculations that will determine the health of the industry.  Intel knew that it could grow financially in the context of exponentially falling transistor costs by shipping exponentially more transistors every quarter -- that is the business model of Moore's Law.  Customers and developers could plan product capabilities, just as Pixar did, knowing that Moore's Law was likely to hold for years to come.  But that was in the context of an effective pricing monopoly.  The question for synthetic gene companies is whether the market will grow fast enough to provide adequate revenues when prices fall due to competition.  To keep revenues up, they will then have to ship lots of bases, probably orders of magnitudes more bases.  If prices don't fall, then something screwy is happening.  If prices do fall, they are likely to fall quickly as companies battle for market share.  It seems like another inevitable race to the bottom.  Probably good for the consumer; probably bad for the producer.

(Updated)  Ultimately, for a new wave of DNA synthesis companies to be successful, they have to provide the customer something of value.  I suspect there will be plenty of academic customers for cheaper genes.  However, I am not so sure about commercial uptake.  Here's why: DNA is always going to be a small cost of developing a product, and it isn't obvious making that small cost even cheaper helps your average corporate lab.

In general, the R part of R&D only accounts for 1-10% of the cost of the final product.  The vast majority of development costs are in polishing up the product into something customers will actually buy.  If those costs are in the neighborhood of $50-100 million, the reducing the cost of synthetic DNA from $50,000 to $500 is nice, but the corporate scientist-customer is more worried about knocking a factor of two, or an order of magnitude, off the $50 million.  This means that in order to make a big impact (and presumably to increase demand adequately) radically cheaper DNA must be coupled to innovations that reduce the rest of the product development costs.  As suggested above, forward design of complex circuits is not going to be adequate innovation any time soon.  The way out here may be high-throughput screening operations that enable testing many variant pathways simultaneously.  But note that this is not just another hypothesis about how the immediate future of engineering biology will change, but another unacknowledged hypothesis.  It might turn out to be wrong.

The upshot, just as I wrote in 2003, is that the market dynamics of biological technologies will  remain difficult to predict precisely because of the diversity of technology and the difficulty of the tasks at hand.  We can plan on prices going down; how much, I wouldn't want to predict.

The Arrival of Nanopore Sequencing

(Update 1 March: Thanks to the anonymous commenter who pointed out the throughput estimates for existing instruments were too low.)

You may have heard a little bit of noise about nanopore sequencing in recent weeks.  After many years of development, Oxford Nanopore promises that by the end of the year we will be able to read DNA sequences by threading them through the eye of a very small needle.

How It Works: Directly Reading DNA

The basic idea is not new: as a long string of DNA pass through a small hole, its components -- the bases A, T, G, and C -- plug that hole to varying degrees.  As they pass through the hole, in this case an engineered pore protein derived from one found in nature, each base has slightly different interactions with the walls of the pore.  As a result, while passing through the pore each base lets different numbers of salt ions through, which allows one to distinguish between the bases by measuring changes in electrical current.  Because this method is a direct physical interrogation of the chemical structure of each base, it is in principal much, much faster than any of the indirect sequencing technologies that have come before.

There have been a variety of hurdles to clear to get nanopore sequencing working.  First you have to use a pore that is small enough to produce measurable changes in current.  Next the speed of the DNA must be carefully controlled so that the signal to noise ratio is high enough.  The pore must also sit in an insulating membrane of some sort, surrounded by the necessary electrical circuitry, and to become a useful product the whole thing must be easily assembled in an industrial manner and be mechanically stable through shipping and use.

Oxford Nanopore claims to have solved all those problems.  They recently showed off a disposable version of their technology -- called the MinIon -- containing 512 pores built into a disposable USB stick.  This puts to shame the Lava Amp, my own experiment with building a USB peripheral for molecular biology.  Here is one part I find extremely impressive -- so impressive it is almost hard to believe: Oxford claims they have reduced the sample handling to single (?) pipetting step.  Clive Brown, Oxford CTO, says "Your fluidics is a Gilson."  (A "Gilson" would be a brand of pipetter.)  That would be quite something.

I've spent a good deal of my career trying to develop simple ways of putting biological samples into microfluidic doo-dads of one kind or another.  It's never trivial, it's usually a pain in the ass, and sometimes it's a showstopper.  Blood, in particular, is very hard to work with.  If Oxford has made this part of the operation simple, then they have a winning technology just based on everyday ease of use -- what sometimes goes by the labels of "user experience" or "human factors".  Compared to the complexity of many other laboratory protocols, it would be like suddenly switching from MS DOS to OS X in one step.

How Well Does it Work?

The challenge for fast sequencing is to combine throughput (bases per hour) with read length (the number of contiguous bases read in one go).  Existing instruments have throughputs in the range of 10-55,000 megabases/day and read lengths from tens of bases to about 800 bases.  (See chart below.)  Nick Loman reports that using the MinIon Oxford has already run DNA of 5000 to 100,000 bases (5 kB to 100 kB) at speeds of 120-1000 bases per minute per pore, though accuracy suffers above 500 bases per minute.  So a single USB stick can run easily run at 150 megabases (MB) per hour, which basically means you can sequence full-length eukaryotic chromosomes in about an hour.  Over the next year or so, Oxford will release the GridIon instrument that will have 4 and then 16 times as many pores.  Presumably that means it will be 16 times as fast.  The long read lengths mean that processing the resulting sequence data, which usually takes longer than the actual sequencing itself, will be much, much faster.

This is so far beyond existing commercial instruments that it sounds like magic.  Writing in Forbes, Matthew Herper quotes Jonathan Rothberg, of sequencing competitor Ion Torrent, as saying "With no data release how do you know this is not cold fusion? ... I don't believe it."  Oxford CTO Clive Brown responded to Rothberg in the comments to Herper's post in a very reasonable fashion -- have a look.

Of course I want to see data as much as the next fellow, and I will have to hold one of those USB sequencers in my own hands before I truly believe it.  Rothberg would probably complain that I have already put Oxford on the "performance tradeoffs" chart before they've shipped any instruments.  But given what I know about building instruments, I think immediately putting Oxford in the same bin as cold fusion is unnecessary.

Below is a performance comparison of sequencing instruments originally published by Bio-era in Genome Synthesis and Design Futures in 2007.  (Click on it for a bigger version.)  I've hacked it up to include the approximate performance range of 2nd generation sequencers from Life, Illumina, etc, as well for a single MinIon.  That's one USB stick, with what we're told is a few minutes worth of sample prep.  How many can you run at once?  Notice the scale on the x-axis, and the units on the y-axis.  If it works as promised, the MinIon is so vastly better than existing machines that the comparison is hard to make.  If I replotted that data with log axis along the bottom then all the other technologies would be cramped up together way off to the left. (The data comes from my 2003 paper, The Pace and Proliferation of Biological Technologies (PDF), and from Service, 2006, The Race for the $1000 Genome).
Carlson_sequencer_performanc_2012.png The Broader Impact

Later this week I will try to add the new technologies to the productivity curve published in the 2003 paper.  Here's what it will show: biological technologies are improving at exceptional paces, leaving Moore's Law behind.  This is no surprise, because while biology is getting cheaper and faster, the density of transistors on chips is set by very long term trends in finance and by SEMATECH; designing and fabricating new semiconductors is crazy expensive and requires coordination across an entire industry. (See The Origin of Moore's Law and What it May (Not) Teach Us About Biological Technologies.)  In fact, we should expect biology to move much faster than semiconductors. 

Here are a few graphs from the 2003 paper:

...The long term distribution and development of biological technology is likely to be largely unconstrained by economic considerations. While Moore's Law is a forecast based on understandable large capital costs and projected improvements in existing technologies, which to a great extent determined its remarkably constant behavior, current progress in biology is exemplified by successive shifts to new technologies. These technologies share the common scientific inheritance of molecular biology, but in general their implementations as tools emerge independently and have independent scientific and economic impacts. For example, the advent of gene expression chips spawned a new industrial segment with significant market value. Recombinant DNA, gel and capillary sequencing, and monoclonal antibodies have produced similar results. And while the cost of chip fabs has reached upwards of one billion dollars per facility and is expected to increase [2012 update: it's now north of $6 billion], there is good reason to expect that the cost of biological manufacturing and sequencing will only decrease. [Update 2012: See "New Cost Curves" for DNA synthesis and sequencing.]

These trends--successive shifts to new technologies and increased capability at decreased cost--are likely to continue. In the fifteen years that commercial sequencers have been available, the technology has progressed ... from labor intensive gel slab based instruments, through highly automated capillary electrophoresis based machines, to the partially enzymatic Pyrosequencing process. These techniques are based on chemical analysis of many copies of a given sequence. New technologies under development are aimed at directly reading one copy at a time by directly measuring physical properties of molecules, with a goal of rapidly reading genomes of individual cells.  While physically-based sequencing techniques have historically faced technical difficulties inherent in working with individual molecules, an expanding variety of measurement techniques applied to biological systems will likely yield methods capable of rapid direct sequencing.

Cue nanopore sequencing. 

A few months ago I tweeted that I had seen single strand DNA sequence data generated using a nanopore -- it wasn't from Oxford. (Drat, can't find the tweet now.)  I am certain there are other labs out there making similar progress.  On the commercial front, Illumina is an investor in Oxford, and Life has invested in Genia.  As best I can tell, once you get past the original pore sequencing IP, which it appears is being licensed broadly, there appear to be many measurement approaches, many pores, and many membranes that could be integrated into a device.  In other words, money and time will be the primary barriers to entry.

(For the instrumentation geeks out there, because the pore is larger than a single base, the instrument actually measures the current as three bases pass through the pore.  Thus you need to be able to distinguish 4^3=64 levels of current, which Oxford claims they can do.  The pore set-up I saw in person worked the same way, so I certainly believe this is feasible.  Better pores and better electronics might reduce the physical sampling to 1 or 2 bases eventually, which should result in faster instruments.)

It may be that Oxford will have a first mover advantage for nanopore instruments, and it may be that they have amassed sufficient additional IP to make it rough for competitors.  But, given the power of the technology, the size of the market, and the number of academic competitors, I can't see that over the long term this remains a one-company game.

Not every sequencing task has the same technical requirements, so instruments like the Ion Torrent won't be put to the curbside.  And other technologies will undoubtedly come along that perform better in some crucial way than Oxford's nanopores.  We really are just at the beginning of the revolution in biological technologies.  Recombinant DNA isn't even 40 years old, and the electronics necessary for nanopore measurements only became inexpensive and commonplace in the last few years.  However impressive nanopore sequencing seems today, the greatest change is yet to come.

New Cost Curves

Sitting here at Synthetic Biology 5.0, it's time to update the DNA synthesis and sequencing cost curves. (Here is some prior commentary on why these curves are slow, are fast, and have the shape they do.) Here you go:

carlson_cost per_base_june_2011.png

The Economist has just posted my invited comments on their current debate: "This house believes the development of computing was the most significant technological advance of the 20th century."

As with the last time I was invited to be a "guest speaker" (just one of the oddities of horning an Oxford-style debate into an online shoe), I have difficulty coloring between the lines.  Here are the first couple of graphs of today's contribution:

The development of computing--broadly construed--was indeed the most significant technological advance of the 20th century. New technologies, however, never crop up by themselves, but are instead part of the woven web of human endeavour. There is always more to a given technology than meets the eye.

We often oversimplify "computing" and think only of software or algorithms used to manipulate information. That information comes in units of bits, and our ability to store and crunch those bits has certainly changed our economies and societies over the past century. But those bits reside on a disk, or in a memory circuit, and the crunching of bits is done by silicon chips. Those disks, circuits and chips had to improve so that computing could advance.

Progress in building computers during the mid-20th century required first an understanding of materials and how they interact; from this knowledge, which initially lived on paper and in the minds of scientists and engineers, were built the first computer chips. As those chips increased in complexity, so did the computational power they conferred on computer designers. That computational power was used to design more powerful chips, creating a feedback loop. By the end of the century, new chips and software packages could only be designed using computers, and their complex behaviour could only be understood with the aid of computers.

The development of computing, therefore, required not just development of software but also of the ability to build the physical infrastructure that runs software and stores information. In other words, our improving ability to control atoms in the service of building computers was crucial to advancing the technology we call "computing". Advances in controlling atoms have naturally been extended to other areas of human enterprise. Computer-aided design and manufacturing have radically changed our ability to transform ideas into objects. Our manufactured world--which includes cars, aircraft, medicines, food, music, phones and even shoes--now arrives at our doorsteps as a consequence of this increase in computational power.

I go on to observe that computation is already having an effect on food through increased corn yields courtesy of gene sequencing and expression analysis.

Like so:

Biodesic_US_corn_yield.pngClick through to read the rest.

Recent DNA Cost and Productivity Figures from The Economist

Hacking Goes Squishy, September 2009

Genesis Redux, May 2010

What Lies Within, August 2010

Booting Up A Synthetic Genome (Updated for typos)

The press is all abuzz over the Venter Institute's paper last week demonstrating a functioning synthetic genome.  Here is the Gibson et al paper in Science, and here are takes from the NYT and The Economist (lede, story).  The Economist story has a figure with the cost and productivity data for gene and oligo synthesis, respectively.  Here also are Jamais Cascio and Oliver Morton, who points to this collection of opinions in Nature.

The nuts and bolts (or bases and methylases?) of the story are this: Gibson et al ordered a whole mess of pieces of relatively short, synthetic DNA from Blue Heron and stitched that DNA together into full length genome for Bug B, which they then transplanted into a related microbial species, Bug A.  The transplanted genome B was shown to be fully functional and to change the species from old to new, from A to B.  Cool.

Yet, my general reaction to this is the same as it was the last time the Venter team claimed they were creating artificial life.  (How many times can one make this claim?)  The assembly and boot-up are really fantastic technical achievements.  (If only we all had the reported $40 million to throw at a project like this.)  But creating life, and the even the claim of creating a "synthetic cell"?  Meh.

(See my earlier posts, "Publication of the Venter Institute's synthetic bacterial chromosome", January 2008, and "Updated Longest Synthetic DNA Plot ", December 2007.)

I am going to agree with my friends at The Economist (see main story) that the announcement is "not unexpected", and disagree strongly that "The announcement is momentous."  DNA is DNA.  We have known that for, oh, a long time now.  Synthetic DNA that is biologically indistinguishable from "natural DNA" is, well, biologically indistinguishable from natural DNA.  This result is at least thirty years old, when synthetic DNA was first used to cause an organism to do something new.  There are plenty of other people saying this in print, so I won't belabor the point; see, for example, the comments in the NYT article.

One less-than-interesting outcome of this paper is that we are once again going to read all about the death of vitalism (see the Nature opinion pieces).  Here are the first two paragraphs from Chapter 4 of my book:

"I must tell you that I can prepare urea without requiring a kidney of an animal, either man or dog." With these words, in 1828 Friedrich Wöhler claimed he had irreversibly changed the world. In a letter to his former teacher Joens Jacob Berzelius, Wöhler wrote that he had witnessed "the great tragedy of science, the slaying of a beautiful hypothesis by an ugly fact." The beautiful idea to which he referred was vitalism, the notion that organic matter, exemplified in this case by urea, was animated and created by a vital force and that it could not be synthesized from inorganic components. The ugly fact was a dish of urea crystals on his laboratory bench, produced by heating inorganic salts. Thus, many textbooks announce, was born the field of synthetic organic chemistry.

As is often the case, however, events were somewhat more complicated than the textbook story. Wöhler had used salts prepared from tannery wastes, which adherents to vitalism claimed contaminated his reaction with a vital component. Wöhler's achievement took many years to permeate the mind-set of the day, and nearly two decades passed before a student of his, Hermann Kolbe, first used the word "synthesis" in a paper to describe a set of reactions that produced acetic acid from its inorganic elements.
Care to guess where the nucleotides came from that went into the Gibson et al synthetic genome?  Probably purified and reprocessed from sugarcane.  Less probably salmon sperm.  In other words, the nucleotides came from living systems, and are thus tainted for those who care about such things.  So much for another nail in the vital coffin.

Somewhat more intriguing will be the debate around whether it is the atoms in the genome that are interesting or instead the information conveyed by the arrangement of those atoms that we should care about.  Clearly, if nothing else this paper demonstrates that the informational code determines species.  This isn't really news to anyone who has thought about it (except, perhaps, to IP lawyers -- see my recent post on the breast cancer gene lawsuit) but it might get a broader range of people thinking more about life as information.  What then, does "creating life" mean?  Creating information?  Creating sequence?  And what sort of design tools do we need to truly control these creations?  Are we just talking about much better computer simulations, or is there more physics to learn, or is it all just too complicated?  Will we be forever chasing away ghosts of vitalism?

That's all I have for deep meaning at the moment.  I've hardly just got off one set of airplanes (New York-DC-LA) and have to get on another for Brazil in the morning. 

I would, however, point out that the recent paper describes what may be a species-specific processing hack.  From the paper:

...Initial attempts to extract the M. mycoides genome from yeast and transplant it into M. capricolum failed. We discovered that the donor and recipient mycoplasmas share a common restriction system. The donor genome was methylated in the native M. mycoides cells and was therefore protected against restriction during the transplantation from a native donor cell. However, the bacterial genomes grown in yeast are unmethylated and so are not protected from the single restriction system of the recipient cell. We were able to overcome this restriction barrier by methylating the donor DNA with purified methylases or crude M. mycoides or M. capricolum extracts, or by simply disrupting the recipient cell's restriction system.
This methylation trick will probably -- probably -- work just fine for other microbes, but I just want to point out that it isn't necessarily generalizable and that the JVCI team didn't demonstrate any such thing.  The team got this one bug working, and who knows what surprises wait in store for the next team working on the next bug.

Since Gibson et al have in fact built an impressive bit of DNA, here is an updated "Longest Synthetic DNA Plot" (here is the previous version with refs.); alas, the one I published just a few months ago in Nature Biotech is already obsolete (hmph, they have evidently now stuck it behind a pay wall).

Thumbnail image for carlson_longest_sDNA_2010.pngA couple of thoughts:  As I noted in DNA Synthesis "Learning Curve": Thoughts on the Future of Building Genes and Organisms (July 2008), it isn't really clear to me that this game can go on for much longer.  Once you hit a MegaBase (1,000,000 bases, or 1 MB) in length, you are basically at a medium-long microbial genome.  Another order of magnitude or so gets you to eukaryotic chromosomes, and why would anyone bother building a contiguous chuck of DNA longer than that?  Eventually you get into all the same problems that the artificial chromosome community has been dealing with for decades -- namely that chromatin structure is complex and nobody really knows how to build something like it from scratch.  There is progress, yes, and as soon as we get a real mammalian artificial chromosome all sorts of interesting therapies should become possible (note to self: dig into the state of the art here -- it has been a few years since I looked into artificial chromosomes).  But with the 1 MB milestone I suspect people will begin to look elsewhere and the typical technology development S-curve kicks in.  Maybe the curve has already started to roll over, as I predicted (sketched in) with the Learning Curve. 

Finally, I have to point out that the ~1000 genes in the synthetic genome are vastly more than anybody knows how to deal with in a design framework.  I doubt very much that the JCVI team, or the team at Synthetic Genomics, will be using this or any other genome in any economically interesting bug any time soon.  As I note in Chapter 8 of Biology is Technology, Jay Keasling's lab and the folks at Amyris are playing with only about 15 genes.  And getting the isoprenoid pathway working (small by the Gibson et al standard but big by the everyone-else standard) took tens of person years and about as much investment (roughly ~$50 million in total by the Gates Foundation and investors) as Venter spent on synthetic DNA alone.  And then is Synthetic Genomics going to start doing metabolic engineering in a microbe that they only just sequenced and about which relatively little is known (at least compared with E. coli, yeast, and other favorite lab animals)?  Or they are going to redo this same genome synthesis project in a bug that is better understood and will serve as a platform or chassis?  Either way, really?  The company has hundreds of millions of dollars in the bank to spend on this sort of thing, but I simply don't understand what the present publication has to do with making any money.

So, in summary: very cool big chuck of synthetic DNA being used to run a cell.  Not artificial life, and neither artificial cell nor synthetic cell.  Probably not going to show up in a product, or be used to make a product, for many years.  If ever.  Confusing from the standpoint of project management, profit, and economic viability.

But I rather hope somebody proves me wrong about that and surprises me soon with something large, synthetic, and valuable.  That way lies truly world changing biological technologies.

Data and References for Longest Published sDNA

Various hard drive crashes have several times wiped out my records for the longest published synthetic DNA (sDNA).  I find that I once again need the list of references to finish off the edits for the book.  I will post them in the open here so that I, and everyone else, will always have access to them.

longest sDNA 2008.png

Year Length Refs
1979 207 Khorana (1979)
1990 2100 Mandecki (1990)
1995 2700 Stemmer (1995)
2002 7500 Cello (2002)
2004.4 14600 Tian (2004)
2004.7 32000 Kodumal (2004)
2008 583000 Gibson (2008)

Total synthesis of a gene
HG Khorana
Science 16 February 1979:
Vol. 203. no. 4381, pp. 614 - 625

A totally synthetic plasmid for general cloning, gene expression and mutagenesis in Escherichia coli
Wlodek Mandecki, Mark A. Hayden, Mary Ann Shallcross and Elizabeth Stotland
Gene Volume 94, Issue 1, 28 September 1990, Pages 103-107

Single-step assembly of a gene and entire plasmid from large numbers of oligodeoxyribonucleotides
Willem P. C. Stemmer, Andreas Crameria, Kim D. Hab, Thomas M. Brennanb and Herbert L. Heynekerb
Gene Volume 164, Issue 1, 16 October 1995, Pages 49-53

Chemical Synthesis of Poliovirus cDNA: Generation of Infectious Virus in the Absence of Natural Template
Jeronimo Cello, Aniko V. Paul, Eckard Wimmer
Science 9 August 2002: Vol. 297. no. 5583, pp. 1016 - 1018

Accurate multiplex gene synthesis from programmable DNA microchips
Jingdong Tian, Hui Gong, Nijing Sheng, Xiaochuan Zhou, Erdogan Gulari, Xiaolian Gao & George Church
Nature 432, 1050-1054 (23 December 2004)

Total synthesis of long DNA sequences: Synthesis of a contiguous 32-kb polyketide synthase gene cluster
Sarah J. Kodumal, Kedar G. Patel, Ralph Reid, Hugo G. Menzella, Mark Welch, and Daniel V. Santi
PNAS November 2, 2004 vol. 101 no. 44 15573-15578

Complete Chemical Synthesis, Assembly, and Cloning of a Mycoplasma genitalium Genome
Daniel G. Gibson, Gwynedd A. Benders, Cynthia Andrews-Pfannkoch, Evgeniya A. Denisova, Holly Baden-Tillson, Jayshree Zaveri, Timothy B. Stockwell, Anushka Brownley, David W. Thomas, Mikkel A. Algire, Chuck Merryman, Lei Young, Vladimir N. Noskov, John I. Glass, J. Craig Venter, Clyde A. Hutchison, III, Hamilton O. Smith
Science 29 February 2008: Vol. 319. no. 5867, pp. 1215 - 1220

While writing a proposal for a new project, I've had occasion to dig back into Moore's Law and its origins.  I wonder, now, whether I peeled back enough of the layers of the phenomenon in my book.  We so often hear about how more powerful computers are changing everything.  Usually the progress demonstrated by the semiconductor industry (and now, more generally, IT) is described as the result of some sort of technological determinism instead of as the result of a bunch of choices -- by people -- that produce the world we live in.  This is on my mind as I continue to ponder the recent failure of Codon Devices as a commercial enterprise.  In any event, here are a few notes and resources that I found compelling as I went back to reexamine Moore's Law.

What is Moore's Law?

First up is a 2003 article from Ars Technica that does a very nice job of explaining the why's and wherefore's: "Understanding Moore's Law".  The crispest statement within the original 1965 paper is "The number of transistors per chip that yields the minimum cost per transistor has increased at a rate of roughly a factor of two per year."  At it's very origins, Moore's Law emerged from a statement about cost, and economics, rather than strictly about technology.

I like this summary from the Ars Technica piece quite a lot:

Ultimately, the number of transistors per chip that makes up the low point of any year's curve is a combination of a few major factors (in order of decreasing impact):

  1. The maximum number of transistors per square inch, (or, alternately put, the size of the smallest transistor that our equipment can etch),
  2. The size of the wafer
  3. The average number of defects per square inch,
  4. The costs associated with producing multiple components (i.e. packaging costs, the costs of integrating multiple components onto a PCB, etc.)
In other words, it's complicated.  Notably, the article does not touch on any market-associated factors, such as demand and the financing of new fabs.

The Wiki on Moore's Law has some good information, but isn't very nuanced.

Next, here an excerpt from an interview Moore did with Charlie Rose in 2005:

Charlie Rose:     ...It is said, and tell me if it's right, that this was part of the assumptions built into the way Intel made it's projections. And therefore, because Intel did that, everybody else in the Silicon Valley, everybody else in the business did the same thing. So it achieved a power that was pervasive.

Gordon Moore:   That's true. It happened fairly gradually. It was generally recognized that these things were growing exponentially like that. Even the Semiconductor Industry Association put out a roadmap for the technology for the industry that took into account these exponential growths to see what research had to be done to make sure we could stay on that curve. So it's kind of become a self-fulfilling prophecy.

Semiconductor technology has the peculiar characteristic that the next generation always makes things higher performance and cheaper - both. So if you're a generation behind the leading edge technology, you have both a cost disadvantage and a performance disadvantage. So it's a very non-competitive situation. So the companies all recognize they have to stay on this curve or get a little ahead of it.
Keeping up with 'the Law' is as much about the business model of the semiconductor industry as about anything else.  Growth for the sake of growth is an axiom of western capitalism, but it is actually a fundamental requirement for chipmakers.  Because the cost per transistor is expected to fall exponentially over time, you have to produce exponentially more transistors to maintain your margins and satisfy your investors.  Therefore, Intel set growth as a primary goal early on.  Everyone else had to follow, or be left by the wayside.  The following is from the recent Briefing in The Economist on the semiconductor industry:

...Even the biggest chipmakers must keep expanding. Intel today accounts for 82% of global microprocessor revenue and has annual revenues of $37.6 billion because it understood this long ago. In the early 1980s, when Intel was a $700m company--pretty big for the time--Andy Grove, once Intel's boss, notorious for his paranoia, was not satisfied. "He would run around and tell everybody that we have to get to $1 billion," recalls Andy Bryant, the firm's chief administrative officer. "He knew that you had to have a certain size to stay in business."

Grow, grow, grow

Intel still appears to stick to this mantra, and is using the crisis to outgrow its competitors. In February Paul Otellini, its chief executive, said it would speed up plans to move many of its fabs to a new, 32-nanometre process at a cost of $7 billion over the next two years. This, he said, would preserve about 7,000 high-wage jobs in America. The investment (as well as Nehalem, Intel's new superfast chip for servers, which was released on March 30th) will also make life even harder for AMD, Intel's biggest remaining rival in the market for PC-type processors.

AMD got out of the atoms business earlier this year by selling its fab operations to a sovereign wealth fund run by Abu Dhabi.  We shall see how they fare as a bits-only design firm, having sacrificed their ability to themselves push (and rely on) scale.

Where is Moore's Law Taking Us?

Here are a few other tidbits I found interesting:

Re the oft-forecast end of Moore's Law, here is Michael Kanellos at CNET grinning through his prose: "In a bit of magazine performance art, Red Herring ran a cover story on the death of Moore's Law in February--and subsequently went out of business."

And here is somebody's term paper (no disrespect there -- it is actually quite good, and is archived at Microsoft Research) quoting an interview with Carver Mead:

Carver Mead (now Gordon and Betty Moore Professor of Engineering and Applied Science at Caltech) states that Moore's Law "is really about people's belief system, it's not a law of physics, it's about human belief, and when people believe in something, they'll put energy behind it to make it come to pass." Mead offers a retrospective, yet philosophical explanation of how Moore's Law has been reinforced within the semiconductor community through "living it":

After it's [Moore's Law] happened long enough, people begin to talk about it in retrospect, and in retrospect it's really a curve that goes through some points and so it looks like a physical law and people talk about it that way. But actually if you're living it, which I am, then it doesn't feel like a physical law. It's really a thing about human activity, it's about vision, it's about what you're allowed to believe. Because people are really limited by their beliefs, they limit themselves by what they allow themselves to believe what is possible. So here's an example where Gordon [Moore], when he made this observation early on, he really gave us permission to believe that it would keep going. And so some of us went off and did some calculations about it and said, 'Yes, it can keep going'. And that then gave other people permission to believe it could keep going. And [after believing it] for the last two or three generations, 'maybe I can believe it for a couple more, even though I can't see how to get there'. . . The wonderful thing about [Moore's Law] is that it is not a static law, it forces everyone to live in a dynamic, evolving world.
So the actual pace of Moore's Law is about expectations, human behavior, and, not least, economics, but has relatively little to do with the cutting edge of technology or with technological limits.  Moore's Law as encapsulated by The Economist is about the scale necessary to stay alive in the semiconductor manufacturing business.  To bring this back to biological technologies, what does Moore's Law teach us about playing with DNA and proteins?  Peeling back the veneer of technological determinism enables us (forces us?) to examine how we got where we are today. 

A Few Meandering Thoughts About Biology

Intel makes chips because customers buy chips.  According to The Economist, a new chip fab now costs north of $6 billion.  Similarly, companies make stuff out of, and using, biology because people buy that stuff.  But nothing in biology, and certainly not a manufacturing plant, costs $6 billion.

Even a blockbuster drug, which could bring revenues in the range of $50-100 billion during its commercial lifetime, costs less than $1 billion to develop.  Scale wins in drug manufacturing because drugs require lots of testing, and require verifiable quality control during manufacturing, which costs serious money.

Scale wins in farming because you need...a farm.  Okay, that one is pretty obvious.  Commodities have low margins, and unless you can hitch your wagon to "eat local" or "organic" labels, you need scale (volume) to compete and survive.

But otherwise, it isn't obvious that there are substantial barriers to participating in the bio-economy.  Recalling that this is a hypothesis rather than an assertion, I'll venture back into biofuels to make more progress here.

Scale wins in the oil business because petroleum costs serious money to extract from the ground, because the costs of transporting that oil are reduced by playing a surface-to-volume game, and because thermodynamics dictates that big refineries are more efficient refineries.  It's all about "steel in the ground", as the oil executives say -- and in the deserts of the Middle East, and in the Straights of Malacca, etc.  But here is something interesting to ponder: oil production may have maxed out at about 90 million barrels a day (see this 2007 article in the FT, "Total chief warns on oil output").  There may be lots of oil in the ground around the world, but our ability to move it to market may be limited.  Last year's report from Bio-era, "The Big Squeeze", observed that since about 2006, the petroleum market has in fact relied on biofuels to supply volumes above the ~90 million per day mark.  This leads to an important consequence for distributed biofuel production that only recently penetrated my thick skull.

Below the 90 million barrel threshold, oil prices fall because supply will generally exceed demand (modulo games played by OPEC, Hugo Chavez, and speculators).  In that environment, biofuels have to compete against the scale of the petroleum markets, and margins on biofuels get squeezed as the price of oil falls.  However, above the 90 million per day threshold, prices start to rise rapidly (perhaps contributing to the recent spike, in addition to the actions of speculators).  In that environment, biofuels are competing not with petroleum, but with other biofuels.  What I mean is that large-scale biofuels operations may have an advantage when oil prices are low because large-scale producers -- particularly those making first-generation biofuels, like corn-based ethanol, that require lots of energy input -- can eke out a bit more margin through surface to volume issues and thermodynamics.  But as prices rise, both the energy to make those fuels and the energy to move those fuels to market get more expensive.  When the price of oil is high, smaller scale producers -- particularly those with lower capital requirements, as might come with direct production of fuels in microbes -- gain an advantage because they can be more flexible and have lower transportation costs (being closer to the consumer).  In this price-volume regime, petroleum production is maxed out and small scale biofuels producers are competing against other biofuels producers since they are the only source of additional supply (for materials, as well as fuels).

This is getting a bit far from Moore's Law -- the section heading does contain the phrase "meandering thoughts" -- I'll try to bring it back.  Whatever the origin of the trends, biological technologies appear to be the same sort of exponential driver for the economy as are semiconductors.  Chips, software, DNA sequencing and synthesis: all are infrastructure that contribute to increases in productivity and capability further along the value chain in the economy.  The cost of production for chips (especially the capital required for a fab) is rising.  The cost of production for biology is falling (even if that progress is uneven, as I observed in the post about Codon Devices).  It is generally becoming harder to participate in the chip business, and it is generally becoming easier to participate in the biology business.  Paraphrasing Carver Mead, Moore's Law became an organizing principal of an industry, and a driver of our economy, through human behavior rather than through technological predestination.  Biology, too, will only become a truly powerful and influential technology through human choices to develop and deploy that technology.  But access to both design tools and working systems will be much more distributed in biology than in hardware.  It is another matter whether we can learn to use synthetic biological systems to improve the human condition to the extent we have through relying on Moore's Law. 

Gene Synthesis Cost Update

While at iGEM this past weekend, I learned that GeneArt is now charging $.55 per base for ~1 kB synthesis jobs, with delivery within 10 days.

Here is an interesting tidbit: They only charged iGEM teams $.20 per base.  Anybody have any idea whether this represents their internal cost, and how much margin this might include?

Here is an updated plot for synthesis and sequencing cost.  No new data, just a new rendering.

(Update: 12 November, 2008.  There is a news piece in last week's Nature that claims Illumina's Genome Analyzer (GA1) was just used to sequence a whole genome in 8 weeks for $250K.  However, the paper describing that sequencing efforts says:

We generated 135 Gb of sequence (4 billion paired 35-base reads) over a period of 8 weeks (December 2007 to January 2008) on six GA1 instruments averaging 3.3 Gb per production run. The approximate consumables cost (based on full list price of reagents) was $250,000.

Thus the price does not include labor, and is not a true commercial cost (labor is only truly free for professors).

I am therefore not sure if/how this price can be compared to the prices in the figure below.

Update 2: I fixed the significant figure issue with the cost axis.  Alas, Open Office does not give great control over the appearance of the digits.)


With experience comes skill and efficiency.  That is the theory behind "learning" or "experience curves", which I played around with last week for DNA sequencing.  As promised, here are a few thoughts on the future of DNA synthesis.  Playing around with the synthesis curves a bit seems to kick out a couple of quantitative metrics for technological change.

For everything below, clicking on a Figure launches a pop-up with a full sized .jpg.  The data come from my papers, the Bio-era "Genome Synthesis and Design Futures" report, and a couple of my blog posts over the last year.

Figure 1.

The simplest application of a learning curve to DNA synthesis is to compare productivity with cost.  Figure 1 shows those curves for both oligo synthesis and gene synthesis (click on the figure for a larger pop-up).  These lines are generated by taking the ratios of fits to data (shown in the inset).  This is necessary due to the methodological annoyance that productivity and cost data do not overlap -- the fits allow comparison of trends even when data is missing from one set or another.  As before, 1) I am not really thrilled to rely on power law fits to a small number of points, and 2) the projections (dashed lines) are really just for the sake of asking "what if?".

What can we learn from the figure?  First, the two lines cover different periods of time.  Thus it isn't completely kosher to compare them directly.  But with that in mind, we come to the second point: even the simple cost data in the inset makes clear that the commercial cost of synthetic genes is rapidly approaching the cost of the constituent single-stranded oligos. This is the result of competition, and is almost certainly due to new technologies introduced by those competitors.

Assuming that commercial gene foundries are making money, the "Assembly Cost" is probably falling because of increased automation and other gains in efficiency.  But it can't fall to zero, and there will (probably?) always be some profit margin for genes over oligos.  I am not going to guess at how low the Assembly Cost can fall, and the projections are drawn in by hand just for illustration.


Figure 2.

It isn't clear that a couple of straight lines in Figure 1 teach us much about the future, except in pondering the shrinking margins of gene foundries.  But combining the productivity information with my "Longest Synthetic DNA" plot gives a little more to chew on.  Figure 2 is a ratio of a curve fitted to the longest published synthetic DNA (sDNA) to the productivity curve.

In what follows, remember that the green line is based on data.

First, the caveat: the fit to the longest sDNA is basically a hand hack.  On a semilog plot I fit a curve consisting of a logarithm and a power law (not shown).  That means the actual functional form (on the original data) is a linear term plus a super power law in which the exponent increases with time.  There isn't any rationale for this function other than it fits the crazy data (in the inset), and I would be oh-so-wary of inferring anything deep from it.  Perhaps one could make the somewhat trivial observation that for a long time synthesizing DNA was hard (the linear regime), and then we entered a period when it has become progressively easier (the super power law).  I should probably win a prize for that.  No?  A lollipop?

There are a couple of interesting things about this curve, along which distance represents "progress".  First, so far as I am aware, commercial oligo synthesis started in 1992 and commercial gene foundries starting showing up in 1999.  The distance along the curve in those seven years is quite short, while the distance over the next nine years to the Venter Institute's recent synthetic chromosome is substantially larger.

This change in distance/speed represents some sort of quantitative measure of accelerating progress in synthesizing genomes, though frankly I am not yet settled on what the proper metric should be.  That is, how exactly should one measure distance or speed along this curve?  And then, given proper caution about the utility of the underlying fits to data, how seriously should one trust the metric?  Maybe it is just fine as is.  I am still pondering this.

Next, while the "learning curve" is presently "concave up", it really ought to turn over and level off sometime soon.  As I argued in the post on the Venter Institute's fine technical achievement, they are already well beyond what will be economically interesting for the foreseeable future, which is probably only 10-50 kilobases (kB).  It isn't at all clear that assembling sDNA larger than 100 kB will be anything more than an academic demonstration.  The red octagon (hint!) is positioned at about 100 MB, which is in the range of a human chromosome.  Even assembling something that large, and then using it to fabricate an artificial human chromosome, is probably not technologically that useful.  I reserve a bit of judgement here in the event it turns out that actually building functioning human chromosomes from smaller pieces is problematic.  But really, why bother otherwise?

Figure 3.

Next, with the other curves in hand I couldn't help but compare the longest sDNA to gene assembly cost (beware the products of actual free time!).  (Update: Can't recall what I meant by this next sentence, so I struck it out.) Figure 3 may only be interesting because of what it doesn't show.  Note the reversed axis -- cost decreases to the right.

The assembly cost (inset) was generated simply by subtracting the oligo cost curve from the gene cost curve (see Figure 1 above) -- yes, I ignored the fact that those data are over different time periods.  There is no cost information available for any of the longest sDNA data, which all come from academic papers.  But the fact that gene assembly cost has been consistently halving every 18 months or so just serves to emphasize that the "acceleration" in the ratio of sDNA to assembly cost results from real improvements in processes and automation used to fabricate long sDNA.  I don't know that this is that deep an observation, but it does go some way towards providing additional quantitative estimates of progress in developing biological technologies.

(Update: 23 March 2009, I fixed various broken links.)

I have been wondering what additional information about future technology and markets can be discerned from trends in genome synthesis and sequencing ("Carlson Curves").  To see if there is anything there, I have been playing around with applying the idea of "learning curves" (also called "experience curves") to data on cost and productivity.

Learning curves generally are used to estimate decreases in costs that result from efficiencies that come from increases in production.  The more you make of something, the more efficient you become.  T.P. Wright famously used this idea in the 1930s to project decreases in cost as a function of increased airplane production.  The effect also shows up in a reduction of the cost of photovoltaic power as a function of cumulative production (see this figure, for example).

To start with here are some musings about the future of sequencing and the thousand dollar genome:

Figure 1 was generated using data on sequencing cost and productivity using commercially available instruments (click on the image for a larger pop-up).  I am not yet sure how seriously to take the plot, but it is interesting to think about the implications.

A few words on methodology: the data is sparse (see inset) in that there are not many points and data is not readily available in each category for each year.  This makes generating the plot of cost vs. productivity subject to estimation and some guesswork.  In particular, fitting a power law to the five productivity points, which are spread over only three logs, makes me uneasy.  The cost data isn't much better.  In the past I have cautioned both the private sector and governments from attempting to use this data to forecast trends.  But, really, everyone else is doing it, so why should I let good sense stop me?

Before going on, I should note that sequencing cost and productivity are related but not strictly correlated.  They are mostly independent variables at this point in time.  Reagents account for a substantial fraction of current sequencing costs, and increasing throughput and automation do not necessarily affect anything other than the number of bases one person can sequence in a day.  It is also important to point out that I am plotting productivity rather than cumulative production, and that both productivity and cost improvements include changes to new technology.  So the learning curve here is sort of an average over different technologies.  It is not a standard way to look at things, but it allows for a few interesting insights.

The blue line was generated by taking a ratio of fits to both the cost and productivity lines.  In other words, the blue line is basically data, and it suggests that for every order of magnitude improvement in productivity you get roughly a one and a half order of magnitude reduction in cost.  Here is the next point that makes me uneasy: I really have no reason to expect the current trends to maintain their present rates.  New sequencing technologies may well cause both productivity and cost changes to accelerate (though I would not expect them to slow -- see, for example, my previous post "The Thousand Dollar Genome").

Forging ahead, extending the trend out to the day when technology provides for the still-mythical Thousand Dollar Genome (TGD) provides an interesting insight.  At present rates, the TGD comes when an instrument allows for a productivity of one human genome per person-day.  It didn't have to be that way; slightly different doubling times (slopes) in the fits to cost and productivity would have produced a different result.  Frankly, I don't know if it means anything at all, but it did make me sit up and look more closely at the plot.  You could even call it a weak prediction about technological change -- weak because any deviation from the present average doubling rates would break the prediction.

But even if the present rates remain steady, that doesn't mean the actual cost of sequencing to the end user falls as quickly as it has.  Let's say somebody commercially produces an instrument that can actually provide a productivity of one genome per person-day.  How many of those instruments might make it onto the market?

Let's estimate that one percent of the US population wants to sign up for sequencing.  Those three million people would then require three million person-days worth of effort to sequence.  Operating 24/7 for one year, that would require just over 2700 instruments.  It will take some time before that many sequencers are available, which means that even if the technological capability exists there will be some -- probably substantial -- scarcity (the green circle on Figure1 ) keeping prices higher for some period.  Given that demand will certainly extend into Europe and Asia, further elevating prices, there is no reason to think the TGD will be a practical reality until there exists competition among providers.  This competition, in turn, will probably only emerge with the development of a diverse set of technologies capable of hitting the appropriate productivity threshold.

What does this imply for the sequencing market, and in particular for health care based on full genome sequencing?  First, costs will stay high until there are a large number of instruments in operation, and probably until there are many different technologies available.  Thus, if prices are determined solely by the market, the idea of sequencing newborns to give them a head start on maximizing their state of health will probably be out of reach for many years after the initial instrument is developed.  Market pricing probably means that sequencing will remain a tool of the wealthy for many, many years to come.

So, what other foolish, over-extended observations can I make based on fitting power laws to sparse data?  Just one more for the moment, and it actually doesn't depend so much on the actual data.  At a productivity of one genome per person-day, you really have to start thinking about the cost of that person.  Somebody will be running the machine, and that person draws a salary.  Let's say that this person earns a technician's wage, which amounts with benefits to $300/day.  All of a sudden (the trends are power laws, after all) that is 30% of the $1000 spent on sequencing the genome.  If the margin is 10-20% of the cost, then the actual sequencing, including financial loads such as depreciation of the instrument and interest, can cost only $500.  We are definitely a long time from seeing that price point.

I'll post on the learning curve for genome synthesis after I make more sense of it.

Bedroom Biology in The Economist

I have yet to see the print version, but evidently I make an appearance in tomorrow's Economist in a Special Report on Synthetic Biology.  (Thanks for the heads-up, Bill.)  I wasn't actually interviewed for the piece, but I've no objections to the text.  There is an accompanying piece that forecasts the coming "Bedroom Biotech", a phrase they seem to prefer to "Garage Biology".  Personally, I prefer to keep my DNA bashing to the garage rather than the bedroom.  Well, okay, most but not all of my DNA bashing.

The story contains a figure showing data from 2002 on productivity changes in DNA sequencing and synthesis, redrawn from my 2003 paper, "The Pace and Proliferation of Biological Technologies", labeling them "Carlson Curves" once again.  Oh well.  The original paper was published in the journal Biosecurity and Bioterrorism (PDF from TMSI, html version at  It isn't so much that I disavow the name "Carlson Curve" as I want to assert that quantitatively predicting the course of biological technologies is a questionable thing to do.  As Moore made clear in his paper, what became his law is driven by the financing of expensive chip fabs -- banks require a certain payment schedule before they will loan another billion dollars for a new fab -- whereas biology is cheap and progress is much more likely to be governed by basic science and the total number of people participating in the endeavor.

Newer versions of figures from the 2003 paper, as well as additional metrics of progress in biological technologies, will be available in December with the release of "Genome Synthesis & Design Futures: Implications for the US Economy", written with my colleagues at Bio Economic Research Associates (bio-era), and funded by bio-era and the Department of Energy.

To close the circle, I should explain that the "Carlson Curves" were an attempt to figure out how fast biology is changing, an effort prompted by an essay I wrote for the inaugural Shell/Economist Writing Prize, "The World in 2050."  (Here is a PDF of the original essay, which was published in 2001 as "Open Source Biology and its Impact on Industry.")  I received a silver prize, rather than gold, and was always slightly miffed that The Economist only published the first place essay, but I suppose I can't complain about the outcome. 

"Carlson Curves" and Synthetic Biology

(UPDATE, 1 September 06: Here is a note about the recent Synthetic Biology story in The Economist.)

(UPDATE, 20 Feb 06: If you came here from Paul Boutin's story "Biowar for Dummies", I've noted a few corrections HERE.)

Oliver Morton's Wired Magazine article about Synthetic Biology is here. If you are looking for the "Carlson Curves", The Pace and Proliferation of Biological Technologies" is published in the journal Biosecurity and Bioterrorism. The paper is available in html at

A note on the so-called "Carlson Curves" (Oliver Morton's phrase, not mine): The plots were meant to provide a sense of how changes in technology are bringing about improvements in productivity in the lab, rather than to provide a quantitative prediction of the future. I am not suggesting there will be a "Moore's Law" for biological technologies. Although it may be possible to extract doubling rates for some aspect of this technology, I don't know whether this analysis is very interesting. I prefer to keep it simple. As I explain in the paper, the time scale of changes in transistor density are set by planning and finance considerations for multi-billion dollar integrated circuit fabs. That doubling time has a significant influence on many billions of dollars of investment. Biology, on the other hand, is cheap, and change should come much faster. Money should be less and less of an issue as time goes on, and my guess is those curves provide a lower bound on changes in productivity.

I will try to have something tomorrow about George Church and Co's "unexpected improvement" in DNA synthesis capacity, as well as some comments about Nicholas Wade's New York Times story.