The technology that enables reading DNA is changing very quickly. I've chronicled how price and productivity are each improving in a previous post; here I want to try to get at how the diversity of companies and technologies is contributing to that improvement.

As I wrote previously, all hell is breaking loose in sequencing, which is great for the user. Prices are falling and the capabilities of sequencing instruments are skyrocketing. From an analytical perspective, the diversity of platforms is a blessing and a curse. There is a great deal more data than just a few years ago, but it has become quite difficult to directly compare instruments that produce different qualities of DNA sequence, produce different read lengths, and have widely different throughputs.

I have worked for many years to come up with intuitive metrics to aid in understanding how technology is changing. Price and productivity in reading and writing DNA are pretty straightforward. My original paper on this topic (PDF) also looked at the various components of determining protein structures, which, given the many different quantifiable tasks involved, turned out to be a nice way to encapsulate a higher level look at rates of change.

In 2007, with the publication of bio-era's Genome Synthesis and Design Futures, I tried to get at how improvements in instrumentation were moving us toward sequencing whole genomes. The two axes of the relevant plot were 1) read length -- the length of each contiguous string of bases read by an instrument, critical to accurate assembly of genomes or chromosomes that can be hundreds of millions of bases long -- and 2) the daily throughput per instrument -- how much total DNA each instrument could read. If you have enough long reads you can use this information as a map to assemble many shorter reads into the contiguous sequence.

Because there weren't very many models of commercially available sequencers in 2007, the original plot didn't have a lot of data on it (the red squares and blue circles below). But the plot did show something interesting, which was that two general kinds of instruments were emerging at that time: those that produced long reads but had relatively limited throughput, and those that produced short reads but could process enormous amounts of sequence per day. The blue dots below were data from my original paper, and the red squares were derived from a Science news article in 2006 that looked at instruments said to be emerging over the next year or so.

I have now pulled performance estimates out of several papers assessing instruments currently on the market and added them to the plot (purple triangles). The two groupings present in 2007 are still roughly extant, though the edges are blurring a bit. (As with the price and productivity figures, I will publish a full bibliography in a paper later this year. For now, this blog post serves as the primary citation for the figure below.)

I am still trying to sort out the best way to represent the data (I am open to suggestions about how do it better). At this point, it is pretty clear that the two major axes are insufficient to truly understand what is going on, so I have attempted to add some information regarding the release schedules of new instruments. Very roughly, we went from a small number of first generation instruments in 2003 to a few more real instruments in 2006 that performed a little better in some regards, plus a few promised instruments that didn't work out for one reason or another. However, starting in about 2010, we began to see seriously improved instruments being released on an increasingly rapid schedule. This improvement is the result of competition not just between firms, but also between technologies. In addition, some of what we are seeing is the emergence of instruments that have niches; long reads but medium throughput, short reads but extraordinary throughput -- combine these two capabilities and you have the ability to crank out de novo sequences at pretty remarkable rate. (For reference, the synthetic chromosome Venter et al published a few years ago was about one million bases; human chromosomes are in the range of 60 to 250 million bases.)

And now something even more interesting is going on. Because platforms like PacBio and IonTorrent can upgrade internal components used in the actual sequencing, where those components include hardware, software, and wetware, revisions can result in stunning performance improvements. Below is a plot with all the same data as above, with the addition of one revision from PacBio. It's true that the throughput per instrument didn't change so much, but such long read lengths mean you can process less DNA and still rapidly produce high resolution sequence, potentially over megabases (modulo error rates, about which there seems to be some vigorous discussion). This is not to say that PacBio makes the best overall instrument, nor that the company will be commercially viable, but rather that the competitive environment is producing change at an extraordinary rate.

If I now take the same plot as above and add a single (putative) MinION nanopore sequencer from Oxford Nanopore (where I have used their performance claims from public presentations; note the question mark on the date), the world again shifts quite dramatically. Oxford also claims they will ship GridION instruments that essentially consist of racks of MinIONs, but I have not even tried to guess at the performance of that beast. The resulting sequencing power will alter the shape of the commercial sequencing landscape. Illumina and Life are not sitting still, of course, but have their own next generation instruments in development. Jens Gundlach's (PDF) team at the University of Washington has demonstrated a nanopore that is argued to be better than the one Oxford uses, and I understand commercialization is proceeding rapidly, though of course Oxford won't be sitting still either.

One take home message from this, which is highlighted by taking the time to plot this data, is that over the next few years sequencing will become highly accurate, fast, and commonplace. With the caveat that it is difficult to predict the future, continued competition will result in continued price decreases.

A more speculative take home emerges if you consider the implications of the MinION. That device is described as a disposable USB sequencer. If it -- or anything else like it -- works as promised, then some centralized sequencing operations might soon reach the end of their lives. There are, of course, different kinds of sequencing operations. If I read the tea leaves correctly, Illumina just reported that its clinical sequencing operations brought in about as much revenue as their other operations combined, including instrument sales. That's interesting, because it points to two kinds of revenue: sales of boxes and reagents that enable other people to sequence, and certified service operations that provide clinically relevant sequence data. At the moment, organizations like BGI appear to be generating revenue by sequencing everything under the sun, but cheaper and cheaper boxes might mean that the BGI operations outside of clinical sequencing aren't cost effective going forward. Once the razors (electric, disposable, whatever) get cheap enough, you no longer bother going to the barber for a shave.

I will continue to work with the data in an effort to make the plots simpler and therefore hopefully more compelling.