DNA Synthesis and Sequencing Costs and Productivity for 2025

May 2, 2025 Rob Carlson

In the run up to Synbiobeta25 I decided to update the cost and productivity curves.

Here is the prior update, with a description of what they are, and are not, and of my history in developing them. You can follow the thread backwards for comments on comparisons to Moore’s Law.

I was also asked recently to provide a opinion about the feasibility of the Human Genome Project 2 proposal, which led me to dig into the performance of the Ultima UG100 instrument. I will publish my thoughts on the HGP 2 later.

The UG 100 is a truly impressive instrument, capable of sequencing >30,000 human genomes annually at 30x coverage, with only about an hour of human hands on time to start a sequencing run. The most recent price and productivity sequencing data are based on the UG 100.

As usual, please remember where you found them.

The price per base of DNA sequencing and synthesis — reading and writing DNA — based on price surveys and industry interviews. Until recently, most synthetic genes (the red line) were assembled from short oligonucleotides (oligos) synthesized in large volumes on columns (pink line). Now genes can be readily assembled from oligos synthesized in very small volumes on arrays, though data on the usage and price of array oligos is difficult to pin down; prices for array oligos are asserted to fall in the range from $.00001 to $.001 per base.

The productivity of DNA synthesis and sequencing, measured as bases per person per day, using commercially available instruments, and compared to Moore's Law, which is a proxy for IT productivity. Productivity in sequencing DNA has increased much faster than Moore's Law in recent years. Productivity in synthesizing DNA must certainly have increased substantially for privately developed and assembled synthesizers, but no new synthesis instruments, and no relevant performance figures, have been released since 2008.

Written comments for Artificial Intelligence and Automated Laboratories for Biotechnology: Leveraging Opportunities and Mitigating Risks, 3-4 April, 2024

July 18, 2024 Rob Carlson

Here are my written comments for the recent NASEM workshop “Artificial Intelligence and Automated Laboratories for Biotechnology: Leveraging Opportunities and Mitigating Risks”, convened at the request of the Congressionally-chartered National Security Commission on Emerging Biotechnology (NCSEB), in April, 2024.

The document is composed of two parts: 1) remarks delivered during the Workshop in response to prompts from NASEM and the National Security Commission for Emerging Biotechnologies and 2) remarks prepared in response to comments arising during the Workshop.

PDF

These comments extend and document my thoughts on the reemergent hallucination that restricting access to DNA synthesis will improve security, and that such regulation will do anything other than constitute perverse incentives that create insecurity. DNA synthesis, and biotechnology more broadly, are examples of a particular kind of distributed and democratized technology. In large markets, served by distributed and accessible production technologies, restrictions on access to those markets and technologies incentivize piracy and create insecurity. There is no data to suggest regulation of such technologies improves security, and here I document numerous examples of counterproductive regulation, including the perverse incentives already created by the 2010 DNA Synthesis Screening Guidelines.

Let’s not repeat this mistake.

Here are a few excerpts:

Biology is a General Purpose Technology. I didn't hear anyone at this meeting use that phrase, but all of our discussions about what we might manufacture using biology, and the range of applications, make clear that we are talking about just such a thing. The Wikipedia entry on GPTs has a pretty good definition: “General-purpose technologies (GPTs) are technologies that can affect an entire economy (usually at a national or global level). GPTs have the potential to drastically alter societies through their impact on pre-existing economic and social structures.” This definitely describes biology. We are already seeing significant economic impacts from biotechnology in the U.S., and we are only just getting started.

My latest estimate is that biotechnology contributed at least $550B to the U.S. economy in 2021, a total that has steadily grown since 1980 at about 10% annually, much faster than the rest of the economy. Moreover, participants in this workshop outlined a future in which various other technologies—hardware, software, and automation, each of which is also recognized as a General Purpose Technology, and each of which contributes significantly to the economy—will be used to enhance our ability to design and manufacture pathways and organisms that will then themselves be used to manufacture other objects.

The U.S. invests in many fields with the recognition that they inform the development of General Purpose Technologies; we expect that photolithography, or control theory, or indeed machine learning, will each have broad impact across the entire economy and social fabric, and so they have. However, in the U.S. investment in biology has been scattershot and application specific, and its output has been poorly monitored. I do have some hope that the recent focus on the bioeconomy, and the creation of various Congressional and Executive Branch bodies, directed to study and secure the bioeconomy, will help. Yet I am on my third White House trying to get the economic impact of biotechnology measured as well as we measure virtually everything else in our economy, and so far the conversation is still about how hard it is to imagine doing this, if only we could first decide how to go about it.

If we in the U.S. were the only ones playing this game, with no outside pressure, perhaps we could take our time and continue fiddling about as we have for the last forty or fifty years. But the global context today is one of multiple stresses from many sources. We must have better biological engineering and manufacturing in order to deal with threats to, and from, nature, whether these are zoonotic pathogens, invasive species, or ecosystems in need of resuscitating, or even rebooting. We face the real threat of engineered organisms or toxins used as weapons by human adversaries. And some of our competitors, countries with a very different perspective on the interaction of the state and political parties with the populace than we have in the U.S., have made very clear that they intend to use biology as a significant, and perhaps the most important, tool in their efforts to dominate the global economy and the politics of the 21st century. So if we want to compete, we need to do better.

…

In summary, before implementing restrictions on access to DNA synthesis, or lab automation, or machine learning, we must ask what perverse incentives we will create for adaptation and innovation to escape those restrictions. And we must evaluate how perverse incentives may increase risks.

The call to action here is not to do nothing, but rather to be thoughtful about proposed regulation and consider carefully the implications of taking action. I am concerned that we all too frequently embrace the hypothetical security and safety improvements promised by regulation or proscription without considering that we might recapitulate the very real, historically validated, costs of regulation and proscription. Moreover, given the overwhelming historical evidence, those proposing and promoting regulation should explain how this time it will be different, how this time regulation will improve security rather than create insecurity.

Here I will throw down the nitrile gauntlet: would-be regulators frequently get their thinking backwards on regulatory policy. I have heard more than one time the proposition “if you don't propose an alternative, we will regulate this”. But, given prior experience, it is the regulators who must explain how their actions will improve the world, and will increase security, rather than achieve the opposite.1 Put very plainly, it is the regulators' responsibility to not implement policies that make things worse.

1 In conversations in Washington DC I also frequently hear “But Rob, we must do something”. To which I respond: must we? What if every action we contemplate has a greater chance of worsening security than improving it? Dissatisfaction with the status quo is a poor rationale for taking actions that are reasonably expected to be counterproductive. Engaging in security theater that obscures a problem for which we have yet to identify a path forward is no security at all.

DNA Cost and Productivity Data, aka "Carlson Curves"

October 26, 2022 Rob Carlson

I have received a number of requests in recent days for my early DNA synthesis and productivity data, so I have decided to post it here for all who are interested. Please remember where you found it.

A bit of history: my efforts to quantify the pace of change in biotech started in the summer of 2000 while I was trying to forecast where the industry was headed. At the time, I was a Research Fellow at the Molecular Sciences Institute (MSI) in Berkeley, and I was working on what became the essay “Open Source Biology and Its Impact on Industry”, originally written in the summer of 2000 for the inaugural Shell/Economist World in 2050 Competition and originally titled “Biological Technology in 2050”. I was trying to conceive of where things were going many decades out, and gathering these numbers seemed like a good way to anchor my thinking. I had the first, very rough, data set by about September of 2000. I presented the curves that summer for the first time to an outside audience in the form of a Global Business Network (GBN) Learning Journey that stopped at MSI to see what we were up to. Among the attendees was Steward Brand, whom I understand soon started referring to the data as “Carlson Curves” in his own presentations. I published the data for the first time in 2003 in a paper with the title “The Pace and Proliferation of Biological Technologies”. Somewhere in there Ray Kurzweil started making reference to the curves, and then a 2006 article in The Economist, “Life 2.0”, brought them to a wider audience and cemented the name. It took me years to get comfortable with “Carlson Curves”, because, even if I did sort it out first, it is just data rather than a law of the universe. But eventually I got it through my thick skull that it is quite good advertising.

The data was very hard to come by when I started. Sequencing was still a labor intensive enterprise, and therefore highly variable in cost, and synthesis was slow, expensive, and relatively rare. I had to call people up to get their rough estimates of how much time and effort they were putting in, and also had to root around in journal articles and technical notes looking for any quantitative data on instrument performance. This was so early in the development of the field that, when I submitted what became the 2003 paper, one of the reviews came back with the criticism that the reviewer – certainly the infamous Reviewer Number 2 – was “unaware of any data suggesting that sequencing is improving exponentially”.

Well, yes, that was the first paper that collected such data.

The review process led to somewhat labored language in the paper asserting the “appearance” of exponential progress when comparing the data to Moore's Law. I also recall showing Freeman Dyson the early data, and he cast a very skeptical eye on the conclusion that there were any exponentials to be written about. The data was, in all fairness, a bit thin at the time. But the trend seemed clear to me, and the paper laid out why I thought the exponential trends would, or would not, continue. Steward Brand, and Drew Endy at the next lab bench over, grokked it all immediately, which lent some comfort that I wasn’t sticking my neck out so very far.

I've written previously about when the comparison with Moore's Law does, and does not, make sense. (Here, here, and here.) Many people choose to ignore the subtleties. I won't belabor the details here, other than to try to succinctly observe that the role of DNA in constructing new objects is, at least for the time being, fundamentally different than that of transistors. For the last forty years, the improved performance of each new generation of chip and electronic device has depended on those objects containing more transistors, and the demand for greater performance has driven an increase in the number of transistors per object. In contrast, the economic value of synthetic DNA is decoupled from the economic value of the object it codes for; in principle you only need one copy of DNA to produce many billions of objects and many billions of dollars in value.

To be sure, prototyping and screening of new molecular circuits requires quite a bit more than one copy of the DNA in question, but once you have your final sequence in hand, your need for additional synthesis for that object goes to zero. And even while the total demand for synthetic DNA has grown over the years, the price per base has on average fallen about as fast; consequently, as best as I can tell, the total dollar value of the industry hasn't grown much over the last ten years. This makes it very difficult to make money in the DNA synthesis business, and may help explain why so many DNA synthesis companies have gone bankrupt or been folded into other operations. Indeed, most of the companies that provided DNA or gene synthesis as a service no longer exist. Due to similar business model challenges it is difficult to sell stand alone synthesis instruments. Thus the productivity data series for synthesis instruments ends several years ago, because it is too difficult to evaluate the performance of proprietary instruments run solely by the remaining service providers. DNA synthesis is likely to remain a difficult business until there is a business model in which the final value of the product, whatever that product is, depends on the actual number of bases synthesized and sold. As I have written before, I think that business model is likely to be DNA data storage. But we shall see.

The business of sequencing, of course, is another matter. It's booming. But as far as the “Carlson Curves” go, I long ago gave up trying to track this on my own, because a few years after the 2003 paper came out the NHGRI started tracking and publishing sequencing costs. Everyone should just use that data. I do.

Finally, a word on cost versus price. For normal, healthy businesses, you expect the price of something to exceed its cost, and for the business to make at least a little bit of money. But when it comes to DNA, especially synthesis, it has always been difficult to determine the true cost because it has turned out that the price per base has frequently been below the cost, thereby leading those businesses to go bankrupt. There are some service operations that are intentionally run at negative margins in order to attract business; that is, they are loss leaders for other services, or in order to maintain sufficient scale so that the company can have access to that scale for its own internal projects. There are a few operations that appear to be priced so that they are at least revenue neutral and don't lose money. Thus there can be a wide range of prices at this point in time, which further complicates sorting out how the technology may be improving and what impact this has on the economics of biotech. Moreover, we might expect the price of synthetic DNA to *increase* occasionally, either because providers can no longer afford to lose money or because competition is reduced. There is no technological determinism here. Just as Moore's Law is ultimately a function of industrial planning and expectations, there is nothing about Carlson Curves that says prices must continuously fall monotonically.

A note on methods and sources: as described in the 2003 paper, this data was generally gathered by calling people up or by extracting what information I could from what little was written down and published at the time. The same is true for later data. The quality of the data is limited primarily by that availability and by how much time I could spend to develop it. I would be perfectly delighted to have someone with more resources build a better data set.

The primary academic references for this work are:

Robert Carlson, “The Pace and Proliferation of Biological Technologies”. Biosecurity and Bioterrorism: Biodefense Strategy, Practice, and Science. Sep, 2003, 203-214. http://doi.org/10.1089/153871303769201851.

Robert Carlson, “The changing economics of DNA synthesis”. Nat Biotechnol 27, 1091–1094 (2009). https://doi.org/10.1038/nbt1209-1091.

Robert Carlson, Biology Is Technology The Promise, Peril, and New Business of Engineering Life, Harvard University Press, 2011. Amazon.

Here are my latest versions of the figures, followed by the data. Updates and commentary are on the Bioeconomy Dashboard.

Creative Commons image licence (Attribution-NoDerivatives 4.0 International (CC BY-ND 4.0)) terms:

Share — copy and redistribute the material in any medium or format for any purpose, even commercially.
Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
NoDerivatives — If you remix, transform, or build upon the material, you may not distribute the modified material.

Here is the cost data (units in [USD per base]):

	
		Year
		DNA Sequencing
		Short Oligo (Column)
		Gene Synthesis
	
		1990
		25
		
		1991
		
		1992
		
		1
		
		1993
		
		1994
		
		1995
		1
		0.75
		
		1996
		
		1997
		
		1998
		
		1999
		
		25
	
		2000
		0.25
		0.3
		
		2001
		
		12
	
		2002
		
		8
	
		2003
		0.05
		0.15
		4
	
		2004
		0.025
		
		2005
		
		2006
		0.00075
		0.1
		1
	
		2007
		
		0.5
	
		2008
		
		2009
		8E-06
		0.08
		0.39
	
		2010
		3.17E-06
		0.07
		0.35
	
		2011
		2.3E-06
		0.07
		0.29
	
		2012
		1.6E-06
		0.06
		0.2
	
		2013
		1.6E-06
		0.06
		0.18
	
		2014
		1.6E-06
		0.06
		0.15
	
		2015
		1.6E-09
		
		2016
		1.6E-09
		0.05
		0.03
	
		2017
		1.6E-09
		0.05
		0.02

Here is the productivity data (units in [bases per person per day] and [number of transistors per chip]) — note that commercially available synthesis instruments were not sold new for the decade following 2011, and I have not sat down to figure out the productivity of any of the new boxes that may be for sale as of today:

	
		year
		Reading DNA
		Writing DNA
		Transistors
	
		1971
		
		2250
	
		1972
		
		2500
	
		1974
		
		5000
	
		1978
		
		29000
	
		1982
		
		1.20E+05
	
		1985
		
		2.75E+05
	
		1986
		25600
		
		1988
		
		1.18E+06
	
		1990
		
		200
		
		1993
		
		3.10E+06
	
		1994
		62400
		
		1996
		
		1997
		4.22E+05
		15320
		
		1998
		
		7.50E+06
	
		1999
		576000
		
		2.40E+07
	
		2000
		
		1.38E+05
		4.20E+07
	
		2001
		
		2002
		
		2003
		
		2.20E+08
	
		2004
		
		5.92E+08
	
		2005
		
		2006
		10000000
		
		2007
		200000000
		2500000
		
		2008
		
		2000000000
	
		2009
		6000000000
		
		2010
		17000000000
		
		2011
		
		2600000000
	
		2012
		54000000000

Seeing The End Of Oil at Bioeconomy Capital

October 4, 2019 Rob Carlson

I have a new post on the IDEAS blog at Bioeconomy.Capital that explores the end of the petroleum industry and the rise of renewable energy and biotechnology.

A memorial to Mark Buller, PhD, and our response to the propaganda film "Demon in the Freezer".

November 29, 2017 Rob Carlson

Earlier this year my friend and colleague Mark Buller passed away. Mark was a noted virologist and a professor at Saint Louis University. He was struck by a car while riding his bicycle home from the lab, and died from his injuries. Here is Mark's obituary as published by the university.

In 2014 and 2015, Mark and I served as advisors to a WHO scientific working group on synthetic biology and the variola virus (the causative agent of smallpox). In 2016, we wrote the following, previously un-published, response to an "Op-Doc" that appeared in the New York Times. In a forthcoming post I will have more to say about both my experience with the WHO and my thoughts on the recent publication of a synthetic horsepox genome. For now, here is the last version (circa May, 2016) of the response Mark I and wrote to the Op-Doc, published here as my own memorial to Professor Buller.

Variola virus is still needed for the development of smallpox medical countermeasures

On May 17, 2016 Errol Morris presented a short movie entitled “Demon in the Freezer” [note: quite different from the book of the same name by Richard Preston] in the Op-Docs section of the on-line New York Times. The piece purported to present both sides of the long-standing argument over what to do with the remaining laboratory stocks of variola virus, the causative agent of smallpox, which no longer circulates in the human population.

Since 1999, the World Health Organization has on numerous occasions postponed the final destruction of the two variola virus research stocks in Russia and the US in order to support public health related research, including the development of smallpox molecular diagnostics, antivirals, and vaccines.

“Demon in the Freezer” clearly advocates for destroying the virus. The Op-Doc impugns the motivation of scientists carrying out smallpox research by asking: “If given a free hand, what might they unleash?” The narrative even suggests that some in the US government would like to pursue a nefarious policy goal of “mutually assured destruction with germs”. This portion of the movie is interlaced with irrelevant, hyperbolic images of mushroom clouds. The reality is that in 1969 the US unilaterally renounced the production, storage or use biological weapons for any reason whatsoever, including in response to a biologic attack from another country. The same cannot be said for ISIS and Al-Qaeda. In 1975 the US ratified the 1925 Geneva Protocol banning chemical and biological agents in warfare and became party to the Biological Weapons Convention that emphatically prohibits the use of biological weapons in warfare.

“Demon in the Freezer” is constructed with undeniable flair, but in the end it is a benighted 21st century video incarnation of a middling 1930's political propaganda mural. It was painted with only black and white pigments, rather than a meaningful palette of colors, and using a brush so broad that it blurred any useful detail. Ultimately, and to its discredit, the piece sought to create fear and outrage based on unsubstantiated accusations.

Maintaining live smallpox virus is necessary for ongoing development and improvement of medical countermeasures. The first-generation US smallpox vaccine was produced in domesticated animals, while the second-generation smallpox vaccine was manufactured in sterile bioreactors; both have the potential to cause serious side effects in 10-20% of the population. The third generation smallpox vaccine has an improved safety profile, and causes minimal side effects. Fourth generation vaccine candidates, based on newer, lower cost, technology, will be even safer and some are in preclinical testing. There remains a need to develop rapid field diagnostics and an additional antiviral therapy for smallpox.

Continued vigilance is necessary because it is widely assumed that numerous undeclared stocks of variola virus exist around the world in clandestine laboratories. Moreover, unsecured variola virus stocks are encountered occasionally in strain collections left behind by long-retired researchers, as demonstrated in 2014 with the discovery of 1950s vintage variola virus in a cold room at the NIH. The certain existence of unofficial stocks makes destroying the official stocks an exercise in declaring “victory” merely for political purposes rather than a substantive step towards increasing security. Unfortunately, the threat does not end with undeclared or forgotten samples.

In 2015 a WHO Scientific Working Group on Synthetic Biology and Variola Virus and Smallpox determined that a “skilled laboratory technician or undergraduate student with experience of working with viruses” would be able to generate variola virus from the widely available genomic sequence in “as little as three months”. Importantly, this Working Group concluded that “there will always be the potential to recreate variola virus and therefore the risk of smallpox happening again can never be eradicated.” Thus, the goal of a variola virus-free future, however laudable, is unattainable. This is sobering guidance on a topic that requires sober consideration.

We welcome increased discussions of the risk of infectious disease and of public health preparedness. In the US these topics have too long languished among second (or third) tier national security conversations. The 2014 West Africa Ebola outbreak and the current Congressional debate over funding to counter the Zika virus exemplifies the business-as-usual political approach of throwing half a bucket of water on the nearest burning bush while the surrounding countryside goes up in flames. Lethal infectious diseases are serious public health and global security issues and they deserve serious attention.

The variola virus has killed more humans numerically than any other single cause in history. This pathogen was produced by nature, and it would be the height of arrogance, and very foolish indeed, to assume nothing like it will ever again emerge from the bush to threaten human life and human civilization. Maintenance of variola virus stocks is needed for continued improvement of molecular diagnostics, antivirals, and vaccines. Under no circumstances should we unilaterally cripple those efforts in the face of the most deadly infectious disease ever to plague humans. This is an easy mistake to avoid.

Mark Buller, PhD, was a Professor of Molecular Microbiology & Immunology at Saint Louis University School of Medicine, who passed away on February 24, 2017. Rob Carlson, PhD, is a Principal at the engineering and strategy firm Biodesic and a Managing Director of Bioeconomy Capital.

The authors served as scientific and technical advisors to the 2015 WHO Scientific Working Group on Synthetic Biology and Variola Virus.

Guesstimating the Size of the Global Array Synthesis Market

August 30, 2017 Rob Carlson

(Updated, Aug 31, for clarity.)

After chats with a variety of interested parties over the last couple of months, I decided it would be useful to try to sort out how much DNA is synthesized annually on arrays, in part to get a better handle on what sort of capacity it represents for DNA data storage. The publicly available numbers, as usual, are terrible, which is why the title of the post contains the word "guesstimating". Here goes.

First, why is this important? As the DNA synthesis industry grows, and the number of applications expands, new markets are emerging that use that DNA in different ways. Not all that DNA is produced using the same method, and the different methods are characterized by different costs, error rates, lengths, throughput, etc. (The Wikipedia entry on Oligonucleotide Synthesis is actually fairly reasonable, if you want to read more. See also Kosuri and Church, "Large-scale de novo DNA synthesis: technologies and applications".) If we are going to understand the state of the technology, and the economy built on that technology, then we need to be careful about measuring what the technology can do and how much it costs. Once we pin down what the world looks like today, we can start trying to make sensible projections, or even predictions, about the future.

While there is just one basic chemistry used to synthesize oligonucleotides, there are two physical formats that give you two very different products. Oligos synthesized on individual columns, which might be packed into 384 (or more) well plates, can be manipulated as individual sequences. You can use those individual sequences for any number of purposes, and if you want just one sequence at a time (for PCR or hybridization probes, gene therapy, etc), this is probably how you make it. You can build genes from column oligos by combining them pairwise, or in larger numbers, until you get the size construct you want (typically of order a thousand bases, or a kilobase [kB], at which point you start manipulating the kB fragments). I am not going to dwell on gene assembly and error correction strategies here; you can Google that.

The other physical format is array synthesis, in which synthesis takes place on a solid surface consisting of up to a million different addressable features, where light or charge is used to control which sequence is grown on which feature. Typically, all the oligos are removed from the array at once, which results in a mixed pool. You might insert this pool into a longer backbone sequence to construct a library of different genes that code for slightly different protein sequences, in order to screen those proteins for the characteristics you want. Or, if you are ambitious, you might use the entire pool of array oligos to directly assemble larger constructs such as genes. Again, see Google, Codon Devices, Gen9, Twist, etc. More relevant to my purpose here, a pool of array-synthesized oligos can be used as an extremely dense information storage medium. To get a sense of when that might be a viable commercial product, we need to have an idea of the throughput of the industry, and how far away from practical implementation we might be.

Next, to recap, last year I made a stab at estimating the size of the gene synthesis market. Much of the industry revenue data came from a Frost & Sullivan report, commissioned by Genscript for its IPO prospectus. The report put the 2014 market for synthetic genes at only $137 million, from which I concluded that the total number of bases shipped as genes that year was 4.8 billion, or a bit less than a duplex human genome. Based on my conversations with people in the industry, I conclude that most of those genes were assembled from oligos synthesized on columns, with a modest, but growing, fraction from array oligos. (See "On DNA and Transistors", and preceding posts, for commentary on the gene synthesis industry and its future.)

The Frost & Sullivan report also claims that the 2014 market for single-stranded oligonucleotides was $241 million. The Genscript IPO prospectus does not specify whether this $241 million was from both array- and column-synthesized oligos, or not. But because Genscript only makes and uses column synthesis, I suspect it referred only to that synthesis format. At ~$0.01 per base (give or take), this gives you about 24 billion bases synthesized on columns sold in 2014. You might wind up paying as much as $0.05 to $0.10 per base, depending on your specifications, which if prevalent would pull down the total global production volume. But I will stick with $0.01 per base for now. If you add the total number of bases sold as genes and the bases sold as oligos, you get to just shy of 30 billion bases (leaving aside for the moment the fact that an unknown fraction of the genes came from oligos synthesized on arrays).

So, now, what about array synthesis? If you search the interwebs for information on the market for array synthesis, you get a mess of consulting and marketing research reports that cost between a few hundred and many thousands of dollars. I find this to be an unhelpful corpus of data and analysis, even when I have the report in hand, because most of the reports are terrible at describing sources and methods. However, as there is no other source of data, I will use a rough average of the market sizes from the abstracts of those reports to get started. Many of the reports claim that in 2016 the global market for oligo synthesis was ~$1.3 billion, and that this market will grow to $2.X billion by 2020 or so. Of the $1.3B 2016 revenues, the abstracts assert that approximately half was split evenly between "equipment and reagents". I will note here that this should already make the reader skeptical of the analyses, because who is selling ~$260M worth of synthesis "equipment"? And who is buying it? Seems fishy. But I can see ~$260M in reagents, in the form of various columns, reagents, and purification kit. This trade, after all, is what keeps outfits like Glenn Research and Trilink in business.

Forging ahead through swampy, uncertain data, that leaves us with ~$650M in raw oligos. Should we say this is inclusive or exclusive of the $241M figure from Frost & Sullivan? I am going to split the difference and call it $500M, since we are already well into hand waving territory by now, anyway. How many bases does this $500M buy?

Array oligos are a lot cheaper than column oligos. Kosuri and Church write that "oligos produced from microarrays are 2–4 orders of magnitude cheaper than column-based oligos, with costs ranging from $0.00001–0.001 per nucleotide, depending on length, scale and platform." Here we stumble a bit, because cost is not the same thing as price. As a consumer, or as someone interested in understanding how actually acquiring a product affects project development, I care about price. Without knowing a lot more about how this cost range is related to price, and the distribution of prices paid to acquire array oligos, it is hard to know what to do with the "cost" range. The simple average cost would be $0.001 per base, but I also happen to know that you can get oligos en masse for less than that. But I do not know what the true average price is. For the sake of expediency, I will call it $0.0001 per base for this exercise.

Combining the revenue estimate and the price gives us about 5E12 bases per year. From there, assuming roughly 100-mer oligos, you get to 5E10 difference sequences. And adding in the number of features per array (between 100,000 and 1M), you get as many as 500,000 arrays run per year, or about 1370 per day. (It is not obvious that you should think of this as 1370 instruments running globally, and after seeing the Agilent oligo synthesis operation a few years ago, I suggest that you not do that.) If the true average price is closer to $0.00001 per base, then you can bump up the preceding numbers by an order of magnitude. But, to be conservative, I won't do that here. Also note that the ~30 billion bases synthesized on columns annually are not even a rounding error on the 5E12 synthesized on arrays.

Aside: None of these calculations delve into the mass (or the number of copies) per synthesized sequence. In principle, of course, you only need one perfect copy of each sequence, whether synthesized on columns or arrays, to use DNA in any just about application (except where you need to drive the equilibrium or reaction kinetics). Column synthesis gives you many more copies (i.e., more mass per sequence) than array synthesis. In principle — ignoring the efficiency of the chemical reactions — you could dial down the feature size on arrays until you were synthesizing just one copy per sequence. But then it would become exceedingly important to keep track of that one copy through successive fluidic operations, which sounds like a quite difficult prospect. So whatever the final form factor, an instrument needs to produce sufficient copies per sequence to be useful, but not so many that resources are wasted on unnecessary redundancy/degeneracy.

Just for shits and giggles, and because array synthesis could be important for assembling the hypothetical synthetic human genome, this all works out to be enough DNA to assemble 833 human duplex genomes per year, or 3 per day, in the absence of any other competing uses, of which there are obviously many. Also if you don't screw up and waste some of the DNA, which is inevitable. Finally, at a density of ~1 bit/base, this is enough to annually store 5 TB of data, or the equivalent of one very beefy laptop hard drive.

And so, if you have access to the entire global supply of single stranded oligonucleotides, and you have an encoding/decoding and sequencing strategy that can handle significant variations in length and high error rates at scale, you can store enough HD movies and TV to capture most of the new, good stuff that HollyBollyWood churns out every year. Unless, of course, you also need to accommodate the tastes and habits of a tween daughter, in which case your storage budget is blown for now and evermore no matter how much capacity you have at hand. Not to mention your wallet. Hey, put down the screen and practice the clarinet already. Or clean up your room! Or go to the dojo! Yeesh! Kids these days! So many exclamations!

Where was I?

Now that we have some rough numbers in hand, we can try to say something about the future. Based on my experience working on the Microsoft/UW DNA data storage project, I have become convinced that this technology is coming, and it will be based on massive increases in the supply of synthetic DNA. To compete with an existing tape drive (see the last few 'graphs of this post), able to read and write ~2 Gbits a second, a putative DNA drive would need to be able to read and write ~2 GBases per second, or ~183 Pbits/day, or the equivalent of ~10,000 human genomes a day — per instrument/device. Based on the guesstimate above, which gave a global throughput of just 3 human genomes per day, we are waaaay below that goal.

To be sure, there is probably some demand for a DNA storage technology that can work at lower throughputs: long term cold storage, government archives, film archives, etc. I suspect, however, that the many advantages of DNA data storage will attract an increasing share of the broader archival market once the basic technology is demonstrated on the market. I also suspect that developing the necessary instrumentation will require moving away from the existing chemistry to something new and different, perhaps enzymatically controlled synthesis, perhaps even with the aid of the still hypothetical DNA "synthase", which I first wrote about 17 years ago.

In any event, based on the limited numbers available today, it seems likely that the current oligo array industry has a long way to go before it can supply meaningful amounts of DNA for storage. It will be interesting to see how this all evolves.

A Few Thoughts and References Re Conservation and Synthetic Biology

June 14, 2017 Rob Carlson

Yesterday at Synthetic Biology 7.0 in Singapore, we had a good discussion about the intersection of conservation, biodiversity, and synthetic biology. I said I would post a few papers relevant to the discussion, which are below.

These papers are variously: the framing document for the original meeting at the University of Cambridge in 2013 (see also "Harry Potter and the Future of Nature"), sponsored by the Wildlife Conservation Society; follow on discussions from meetings in San Francisco and Bellagio; and my own efforts to try to figure out how quantify the economic impact of biotechnology (which is not small, especially when compared to much older industries) and the economic damage from invasive species and biodiversity loss (which is also not small, measured as either dollars or jobs lost). The final paper in this list is my first effort to link conservation and biodiversity with economic and physical security, which requires shifting our thinking from the national security of nation states and their political boundaries to the natural security of the systems and resources that those nation states rely on for continued existence.

"Is It Time for Synthetic Biodiversity Conservation?", Antoinette J. Piaggio1, Gernot Segelbacher, Philip J. Seddon, Luke Alphey, Elizabeth L. Bennett, Robert H. Carlson, Robert M. Friedman, Dona Kanavy, Ryan Phelan, Kent H. Redford, Marina Rosales, Lydia Slobodian, Keith WheelerTrends in Ecology & Evolution, Volume 32, Issue 2, February 2017, Pages 97–107

Robert Carlson, "Estimating the biotech sector's contribution to the US economy", Nature Biotechnology, 34, 247–255 (2016), 10 March 2016

Kent H. Redford, William Adams, Rob Carlson, Bertina Ceccarelli, “Synthetic biology and the conservation of biodiversity”, Oryx, 48(3), 330–336, 2014.

"How will synthetic biology and conservation shape the future of nature?", Kent H. Redford, William Adams, Georgina Mace, Rob Carlson, Steve Sanderson, Framing Paper for International Meeting, Wildlife Conservation Society, April 2013.

"From national security to natural security", Robert Carlson, Bulletin of the Atomic Scientists, 11 Dec 2013.

Warning: Construction Ahead

July 1, 2016 Rob Carlson

I am migrating from Movable Type to Squarespace. There was no easy way to do this. Undoubtedly, there are presently all sorts of formatting hiccups, lost media and images, and broken links. If you are looking for something in particular, use the Archive or Search tabs.

If you have a specific link you are trying to follow, and it has dashes between words, try replacing them with underscores. E.g., instead of "www.synthesis.cc/x-y-z", try "www.synthesis.cc/x_y_z". If the URL ends in "/x.html", try replacing that with "/x/".

I will be repairing links, etc., as I find them.

Late Night, Unedited Musings on Synthesizing Secret Genomes

May 13, 2016 Robert

By now you have probably heard that a meeting took place this past week at Harvard to discuss large scale genome synthesis. The headline large genome to synthesize is, of course, that of humans. All 6 billion (duplex) bases, wrapped up in 23 pairs of chromosomes that display incredible architectural and functional complexity that we really don't understand very well just yet. So no one is going to be running off to the lab to crank out synthetic humans. That 6 billion bases, by the way, just for one genome, exceeds the total present global demand for synthetic DNA. This isn't happening tomorrow. In fact, synthesizing a human genome isn't going to happen for a long time.

But, if you believe the press coverage, nefarious scientists are planning pull a Frankenstein and "fabricate" a human genome in secret. Oh, shit! Burn some late night oil! Burn some books! Wait, better — burn some scientists! Not so much, actually. There are a several important points here. I'll take them in no particular order.

First, it's true, the meeting was held behind closed doors. It wasn't intended to be so, originally. The rationale given by the organizers for the change is that a manuscript on the topic is presently under review, and the editor of the journal considering the manuscript made it clear that it considers the entire topic under embargo until the paper is published. This put the organizers in a bit of a pickle. They decided the easiest way to comply with the editor's wishes (which were communicated to the authors well after the attendees had made travel plans) was to hold the meeting under rules even more strict than Chatham House until the paper is published. At that point, they plan to make a full record of the meeting available. It just isn't a big deal. If it sounds boring and stupid so far, it is. The word "secret" was only introduced into the conversation by a notable critic who, as best I can tell, perhaps misconstrued the language around the editor's requirement to respect the embargo. A requirement that is also boring and stupid. But, still, we are now stuck with "secret", and all the press and bloggers who weren't there are seeing Watergate headlines and fame. Still boring and stupid.

Next, It has been reported that there were no press at the meeting. However, I understand that there were several reporters present. It has also been suggested that the press present were muzzled. This is a ridiculous claim if you know anything about reporters. They've simply been asked to respect the embargo, which so far they are doing, just like they do with every other embargo. (Note to self, and to readers: do not piss off reporters. Do not accuse them of being simpletons or shills. Avoid this at all costs. All reporters are brilliant and write like Hemingway and/or Shakespeare and/or Oliver Morton / Helen Branswell / Philip Ball / Carl Zimmer / Erica Check-Hayden. Especially that one over there. You know who I mean. Just sayin'.)

How do I know all this? You can take a guess, but my response is also covered by the embargo.

Moving on: I was invited to the meeting in question, but could not attend. I've checked the various associated correspondence, and there's nothing about keeping it "secret". In fact, the whole frickin' point of coupling the meeting to a serious, peer-reviewed paper on the topic was to open up the conversation with the public as broadly as possible. (How do you miss that unsubtle point, except by trying?) The paper was supposed to come out before, or, at the latest, at the same time as the meeting. Or, um, maybe just a little bit after? But, whoops. Surprise! Academic publishing can be slow and/or manipulated/politicized. Not that this happened here. Anyway, get over it. (Also: Editors! And, reviewers! And, how many times will I say "this is the last time!")

(Psst: an aside. Science should be open. Biology, in particular, should be done in the public view and should be discussed in the open. I've said and written this in public on many occasions. I won't bore you with the references. [Hint: right here.] But that doesn't mean that every conversation you have should be subject to review by the peanut gallery right now. Think of it like a marriage/domestic partnership. You are part of society; you have a role and a responsibility, especially if you have children. But that doesn't mean you publicize your pillow talk. That would be deeply foolish and would inevitably prevent you from having honest conversations with your spouse. You need privacy to work on your thinking and relationships. Science: same thing. Critics: fuck off back to that sewery rag in — wait, what was I saying about not pissing off reporters?)

Is this really a controversy? Or is it merely a controversy because somebody said it is? Plenty of people are weighing in who weren't there or, undoubtedly worse from their perspective, weren't invited and didn't know it was happening. So I wonder if this is more about drawing attention to those doing the shouting. That is probably unfair, this being an academic discussion, full of academics.

Secondly (am I just on secondly?), the supposed ethical issues. Despite what you may read, there is no rush. No human genome, nor any human chromosome, will be synthesized for some time to come. Make no mistake about how hard a technical challenge this is. While we have some success in hand at synthesizing yeast chromosomes, and while that project certainly serves as some sort of model for other genomes, the chromatin in multicellular organisms has proven more challenging to understand or build. Consequently, any near-term progress made in synthesizing human chromosomes is going to teach us a great deal about biology, about disease, and about what makes humans different from other animals. It is still going to take a long time. There isn't any real pressing ethical issue to be had here, yet. Building the ubermench comes later. You can be sure, however, that any federally funded project to build the ubermench will come with a ~2% set aside to pay for plenty of bioethics studies. And that's a good thing. It will happen.

There is, however, an ethical concern here that needs discussing. I care very deeply about getting this right, and about not screwing up the future of biology. As someone who has done multiple tours on bioethics projects in the U.S. and Europe, served as a scientific advisor to various other bioethics projects, and testified before the Presidential Commission on Bioethical Concerns (whew!), I find that many of these conversations are more about the ethicists than the bio. Sure, we need to have public conversations about how we use biology as a technology. It is a very powerful technology. I wrote a book about that. If only we had such involved and thorough ethical conversations about other powerful technologies. Then we would have more conversations about stuff. We would converse and say things, all democratic-like, and it would feel good. And there would be stuff, always more stuff to discuss. We would say the same things about that new stuff. That would be awesome, that stuff, those words. <dreamy sigh> You can quote me on that. <another dreamy sigh>

But on to the technical issues. As I wrote last month, I estimate that the global demand for synthetic DNA (sDNA) to be 4.8 billion bases worth of short oligos and ~1 billion worth of longer double-stranded (dsDNA), for not quite 6 Gigabases total. That, obviously, is the equivalent of a single human duplex genome. Most of that demand is from commercial projects that must return value within a few quarters, which biotech is now doing at eye-popping rates. Any synthetic human genome project is going to take many years, if not decades, and any commercial return is way, way off in the future. Even if the annual growth in commercial use of sDNA were 20% — which is isn't — this tells you, dear reader, that the commercial biotech use of synthetic DNA is never, ever, going to provide sufficient demand to scale up production to build many synthetic human genomes. Or possibly even a single human genome. The government might step in to provide a market to drive technology, just as it did for the human genome sequencing project, but my judgement is that the scale mismatch is so large as to be insurmountable. Even while sDNA is already a commodity, it has far more value in reprogramming crops and microbes with relatively small tweaks than it has in building synthetic human genomes. So if this story were only about existing use of biology as technology, you could go back to sleep.

But there is a use of DNA that might change this story, which is why we should be paying attention, even at this late hour on a Friday night.

DNA is, by far, the most sophisticated and densest information storage medium humans have ever come across. DNA can be used to store orders of magnitude more bits per gram than anything else humans have come up with. Moreover, the internet is expanding so rapidly that our need to archive data will soon outstrip existing technologies. If we continue down our current path, in coming decades we would need not only exponentially more magnetic tape, disk drives, or flash memory, but exponentially more factories to produce these storage media, and exponentially more warehouses to store them. Even if this is technically feasible it is economically implausible. But biology can provide a solution. DNA exceeds by many times even the theoretical capacity of magnetic tape or solid state storage.

A massive warehouse full of magnetic tapes might be replaced by an amount of DNA the size of a sugar cube. Moreover, while tape might last decades, and paper might last millennia, we have found intact DNA in animal carcasses that have spent three-quarters of a million years frozen in the Canadian tundra. Consequently, there is a push to combine our ability to read and write DNA with our accelerating need for more long-term information storage. Encoding and retrieval of text, photos, and video in DNA has already been demonstrated. (Yes, I am working on one of these projects, but I can't talk about it just yet. We're not even to the embargo stage.)

Governments and corporations alike have recognized the opportunity. Both are funding research to support the scaling up of infrastructure to synthesize and sequence DNA at sufficient rates.

For a “DNA drive” to compete with an archival tape drive today, it needs to be able to write ~2Gbits/sec, which is about 2 Gbases/sec. That is the equivalent of ~20 synthetic human genomes/min, or ~10K sHumans/day, if I must coin a unit of DNA synthesis to capture the magnitude of the change. Obviously this is likely to be in the form of either short ssDNA, or possibly medium-length ss- or dsDNA if enzymatic synthesis becomes a factor. If this sDNA were to be used to assemble genomes, it would first have to be assembled into genes, and then into synthetic chromosomes, a non trivial task. While this would be hard, and would to take a great deal of effort and PhD theses, it certainly isn't science fiction.

But here, finally, is the interesting bit: the volume of sDNA necessary to make DNA information storage work, and the necessary price point, would make possible any number of synthetic genome projects. That, dear reader, is definitely something that needs careful consideration by publics. And here I do not mean "the public", the 'them' opposed to scientists and engineers in the know and in the do (and in the doo-doo, just now), but rather the Latiny, rootier sense of "the people". There is no them, here, just us, all together. This is important.

The scale of the demand for DNA storage, and the price at which it must operate, will completely alter the economics of reading and writing genetic information, in the process marginalizing the use by existing multibillion-dollar biotech markets while at the same time massively expanding capabilities to reprogram life. This sort of pull on biotechnology from non-traditional applications will only increase with time. That means whatever conversation we think we are having about the calm and ethical development biological technologies is about to be completely inundated and overwhelmed by the relentless pull of global capitalism, beyond borders, probably beyond any control. Note that all the hullabaloo so far about synthetic human genomes, and even about CRISPR editing of embryos, etc., has been written by Western commentators, in Western press. But not everybody lives in the West, and vast resources are pushing development of biotechnology outside of the of West. And that is worth an extended public conversation.

So, to sum up, have fun with all the talk of secret genome synthesis. That's boring. I am going off the grid for the rest of the weekend to pester litoral invertebrates with my daughter. You are on your own for a couple of days. Reporters, you are all awesome, make of the above what you will. Also: you are all awesome. When I get back to the lab on Monday I will get right on with fabricating the ubermench for fun and profit. But — shhh — that's a secret.

On DNA and Transistors

March 9, 2016 Robert

Here is a short post to clarify some important differences between the economics of markets for DNA and for transistors. I keep getting asked related questions, so I decided to elaborate here.

But first, new cost curves for reading and writing DNA. The occasion is some new data gleaned from a somewhat out of the way source, the Genscript IPO Prospectus. It turns out that, while preparing their IPO docs, Genscript hired Frost & Sullivan to do market survey across much of life sciences. The Prospectus then puts Genscript's revenues in the context of the global market for synthetic DNA, which together provide some nice anchors for discussing how things are changing (or not).

So, with no further ado, Frost & Sullivan found that the 2014 global market for oligos was $241 million, and the global market for genes was $137 million. (Note that I tweeted out larger estimates a few weeks ago when I had not yet read the whole document.) Genscript reports that they received $35 million in 2014 for gene synthesis, for 25.6% of the market, which they claim puts them in the pole position globally. Genscript further reports that the price for genes in 2014 was $.34 per base pair. This sounds much too high to me, so it must be based on duplex synthesis, which would bring the linear per base cost down to $.17 per base, which sounds much more reasonable to me because it is more consistent with what I hear on the street. (It may be that Gen9 is shipping genes at $.07 per base, but I don't know anyone outside of academia who is paying that low a rate.) If you combine the price per base and the size of the market, you get about 1 billion bases worth of genes shipped in 2014 (so a million genes, give or take). This is consistent with Ginkgo's assertions saying that their 100 million base deal with Twist was the equivalent of 10% of the global gene market in 2015. For oligos, if you combine Genscript's reported average price per base, $.05, with the market size you get about 4.8 billion bases worth of oligos shipped in 2014. Frost & Sullivan thinks that from 2015 to 2019 the oligo market CAGR will be 6.6% and the gene synthesis market will come in at 14.7%.

For the sequencing, I have capitulated and put the NextSeq $1000 human genome price point on the plot. This instrument is optimized to sequence human DNA, and I can testify personally that sequencing arbitrary DNA is more expensive because you have to work up your own processes and software. But I am tired of arguing with people. So use the plot with those caveats in mind.

NOTE: Replaces prior plot with an error in sequencing price.

What is most remarkable about these numbers is how small they are. The way I usually gather data for these curves is to chat with people in the industry, mine publications, and spot check price lists. All that led me to estimate that the gene synthesis market was about $350 million (and has been for years) and the oligo market was in the neighborhood of $700 million (and has been for years).

If the gene synthesis market is really only $137 million, with four or 5 companies vying for market share, then that is quite an eye opener. Even if that is off by a factor of two or three, getting closer to my estimate of $350 million, that just isn't a very big market to play in. A ~15% CAGR is nothing to sneeze at, usually, and that is a doubling rate of about 5 years. But the price of genes is now falling by 15% every 3-4 years (or only about 5% annually). So, for the overall dollar size of the market to grow at 15%, the number of genes shipped every year has to grow at close to 20% annually. That's about 200 million additional bases (or ~200,000 more genes) ordered in 2016 compared to 2015. That seems quite large to me. How many users can you think of who are ramping up their ability to design or use synthetic genes by 20% a year? Obviously Ginkgo, for one. As it happens, I do know of a small number of other such users, but added together they do not come close to constituting that 20% overall increase. All this suggests to me that the dollar value of the gene synthesis market will be hard pressed to keep up with Frost & Sullivan estimate of 14.7% CAGR, at least in the near term. As usual, I will be happy to be wrong about this, and happy to celebrate faster growth in the industry. But bring me data.

People in the industry keep insisting that once the price of genes falls far enough, the ~$3 billion market for cloning will open up to synthetic DNA. I have been hearing that story for a decade. And then price isn't the only factor. To play in the cloning market, synthesis companies would actually have to be able to deliver genes and plasmids faster than cloning. Given that I'm hearing delivery times for synthetic genes are running at weeks, to months, to "we're working on it", I don't see people switching en mass to synthetic genes until the performance improves. If it costs more to have your staff waiting for genes to show up by FedEx than to have them bash the DNA by hand, they aren't going to order synthetic DNA.

And then what happens if the price of genes starts falling rapidly again? Or, forget rapidly, what about modestly? What if a new technology comes in and outcompetes standard phosphoramidite chemistry? The demand for synthetic DNA could accelerate and the total market size still might be stagnant, or even fall. It doesn't take much to turn this into a race to the bottom. For these and other reasons, I just don't see the gene synthesis market growing very quickly over the next 5 or so years.

Which brings me to transistors. The market for DNA is very unlike the market for transistors, because the role of DNA in product development and manufacturing is very unlike the role of transistors. Analogies are tremendously useful in thinking about the future of technologies, but only to a point; the unwary may miss differences that are just as important as the similarities.

For example, the computer in your pocket fits there because it contains orders of magnitude more transistors than a desktop machine did fifteen years ago. Next year, you will want even more transistors in your pocket, or on your wrist, which will give you access to even greater computational power in the cloud. Those transistors are manufactured in facilities now costing billions of dollars apiece, a trend driven by our evidently insatiable demand for more and more computational power and bandwidth access embedded in every product that we buy. Here is the important bit: the total market value for transistors has grown for decades precisely because the total number of transistors shipped has climbed even faster than the cost per transistor has fallen.

In contrast, biological manufacturing requires only one copy of the correct DNA sequence to produce billions in value. That DNA may code for just one protein used as a pharmaceutical, or it may code for an entire enzymatic pathway that can produce any molecule now derived from a barrel of petroleum. Prototyping that pathway will require many experiments, and therefore many different versions of genes and genetic pathways. Yet once the final sequence is identified and embedded within a production organism, that sequence will be copied as the organism grows and reproduces, terminating the need for synthetic DNA in manufacturing any given product. The industrial scaling of gene synthesis is completely different than that of semiconductors.