#VeryStableGenius

Ladies, and gentlemen, for your reading pleasure (I will add new verses as I find them, still working out quirks in embedding Tweets):

A memorial to Mark Buller, PhD, and our response to the propaganda film "Demon in the Freezer".

Earlier this year my friend and colleague Mark Buller passed away. Mark was a noted virologist and a professor at Saint Louis University. He was struck by a car while riding his bicycle home from the lab, and died from his injuries. Here is Mark's obituary as published by the university.

In 2014 and 2015, Mark and I served as advisors to a WHO scientific working group on synthetic biology and the variola virus (the causative agent of smallpox). In 2016, we wrote the following, previously un-published, response to an "Op-Doc" that appeared in the New York Times. In a forthcoming post I will have more to say about both my experience with the WHO and my thoughts on the recent publication of a synthetic horsepox genome. For now, here is the last version (circa May, 2016) of the response Mark I and wrote to the Op-Doc, published here as my own memorial to Professor Buller.


Variola virus is still needed for the development of smallpox medical countermeasures

On May 17, 2016 Errol Morris presented a short movie entitled “Demon in the Freezer” [note: quite different from the book of the same name by Richard Preston] in the Op-Docs section of the on-line New York Times. The piece purported to present both sides of the long-standing argument over what to do with the remaining laboratory stocks of variola virus, the causative agent of smallpox, which no longer circulates in the human population.

Since 1999, the World Health Organization has on numerous occasions postponed the final destruction of the two variola virus research stocks in Russia and the US in order to support public health related research, including the development of smallpox molecular diagnostics, antivirals, and vaccines.  

“Demon in the Freezer” clearly advocates for destroying the virus. The Op-Doc impugns the motivation of scientists carrying out smallpox research by asking: “If given a free hand, what might they unleash?” The narrative even suggests that some in the US government would like to pursue a nefarious policy goal of “mutually assured destruction with germs”. This portion of the movie is interlaced with irrelevant, hyperbolic images of mushroom clouds. The reality is that in 1969 the US unilaterally renounced the production, storage or use biological weapons for any reason whatsoever, including in response to a biologic attack from another country. The same cannot be said for ISIS and Al-Qaeda. In 1975 the US ratified the 1925 Geneva Protocol banning chemical and biological agents in warfare and became party to the Biological Weapons Convention that emphatically prohibits the use of biological weapons in warfare.

“Demon in the Freezer” is constructed with undeniable flair, but in the end it is a benighted 21st century video incarnation of a middling 1930's political propaganda mural. It was painted with only black and white pigments, rather than a meaningful palette of colors, and using a brush so broad that it blurred any useful detail. Ultimately, and to its discredit, the piece sought to create fear and outrage based on unsubstantiated accusations.

Maintaining live smallpox virus is necessary for ongoing development and improvement of medical countermeasures. The first-generation US smallpox vaccine was produced in domesticated animals, while the second-generation smallpox vaccine was manufactured in sterile bioreactors; both have the potential to cause serious side effects in 10-20% of the population. The third generation smallpox vaccine has an improved safety profile, and causes minimal side effects. Fourth generation vaccine candidates, based on newer, lower cost, technology, will be even safer and some are in preclinical testing. There remains a need to develop rapid field diagnostics and an additional antiviral therapy for smallpox.

Continued vigilance is necessary because it is widely assumed that numerous undeclared stocks of variola virus exist around the world in clandestine laboratories. Moreover, unsecured variola virus stocks are encountered occasionally in strain collections left behind by long-retired researchers, as demonstrated in 2014 with the discovery of 1950s vintage variola virus in a cold room at the NIH. The certain existence of unofficial stocks makes destroying the official stocks an exercise in declaring “victory” merely for political purposes rather than a substantive step towards increasing security. Unfortunately, the threat does not end with undeclared or forgotten samples.

In 2015 a WHO Scientific Working Group on Synthetic Biology and Variola Virus and Smallpox determined that a “skilled laboratory technician or undergraduate student with experience of working with viruses” would be able to generate variola virus from the widely available genomic sequence in “as little as three months”. Importantly, this Working Group concluded that “there will always be the potential to recreate variola virus and therefore the risk of smallpox happening again can never be eradicated.” Thus, the goal of a variola virus-free future, however laudable, is unattainable. This is sobering guidance on a topic that requires sober consideration.

We welcome increased discussions of the risk of infectious disease and of public health preparedness. In the US these topics have too long languished among second (or third) tier national security conversations. The 2014 West Africa Ebola outbreak and the current Congressional debate over funding to counter the Zika virus exemplifies the business-as-usual political approach of throwing half a bucket of water on the nearest burning bush while the surrounding countryside goes up in flames. Lethal infectious diseases are serious public health and global security issues and they deserve serious attention.

The variola virus has killed more humans numerically than any other single cause in history. This pathogen was produced by nature, and it would be the height of arrogance, and very foolish indeed, to assume nothing like it will ever again emerge from the bush to threaten human life and human civilization. Maintenance of variola virus stocks is needed for continued improvement of molecular diagnostics, antivirals, and vaccines. Under no circumstances should we unilaterally cripple those efforts in the face of the most deadly infectious disease ever to plague humans. This is an easy mistake to avoid.

Mark Buller, PhD, was a Professor of Molecular Microbiology & Immunology at Saint Louis University School of Medicine, who passed away on February 24, 2017. Rob Carlson, PhD, is a Principal at the engineering and strategy firm Biodesic and a Managing Director of Bioeconomy Capital.

The authors served as scientific and technical advisors to the 2015 WHO Scientific Working Group on Synthetic Biology and Variola Virus.

Guesstimating the Size of the Global Array Synthesis Market

(Updated, Aug 31, for clarity.)

After chats with a variety of interested parties over the last couple of months, I decided it would be useful to try to sort out how much DNA is synthesized annually on arrays, in part to get a better handle on what sort of capacity it represents for DNA data storage. The publicly available numbers, as usual, are terrible, which is why the title of the post contains the word "guesstimating". Here goes.

First, why is this important? As the DNA synthesis industry grows, and the number of applications expands, new markets are emerging that use that DNA in different ways. Not all that DNA is produced using the same method, and the different methods are characterized by different costs, error rates, lengths, throughput, etc. (The Wikipedia entry on Oligonucleotide Synthesis is actually fairly reasonable, if you want to read more. See also Kosuri and Church, "Large-scale de novo DNA synthesis: technologies and applications".) If we are going to understand the state of the technology, and the economy built on that technology, then we need to be careful about measuring what the technology can do and how much it costs. Once we pin down what the world looks like today, we can start trying to make sensible projections, or even predictions, about the future.

While there is just one basic chemistry used to synthesize oligonucleotides, there are two physical formats that give you two very different products. Oligos synthesized on individual columns, which might be packed into 384 (or more) well plates, can be manipulated as individual sequences. You can use those individual sequences for any number of purposes, and if you want just one sequence at a time (for PCR or hybridization probes, gene therapy, etc), this is probably how you make it. You can build genes from column oligos by combining them pairwise, or in larger numbers, until you get the size construct you want (typically of order a thousand bases, or a kilobase [kB], at which point you start manipulating the kB fragments). I am not going to dwell on gene assembly and error correction strategies here; you can Google that.

The other physical format is array synthesis, in which synthesis takes place on a solid surface consisting of up to a million different addressable features, where light or charge is used to control which sequence is grown on which feature. Typically, all the oligos are removed from the array at once, which results in a mixed pool. You might insert this pool into a longer backbone sequence to construct a library of different genes that code for slightly different protein sequences, in order to screen those proteins for the characteristics you want. Or, if you are ambitious, you might use the entire pool of array oligos to directly assemble larger constructs such as genes. Again, see Google, Codon Devices, Gen9, Twist, etc. More relevant to my purpose here, a pool of array-synthesized oligos can be used as an extremely dense information storage medium. To get a sense of when that might be a viable commercial product, we need to have an idea of the throughput of the industry, and how far away from practical implementation we might be. 

Next, to recap, last year I made a stab at estimating the size of the gene synthesis market. Much of the industry revenue data came from a Frost & Sullivan report, commissioned by Genscript for its IPO prospectus. The report put the 2014 market for synthetic genes at only $137 million, from which I concluded that the total number of bases shipped as genes that year was 4.8 billion, or a bit less than a duplex human genome. Based on my conversations with people in the industry, I conclude that most of those genes were assembled from oligos synthesized on columns, with a modest, but growing, fraction from array oligos. (See "On DNA and Transistors", and preceding posts, for commentary on the gene synthesis industry and its future.)

The Frost & Sullivan report also claims that the 2014 market for single-stranded oligonucleotides was $241 million. The Genscript IPO prospectus does not specify whether this $241 million was from both array- and column-synthesized oligos, or not. But because Genscript only makes and uses column synthesis, I suspect it referred only to that synthesis format.  At ~$0.01 per base (give or take), this gives you about 24 billion bases synthesized on columns sold in 2014. You might wind up paying as much as $0.05 to $0.10 per base, depending on your specifications, which if prevalent would pull down the total global production volume. But I will stick with $0.01 per base for now. If you add the total number of bases sold as genes and the bases sold as oligos, you get to just shy of 30 billion bases (leaving aside for the moment the fact that an unknown fraction of the genes came from oligos synthesized on arrays).

So, now, what about array synthesis? If you search the interwebs for information on the market for array synthesis, you get a mess of consulting and marketing research reports that cost between a few hundred and many thousands of dollars. I find this to be an unhelpful corpus of data and analysis, even when I have the report in hand, because most of the reports are terrible at describing sources and methods. However, as there is no other source of data, I will use a rough average of the market sizes from the abstracts of those reports to get started. Many of the reports claim that in 2016 the global market for oligo synthesis was ~$1.3 billion, and that this market will grow to $2.X billion by 2020 or so. Of the $1.3B 2016 revenues, the abstracts assert that approximately half was split evenly between "equipment and reagents". I will note here that this should already make the reader skeptical of the analyses, because who is selling ~$260M worth of synthesis "equipment"? And who is buying it? Seems fishy. But I can see ~$260M in reagents, in the form of various columns, reagents, and purification kit. This trade, after all, is what keeps outfits like Glenn Research and Trilink in business.

Forging ahead through swampy, uncertain data, that leaves us with ~$650M in raw oligos. Should we say this is inclusive or exclusive of the $241M figure from Frost & Sullivan? I am going to split the difference and call it $500M, since we are already well into hand waving territory by now, anyway. How many bases does this $500M buy?

Array oligos are a lot cheaper than column oligos. Kosuri and Church write that "oligos produced from microarrays are 2–4 orders of magnitude cheaper than column-based oligos, with costs ranging from $0.00001–0.001 per nucleotide, depending on length, scale and platform." Here we stumble a bit, because cost is not the same thing as price. As a consumer, or as someone interested in understanding how actually acquiring a product affects project development, I care about price. Without knowing a lot more about how this cost range is related to price, and the distribution of prices paid to acquire array oligos, it is hard to know what to do with the "cost" range. The simple average cost would be $0.001 per base, but I also happen to know that you can get oligos en masse for less than that. But I do not know what the true average price is. For the sake of expediency, I will call it $0.0001 per base for this exercise.

Combining the revenue estimate and the price gives us about 5E12 bases per year. From there, assuming roughly 100-mer oligos, you get to 5E10 difference sequences. And adding in the number of features per array (between 100,000 and 1M), you get as many as 500,000 arrays run per year, or about 1370 per day. (It is not obvious that you should think of this as 1370 instruments running globally, and after seeing the Agilent oligo synthesis operation a few years ago, I suggest that you not do that.) If the true average price is closer to $0.00001 per base, then you can bump up the preceding numbers by an order of magnitude. But, to be conservative, I won't do that here. Also note that the ~30 billion bases synthesized on columns annually are not even a rounding error on the 5E12 synthesized on arrays.

Aside: None of these calculations delve into the mass (or the number of copies) per synthesized sequence. In principle, of course, you only need one perfect copy of each sequence, whether synthesized on columns or arrays, to use DNA in any just about application (except where you need to drive the equilibrium or reaction kinetics). Column synthesis gives you many more copies (i.e., more mass per sequence) than array synthesis. In principle — ignoring the efficiency of the chemical reactions — you could dial down the feature size on arrays until you were synthesizing just one copy per sequence. But then it would become exceedingly important to keep track of that one copy through successive fluidic operations, which sounds like a quite difficult prospect. So whatever the final form factor, an instrument needs to produce sufficient copies per sequence to be useful, but not so many that resources are wasted on unnecessary redundancy/degeneracy.

Just for shits and giggles, and because array synthesis could be important for assembling the hypothetical synthetic human genome, this all works out to be enough DNA to assemble 833 human duplex genomes per year, or 3 per day, in the absence of any other competing uses, of which there are obviously many. Also if you don't screw up and waste some of the DNA, which is inevitable. Finally, at a density of ~1 bit/base, this is enough to annually store 5 TB of data, or the equivalent of one very beefy laptop hard drive.

And so, if you have access to the entire global supply of single stranded oligonucleotides, and you have an encoding/decoding and sequencing strategy that can handle significant variations in length and high error rates at scale, you can store enough HD movies and TV to capture most of the new, good stuff that HollyBollyWood churns out every year. Unless, of course, you also need to accommodate the tastes and habits of a tween daughter, in which case your storage budget is blown for now and evermore no matter how much capacity you have at hand. Not to mention your wallet. Hey, put down the screen and practice the clarinet already. Or clean up your room! Or go to the dojo! Yeesh! Kids these days! So many exclamations!

Where was I?

Now that we have some rough numbers in hand, we can try to say something about the future. Based on my experience working on the Microsoft/UW DNA data storage project, I have become convinced that this technology is coming, and it will be based on massive increases in the supply of synthetic DNA. To compete with an existing tape drive (see the last few 'graphs of this post), able to read and write ~2 Gbits a second, a putative DNA drive would need to be able to read and write ~2 GBases per second, or ~183 Pbits/day, or the equivalent of ~10,000 human genomes a day — per instrument/device. Based on the guesstimate above, which gave a global throughput of just 3 human genomes per day, we are waaaay below that goal.

To be sure, there is probably some demand for a DNA storage technology that can work at lower throughputs: long term cold storage, government archives, film archives, etc. I suspect, however, that the many advantages of DNA data storage will attract an increasing share of the broader archival market once the basic technology is demonstrated on the market. I also suspect that developing the necessary instrumentation will require moving away from the existing chemistry to something new and different, perhaps enzymatically controlled synthesis, perhaps even with the aid of the still hypothetical DNA "synthase", which I first wrote about 17 years ago.

In any event, based on the limited numbers available today, it seems likely that the current oligo array industry has a long way to go before it can supply meaningful amounts of DNA for storage. It will be interesting to see how this all evolves.

A Few Thoughts and References Re Conservation and Synthetic Biology

Yesterday at Synthetic Biology 7.0 in Singapore, we had a good discussion about the intersection of conservation, biodiversity, and synthetic biology. I said I would post a few papers relevant to the discussion, which are below.

These papers are variously: the framing document for the original meeting at the University of Cambridge in 2013 (see also "Harry Potter and the Future of Nature"), sponsored by the Wildlife Conservation Society; follow on discussions from meetings in San Francisco and Bellagio; and my own efforts to try to figure out how quantify the economic impact of biotechnology (which is not small, especially when compared to much older industries) and the economic damage from invasive species and biodiversity loss (which is also not small, measured as either dollars or jobs lost). The final paper in this list is my first effort to link conservation and biodiversity with economic and physical security, which requires shifting our thinking from the national security of nation states and their political boundaries to the natural security of the systems and resources that those nation states rely on for continued existence.

"Is It Time for Synthetic Biodiversity Conservation?", Antoinette J. Piaggio1, Gernot Segelbacher, Philip J. Seddon, Luke Alphey, Elizabeth L. Bennett, Robert H. Carlson, Robert M. Friedman, Dona Kanavy, Ryan Phelan, Kent H. Redford, Marina Rosales, Lydia Slobodian, Keith WheelerTrends in Ecology & Evolution, Volume 32, Issue 2, February 2017, Pages 97–107

Robert Carlson, "Estimating the biotech sector's contribution to the US economy", Nature Biotechnology, 34, 247–255 (2016), 10 March 2016

Kent H. Redford, William Adams, Rob Carlson, Bertina Ceccarelli, “Synthetic biology and the conservation of biodiversity”, Oryx, 48(3), 330–336, 2014.

"How will synthetic biology and conservation shape the future of nature?", Kent H. Redford, William Adams, Georgina Mace, Rob Carlson, Steve Sanderson, Framing Paper for International Meeting, Wildlife Conservation Society, April 2013.

"From national security to natural security", Robert Carlson, Bulletin of the Atomic Scientists, 11 Dec 2013.

Warning: Construction Ahead

I am migrating from Movable Type to Squarespace. There was no easy way to do this. Undoubtedly, there are presently all sorts of formatting hiccups, lost media and images, and broken links. If you are looking for something in particular, use the Archive or Search tabs.

If you have a specific link you are trying to follow, and it has dashes between words, try replacing them with underscores. E.g., instead of "www.synthesis.cc/x-y-z", try "www.synthesis.cc/x_y_z". If the URL ends in "/x.html", try replacing that with "/x/".

I will be repairing links, etc., as I find them.