Thursday, December 17, 2020

Autonomy in Synthetic Biology

A lot of work is going into laboratory automation and design tools, but how far is it actually getting and what are the real roadblocks? We examine these questions in a new article, "Levels of autonomy in synthetic biology," out today in Molecular Systems Biology.

I'm a big believer that a better toolkit is eventually going to radically transform the way that we engineer biological organisms. And I've certainly preached the gospel of data integration, design tools, reproducibility, etc. But somehow, two decades after the field began, we still find ourselves with lots of barriers to effective deployment of standards and automation to actually increase routine productivity. Way too much is still slow, manual, artisanal. So what's going wrong and what do we need to fix in order to get that transformation into a world of rapid, routine, and reliable engineering?

To understand this, we first developed a six-level framework for analyzing efficacy of automation, analogous to the one used for discussing autonomous vehicles. We don't necessarily need to go crazy-high in autonomy in order to get a lot of benefit. What I really want is at least some good Level 2 scientific "driver assistance" features to help me with the lab equivalents of lane-keeping, checking my blind spot, and parallel parking.

State of the art in autonomy for design-build-test-learn cycle, as shown in our article.

The problem is that right now, there's a bunch of good work being done on specific challenges in the prototypical "design", "build", "test", and "learn" stages of the engineering cycle, but not enough investment in the "glue" of standards that will allow things to connect together between stages or in curation tools that decrease the burden in setting up tools. We know from a number of demonstrations that we can do much better, but the marketplace is still too fragmented and just not enough work has been done on stringing these pieces together yet.

At the same time, I have a lot of hope.  The work we've been doing in partnership with lots of others on the DARPA SD2 program is producing a lot of interesting tools for lowering barriers to curation and making it easier to use automation. I'm also seeing a big push in iGEM, where we've been spreading the word on measurement and engineering methods. And a lot of folks I talk to in government, industry, and around the world all seem to be seeing similar needs and trends, so I hope we're building momentum toward a phase change, and I'm going to see if I can do my part to help.

Check out the full details of our discussion of autonomy in our open access article.

Thursday, September 17, 2020

Robust estimation of bacterial cell count from optical density

The iGEM 2018 interlab article is published today in Nature Communications Biology! This article, which I wrote about last October when we posted on bioRxiv, presents a cheap and easy protocol for estimating cell count and per-cell fluorescence on plate readers. This is so cheap (reagents <$1) and easy, as validated by hundreds of iGEM teams around the world, that I believe no paper should ever be accepted again if it has plate reader data in uncalibrated units, any more than we would accept a paper that measured length in cubits.

The core idea is pretty simple: basically, you calibrate against dilutions of silica microspheres with similar size and optical properties to cells. As long as your cells are in liquid culture that's not too opaque and the cells aren't doing anything really odd optically themselves, this should give a good relationship between optical density and cell count.

We validated this by combining our microsphere protocol and previously published fluorescence estimation protocol to get a close match between flow cytometer and plate reader estimates of per-cell fluorescence, which wouldn't work if either of the protocols was problematic.
 
Per-cell fluorescence from flow cytometer and plate reader (article Figure 5)

Ironically, the most challenging part of the whole publication was the author list. All the data was supplied by a lot of iGEMers: our consortium author list included approximately1400 people from about 250 teams all around the world.  I had to write scripts to manage the author list and to format and reformat it as we went back and forth with the journal to figure out how to match their formatting requirements. All said and done, however, I wouldn't have it any other way: I am proud to have been able to work with so many capable collaborators and with so many eager young contributors to synthetic biology.

Wednesday, July 22, 2020

Plate reader & flow cytometry tutorials

Over the last two weeks, I've given two measurement tutorials in the iGEM Summer Webinar Series, one focused on plate readers and the other focused on flow cytometry. Both are posted in a repository on GitHub, along with example data and code to help people get started with effective calibration and interpretation of these instruments.
  • The first tutorial, "Quantifying fluorescence and cell count with plate readers," starts with a general introduction to fluorescence and OD, including a comparison of plate readers and other types of instruments, factors affecting fluorescence, and how to pick colors based on excitation and emission spectra. The second block focuses on calibration of measurements for fluorescence and OD, and on debugging such measurements. Finally, the session ends with a discussion of how to interpret and debug calibrated plate reader data.
  • The second tutorial, "Quantifying fluorescence and cell phenotypes with flow cytometry," starts with an introduction to flow cytometry, including how these instruments operate and the types of data that they produce. The second block focuses on calibration of measurements for fluorescence and cell size, and on debugging such measurements. Finally, the session ends with a discussion of how to interpret and debug calibrated flow cytometry data.
Under the hood of a flow cytometer, showing its optical path.
I hope that you will find these useful and redistribute them to others who may find the same!

Friday, June 26, 2020

Closing in on fast, cheap, point-of-care testing for COVID-19

In mid-March, as the COVID-19 pandemic slammed down on America, it just so happened that our group at BBN had just finished sending DARPA a proposal for fast, cheap point-of-care testing for emerging diseases.  So rather than wait for a response on the proposal, we just organized things up ourselves and started working on testing.

The test plan has evolved a bit as we've worked through details, as more information about the virus has emerged and as we've made sure manufacturing will be able to roll these things out at scale. In the end, as has just been announced, it looks like we'll be able to just have people spit out a bit of saliva for the test (no more nasal swabs!) and give accurate answers in less than an hour.

My own role in the project has been on the bioinformatics: the FAST-NA software I've written about here a few times before has been critical for fast and effective design of our detection targets, both ensuring that we will be able to detect all known variants of the virus and that we won't get false positives from other organisms.  And I still love that FAST-NA's core is technology that has been repurposed from hunting for computer viruses to hunting for real ones.

It's not in the field yet, but we're on a good track, and I hope we'll be able to make a real contribution to helping manage the pandemic...

Friday, April 10, 2020

Making biosecurity more agile

Just out in Science, a new article on making biosecurity more agile: "Embrace experimentation in biosecurity governance" is a perspective piece summarizing the position developed at a workshop I attended last summer.  In essence, right now the processes by which our nations and communities deal with biosecurity are slow, political, and isolated. We argue that we all need much better connected and flexible ways to deal with emerging threats in our more connected world, and to manage these processes in a way that makes it easier to study and learn from successes and failures.

We didn't realize this would be so apropos at the time that we were writing this piece, but we're in the thick of this problem right now, and I think this is a good piece to read for anybody who wants to help prevent the next biological disaster.

Monday, March 02, 2020

Guest Post: "Shining Winter"

On a peaceful and relaxing note, I offer this guest post from my daughter Harriet, who has asked me to share this poem on her behalf:

Snow falls from the sky outside the window like if the clouds were dancing to the ground.
Hot cocoa sits by the warm fire place in the quiet, loving room.
In the silent, quiet room lays a sleeping cat.
Next to the cat is a pale brown couch
Inside the fire place lighted, quiet, brown room everything is relaxed.
Next to me lays little, brown, leather book.
Glimmering in the light of the fire lies a shelf of elegant, glittering, glass cups.

When the cat wakes up it climbs on to the couch and purrs satisfyingly.
Innocent silence fills the room again.
Near the fire, I sleep in peace, with a beautiful aroma of candles.
Tenderly, the cat purrs again.
Enjoying the hot cocoa, I pet the cat.
Resting my feet on a pillow, I fall into a deep, relaxing sleep.

Wednesday, February 26, 2020

Looks like we found something significant in the coronavirus...

It looks like the unique sequences we found in the 2019-nCoV coronavirus were indeed significant!

In this article in last week's Science, the authors found key differences between this virus and SARS, focused most strongly on the N-terminal domain (NTD) and receptor binding domain (RBD) regions of the viruses spike glycoprotein. This is important to understand, because this protein is what the viruses uses to actually infect cells, and also a primary target for antibodies to identify or neutralize the virus.

These regions are also right where we pointed our spotlight in our bioRxiv paper, with the surface glyoprotein region of interest that we identified! In particular, we identified the region from amino acids 9 to 275 as the largest unique sequence, and found it was part of a cluster spanning from amino acids 9 to 883. In the Science paper, the key NTD sequence goes from amino acids 17 - 305, nearly a perfect match to our largest unique sequence, and the RBD sequence goes from amino acids 330 to 521, meaning that together the two cover the majority of our identified cluster!

Now, these folks went a lot deeper than we could (not being protein modelers ourselves), and I'm sure they didn't use our research, given they were likely starting their investigation at the same time we started ours. That said, it's a nice confirmation of our methods and their potential significance to have rapidly and independently identified these regions with our FAST-NA method.

My next question for other researchers, however, is this: what about the other two domains we found?

Sunday, February 09, 2020

Congratulations to Cassandra Overney!

Congratulations to my former intern Cassandra Overney, who is a finalist for the National Center for Women & Information Technology (NCWIT) Collegiate Award!

Cassandra is an undergrad at Olin College who first began working for me at BBN in the summer of 2018, contributing to the NSF Expeditions “Living Computing Project” by improving our TASBE Flow Analytics software package for calibrated flow cytometry (which you may remember from a post last year). Flow cytometry is a method for measuring the fluorescence of large numbers of cells, often used as a “logic probe” for genetic engineering projects, and TASBE Flow Analytics allows precise and replicable interpretation of the results of complex experiments, and is being used in a number of laboratories and large-scale projects.

Cassandra's recognition by NCWIT is based on the critical contributions that she made for this project, most notably developing an Excel-based user interface that has proven to be much simpler and more intuitive for most of its biologist users. In developing this software, Cassandra worked closely with the biologists who would become her users, prototyping, testing, and adjusting in multiple rounds in order to provide a workflow that has significantly increased the adoption of TASBE Flow Analytics by bench scientists. Better, though, why not learn about it from the video that Cassandra made for her NCWIT award entry?



Although her internship is long over, Cassandra has continued to work part-time on this project, further improving the user interface she designed and addressing other issues as raised by users. Wearing my selfish primary investigator hat, I'd hire her full time if I could, but wearing my mentor hat, I expect both she (and science) will be better served by instead continuing to explore her interests in different areas of potential research and going off to graduate school.  This is the bittersweet joy of a mentor: the better the student you work with, the faster they are likely to leave the nest!

So congratulations again, Cassandra!

Tuesday, February 04, 2020

Organizing genome engineering for the gigabase scale

Just out in Nature Communications, our new paper on "Organizing Genome Engineering for the Gigabase Scale"!

This perspective piece, a companion to the technology perspective last fall, analyzes the trends in the growing size of organisms getting their genomes re-engineered, and concludes that, while impressive, it's growing more slowly than one might think: big, complex organisms like mammals and plants are only likely to become tractable around 2050. Moreover, the complexity of the projects has been growing exponentially as well, as measured by the number of authors per paper.

The largest engineered genomes have grown exponentially, doubling approximately every 3 years (a), but the number of authors credited on projetcs has been growing exponentially as well (b). 
We look at this problem and see not just a genome technology issue, but a massive organizational challenge as well: these projects are going to be big, and in order to manage them effectively we're going to need a lot of friction-reducing software tooling automation.  The bulk of the piece is then dedicated to looking at the design/build/test cycle and analyzing the sticking points and how to address them.

Bottom line: it's not going to be simple, but it looks quite tractable, and there are things that can be done right now that will likely have a significant impact on our ability to engineer ever-larger genomes.

Sunday, February 02, 2020

Unique sequences found in Wuhan coronavirus

Like many people, I have some concerns about the emerging virus in Wuhan. I am also fortunate enough to have some tools that might turn out to be helpful. For the past two years, I've been leading a project on improving pathogen screening in DNA orders by applying cybersecurity tools, and was, in fact, in the midst of writing up a paper on our improved ability to detect small virus fragments with high precision.  

So it just so happens that I've got software to hand that's very good at detecting the unique aspects of a viral pathogen, and a pre-existing collection of organized coronavirus data, and it looks like we may have found something interesting---some chunks of the virus that look unlike any of its known relatives. We've written this up in a quick manuscript that's now under review and up on bioRxiv:
Highly Distinguished Amino Acid Sequences of 2019-nCoV (Wuhan Coronavirus)
Using a method for pathogen screening in DNA synthesis orders, we have identified a number of amino acid sequences that distinguish 2019-nCoV (Wuhan Coronavirus) from all other known viruses in Coronaviridae. We find three main regions of unique sequence: two in the 1ab polyprotein QHO60603.1, one in surface glycoprotein QHO60594.1.
Summary statistics of distinguishing amino acid sequences identified for 2019-nCoV (Wuhan coronavirus), organized by the identifiers of protein sequences in which we found unique content. The blue is the fraction of sequence that's judged unique and the red is the total amount: the left-most and right-most sequences look particularly interesting. 
It's also been a fascinatingly fast project: we noticed the sequence and decided to evaluate it on Tuesday morning and got our first results that afternoon. On Wednesday, we refined and confirmed the results. Thursday, we checked with others that it might be interesting, and I wrote up the quick report. Friday was polishing and submission as a research letter to CDC Emerging Infectious Diseases and a bioRxiv preprint, and then it took 48 hours for bioRxiv to post it. At just under a week from project conception to submitted preprint with DOI, this is definitely my fastest experience with scientific publication, and it's been a strange experience.

I don't know just how important this might or might not be---I am definitely not a viral pathology specialist. And maybe the journal will just laugh at us and reject it all as naive.  But I'm still happy that this is out there, no matter what, in case it may indeed be useful. More than anything else, I really hope that this gets in front of people who are, in fact, the right type of expert, so that they can evaluate it and see if they can put this information to effective use in helping diagnose, prevent, and mitigate this new disease.