Tuesday, November 14, 2023

How do you describe genetic construction plans?

ACS Synthetic Biology has just published "Standardized Representation of Parts and Assembly for Build Planning", our new article on how to better communicate about building genetic constructs. The paper is basically a more friendly user manual for the best practice that we wrote up last year.

Fundamentally, this is all just about trying to reduce the confusion that commonly occurs when we're talking about build plans. If somebody shares a sequence, is it for the bit they want synthesized, what a vector will look like after the synthesized bit gets stuck in, what gets digested out of the vector, or what it looks like as part of the final construct after it gets ligated together with other constructs? 

When we were collaborating on building the new iGEM distribution, we ran into a lot of confusion amongst the many different participants along these lines, so we worked out a standard vocabulary for describing what we were talking about, with intuitive names for different stages in typical digestion/ligation assembly processes.


And once we humans were clear on what we wanted to say to one another, it was easy enough to take the next step and use SBOL3 to make a simple description to describe it to the machines as well, including the exact reactions one would want to run to actually execute the plan. This is one of the nice things about SBOL, which you can't do with formats like GenBank, FASTA, or GFF: describe not just a construct, but its relationship with other constructs and your whole plan for how to use it.

We're still using this vocabulary quite extensively in the iGEM Engineering Committee, as well as using the representations in our software, and we hope that others will find it useful for clarifying their discussions as well.

Monday, April 03, 2023

BLAST vs. custom tools for pathogen identification

Our analysis of issues with using BLAST vs. NCBI for pathogen identification is out today: “Studying pathogens degrades BLAST-based pathogen identification.”  This paper is the full published version of the preprint I posted about a few months ago, investigating an emergent dynamic, in which biological research and development ends up contaminating public databases with chimeric material that can confound biosecurity systems that trust those databases.

The most important addition between the preprint and this final version was to make direct head-to-head comparisons between BLAST vs. NCBI and two tools specifically designed for biosecurity analysis, our own FAST-NA Scanner and a free tool called SeqScreen (there are other tools we'd like to have compared with as well, but they were not available for comparison). 

As predicted, the actual biosecurity tools completely dominated over BLAST vs. NCBI, making more than an order of magnitude less mistakes---not a surprise, but nice to see experimentally validated. In fact, each biosecurity tool only made one mistake in judgement, and in both cases it was the same mistake that NCBI did, which is an important lesson: the big NCBI databases aren't bad, they're just dirty, and so they just need a lot of care and refinement when they're being put to a use (like biosecurity determinations) where mistakes can be costly and dangerous.

This is important for biosecurity, but I also think people need to be aware of this in the larger scientific world as well. In biology, curation quality really matters, and many people are far too blasé about the potential impact of dirty data on their applications. If you want to do biosecurity right, you need to use an actual biosecurity tool and not just trust the databases. I'm sure the same applies for many aspects of medicine, diagnostics, etc., and I fear that not enough people are taking these issues seriously.

Wednesday, July 27, 2022

Multicolor Plate Reader Fluorescence Calibration

Just out in OUP Synthetic Biology, "Multicolor Plate Reader Fluorescence Calibration" extends our prior work on calibrating green fluorescence and cell count to calibrate red and blue fluorescence as well. The results are no surprise (if we can use a green dye, we ought to be able to use other dyes too), but it's valuable to have specific recommendations for dyes to use and to have an interlab study validate that yes, they really do perform as well as the others. 

So everybody out there listening, please start using sulforhodamine-101 to calibrate your red fluorescence and Cascade Blue to calibrate your blue fluorescence! Everybody who uses your data will thank you for providing equivalent molecule/cell estimates rather than irreproductible arbitrary or relative units.

Red and blue fluorescence calibrants were just as precise as the prior green and cell-count calibrants 

The paper also reports on some of the travails we ran into making the study work: some of the fluorescent proteins we wanted to try out didn't work in our hands, and there were miscellaneous other problems: a promoter sequence got messed up,  some things wouldn't synthesize, one of the plasmids seemed problematic, and timing problems meant not all labs could run all constructs.

Problems like that are frustrating, but ultimately I'm happier reporting them than burying them. Remember: if you read a synthetic biology study with lab work and it doesn't talk about failures, it just means they either aren't aware of them or else they've pruned them from the narrative!  Calibration methods like these help us see better when things go wrong and understand what's happened.


Thursday, July 14, 2022

Studying Pathogens Degrades BLAST-based Pathogen Identification

Using the BLAST algorithm to search the NCBI databases is the typical way one goes about identifying a DNA sequence, so it's been the typical way biosecurity systems decide if something is potentially a dangerous pathogen or toxin too. Problem is, that's not what BLAST and those databases were designed for, and we've observed that they aren't working as well for that purpose as they used to, as we report in our new preprint: "Studying Pathogens Degrades BLAST-based Pathogen Identification"

Specifically, we've found an inherent problem that is growing in seriousness due to a non-obvious emergent dynamic. Now that sequencing and bioengineering tools are getting much more accessible, lots of sequences are being studied by modifying them with "tool" sequences like purification tags, fluorescent proteins, stabilizing sequences, etc. Those sequences get (appropriately) classified based on what's being studied, and now you've got chimeric material that includes both the subject of study and the bioengineering tool. Then when you run BLAST on a sequence with that tool, you start finding that tools are classified as what they're used to study.

Example of BLAST classification failure: using a purification tag to study an Ebola protein means that now a fluorescent protein plus a purification tag gets mis-identified as Ebola.

This doesn't seem to be much of a problem for most uses of BLAST against NCBI, but it's poisonous for making biosecurity decisions, since it can cause benign sequences to be classified as dangerous or vice versa. Moreover, the effect gets stronger the more problematic a pathogen is (since more sequences are recorded) and the more useful a tool is (since more chimeric material is produced), meaning that the problem is most likely to occur in the most important.  For example, over the last two years, quite a lot of stuff has started coming back as COVID-19, since everybody in the world is studying COVID-19 with all of the tools that they can get their hands on.

This is a serious problem, and it's not likely to get better, since NCBI and BLAST aren't doing the wrong thing: they're just getting less suitable to use as a short-cut for doing something that they were never designed to do. 

So how do we fix it? Switch to tools that are actually designed for pathogen identification. We've got one (FAST-NA Scanner), and a whole bunch of other folks worked on the same problem in the FunGCAT program. The solutions are there, we just have to help folks switch to them.

Wednesday, July 13, 2022

pySBOL3: SBOL3 for Python Programmers

Our Python library for the SBOL3 standard now has an official citable publication in ACS Synthetic Biology, called "pySBOL3: SBOL3 for Python Programmers." 

The article is a good short read, but for any Python programmers, out there I recommend just jumping straight in with the tutorial instead. Happy hacking, everyone!



Tuesday, July 05, 2022

Functional Synthetic Biology

Synthetic biology isn't about sequences. Don't agree? Tell me what this is without looking it up: atgcgtaaaggagaagaacttttcactggagttgtcccaattcttgttga

Tell you what, I'll give you a hint, make it easy. It's a coding sequence translating to MRKGEELFTGVVPILV. Everybody knows this one, right?

How about this instead?


That's right. That mystery sequence up top is the first 50 bases of BBa_E0040, the widely used iGEM part with a coding sequence for GFPmut3. Now that one, a great many folks working in synthetic biology know, have used in their work, and maybe even have strong opinions about.

Notice that this is a description of biological function: the important thing is that the coding sequence makes a protein that emits a lot of green light when you hit it with a blue laser. There's a sequence in there somewhere but that's not what gets put on the whiteboard or what gets discussed.

Don't get me wrong, sequences are important. But right now we're living with a mis-match in synthetic biology, where most of our discussions about design are about function, but nearly all of our tooling is heavily focused on sequences (e.g., GenBank format), with any information about function tacked on as an afterthought or else confined to specialized databases that each pose their own sui generis integration problem. 

We need a new focus on functional synthetic biology, and that's one of the things we've been working on in the iGEM Engineering Committee. We're trying to change how we do synthetic biology, so that we can pull together the work that lots of people have been doing on calibration, insulation, characterization, context effects, modeling, assembly, etc., in one place and make at least a small class of synthetic biology engineering really simple and predictable.

We aren't there yet, but we've gotten to the point where we think we've figured out some of the important shifts in thinking, representation, and tooling that need to happen in order to make functional synthetic biology possible. If you're interested in this too, I encourage you to read more in our newly available pre-print on Functional Synthetic Biology.

Thursday, May 05, 2022

AI for Synthetic Biology

Several of my colleagues have been organizing an series of "AI for SynBio" workshops over the last few years. I've been to some and they have been both stimulating and enjoyable. Now they have an article out in Communications of the ACM, along with a nice short video in which Aaron Adler introduces this increasingly important cross-disciplinary interaction for folks who aren't familiar with one or both of the subjects.

Friday, April 22, 2022

Talking measurement and standards with "The Living Revolution"

Yesterday I had an enjoyable conversation with Luke Roche and Sara Knurowska, who do a podcast called "The Living Revolution." They'd read some of my work on measurement, which led inevitably to a wide-ranging discussion including fundamental principles in engineering and science, when to standardize (or not), SBOL, etc.

Check out the podcast here (if it works for in your browser), or on Spotify or Apple Podcasts 


Friday, January 07, 2022

Two years of soap

Back in pre-pandemic times, I used to travel quite a lot, and like many other frequent travelers, I slowly accumulated a pile of little bars of complimentary soap from hotel rooms.  As a result, I hadn't actually purchased soap for myself for years. Today, however, I opened my last little leftover travel soap. A curious milestone and statistic: it appears that I'd had just under two years of soap in my little pile.

One of my daughter's stuffed animals traveling with me on my last pre-pandemic trip.