Wednesday, July 27, 2022

Multicolor Plate Reader Fluorescence Calibration

Just out in OUP Synthetic Biology, "Multicolor Plate Reader Fluorescence Calibration" extends our prior work on calibrating green fluorescence and cell count to calibrate red and blue fluorescence as well. The results are no surprise (if we can use a green dye, we ought to be able to use other dyes too), but it's valuable to have specific recommendations for dyes to use and to have an interlab study validate that yes, they really do perform as well as the others. 

So everybody out there listening, please start using sulforhodamine-101 to calibrate your red fluorescence and Cascade Blue to calibrate your blue fluorescence! Everybody who uses your data will thank you for providing equivalent molecule/cell estimates rather than irreproductible arbitrary or relative units.

Red and blue fluorescence calibrants were just as precise as the prior green and cell-count calibrants 

The paper also reports on some of the travails we ran into making the study work: some of the fluorescent proteins we wanted to try out didn't work in our hands, and there were miscellaneous other problems: a promoter sequence got messed up,  some things wouldn't synthesize, one of the plasmids seemed problematic, and timing problems meant not all labs could run all constructs.

Problems like that are frustrating, but ultimately I'm happier reporting them than burying them. Remember: if you read a synthetic biology study with lab work and it doesn't talk about failures, it just means they either aren't aware of them or else they've pruned them from the narrative!  Calibration methods like these help us see better when things go wrong and understand what's happened.


Thursday, July 14, 2022

Studying Pathogens Degrades BLAST-based Pathogen Identification

Using the BLAST algorithm to search the NCBI databases is the typical way one goes about identifying a DNA sequence, so it's been the typical way biosecurity systems decide if something is potentially a dangerous pathogen or toxin too. Problem is, that's not what BLAST and those databases were designed for, and we've observed that they aren't working as well for that purpose as they used to, as we report in our new preprint: "Studying Pathogens Degrades BLAST-based Pathogen Identification"

Specifically, we've found an inherent problem that is growing in seriousness due to a non-obvious emergent dynamic. Now that sequencing and bioengineering tools are getting much more accessible, lots of sequences are being studied by modifying them with "tool" sequences like purification tags, fluorescent proteins, stabilizing sequences, etc. Those sequences get (appropriately) classified based on what's being studied, and now you've got chimeric material that includes both the subject of study and the bioengineering tool. Then when you run BLAST on a sequence with that tool, you start finding that tools are classified as what they're used to study.

Example of BLAST classification failure: using a purification tag to study an Ebola protein means that now a fluorescent protein plus a purification tag gets mis-identified as Ebola.

This doesn't seem to be much of a problem for most uses of BLAST against NCBI, but it's poisonous for making biosecurity decisions, since it can cause benign sequences to be classified as dangerous or vice versa. Moreover, the effect gets stronger the more problematic a pathogen is (since more sequences are recorded) and the more useful a tool is (since more chimeric material is produced), meaning that the problem is most likely to occur in the most important.  For example, over the last two years, quite a lot of stuff has started coming back as COVID-19, since everybody in the world is studying COVID-19 with all of the tools that they can get their hands on.

This is a serious problem, and it's not likely to get better, since NCBI and BLAST aren't doing the wrong thing: they're just getting less suitable to use as a short-cut for doing something that they were never designed to do. 

So how do we fix it? Switch to tools that are actually designed for pathogen identification. We've got one (FAST-NA Scanner), and a whole bunch of other folks worked on the same problem in the FunGCAT program. The solutions are there, we just have to help folks switch to them.

Wednesday, July 13, 2022

pySBOL3: SBOL3 for Python Programmers

Our Python library for the SBOL3 standard now has an official citable publication in ACS Synthetic Biology, called "pySBOL3: SBOL3 for Python Programmers." 

The article is a good short read, but for any Python programmers, out there I recommend just jumping straight in with the tutorial instead. Happy hacking, everyone!



Tuesday, July 05, 2022

Functional Synthetic Biology

Synthetic biology isn't about sequences. Don't agree? Tell me what this is without looking it up: atgcgtaaaggagaagaacttttcactggagttgtcccaattcttgttga

Tell you what, I'll give you a hint, make it easy. It's a coding sequence translating to MRKGEELFTGVVPILV. Everybody knows this one, right?

How about this instead?


That's right. That mystery sequence up top is the first 50 bases of BBa_E0040, the widely used iGEM part with a coding sequence for GFPmut3. Now that one, a great many folks working in synthetic biology know, have used in their work, and maybe even have strong opinions about.

Notice that this is a description of biological function: the important thing is that the coding sequence makes a protein that emits a lot of green light when you hit it with a blue laser. There's a sequence in there somewhere but that's not what gets put on the whiteboard or what gets discussed.

Don't get me wrong, sequences are important. But right now we're living with a mis-match in synthetic biology, where most of our discussions about design are about function, but nearly all of our tooling is heavily focused on sequences (e.g., GenBank format), with any information about function tacked on as an afterthought or else confined to specialized databases that each pose their own sui generis integration problem. 

We need a new focus on functional synthetic biology, and that's one of the things we've been working on in the iGEM Engineering Committee. We're trying to change how we do synthetic biology, so that we can pull together the work that lots of people have been doing on calibration, insulation, characterization, context effects, modeling, assembly, etc., in one place and make at least a small class of synthetic biology engineering really simple and predictable.

We aren't there yet, but we've gotten to the point where we think we've figured out some of the important shifts in thinking, representation, and tooling that need to happen in order to make functional synthetic biology possible. If you're interested in this too, I encourage you to read more in our newly available pre-print on Functional Synthetic Biology.