Saturday, May 04, 2019

Down in the weeds with flow cytometry

As a computer scientist and an engineer, I love flow cytometry, and today I'm excited to tell you about the new paper that we've just published about the subject.

I love flow cytometry because it's the closest that I currently get to being able to stick logic probes into cells (though we're trying to do better). I also get measurements from hundreds of thousands of individual cells, so there's more than enough to get really deeply into their statistics and learn a lot. Plus, it uses frickin' laser beams: the cells' fluorescence actually gets interrogated by sputtering them in a stream past several lasers of different colors, which blast the cells so that we can see the light that gets thrown off in response.

All well and good, but actually using the information is not all joy and laser beams. There's a whole bunch of complexities to deal with in order to turn the raw numbers into reliable measurements of the biology I'm interested in, rather than just the physics of blasting cells with lasers. And not just cells either: the first thing we have to do is try to sort out the single cells from the bits of debris and from the pairs and clumps of cells. Then you have to trim off the background fluorescence of the cells, sort out spectral overlap between the different proteins and lasers, and then somehow relate all of those numbers to molecules and make them comparable even though the different colors come from different molecules.

A typical workflow for processing raw flow cytometry data into comparable biological units, implemented by our TASBE Flow Analytics tool.
Getting all of this right is surprisingly difficult. It's not that any one thing is all that difficult, but there's a lot of them. All of these have failure modes and hidden gotchas to deal with too, and it's easy to stumble over one thing or another, especially when you're dealing with datasets with hundreds of samples. As a good (lazy) computer scientist, my response to challenges like this is to make the computer do it for me instead, and so my colleagues and I have done so.

In a way, it's surprising that we've needed to do this, given that flow cytometry has been established for decades and is widely used in both medicine and research. Most people, however, still aren't trying to use the data for precision quantitative modeling with big data sets in the way that a few folks (including me) have been doing in synthetic biology. As such, the prior tools that were out there weren't up to the job.  It's not that there is anything wrong with these tools, it's just that they are not designed to be good at the particular types of automation and assistance that I've found are needed for characterizing systems in synthetic biology. Thus, we've ended up needing to build our own tools, and have been incrementally developing and refining them for years on project after project after project that has made use of them.

Two years ago, we began to share with others by releasing our TASBE Flow Analytics package as free and open source software, and as of yesterday it's also been officially published on in a scientific paper in ACS Synthetic Biology. I'm pretty happy with where our tools have gotten and the number of projects and collaborators that they are supporting, with different modes and interfaces for different folks:

  • The basic Matlab/Octave/Python interactive interface, which is what I most often use myself as a programmer-analyst.
  • An Excel-sheet interface, which our laboratory-based collaborators have found much more intuitive and user friendly, since it's a lot like the spreadsheets they use to design their experiments in the first place.
  • A scripting interface for use in high-throughput automated systems, which is how it's being used in the DARPA Synergistic Discovery and Design (SD2)  program.
Infrastructure projects like this aren't particularly flashy or cool, but it's the sort of thing that greatly changes what can be accomplished. We're back to third grade science once again, and those foundations of the scientific method: units and reproducibility. TASBE Flow Analytics is one more piece of that puzzle, and I hope that it will continue to expand the number of people and projects that are benefiting from high-quality measurements of cells.