Monday, December 26, 2016

Cooler than Magic

Our four-year old daughter doesn't believe in Santa Claus. We didn't push the Santa story, nor did we discourage it: we just let Santa be one of the many enjoyable cultural figures whose stories she knows, along with such other luminaries as Elsa and Anna, Durga, Peppa Pig, Jesus, Cinderella, Shiva, and Team Umizoomi. We've got a "no lies" policy in our parenting, though, so when some time ago she asked straight up if Santa Claus was real, the answer was simple: Santa is not real, but is a game that grownups and kids play, and once she knows the truth, she's playing on the grownup side and it would be very rude to spoil the game for other kids.

I'm certain that some people would say that we have done a terrible thing, stealing one of the pieces of magic from her life and shortening her childhood's period of innocence and wonder. I believe, however, that such concerns stem from a misunderstanding of what is so special about innocence and wonder. The innocence and wonder of children is tied up in many ways with their ignorance: you cannot experience wonder in exploring the geometry of a cardboard tube, for example, unless you are largely ignorant of geometry and cardboard tubes. I find that many people seem to confuse the ignorance, which is anything but valuable, with the innocence and wonder that happen to be associated with it in time.

As a parent, observing my child and other children, my current best understanding is that the importance of innocence is thinking and acting without feeling hemmed in and constrained by fears, and that the importance of wonder is the experience of how remarkable and marvelous our lives can be. It is much easier to achieve these states when we are ignorant, but we need not give them up merely because we become experienced. Rather, I think they are a critically important part of what makes life worth living.

Nor does knowing a truth prevent you from enjoying fiction. In my own experience, holding a very clear understanding of the distinction between fact and fiction can let you enjoy the fiction all the more because there is less need to be worried about whether it is true or how it may be affecting things. I even find it to be a good way to disempowering things that I am concerned about as a parent, like the Disney princess culture. If we try to hide from it or focus on criticism, that just gives it power as something that grownups are worried about. But if we embrace it, and acknowledge how fun the story is, but also help to remind and remember how it's just stories, and that stories are different from the real world, then it can draw the power out and turn literary criticism into a game that we can play together. Elsa is still an awesome queen with seriously cool ice magic, even if she'd be really sick if she was a real person with a waist that is so tiny.

The biggest difference, I find, in talking about fiction versus reality is that the "Why?" questions for fiction quickly ground out in questions of narrative structure. "In 'Moana', why did Te Fiti have her heart on the outside where Maui could steal it?" "They didn't give a reason. We can try to make up some reasons, but mostly it needed to be somewhere that he could steal it so that the story could happen." With reality, however, there is always something more, something outside, and ultimately things always ground out in an answer, and opportunity to look things up and learn, or, eventually, "That's a really good question, and we don't know the answer yet!"  And then maybe we can talk about how we might try to find out.

The reality of the world is so intricate, and its systematicity so remarkable that I find wonder is never very far from the surface. Driving through the fields a couple of days ago, I noticed a microwave relay tower and pointed it out, and we talked about communication. I love the great big scoops, posed on top of their rectangular spike of concrete, I love that microwaves interact with metal in such a way that their circuitry often ends up looking more like plumbing than wiring. I love that the waves themselves are inches wide, while an FM radio wave is about three times my daughter's height, and the light that I see is a bit less than a millimeter. I love that our civilization has erected concrete towers standing high across the plains like a high-tech semaphore network, relaying messages in a network little different in principle than moving goods in trucks or messengers riding horses in centuries past. There are thousands and thousands of such edifices out there, and many people have been involving in building them, bringing together on each of those thousands of sites a whole assemblage of heavy machinery, cranes, truckloads of electronic equipment, and so on. The more I think about it, the more wondrous that grey square lump of concrete with the funny metal scoops becomes.

Some of my wonder and joy, I think, comes from the training that I have received as a scientist. Most, however, does not need much specialization or knowledge to appreciate, just an open mind and a willingness to experience one's ignorance and look for answers. I would not say to embrace or celebrate ignorance, but to simply recognize it as a state of knowledge that one can choose what one wants to do with. A lot of people write themselves off from knowledge, saying that they just don't have the right head or the right training. I think that often this may come from a discomfort at being ignorant, and I am saddened by it. I think that we will all be better off, both as individuals and as citizens of our civilization, the more that we are able to embrace our innocence as seekers of knowledge and connoisseurs of the wonder of the world.

Later on Christmas day, as we looked through the window at the rain coming down outside, my daughter mentioned that her teacher had said that it would rain on Christmas. I asked her how she thought her teacher had known, and she told me her teacher must have read a weather forecast. And then we all talked as a family about how weather forecasts use satellites to look down from way above the Earth so that they can see what's going on over a very big distance, and see what weather is coming towards us over the next few days.

My daughter's eyes grew wide with wonder, and she said:  "That's even cooler than magic!"

I agree with her most strongly.

Monday, November 28, 2016

Reflections on iGEM 2016

This year was my third year to go to iGEM, the genetic engineering jamboree, and while my big news was the interlab study, in fact we had done most of the work on that in advance and as a result that occupied a relatively small portion of my weekend. Much more of my time was spent fulfilling my responsibilities as a judge, as well as parenting: this year I brought my daughter Harriet along, which was both awesome and challenging.

Harriet at the iGEM jamboree opening session

I also couldn't stay until the end, which I did on both of the previous occasions: the last day of the jamboree was Halloween, and at this age that's very important for me to be at home for. So I left Boston before dawn, and was lucky that my flight got in early enough for me to catch the awards ceremony on a cell-phone broadcast from one of the teams.

Reflecting back, this year was very different for me from my previous two experiences. The first time I went, in 2014, was the first interlab study, my first experience as an iGEM judge, and my first time speaking in front of an audience of thousands. Last year, in 2015, I spent much of my time in an electrifying deep partnership of data analysis with my colleague Markus, as we sorted out bit by bit what the interlab study was telling us, along with the drama of having one of the teams that I had judged becoming a winner of the contest as a whole.

This year was... unexceptional, and I should not complain.

iGEM is a beautiful and unique event, and I'm glad to support and be a part of it.  But I'm also a little bit sad to have experienced the expected regression to the mean.

Tuesday, November 22, 2016

Maturity, availability, pacing: the standards release process

The SBOL 2.1 update that I discussed recently had been in the works for nearly a year, practically since we released version 2.0.  Today I want to talk about why we released SBOL 2.0 at that time, when we already knew we'd be updating a bunch of things in SBOL 2.1.

One of things that I've observed happens with standards, like any big project, is that there's always a lot of different changes in progress at different levels of maturity. This isn't a problem, but rather a sign of a healthy and growing area of work, where there are a lot of participants with different and evolving needs. This puts a three-way tension into development, between maturity, availability, and pacing:

  • Maturity: the more polished and well-featured a standard, the better that it will serve its users.
  • Availability: something that isn't approved and released doesn't actually serve anybody.
  • Pacing: unlike a phone app or a web service, which can resolve maturity vs. availability with constant updates, a standard's users are tool developers, and every time we make a new release, we make new work for them.

Given this three-way dilemma, the best thing for us to do seems to be to set some sort of relatively regular schedule on which we plan our releases. We've got something of a natural pacing for this as a community, since there are SBOL workshops twice a year. There are always discussions going on in the community, on its mailing lists, between individual developers, amongst the leadership, etc.  At the workshops, the issues raised and the details of proposed solutions get hammered out in more detail, and it there's clear progression toward consensus, these new steps forward can be formally adopted and written into the standard.

About once every six months, then, we're in a good position to assess whether there has been significant enough progress to plan for a new release. Then, when we've decided we need to make a release, every one of those changes-in-progress ends up going through a triage process, where we try to figure out if it's mature enough to actually get put into this particular release or if it needs to get deferred to the next one. Again, maturity versus availability: we end up with a push and pull back and forth between the desire to get more of these corrections and improvements in versus the need to actually make a release so that people can start officially using the things that are already finished.

So when we released SBOL 2.0, back in the summer of 2015, we weren't saying "the standard is done," but "the standard is good enough that we think it will be valuable for a lot of people." And part of making it valuable is actually deciding what to exclude, because including something that's not well developed is inviting problems, incompatibilities, and frustration in developers and users.
Instead, all of that stuff that we know we need to finish, or argue about, or even just plain contemplate, has ended up as issues in our online tracker. Over the next year, members of the community worked on some, argued about some, and ignored others, and over the course of a couple of workshops, we got to the point where SBOL 2.1 made sense---and found a nice external deadline as a forcing function too, which was also useful.

Now SBOL 2.1 is out, and after a short refractory period, the process has continued. People need to talk about permutations and combinations of DNA, circular pieces of DNA raise new and different ambiguities to resolve, nobody's really quite comfortable with certain things about proteins, and so on and on and on and on. Step by grinding little step, we'll work our way to SBOL 2.2, helping to work out these problems of description and communication that people from all around the synthetic biology community are dealing with. Maturity, availability, pacing, and an open invitation for more to join us in the work.

Friday, November 18, 2016

The Living Computing Project

Yesterday, I spent the whole day immersed in the engineering challenges of computing with living organisms. The occasion was an internal team workshop on the Living Computing Project, a really cool NSF-funded project involving a whole bunch of synthetic biologists at MIT and BU.

What makes this particular project so unique is that it's not tied to particular applications, but instead lets us really focus on the foundational questions of how to store, process, and communicate information inside of living cells. In this project, we get to build the engineering models and tools that will enable all sorts of different applications, and I get to play a sort of "metrics czar" role, integrating lots of different elements across the project as a whole, as well as connecting our work on the project with work being performed in other organizations.

We've been working on this project for nearly a year now, and in our meeting yesterday, I saw a number of places where things are really starting to gel intellectually. I'm excited to have people interested in getting their units right, and about getting precise and predictive models that can let us know right away whether something that we want to build is likely to be feasible and to go right to the designs we want. I'm also very excited about the potential in some of the biological devices that we are working on and with.

I like to have my research be applicable and have a clear use story. But sometimes, it's also nice to just be a engineering scientist and work on the tools we get to use in building those applications too. I'm happy that we're working with folks at NSF who share that vision and understanding, and have been willing to give us some rope to go and work on the foundations of the field.

Tuesday, November 15, 2016

iGEM 2016 Interlab Presentation Posted

For those who may be interested in more details about our highly successful fluorescent measurement experiment at this year's iGEM: I have now posted the slides that we presented at iGEM in the interlab workshop. These slides contain:
  • details on the setup of the experiment
  • a discussion of the problems that we encountered and lessons learned
  • a sampler of the data and key results
The slides can be accessed at the top of the 2016 iGEM interlab page or via this direct link.

Complete details and raw data will be presented in the paper, once that is available. As of now, it still only just beginning to be written, but I will post news as it becomes available.

Friday, November 11, 2016

A few words on the election, as an American scientist

Although I usually stay strictly away from politics in my online (and thus professional) presence, there is an elephant in the room, and I need to address it before this blog goes back to my regular programming.

On the morning after the election, I had email from my European colleagues expressing sympathy and concern. Others wondered whether the declared plans of our incoming administration meant that organizing and attending conferences in the US was going to become problematic. To them I sent the reply that I feel best describes our situation in general at the moment:

Donald Trump has made a lot of promises, many of them frightening to many people, many of them contradictory, and some of them almost certainly impossible. The presumptive Republican leadership in Congress has done the same. What happens when they actually attempt to govern could go in a lot of different directions, some of them reasonable (even if likely to be things that I disagree with), some of them extremely dangerous and damaging to a large number of people, and some of them potentially existentially threatening to our civilization and possibly even our species. Whether you agreed or disagreed with Hillary Clinton's promises and plans, they certainly offered a much more predictable course than I see playing out before us now, and high degrees of uncertainty have serious dangers of their own, given our species' current technological capabilities.

As a citizen, I intend to take actions to try to ensure that my government effectively supports myself, my family, and every other person in our nation and on our planet, to grow and thrive and live long and stable lives in an environment of sympathy and positively chosen peace.  I also intend to make reasonable contingency plans in case some of the bad scenarios that I now see as more likely come to pass.

As a scientist and a professional, my intention is to continue to do my best to contribute in the areas where I have unique or rare skills, abilities, and insights. I will do my best to advise my government and will continue to choose subjects for research and development that I think are likely to have more benefit than not for humanity. Even as people were voting, I was submitting a proposal addressing a subject of potential great concern, and I hope that I will have a chance to execute on that and help to reduce some of the risks that I see out there.

The future is highly uncertain to me right now, and one of the truths that my scientific honesty impels me to acknowledge is that my personal choices and actions are unlikely to make a significant difference to our future, just the same as it was a week ago. Yesterday afternoon, my father and I walked together on a bed of fossils from and ancient sea floor, laid down 375 million years ago, talking about our lives and finding ancient corals, brachiopods, and sponges. Experiences like that are a valuable reminder to me that I am very small and have been here only a very short time. At the same time, as a student of complex and self-organizing systems, I know that individual choices do matter, and that the actions of individual people in networks can indeed have a major impact on the world in which we live.

My dear readers, I urge you to think carefully about your choices and actions, and to act with as much empathy and care for your fellow humans as you can, no matter who you supported, where you live, or what your political inclinations. I urge you this now, as I would have urged this before as well and will continue to do so no matter what may happen. It is only that now, in this time of great chance and uncertainty, that I feel impelled to do so publicly as well as personally.

Thursday, November 03, 2016

SBOL version 2.1

Version 2.1 of the Synthetic Biology Open Language (SBOL version 2.1) has now been officially released, improving our means of representing biological designs. In addition to a bunch of little technical changes useful for implementation, the key changes with this release are:

  • You can describe how the nature of a biological component changes when it's used as part of a larger design.
  • It's simpler to mark up features of a design.
  • We filled in a bunch of missing terms for describing interactions between components.
  • You can describe the topology of components (e.g., circular vs. linear DNA)

I have some musings on the standards release process that I'd like to share later, but for now I just want to cheer and encourage people to update if they're using SBOL and consider adopting if they aren't yet.

Sunday, October 30, 2016

Ladies and Gentlemen, we have our ruler

Dear readers, as promised earlier this week, it is my distinct pleasure to share with you today the headline results of this year's iGEM interlab study. These are preliminary results that we are still writing up for publication, but I feel that it is too exciting to keep quiet, and so I am shouting it from the rooftops today.

In synthetic biology, one of the best tools for studying the behavior of cells is fluorescence. Unfortunately we haven't had a good, accessible way of quantifying fluorescence, so you couldn't compare results from one lab to another, or even necessarily from one experiment to another within the same lab.  In short: we've needed a good ruler for measuring fluorescence.

The goal of the iGEM interlab studies has been to understand where the problems in measurement are coming from and then to use that knowledge to produce a good ruler. In the 2014 and 2015 interlab studies, we figured out that the big source of the problems didn't seem to be the biology, but how people were using their instruments and comparing their data. That was actually good news, because we had some ideas for how we might be able to fix that, and this year we tried them out. We gave every team two simple non-living calibration samples to compare their biological samples to, and hoped that this would tighten up the numbers some.

The results we got were beyond my wildest dreams.

Here's what we saw for the precision of measuring fluorescence with plate readers. We compared the standard deviation of the arbitrary unit measurements from 2015 with the calibrated measurements from 2016 before and after using the positive and negative controls for quantitative filtering of problematic tests, and got these standard deviations:

Smaller numbers are better, so you can see that we got a big improvement in accuracy from calibrating, and another big improvement from using the fact that we have real numbers to quantitatively exclude obvious protocol failures (e.g., 1000 uM FITC/OD fluorescence from a negative control).

But to really wrap your head around how big an improvement this is, you have to think about the fact that the thing that we are measuring is geometric standard deviation, for which the units are in times multiplied or divided.  In general, we would consider the normal range of values to expect from a measurement to be within two standard deviations up or down from whatever the real number is---i.e., 95% of the time, a measurement should lie within that range. With a geometric standard deviation of 35, two standard deviations up is 35*35 = 1,225.  That's more than a thousand.  Going down is another multiple of more than a thousand, meaning that all told, we would expect measurements to be accurate within a factor of a million or so.  Obviously, that's rubbish.  It's hard to do anything if you expect your measurements to be wobbling around by a factor of a million.

This year's measurements were more than 100,000 times more precise.

OK, you say, that's all well and good, but since the units were arbitrary before, nobody ever claimed that these numbers should be the same in the first place.  Many people who measure fluorescence and don't calibrate their measurements try to deal with this problem by normalizing to a positive control. The idea is then that you can say: "This is 2.7 times the control" and somebody else can hopefully measure the same control and find the same ratio.  Indeed, in last year's interlab we got pretty good results for comparing ratios of strong promoters, but we got terrible results for comparing the weak ones.

Here's the thing, though: remember that only about half of our improvement came from using the same units, while the rest came from being able to identify failures by the strange behavior of their controls.  We would thus expect to see significant improvement in precision this year versus last year, and indeed that is what we saw:

On average, normalized measurements were 70 times more precise.

The flow cytometry results are not quite as dramatic, and not as statistically strong since many fewer teams to took flow cytometry data, but the basic result is the same: orders of magnitude improvement in precision of both individual measurements and of normalized measurement.

There's a lot more to do, to go from initial results to routine and effective usage, but I believe the core results are clear:
  1. We've got a workable ruler for fluorescence.  It's not perfect, but it's orders of magnitude better than current practices.
  2. If undergraduates and high school students from all around the world can use these methods, there's no reason they can't be adopted in every biology laboratory that measures fluorescence.

Sunday, October 23, 2016

Never work without a net: why units matter

I've been working on measurement and units in synthetic biology for more than five years now, so it would seem that I should have a pretty clear understanding of the landscape. As you, dear reader, may recall, for a long time I've been arguing in favor of getting independently calibrated units into our work in synthetic biology, and working on ways to make this generally accessible. Over the past 24 hours, however, I've come across something that has blown my mind.

The arguments for having units that you can compare across different experiments, devices, and laboratories have been pretty strong and clear, since yes, of course, we want to be able to compare our work.  Many people, however, believe that it's good enough to have relative units, where you measure in arbitrary units and then normalize your data by a known genetic construct control.  I have not been comfortable with this, because you have no way to know if something goes wrong that affects your control as well.

My arguments there, however, have always felt relatively weak, because: a) how do I know if this actually happens often enough to be a real concern? and b) it sounds like I'm accusing people of doing sloppy lab work, which would be doubly unfair since most scientists I know are quite careful and since I don't do any lab work at all.  So while I've had a clear argument against relative units that it persuasive to those who are already basically in agreement, I haven't really had a leg to stand on, scientifically speaking, in my concerns about relative units, and have sort of dismissed it down to a level of secondary concerns.

But in the data from this year's iGEM interlab, we have hard evidence that relative units are not enough, because having a way to catch the results of those little mistakes really matters. I can't give you any more details for a week, not until after we officially unveil the results next week at this year's iGEM Jamboree, but it's a big deal.  Like, orders of magnitude big deal.

My world is rocked.  It's obvious in retrospect, and I've even made the argument before. There's a difference, however, between making an argument and having data staring you in the face that says that the argument is far more important than you ever had actually realized.

Basically, if you're not using units, then you're working without a net. With units, you have a chance to apply a bit of experience and common sense and realize that something is going wrong with your numbers. You might not know how or why, but usually that's not actually important, because usually it's something small and stupid (dropped a minus sign, got left and right mixed up, grabbed the wrong bottle, etc.), and the best way to fix your mistake is just to do it again, because you probably won't make the same stupid mistake twice in a row. This applies to pretty much everything in life involving numbers and measurements, not just to biological research.  Using properly calibrated units gives you a second chance to notice your mistake, and makes all the difference between embarrassment and disaster.

How big a difference, exactly? Well, I'll just go grab a cake to celebrate and I'll see you in about a week.

Friday, October 21, 2016

Why do we measure biological computations?

If you're making a biological computer, what do you need to know about its parts? That's one of the questions I'm working on as I lead the effort to organize measurement in the Living Computing Project.  One of the things that's coolest for me about this project, funded by the National Science Foundation, is that it's being funded by the computer science folks there, and so we get to really focus on questions about the fundamentals of computing with biology.  And so I've been asking this question: what is it that I actually need to know, if I want to build a computer using the DNA of a living cell?

It is very easy, in every science, not just biology, to get seduced into performing the experiments that are easy to perform and gathering the data that is easy to record from your instruments. Unfortunately, however, the numbers that you obtain this way often turn out to not be the numbers that you really need, the ones that can actually give you insight into your system and let you build upon it.

Fundamentally, measurement is a matter not of numbers but of communication. Measurements are only meaningful when they are consumed by something that makes use of those measurements.  In many scientific experiments, the only thing that you are trying to communicate is your judgement that a particular hypothesis appears to be reasonably sound (look at all those modifying adjectives!), and there's lots of different ways to do that.  When we want to build something, however, we need the parts that we are building it out of to communicate signals and numbers that enable us to understand what will happen when we use them together.  Like the way that labelling something an 8mm nut communicates that it will mesh with the threads of an 8mm screw, even though that is only one of its many dimensions, and the way the shape of a USB plug tells you everything you need to know about whether its electrical and computational characteristics will be compatible with a given socket. This communication doesn't need to be perfect, just good enough to let us decide whether to use them this way or that way, not to mention whether our project is even reasonable to consider.

So I've been looking at how we are building things, the way people talk about them, how they sketch them and what they struggle with, and I've been writing down my ideas, bit by bit, of a general workflow for building biological computations.  Not just analog or digital, not just about chemical messages or memory, but about specifying how we want to manipulate information, in whatever form and with whatever tools.  What I've got right now is very simple, but also I find that it is causing me to ask apparently simple questions to which we do not know the answer, like "How do you compare the complexity of a computation to the capabilities of a library of biological regulatory devices?" and when that sort of thing happens, my experience is that the answers may be scientifically exciting.

Monday, October 17, 2016

Bringing more AI into synthetic biology

Much of my work in synthetic biology has been founded on the importation of knowledge and methods from artificial intelligence. I want to encourage others from that background to get into the act too, since I think it will be beneficial to all involved---as long as there is sufficient listening.

People often think about artificial intelligence as being about stuff like robots and foul-mouthed chat-bots, but it's much wider and deeper than that.  For example, much early work on programming languages was considered an AI problem of "automatic programming."  In fact, one of the common complaints of AI researchers is that as soon as AI has solved a problem, it gets classified as "not really artificial intelligence" simply because the solution is now understood. 

So what are the skills and capabilities of artificial intelligence, that it can bring to other fields? My colleagues Fusun Yaman and Aaron Adler started this discussion in earnest with a talk at AAAI a couple of years ago: "How can AI help Synthetic Biology?", following this up with a paper on "Managing Bioengineering Complexity with AI Techniques" and a workshop last year on "AI for Synthetic Biology" at IJCAI, one of the main conferences in the field. It turns out that, building on core areas like knowledge representation, machine learning, planning and reasoning, robotics, etc, there are, in fact, a great wealth of possibilities for AI applications in synthetic biology, from data integration to protocol automation, from laboratory management to modeling, and many more. 

The main challenges are, more than anything else, friction at the interface between fields and getting people to listen well enough to understand which problems are useful to solve (so the AI practitioners aren't too naive about biological realities) and what types of things AI can realistically contribute (so the biologists don't view it as either magic or "just data processing").  My experience has been that it's heavy going to get connected (as is generally the case for interdisciplinary research), but that the opportunities are great, and I encourage my fellow practitioners come and get involved.

Tuesday, October 11, 2016

How to Shoot Good Pictures from a Plane

Today's topic, dear readers, is how to shoot interesting and decent pictures from airplanes. As those of you who read this blog regularly may know, I travel fairly frequently for work (although I am trying to cut down).  I also enjoy some dabbling with photography, and so one of my frequent subjects of photography is travel, particularly from up in the air while in an airplane.

One of Harriet's stuffed animals contemplates the view while traveling with me.

I have long held that if I ever stop enjoying the view from the air, then I'll know my soul is truly dead. So far, not dead (though I've had a couple of close scrapes, still).  Part of what keeps me enjoying these views is the fact that you can see so many strange and unexpected things below, if you look carefully.  Complex stories and geometry form in the ordinary landscape, even something as "flat" as the cornfields of Iowa, and there are many beautiful and mysterious things hidden in the interstices of the world.

Seeing something interesting, however, is a long way from being able to effectively capture it in your camera and to convey that same feeling of interest to others. Our eyes and brains are very good at compensating for distortions and patching around obscurations that pop out and destroy the view when captured in a photograph.  Here, then, are my tips for capturing interesting images from an airplane:
  • Sit forward of the airplane wing: Obviously, you don't want to be sitting over the airplane wing, as it will blot out most of your view.  You also don't want to sit behind the wings, however, because the hot exhaust from the engine creates large areas of rippling visual distortion.  On big planes, sitting in front of the wing may cost you a little extra if you aren't a frequent flyer, but on small regional flights you can generally pick any seat.
  • Be mindful of the window: Shooting through a window makes your life much more difficult: you need to deal with reflections of yourself, the camera, and the cabin; the rounded window frame is obnoxious to rectangular images, and the window near the border often causes significant distortion of the image.  I find that these can often be remedied by moving myself and the camera with respect to the image.  For example, with reflections, sometimes I can get out of the way, while other times I move to uniformly shadow the area that I am shooting through.  The contortions needed, however, are sometimes quite significant.
  • Takeoff and landing are key: Cameras are allowed during takeoff and landing, as they fall into the same category of "personal electronics" as music players and phones.  These are some of the best times to shoot, since you are closer to the ground and have more interesting angles on the infrastructure that you pass.
  • Haze can be helped with post-processing: when you are high up, there is an inherent haze from the amount of atmosphere between you and subjects on the ground.  This can be helped, to some degree, by post-processing; programs like Adobe Lightroom have specific mechanisms to help with haze.  They're no panacea, but they can certainly bring the image you get from your camera closer to what your eye was feeling.
  • Always, always, always have your camera out: Wonderful images appear without warning and vanish in a heartbeat, especially at takeoff and landing, and you can't exactly ask to stop or go back to find the angle that you want again.
  • Keep track of your location: It helps to know where you are, in order to be able to better interpret what you are seeing. When there is a seat-back entertainment system, this is pretty easy, since they generally have a map built in as one of the functions; without it, have a map (even the crappy one from the airplane magazine can help a lot) and keep track of the time since takeoff in order to be able to at least roughly estimate where you are.
  • Don't forget to look close to home: Despite flying into and out of it many dozens of times, I'm still finding interesting things within just a few minutes of the Eastern Iowa Airport.
  • Be prepared for possible disappointment: Despite all preparations, sometimes it's just hopeless.  Your window may be heavily scratched, smeared, fogged, or iced. There may be fog nearly down to the ground and nothing but utterly bland and boring clouds from above.  When this happens, there's nothing you can do, any more than you can about a bad sky when you're on the ground, so simply cultivate what tranquility you can.
And so, my dear readers: go out, and share your visions!  Here are a few my own personal favorites (all also already posted on my photo blog):
Snow shadows, Eastern Iowa
Peace-sign neighborhood, outside of Chicago
Boston cargo docks

Thursday, October 06, 2016

To PC or Not To PC

As I'm sure a lot of scientists do, I get a lot of requests to join program committees and review papers.  It's always a mixed blessing: on the one hand, I like to help out and it exposes me to a lot of interesting ideas. On the other hand, there are never enough hours in the day. So: how does one decide when to triage?

My reviewing commitments (not counting as an organizer) show a clear need for careful triage.

For myself, I tend to come down on the side of saying yes.  Maybe it means I'm stretched a bit thinner, but I frequently find the investment of time to be worth my while, for some combination of the following reasons:
  • If I want some event in my field to exist, I'd better be willing to contribute to it.  Even if it's going to exist anyway, being willing to serve helps make sure that the things I'm interested in have a fair and interested evaluation.
  • Reviewing papers exposes me to a wider swath of the literature.  I always need to read more, but there's always so many other things competing for my time that it often slips. Reviewing a paper, I have to pay real attention, too, which forces me to engage with material that I might otherwise have just skimmed over.
  • Reviewing papers introduces me to new communities.  There are places and people that I now keep track of regularly who I was first introduced to when they invited me to review or when I was invited to review their work.
  • Reviewing papers challenges me.  If I don't like a talk at a conference, I can just blow it off and tune into my email.  If I don't like a paper, I'd better be prepared to explain myself in a way I'm comfortable defending.  This forces me to understand my own views much more deeply than when I'm living in the echo chamber of my own research group and collaborators.
  • Reviewing papers makes me a better writer. Reviewing exposes me to both a lot of good writing and a lot of bad writing.  From the good papers, I can learn other ways to present that are effective but that are different from my own style.  As for the bad papers: it's always easier to see the flaws in another's work than in one's own.  From the really bad ones, I learn nothing, but I also see a lot of good work presented badly, and a lot of good ideas that have been badly developed, and I can learn from these mistakes.
So if it's a conference I actually go to, I'll definitely say yes; if it's a place I'm generally aware of and interested in, I'll almost certainly say yes; and if it's got people I recognize and an interesting subject, then probably.  Though I must say, sometimes I feel I desperately need a way to express myself as fully and vehemently about papers as my daughter used when she was an infant...

Baby Harriet helps with reviewing, back in early 2013

Thursday, September 29, 2016

Standards are a Slow and Grinding Process

Open standards development is true democracy: a terribly slow, painful and grinding process that is nonetheless better than all of the alternatives.

Last week was the most recent SBOL workshop (actually, just part of the larger COMBINE workshop on biological standards).  I attended remotely, so I was only there for a portion of the event, but I was still able to participate in many of the key discussions on the issues that we are trying to resolve with SBOL. Modern video conferencing is amazing, and we also just adopted new moderation rules (aimed at making discussions more inclusive and constructive) that also have the side effect of making it much easier for remote participants to get a word in edgewise.

Sitting in these discussions, the main emotions that I felt were frustration and despair, driven by my perceptions that nobody seemed to be able to agree on anything, that things I'd thought were settled were all coming apart again, and that we were just not making progress. Often, in meetings such as these, I am put in mind of the famous quote often attributed to Oscar Wilde: "The problem with Socialism is that it takes up too many evenings."  Open standards development has much the same failure mode of seemingly never-ending meetings in which everybody needs to have their say and the pettiest disagreements can seem the most impossible to overcome.

Today, however, at our SBOL editors meeting, as were were going over the state of work in progress, I was pleasantly surprised to discover just how nearsighted those perceptions were.  In fact, coming out of those apparently frustrating discussions, we have:

  • A tidy solution to a nasty technical problem (colons vs. underscores in URIs) that had threatened half the software tools that make use of SBOL.
  • Agreement with the curators of the Systems Biology Ontology that they will add a collection of terms that we need for representing genetic interactions.
  • A pretty good draft for how to represent combinatorial designs in SBOL, which lots of people want because they are using them to do "mix and match" tests of lots of different variants of their genetic designs.
  • An elegant means of representing topological information about genetic designs (e.g., is this a linear fragment of DNA or a circular plasmid?).
  • A solid draft for how to include information on the provenance and derivation of genetic designs, which will be critical for tracking how systems are built and exchanged, particularly once they are deployed outside of the lab.
  • Agreement on a simplification of how we link information about genetic components with information about how they are combined to build a larger system---an improvement so clear that it feels incredibly obvious in retrospect.
  • Rough consensus on most of a major new version of SBOL visual, for diagrams that mix genetic information with information about chemical and genetic interactions.

Example genetic system diagram from current draft of SBOL Visual 2.0

That's actually quite a lot of progress! So, why the disconnect between my perception in the meeting and the reality discovered afterward? I think that much of it is due to the fact that discussions spend very little time on the things that people do agree about.  Instead, we naturally focus on the points of disagreement.  There is also often a process of "chewing over" an idea, even if one ultimately agrees with it, in order to really understand it and to understand its implications.  With good moderation that keeps things civil, inclusive, and impersonal, you may actually hear a lot more potential disagreements (since people aren't feeling silenced), but if the fundaments are good, you can make a lot of progress even through the billowing conversational smoke of apparent disagreement.

I don't enjoy the process, but I think I may be good at it, I do like what we can achieve, and I believe that those results will be very important for a lot of people, including me, and so I stay involved.  Just sometimes, the grinding sound you hear may be both the progress of the standard and also my own teeth.

Monday, September 26, 2016

Combining Self-Organisation and Autonomic Computing with Aggregate-MAPE

One last paper from SASO 2016: I was a minor author on a paper about bringing aggregate programming to the world of self-managing systems, presented in the same workshop as my paper on MTIP. This work is in the area of autonomic computing, a concept introduced by IBM about ten years back, which basically increases the flexibility of a computer’s operating system by incorporating explicit self-representation and a planning system.  The typical architecture for this is the “MAPE loop,” an acronym that expands to Monitor, Analyze, Plan, and Execute.  This is a nice idea, but becomes particularly challenging to implement on networked systems, when one needs to figure out how to make a bunch of MAPE loops on different machines play nicely with one another.

Aggregate MAPE is implemented by with four aggregate processes: Monitor, Analyze, Plan, and Execute, each spread across all of the participating devices in the network.

This paper turns that around by spreading each component (M, A, P, E) across the network as an aggregate program, thereby separating the networking and coordination aspects from the autonomic control aspects and significantly simplifying the system.  It’s early work yet, but we’ve shown at least a proof of principle of the value of this synthesis, and the next step is elaboration into bigger and more real systems.

Thursday, September 22, 2016

Surviving Life as a Researcher

Last week at SASO/FAS*, I was offered a chance to speak about my thoughts on the scientific life to students attending the doctoral consortium organized there.  I really appreciated this opportunity, because I feel that there are many cultural myths in science that cause lots of pain and difficulty until one learns to overcome them.  Certainly, that has been my experience.

I have struggled, many times, with my relationship to the scientific life and research.  I would say that approximately once every year or two, I have hit such a deep point of frustration or despair that I have seriously contemplated simply giving up and going into some other “simpler” profession (the grass most surely is greener on the other side).  Every time, that has ended up forcing me to look more deeply into what was going wrong and why I had been frustrated, and as a consequence I have frequently ended up discovering something that I did not understand about the realities of the scientific endeavor and my relations to it, a mistake that was an important cause of the pain or failures that were hurting me.  Understanding these things lets me address them, whether by changing what I’m doing or just by changing what my goals and expectations are.

I hope, by writing some of these things down, that I can help others to be able to learn the same things that I wish that somebody had been able to educate me on.  There’s a lot of details on the slides I gave, and a lot of things that I said that are not on the slides, but here are the key takeaways that I wanted most to share:

  • The research world is much larger and has many more types of organizations than one is generally aware of during grad school.
  • Nobody can be a Renaissance man any more: find a niche that matches your strengths.
  • Make sure to work on important problems, even if that’s hard to justify and changes your direction of research over time.
  • Imposter syndrome is always with us, but some things can help to manage it.
  • In a research career, it’s never a good time to have a life.  Do it anyway.

For those interested in the full presentation, the slides are available in both PDF and PowerPoint.

Monday, September 19, 2016

The Best Paper of SASO 2016

I have the distinct pleasure to announce that our paper at this year’s IEEE SASO conference, entitled “Self-adaptation to Device Distribution Changes,” has been awarded the Best Paper award for the conference.

In this paper, we tackle one of the quandaries of computer networks that live dispersed through the environment of the real world, like all of our phones, laptops, and other personal devices: how the distribution of those devices in space can affect the systems that we run on them.  When building a distributed system, it is disconcertingly easy to accidentally build in assumptions about the density or position of devices in space, so that a system can have its behavior change radically or even fail entirely, just because a couple of devices moved or because people bunch up in some places and spread out in others.  It has been very hard to predict when these situations may arise, or to find them through empirical testing, since these problems may happen only with large numbers or very specific conditions of positioning.  Worse yet, these issues may not even be apparent in any single component of the system, but only emerge nastily when you put the pieces together.

Our work makes a start on these problems by first making a theoretical analysis of ways in which these problems arise and identifying a new mathematical property called “eventual consistency,” that describes the behavior of systems where things do not break down when devices move or their density increases.  In essence, a networked system is eventually consistent if, the more devices there are, the smoother its behavior is and the less that each individual device matters.

The intuition of eventual consistency: more devices = smoother behavior
In many ways, this notion is an extension of the concept of self-stabilization, which says that a distributed system always converges to a predictable set of values.  Eventual consistency enhances that by saying that the values also shouldn’t depend very much on how many devices there are or where exactly they are positioned in space.  Accordingly, we have taken our previous work establishing a language of efficient self-stabilizing programs and restricted it to get a language of eventually consistent programs, which we demonstrated with some examples in simulation, showing that they became smoother and more similar the more devices we had in the simulation, just like they theory said should happen.

This is cool because it takes us another step toward being able to make a lot of the nasty difficulties of building distributed systems go away.  The recognition is nice too, particularly given this paper’s rather long and bumpy prior path towards publication—we first submitted a version of it more than two years ago, and it spent a long, long time in limbo and rejections before reaching its current happy state.  I also think this work may be important, as we cope with our increasingly computer-saturated environment, and look forward to continuing along these lines.

Friday, September 16, 2016

Aggregate Programming in the GIS world

One of the projects I’ve been working on for the last year or so is an interesting application of aggregate programming to airborne sensor planning.  The motivation behind this project is that lots of people want to use data from airborne sensors, whether on drones or manned platforms, for disaster response, traffic management, law enforcement, wildfire management, etc.  Those sensors, however, spend an awful lot of their time idle because there’s a lot of time spent in transit, not near one of the things the sensor's owner is trying to get information about.  If you could share those sensors opportunistically, however, they could be put to other uses in that down-time, and that’s what the system we’ve been working on does.

It turns out that a particularly easy way to implement this sensor sharing is using Protelis, our aggregate programming framework: with the rest of the project in Java, it’s simplicity itself to set up an agent-based representation of the scenario using Protelis, with agents for platforms and for sensor goals.  Then we let the agents talk to one another with an aggregate program to set up the plan, and let the plan evolve incrementally as more information comes in, updating the agents.

One of the things that has been most fun for me in working on this project has been playing with real GIS data. To make a realistic scenario, we wanted to have a reasonable example of how stuff is actually distributed in the real world, and so we found a whole bunch of publicly available GIS information about infrastructure in the San Francisco area.  It’s amazing to me how much is just sitting out there available online, like lists of all the cell phone towers, dams, and bridges.  There’s a lot more dams that you might think, too — maybe only a few really big ones, but lots of little ones.  We’ve been working with this information using an awesome open framework made by NASA called WorldWind, which has all sorts of pre-existing resources and computational support built in, as well as making beautiful visualizations.  Sometimes I just like to sit and explore our scenarios as they play.

WorldWind visualizing a fragment of a plan for a UAV (blue) to survey critical infrastructure south of San Francisco (red)
Anyway, long story short, our implementation works nicely in the real world.  Back in May, we published the first paper describing the framework, using our San Francisco earthquake scenario to show that there really is a lot of potential for sensor sharing with realistic goals.  Our second paper, which I’ve just presented at the eCAS workshop at IEEE SASO, shows that the agent-based planning can adapt quickly and effectively to highly dynamic situations, and can also provide good resilience by having more than one platform cover the same sensing goal. Next step: testing it out in the real world, we hope, where it can move towards really being able to help people.

Wednesday, September 14, 2016

Low-Flying Trains

One of the things that I love about Europe is being in a land with a highly functional rail infrastructure.  Going from the middle of Belgium to the Southeastern corner of Germany, the train flashes along through the countryside at 250 kilometers per hour, smooth and comfortable.  If I had flown, it would take nearly the same amount of time, when waiting times and awkward connections are counted in.

More importantly, however, the flashing speed to me gives much the same feel of speed as air travel.  I may be going only 1/3 of the speed of a regional jet, but I’m flying along right down at ground level, and so it feels at least as rapid.  Silken smooth along rails (at least in Northern Europe), usually near-perfectly on time and so frequent that you hardly even need to check the schedules: you just show up, book your flight, and ride, ride, ride right into the heart of your destination, finally docking below some station’s high-vaulted roof full of intricate artistic trusses.

Bonn Train Station, Germany

Monday, September 12, 2016

ANTS 2016: The 10th International Conference on Swarm Intelligence

In my last post, I talked about my day off in Brussels, in between two meetings.  What brought me to Brussels in the first place was a plenary talk surveying my work on aggregate programming at the ANTS Swarm Intelligence conference. I enjoyed giving this talk greatly, and, so far as I could perceive, it appeared to be quite interesting and useful to my audience as well.  It was also interesting for me, as I was putting things together, to reflect on the remarkable degree to which my work in this area has gelled and reorganized itself in the past few years.  It is only such little time since we discovered field calculus, created the “self-organizing building blocks” approach to developing complex distributed systems, and elaborated the entirety of the aggregate programming stack.  As I look back, these developments seem so obvious, and yet it took so long to see, as perhaps is often the case with any significant scientific step.

This was also the first time I’d had a chance to actually attend ANTS since 2010, although I’ve been continually on the PC and following the work presented there.  It was nice to be there in person again, catching up with some folks I knew and meeting a couple of very cool new folks, and I also learned some things that I think will turn out relevant to problems that I am pursuing right now.  All told, I’d say, an excellent experience.

Sunday, September 11, 2016

Giving Directions in Foreign Lands

One of the things that happens to me sometimes when I am traveling abroad, and both amuses and delights me, is when I am taken for a local by other travelers and asked by them for directions.  It further amuses me (and also makes me supremely satisfied) when I am not only asked for directions while in somebody else’s city but also am capable of answering correctly.

Thus happened to me in Brussels, where I was yesterday, and it’s just one of those little things that can really make my whole day.  I have always had a good head for directions, and one of the things that I enjoy most when I am in a foreign city is to try to get my bearings and simply walk around to experience the environment and try to get a bit of a feel for what it would feel like to live there as a local.  So with a day to spend in Brussels, I eschewed the museums and the Mannekin Pis, and instead went for a long walk down through the Bois de la Cambre on the South side of the city, then walked back in near the center and took the metro out to wander through the Parc de Laeken on the other side, near the beautiful madness of the Atomium.

Chalet Robinson in Bois de la Cambre, Brussels, Belgium
Bois de la Cambre, Brussels, Belgium
Atomium, Brussels, Belgium
Leopold I Monument, Brussels, Belgium

Being asked for directions further affirms the feelings of enjoyment that I get from such wanderings, because it means that somebody sees me as not being out of place.  Being able to answer further strokes my ego for reasons that I suppose are rather obvious.

I’d like to think this vanity is not a problem, and indeed as such things go, I believe that it is relatively harmless.  Still, my joy is real, and so is my self-aware concern that perhaps I should be more concerned with how I choose to feed my ego.

Saturday, September 10, 2016

Relative Fluorescent Units Considered Harmful

My working title for the paper that will come from this year’s iGEM interlab study is “Relative Fluorescent Units Considered Harmful.”  It’s a bit of a playful title, invoking a computer science tradition started by the notorious Edsger Dijkstra. I think, however, that this statement is deserved and also that we can now back it up with some hard experimental evidence.

Most of the data is in—I’m just waiting for a few more teams with extensions—and it looks like we’ve got amazing results.  The big news is, in a more positive reformulation of my title, that calibrating fluorescence measurements works, and that it makes a big enough difference to be worth it.  Let me present the key conclusions that I believe we can now support in the form of responses to the most common arguments that I hear against calibrating fluorescent measurements.
Q: Wouldn’t it be difficult and costly to add fluorescence calibration to experiments?
A: The materials needed are quite inexpensive. As for difficulty: it seems to be pretty easy for undergraduates and high-school students all around the world, so professional researchers should be fine.
Q: Aren’t calibrated measurements pointless, because cell behavior varies so much in different people’s hands?
A: Not according to our results: pretty much everybody who got the protocols right, as indicated by reasonable control values, had a tight correspondence in the rest of their values as well.

Q: Aren’t arbitrary or relative units good enough, if we just want to compare fluorescence?
A: Absolutely not! You know what I said about getting reasonable control values? In our study, anybody whose controls were wonky appears to have been a lot more likely to have wonky results elsewhere too, probably indicating some sort of protocol failure.  With relative values, however, a lot of those apparent protocol failures go through, polluting the data and potentially making all sorts of trouble down the road.
Q: Why can’t we just compare to a known system in a cell?
A: This is the idea behind Relative Promoter Units (RPU) and the like, which are pretty clever.  Just as with purely arbitrary units, however, if something goes wrong in the protocol, it’s likely to affect the controls as well: RPU also appears unlikely to have caught a lot of the problems that absolute units identify in our study, again leading to pollution of data with all sort of strange failure modes.
In short, calibrated fluorescent units make a big and quantifiable difference and they’re easy to use.  Moreover, given what we’ve seen, I suspect that a lot of the “cells are so touchy and behave so differently” laboratory folk-wisdom out there is really not about the cells, but about problems with culturing and measurement protocols that go unnoticed when you’re using relative or arbitrary units.

I wish that I could say more now about what we’ve learned, but we’re planning to announce the full results at the iGEM Jamboree at the end of October, and I’m embargoing all of the details until that time.

Wednesday, September 07, 2016

Hybrid Semiconductor-Biological Systems

Another in this year’s series of studies for the Semiconductor Synthetic Biology roadmap was a meeting in Atlanta that focused on hybrid systems, in which a silicon / electronic device is physically integrated with a biological system.

This is an area that’s largely out of my area of expertise, and so I did a lot more listening than talking.  More than anything else, I was struck by how the defining feature of any semiconductor/biological hybrid system is the surface interface between the two different chemistries.  There were a lot of different interface technologies discussed, each providing a different valuable modality of connection, such as imaging and capacitance sensing, direct chemical sensing, physical stimulus increasing cell viability, receiving electrical signals from cells, etc.  Most of these wonderful proof of concept capabilities, however, are currently mutually incompatible, for the simple reason that you can’t make a surface be all of these things at once—and even where you could in theory, we don’t necessarily have the manufacturing technology to do so yet.

Still, the potential for high impact is there, even if we can only integrate and exploit some of the different modalities of interaction between cells and silicon.  Personally, I am most excited about the possibility of large-scale non-invasive single-cell assays and control.  I described this in a brief talk that I was invited to give in a session on motivating applications. Running some back-of-the-envelope numbers, I think one of the applications that would be both high-value and high-impact is improved assay technology, combining the high resolution, temporal tracking, and low-invasiveness of microscopy with the high throughput and large numbers of cells that can be obtained with flow cytometry.  At least some of the investigators and investors in the field seems to be convinced that’s of value also, and so I think it’s reasonable to hope to see improved assay devices of this sort on the market within the next five years.

Monday, September 05, 2016

Decelerating Travel

Back in June, I posted a cri de coeur on the travel schedule that I have been enduring this year.  That sequence of trips was my breaking point on travel for the year: amidst using 17 airplanes to visit five widely separated locations over the course of two weeks, I was ready to be done with travel and to just stop moving for a while.

Tired Moose, traveling with Daddy and sending pictures back to Harriet. 

And so, like a good scientist, I sat down with my last five years of travel and crunched some data to find out what has been happening differently.  To my surprise, it turned out that I have actually not been going to more scientific events (at least not a statistically significant number).  Instead, what has increased my travel markedly is something that should have been obvious to me from the start: I moved to Iowa.

Moving to Iowa impacts my travel in three different ways:
  1. Attending events in Boston now requires travel.
  2. Attending events in Washington DC now requires an overnight (or precise scheduling and luck), rather than being an easy down-and-back day trip on the shuttle from Boston.
  3. Events back home in Maine now require serious travel rather than just being a day trip.
So what actually has made this year so intense travel-wise is that I needed to go to Boston and DC a few extra times and I had a few extra life-events to go attend: seven extra trips compared to last year turned survivable into a crisis of too much travel.

Fortunately, I was also able to calculate what I need to do in order for my life to be sustainable in the way that I want it to be: I need to reduce my travel by 40%.  That’s a big enough number that it’s definitely not simple and will require compromises, but it looks like I can do it by shaving pieces here and there, by delegating certain things, and by doing more things remotely when I might otherwise prefer to be there in person.  It’s a bit scary, to be saying “no” to things that otherwise might turn into good opportunities, but I have to remind myself that by doing so, I am allowing myself to say “yes” to other things I value.

Check back in a year or two, and maybe I’ll know how well it’s working out.

Friday, September 02, 2016

Where we stand in biological design automation

Another of the events that I attended in Newcastle was an SRC Workshop on the interaction between electronic design automation and biological design automation. This takes a little unpacking to explain. For the past year or so, I have been part of a road-mapping project organized by the Semiconductor Research Corporation, which is in essence a research foundation run by the United States semiconductor industry. As we are approaching the end what physics allows in improving ordinary computers, this organization is investigating possible new directions for the industry to expand, and one of those directions is towards biology. There are a number of different aspects of this investigation, some of which I have talked about before, and more of which I will talk about in the future, but the one that we were discussing in Newcastle is how the methods used to design electronic systems might help in designing biological systems and vice versa.

Now those of you who have been following me and my work and my writings may know that I am a big advocate of biological design automation. I think that computational tools and models are the only way to really achieve the potential engineering biology. So it might be surprising then, dear reader, for you to hear that one of the big themes that I heard developing at this workshop was that lack of biological design automation software is not the bottleneck in the area and probably will not be for some time.

The issue is this: what I mean when I say "the lack of software is not the issue" is not that we do not need the software. We desperately need good biological design automation software. But if I had two years and 500 programmers, I could not produce that software right now. And that is what the challenges of adapting EDA techniques to the BDA environment is really about, in my opinion. The bottleneck is not a software problem that can be solved by a sufficient application of industry resources and know-how. Instead, as I have argued before, the bottleneck is and currently remains the lack of good devices, characterization data, and models of composition. Until we have a better understanding of what we are looking to automate, we cannot solve the problem by throwing software resources at it.

BDA software, however, does have a critical role to play in solving the problems of biological design and engineering. This is because automation tools expose the requirements of engineering in an especially clear and difficult to evade fashion. Unfortunately, much of the work that has been done on the characterization of biological systems and devices is simply not usable in biological design beyond the most simple and qualitative level. Please understand that this is not a criticism of that work, which is in many cases very good indeed, but a necessary recognition that its purposes have typically been more explanatory and exploratory, and that the knowledge produced from such an investigation is simply not sufficient for the requirements of establishing routine engineering control over the phenomena in question. The degree of precision, curation, and completeness necessary for an excellent scientific publication is simply much lower than what is required for a design automation tool because computer algorithms are unforgivingly stupid, while the people who read scientific papers are very intelligent. That means that if there is any ambiguity or gap in knowledge that is pertinent to the engineering of the system, then an automation tool will almost certainly run afoul of it and force us to confront these issues directly that we otherwise might overlook until they came back to bite us and cost millions of dollars of wasted effort and disappointing failures.

And so, I believe that high-level design automation and the supporting knowledge necessary to enable it need more time to develop and mature before they become an industrially viable business for more than certain niche applications. Right now, the places where automation tools are likely to be of high value are at a low-level, such as we se in protein design, CRISPR nuclease design, codon optimization, etc. Likewise, there is a great deal of potential opportunity in the automation of laboratory equipment and I expect a high potential for disruptive innovation in this area given the high premiums and extreme vertical integration common in laboratory suppliers at present. Any business that can begin shifting biological engineering instruments from from the current "car sales" model towards something more like office equipment services might be able to radically affect the area. Similarly, I foresee great disruptive potential for anyone who can bring microfluidics from specialty investigation to a set of compact and user-friendly tools.

In the meantime, more basic research funding is still needed to enable the development of characterization and devices that will be able to support the more complex biological engineering targets of the future. This is particularly true since I do not see this being profitable area for large companies to invest in for the near future, given that there are still so many pieces of low-hanging bioengineering fruit that can be collected for significant financial return. 

Thus, at present, I see biological design automation as another good example of a technological area where government investment is likely to spur the foundation of multi-billion dollar industries, if only we can start it going.

Wednesday, August 31, 2016

Biological Foundries

While in the UK, I had a gap in my schedule at IWBDA, and so I went to Edinburgh.  It was a lovely train ride up the coast from Newcastle, and for the first time in my life I had an opportunity to visit Scotland, birthplace of my McDonald and Houston ancestors.  Getting off the train at Edinburgh station, I was immediately struck by the remarkable degree to which the city is a center of culture, from the omnipresent Robert Burns quotes in the station to the theaters all about, not to mention the burgeoning festivals just beginning their month of explosion through the streets.  It was a fine warm, sunny day, and I immediately set off walking across the town toward my main business: visiting colleagues.

One of my stops that day was the Edinburgh Genome Foundry, one of only few such centers like it in the world. At present, I am aware of only eight: five in the US (Ginkgo Bioworks, Amyris, Zymergen, MIT/Broad, and Urbana-Champaign), two in the UK (Imperial and Edinburgh), and one at the National University of Singapore.  I may well be missing some, and others may be getting founded even as I speak (the UK and Singapore foundries are just getting off the ground), but the point is there's a small but growing number of such centers, both in industry and academia.

All of these foundries are aimed at much the same basic goal: to greatly increase the rate at which complex genetic materials can be engineered and tested. The focus differs somewhat from foundry to foundry: some are more focused on assembly, while others are more focused on information processing circuits, and yet others on chemical synthesis.  And of course, as each is its own unique cutting-edge experiment, the particulars of how each is set up are quite different. At their hearts, however, every foundry is the same, being essentially a robot-assisted machine shop for genes, formed of a number of stations of automated lab equipment, fluid-handling robots, and some combination of industrial manipulator arms and lab technicians to move materials from station to station.

Biological foundries are different than other laboratories incorporating automation in that they have much more flexibility, and with that flexibility comes a higher order of challenge in organizing the informational side of the foundries, and thus my interest.  In order to make a foundry work well, you need to have some sort of explicit representation of the genetic constructs that you are aiming to manipulate, the biological processes that you hope to create and affect by means of those constructs, and the various protocols and assays that you intend to perform in order to manufacture and evaluate them.  Much of that is well-represented by SBOL, and if they don’t choose SBOL they will likely end up having to recapitulate its development, so I am hoping we can ensure that all of the foundries adopt SBOL (it is being used within at least some already). Beyond that, I hold that it will also be important to for the foundries to adopt good unit calibration in their assays, and to consume that data in model-driven software design tools, in order to avoid some of the past tragedies of large biological characterization projects that produced largely non-reusable data.

Moreover, in a world with many biological foundries, I suspect that ultimately the ones that will have the largest impact will be those that open their processes and data and ensure that their works are recorded in good interchangeable standards.  Some are commercial concerns, of course, and that will limit their ability to share results out, but if those at least are able to take standardized information in, it will no doubt help them in the marketplace as well.  For biological foundries, like everything else, we no longer live in a world where isolated “moon shot” projects are a particularly competitive way to pursue either science or commerce, and I hope that their operators are well able to come to grips with this reality.

Monday, August 29, 2016

Waiting for data is the scariest part

Two years ago, in partnership with the iGEM foundation and some excellent colleagues, I ran the largest inter-laboratory study ever conducted in synthetic biology, in which forty-five teams participating to create an international baseline on fluorescent measurement, synthetic biology’s favorite debugging and reporting tool. Last year, we ran the world’s largest synthetic biology inter-laboratory study, in which eighty-five teams contributed to help figure out that the calls were coming from inside the house the most critical source of error in measuring fluorescence is problems using instruments and handling data.

DIY fluorimeter tested by Aix-Marseille 2015 interlab team

This year we’re once again running the world’s largest synthetic biology inter-laboratory study, this time with ninety-one teams, who are helping us test experimental protocols aimed at fixing the problem: we distributed calibration materials and calculation spreadsheets that should let everybody measure their systems in the same, directly comparable units.  This might not sound like a big deal, but imagine how confusing life would be if you measured with a ruler marked in centimeters while other people were using mils, rods, chains, furlongs, and leagues, but you didn’t know your markings were different.  The world of fluorescence is that bad and worse right now: last year, the numbers we received from different teams measuring the same genetic system varied by a factor of more than one trillion.  Given all that, it’s frankly remarkable how much precision the teams were able to achieve in the ratios between units.  If we can get everybody using centimeters (metaphorically), it should let things become much better still, since people will be able to compare experiments directly and figure out much earlier when something’s gone wrong in their experiment and needs to be debugged and redone.

Right now, though, the project is in the middle of its most scary and exciting phase.  The teams have got the protocol and the materials, and we’ve debugged as much as we could of problems in its design (next year: corrected spreadsheets, better tube stoppers, and a giant red warning sticker that says “Don’t freeze the LUDOX!”).  At this point, there is nothing much that I can do any more to positively affect the results of our experiment: just take the data from the teams as it arrives, process it, and stare in nervous excitement and concern at the evolving numbers.

Running the iGEM interlab is awesome and scary, a big responsibility: these folks are investing their time, resources, and trust in us as organizers of the study, and we have a responsibility not waste their time and to ensure that what we do is both good science and good education. So far it has gone well, and the preliminary results are looking good, but there’s a lot of data still out there being gathered and anything can happen.

I’m excited and scared, and I love these young people for making such a grand effort possible, for seizing the opportunity and understanding what an important thing this is and how much of a difference their work can make.  Some have run into obstacles and had to withdraw, or have turned in broken or patchy data, and I tell them how much their contribution matters too, because it tells us how things go wrong and what needs to be improved in order to get everybody good rulers for their work. Young men and women, in every corner of the world, all doing their part to contribute one more brick, small but significant, to the foundations of science and society.

I am honored to have the privilege to lead an effort like this, and I'll be on eggshells until we know how well it has succeeded.