Thursday, September 29, 2016

Standards are a Slow and Grinding Process

Open standards development is true democracy: a terribly slow, painful and grinding process that is nonetheless better than all of the alternatives.

Last week was the most recent SBOL workshop (actually, just part of the larger COMBINE workshop on biological standards).  I attended remotely, so I was only there for a portion of the event, but I was still able to participate in many of the key discussions on the issues that we are trying to resolve with SBOL. Modern video conferencing is amazing, and we also just adopted new moderation rules (aimed at making discussions more inclusive and constructive) that also have the side effect of making it much easier for remote participants to get a word in edgewise.

Sitting in these discussions, the main emotions that I felt were frustration and despair, driven by my perceptions that nobody seemed to be able to agree on anything, that things I'd thought were settled were all coming apart again, and that we were just not making progress. Often, in meetings such as these, I am put in mind of the famous quote often attributed to Oscar Wilde: "The problem with Socialism is that it takes up too many evenings."  Open standards development has much the same failure mode of seemingly never-ending meetings in which everybody needs to have their say and the pettiest disagreements can seem the most impossible to overcome.

Today, however, at our SBOL editors meeting, as were were going over the state of work in progress, I was pleasantly surprised to discover just how nearsighted those perceptions were.  In fact, coming out of those apparently frustrating discussions, we have:

  • A tidy solution to a nasty technical problem (colons vs. underscores in URIs) that had threatened half the software tools that make use of SBOL.
  • Agreement with the curators of the Systems Biology Ontology that they will add a collection of terms that we need for representing genetic interactions.
  • A pretty good draft for how to represent combinatorial designs in SBOL, which lots of people want because they are using them to do "mix and match" tests of lots of different variants of their genetic designs.
  • An elegant means of representing topological information about genetic designs (e.g., is this a linear fragment of DNA or a circular plasmid?).
  • A solid draft for how to include information on the provenance and derivation of genetic designs, which will be critical for tracking how systems are built and exchanged, particularly once they are deployed outside of the lab.
  • Agreement on a simplification of how we link information about genetic components with information about how they are combined to build a larger system---an improvement so clear that it feels incredibly obvious in retrospect.
  • Rough consensus on most of a major new version of SBOL visual, for diagrams that mix genetic information with information about chemical and genetic interactions.

Example genetic system diagram from current draft of SBOL Visual 2.0

That's actually quite a lot of progress! So, why the disconnect between my perception in the meeting and the reality discovered afterward? I think that much of it is due to the fact that discussions spend very little time on the things that people do agree about.  Instead, we naturally focus on the points of disagreement.  There is also often a process of "chewing over" an idea, even if one ultimately agrees with it, in order to really understand it and to understand its implications.  With good moderation that keeps things civil, inclusive, and impersonal, you may actually hear a lot more potential disagreements (since people aren't feeling silenced), but if the fundaments are good, you can make a lot of progress even through the billowing conversational smoke of apparent disagreement.

I don't enjoy the process, but I think I may be good at it, I do like what we can achieve, and I believe that those results will be very important for a lot of people, including me, and so I stay involved.  Just sometimes, the grinding sound you hear may be both the progress of the standard and also my own teeth.

Monday, September 26, 2016

Combining Self-Organisation and Autonomic Computing with Aggregate-MAPE

One last paper from SASO 2016: I was a minor author on a paper about bringing aggregate programming to the world of self-managing systems, presented in the same workshop as my paper on MTIP. This work is in the area of autonomic computing, a concept introduced by IBM about ten years back, which basically increases the flexibility of a computer’s operating system by incorporating explicit self-representation and a planning system.  The typical architecture for this is the “MAPE loop,” an acronym that expands to Monitor, Analyze, Plan, and Execute.  This is a nice idea, but becomes particularly challenging to implement on networked systems, when one needs to figure out how to make a bunch of MAPE loops on different machines play nicely with one another.

Aggregate MAPE is implemented by with four aggregate processes: Monitor, Analyze, Plan, and Execute, each spread across all of the participating devices in the network.

This paper turns that around by spreading each component (M, A, P, E) across the network as an aggregate program, thereby separating the networking and coordination aspects from the autonomic control aspects and significantly simplifying the system.  It’s early work yet, but we’ve shown at least a proof of principle of the value of this synthesis, and the next step is elaboration into bigger and more real systems.

Thursday, September 22, 2016

Surviving Life as a Researcher

Last week at SASO/FAS*, I was offered a chance to speak about my thoughts on the scientific life to students attending the doctoral consortium organized there.  I really appreciated this opportunity, because I feel that there are many cultural myths in science that cause lots of pain and difficulty until one learns to overcome them.  Certainly, that has been my experience.

I have struggled, many times, with my relationship to the scientific life and research.  I would say that approximately once every year or two, I have hit such a deep point of frustration or despair that I have seriously contemplated simply giving up and going into some other “simpler” profession (the grass most surely is greener on the other side).  Every time, that has ended up forcing me to look more deeply into what was going wrong and why I had been frustrated, and as a consequence I have frequently ended up discovering something that I did not understand about the realities of the scientific endeavor and my relations to it, a mistake that was an important cause of the pain or failures that were hurting me.  Understanding these things lets me address them, whether by changing what I’m doing or just by changing what my goals and expectations are.

I hope, by writing some of these things down, that I can help others to be able to learn the same things that I wish that somebody had been able to educate me on.  There’s a lot of details on the slides I gave, and a lot of things that I said that are not on the slides, but here are the key takeaways that I wanted most to share:

  • The research world is much larger and has many more types of organizations than one is generally aware of during grad school.
  • Nobody can be a Renaissance man any more: find a niche that matches your strengths.
  • Make sure to work on important problems, even if that’s hard to justify and changes your direction of research over time.
  • Imposter syndrome is always with us, but some things can help to manage it.
  • In a research career, it’s never a good time to have a life.  Do it anyway.

For those interested in the full presentation, the slides are available in both PDF and PowerPoint.

Monday, September 19, 2016

The Best Paper of SASO 2016

I have the distinct pleasure to announce that our paper at this year’s IEEE SASO conference, entitled “Self-adaptation to Device Distribution Changes,” has been awarded the Best Paper award for the conference.

In this paper, we tackle one of the quandaries of computer networks that live dispersed through the environment of the real world, like all of our phones, laptops, and other personal devices: how the distribution of those devices in space can affect the systems that we run on them.  When building a distributed system, it is disconcertingly easy to accidentally build in assumptions about the density or position of devices in space, so that a system can have its behavior change radically or even fail entirely, just because a couple of devices moved or because people bunch up in some places and spread out in others.  It has been very hard to predict when these situations may arise, or to find them through empirical testing, since these problems may happen only with large numbers or very specific conditions of positioning.  Worse yet, these issues may not even be apparent in any single component of the system, but only emerge nastily when you put the pieces together.

Our work makes a start on these problems by first making a theoretical analysis of ways in which these problems arise and identifying a new mathematical property called “eventual consistency,” that describes the behavior of systems where things do not break down when devices move or their density increases.  In essence, a networked system is eventually consistent if, the more devices there are, the smoother its behavior is and the less that each individual device matters.

The intuition of eventual consistency: more devices = smoother behavior
In many ways, this notion is an extension of the concept of self-stabilization, which says that a distributed system always converges to a predictable set of values.  Eventual consistency enhances that by saying that the values also shouldn’t depend very much on how many devices there are or where exactly they are positioned in space.  Accordingly, we have taken our previous work establishing a language of efficient self-stabilizing programs and restricted it to get a language of eventually consistent programs, which we demonstrated with some examples in simulation, showing that they became smoother and more similar the more devices we had in the simulation, just like they theory said should happen.

This is cool because it takes us another step toward being able to make a lot of the nasty difficulties of building distributed systems go away.  The recognition is nice too, particularly given this paper’s rather long and bumpy prior path towards publication—we first submitted a version of it more than two years ago, and it spent a long, long time in limbo and rejections before reaching its current happy state.  I also think this work may be important, as we cope with our increasingly computer-saturated environment, and look forward to continuing along these lines.

Friday, September 16, 2016

Aggregate Programming in the GIS world

One of the projects I’ve been working on for the last year or so is an interesting application of aggregate programming to airborne sensor planning.  The motivation behind this project is that lots of people want to use data from airborne sensors, whether on drones or manned platforms, for disaster response, traffic management, law enforcement, wildfire management, etc.  Those sensors, however, spend an awful lot of their time idle because there’s a lot of time spent in transit, not near one of the things the sensor's owner is trying to get information about.  If you could share those sensors opportunistically, however, they could be put to other uses in that down-time, and that’s what the system we’ve been working on does.

It turns out that a particularly easy way to implement this sensor sharing is using Protelis, our aggregate programming framework: with the rest of the project in Java, it’s simplicity itself to set up an agent-based representation of the scenario using Protelis, with agents for platforms and for sensor goals.  Then we let the agents talk to one another with an aggregate program to set up the plan, and let the plan evolve incrementally as more information comes in, updating the agents.

One of the things that has been most fun for me in working on this project has been playing with real GIS data. To make a realistic scenario, we wanted to have a reasonable example of how stuff is actually distributed in the real world, and so we found a whole bunch of publicly available GIS information about infrastructure in the San Francisco area.  It’s amazing to me how much is just sitting out there available online, like lists of all the cell phone towers, dams, and bridges.  There’s a lot more dams that you might think, too — maybe only a few really big ones, but lots of little ones.  We’ve been working with this information using an awesome open framework made by NASA called WorldWind, which has all sorts of pre-existing resources and computational support built in, as well as making beautiful visualizations.  Sometimes I just like to sit and explore our scenarios as they play.

WorldWind visualizing a fragment of a plan for a UAV (blue) to survey critical infrastructure south of San Francisco (red)
Anyway, long story short, our implementation works nicely in the real world.  Back in May, we published the first paper describing the framework, using our San Francisco earthquake scenario to show that there really is a lot of potential for sensor sharing with realistic goals.  Our second paper, which I’ve just presented at the eCAS workshop at IEEE SASO, shows that the agent-based planning can adapt quickly and effectively to highly dynamic situations, and can also provide good resilience by having more than one platform cover the same sensing goal. Next step: testing it out in the real world, we hope, where it can move towards really being able to help people.

Wednesday, September 14, 2016

Low-Flying Trains

One of the things that I love about Europe is being in a land with a highly functional rail infrastructure.  Going from the middle of Belgium to the Southeastern corner of Germany, the train flashes along through the countryside at 250 kilometers per hour, smooth and comfortable.  If I had flown, it would take nearly the same amount of time, when waiting times and awkward connections are counted in.

More importantly, however, the flashing speed to me gives much the same feel of speed as air travel.  I may be going only 1/3 of the speed of a regional jet, but I’m flying along right down at ground level, and so it feels at least as rapid.  Silken smooth along rails (at least in Northern Europe), usually near-perfectly on time and so frequent that you hardly even need to check the schedules: you just show up, book your flight, and ride, ride, ride right into the heart of your destination, finally docking below some station’s high-vaulted roof full of intricate artistic trusses.

Bonn Train Station, Germany

Monday, September 12, 2016

ANTS 2016: The 10th International Conference on Swarm Intelligence

In my last post, I talked about my day off in Brussels, in between two meetings.  What brought me to Brussels in the first place was a plenary talk surveying my work on aggregate programming at the ANTS Swarm Intelligence conference. I enjoyed giving this talk greatly, and, so far as I could perceive, it appeared to be quite interesting and useful to my audience as well.  It was also interesting for me, as I was putting things together, to reflect on the remarkable degree to which my work in this area has gelled and reorganized itself in the past few years.  It is only such little time since we discovered field calculus, created the “self-organizing building blocks” approach to developing complex distributed systems, and elaborated the entirety of the aggregate programming stack.  As I look back, these developments seem so obvious, and yet it took so long to see, as perhaps is often the case with any significant scientific step.

This was also the first time I’d had a chance to actually attend ANTS since 2010, although I’ve been continually on the PC and following the work presented there.  It was nice to be there in person again, catching up with some folks I knew and meeting a couple of very cool new folks, and I also learned some things that I think will turn out relevant to problems that I am pursuing right now.  All told, I’d say, an excellent experience.

Sunday, September 11, 2016

Giving Directions in Foreign Lands

One of the things that happens to me sometimes when I am traveling abroad, and both amuses and delights me, is when I am taken for a local by other travelers and asked by them for directions.  It further amuses me (and also makes me supremely satisfied) when I am not only asked for directions while in somebody else’s city but also am capable of answering correctly.

Thus happened to me in Brussels, where I was yesterday, and it’s just one of those little things that can really make my whole day.  I have always had a good head for directions, and one of the things that I enjoy most when I am in a foreign city is to try to get my bearings and simply walk around to experience the environment and try to get a bit of a feel for what it would feel like to live there as a local.  So with a day to spend in Brussels, I eschewed the museums and the Mannekin Pis, and instead went for a long walk down through the Bois de la Cambre on the South side of the city, then walked back in near the center and took the metro out to wander through the Parc de Laeken on the other side, near the beautiful madness of the Atomium.

Chalet Robinson in Bois de la Cambre, Brussels, Belgium
Bois de la Cambre, Brussels, Belgium
Atomium, Brussels, Belgium
Leopold I Monument, Brussels, Belgium

Being asked for directions further affirms the feelings of enjoyment that I get from such wanderings, because it means that somebody sees me as not being out of place.  Being able to answer further strokes my ego for reasons that I suppose are rather obvious.

I’d like to think this vanity is not a problem, and indeed as such things go, I believe that it is relatively harmless.  Still, my joy is real, and so is my self-aware concern that perhaps I should be more concerned with how I choose to feed my ego.

Saturday, September 10, 2016

Relative Fluorescent Units Considered Harmful

My working title for the paper that will come from this year’s iGEM interlab study is “Relative Fluorescent Units Considered Harmful.”  It’s a bit of a playful title, invoking a computer science tradition started by the notorious Edsger Dijkstra. I think, however, that this statement is deserved and also that we can now back it up with some hard experimental evidence.

Most of the data is in—I’m just waiting for a few more teams with extensions—and it looks like we’ve got amazing results.  The big news is, in a more positive reformulation of my title, that calibrating fluorescence measurements works, and that it makes a big enough difference to be worth it.  Let me present the key conclusions that I believe we can now support in the form of responses to the most common arguments that I hear against calibrating fluorescent measurements.
Q: Wouldn’t it be difficult and costly to add fluorescence calibration to experiments?
A: The materials needed are quite inexpensive. As for difficulty: it seems to be pretty easy for undergraduates and high-school students all around the world, so professional researchers should be fine.
Q: Aren’t calibrated measurements pointless, because cell behavior varies so much in different people’s hands?
A: Not according to our results: pretty much everybody who got the protocols right, as indicated by reasonable control values, had a tight correspondence in the rest of their values as well.

Q: Aren’t arbitrary or relative units good enough, if we just want to compare fluorescence?
A: Absolutely not! You know what I said about getting reasonable control values? In our study, anybody whose controls were wonky appears to have been a lot more likely to have wonky results elsewhere too, probably indicating some sort of protocol failure.  With relative values, however, a lot of those apparent protocol failures go through, polluting the data and potentially making all sorts of trouble down the road.
Q: Why can’t we just compare to a known system in a cell?
A: This is the idea behind Relative Promoter Units (RPU) and the like, which are pretty clever.  Just as with purely arbitrary units, however, if something goes wrong in the protocol, it’s likely to affect the controls as well: RPU also appears unlikely to have caught a lot of the problems that absolute units identify in our study, again leading to pollution of data with all sort of strange failure modes.
In short, calibrated fluorescent units make a big and quantifiable difference and they’re easy to use.  Moreover, given what we’ve seen, I suspect that a lot of the “cells are so touchy and behave so differently” laboratory folk-wisdom out there is really not about the cells, but about problems with culturing and measurement protocols that go unnoticed when you’re using relative or arbitrary units.

I wish that I could say more now about what we’ve learned, but we’re planning to announce the full results at the iGEM Jamboree at the end of October, and I’m embargoing all of the details until that time.

Wednesday, September 07, 2016

Hybrid Semiconductor-Biological Systems

Another in this year’s series of studies for the Semiconductor Synthetic Biology roadmap was a meeting in Atlanta that focused on hybrid systems, in which a silicon / electronic device is physically integrated with a biological system.

This is an area that’s largely out of my area of expertise, and so I did a lot more listening than talking.  More than anything else, I was struck by how the defining feature of any semiconductor/biological hybrid system is the surface interface between the two different chemistries.  There were a lot of different interface technologies discussed, each providing a different valuable modality of connection, such as imaging and capacitance sensing, direct chemical sensing, physical stimulus increasing cell viability, receiving electrical signals from cells, etc.  Most of these wonderful proof of concept capabilities, however, are currently mutually incompatible, for the simple reason that you can’t make a surface be all of these things at once—and even where you could in theory, we don’t necessarily have the manufacturing technology to do so yet.

Still, the potential for high impact is there, even if we can only integrate and exploit some of the different modalities of interaction between cells and silicon.  Personally, I am most excited about the possibility of large-scale non-invasive single-cell assays and control.  I described this in a brief talk that I was invited to give in a session on motivating applications. Running some back-of-the-envelope numbers, I think one of the applications that would be both high-value and high-impact is improved assay technology, combining the high resolution, temporal tracking, and low-invasiveness of microscopy with the high throughput and large numbers of cells that can be obtained with flow cytometry.  At least some of the investigators and investors in the field seems to be convinced that’s of value also, and so I think it’s reasonable to hope to see improved assay devices of this sort on the market within the next five years.

Monday, September 05, 2016

Decelerating Travel

Back in June, I posted a cri de coeur on the travel schedule that I have been enduring this year.  That sequence of trips was my breaking point on travel for the year: amidst using 17 airplanes to visit five widely separated locations over the course of two weeks, I was ready to be done with travel and to just stop moving for a while.

Tired Moose, traveling with Daddy and sending pictures back to Harriet. 

And so, like a good scientist, I sat down with my last five years of travel and crunched some data to find out what has been happening differently.  To my surprise, it turned out that I have actually not been going to more scientific events (at least not a statistically significant number).  Instead, what has increased my travel markedly is something that should have been obvious to me from the start: I moved to Iowa.

Moving to Iowa impacts my travel in three different ways:
  1. Attending events in Boston now requires travel.
  2. Attending events in Washington DC now requires an overnight (or precise scheduling and luck), rather than being an easy down-and-back day trip on the shuttle from Boston.
  3. Events back home in Maine now require serious travel rather than just being a day trip.
So what actually has made this year so intense travel-wise is that I needed to go to Boston and DC a few extra times and I had a few extra life-events to go attend: seven extra trips compared to last year turned survivable into a crisis of too much travel.

Fortunately, I was also able to calculate what I need to do in order for my life to be sustainable in the way that I want it to be: I need to reduce my travel by 40%.  That’s a big enough number that it’s definitely not simple and will require compromises, but it looks like I can do it by shaving pieces here and there, by delegating certain things, and by doing more things remotely when I might otherwise prefer to be there in person.  It’s a bit scary, to be saying “no” to things that otherwise might turn into good opportunities, but I have to remind myself that by doing so, I am allowing myself to say “yes” to other things I value.

Check back in a year or two, and maybe I’ll know how well it’s working out.

Friday, September 02, 2016

Where we stand in biological design automation

Another of the events that I attended in Newcastle was an SRC Workshop on the interaction between electronic design automation and biological design automation. This takes a little unpacking to explain. For the past year or so, I have been part of a road-mapping project organized by the Semiconductor Research Corporation, which is in essence a research foundation run by the United States semiconductor industry. As we are approaching the end what physics allows in improving ordinary computers, this organization is investigating possible new directions for the industry to expand, and one of those directions is towards biology. There are a number of different aspects of this investigation, some of which I have talked about before, and more of which I will talk about in the future, but the one that we were discussing in Newcastle is how the methods used to design electronic systems might help in designing biological systems and vice versa.

Now those of you who have been following me and my work and my writings may know that I am a big advocate of biological design automation. I think that computational tools and models are the only way to really achieve the potential engineering biology. So it might be surprising then, dear reader, for you to hear that one of the big themes that I heard developing at this workshop was that lack of biological design automation software is not the bottleneck in the area and probably will not be for some time.

The issue is this: what I mean when I say "the lack of software is not the issue" is not that we do not need the software. We desperately need good biological design automation software. But if I had two years and 500 programmers, I could not produce that software right now. And that is what the challenges of adapting EDA techniques to the BDA environment is really about, in my opinion. The bottleneck is not a software problem that can be solved by a sufficient application of industry resources and know-how. Instead, as I have argued before, the bottleneck is and currently remains the lack of good devices, characterization data, and models of composition. Until we have a better understanding of what we are looking to automate, we cannot solve the problem by throwing software resources at it.

BDA software, however, does have a critical role to play in solving the problems of biological design and engineering. This is because automation tools expose the requirements of engineering in an especially clear and difficult to evade fashion. Unfortunately, much of the work that has been done on the characterization of biological systems and devices is simply not usable in biological design beyond the most simple and qualitative level. Please understand that this is not a criticism of that work, which is in many cases very good indeed, but a necessary recognition that its purposes have typically been more explanatory and exploratory, and that the knowledge produced from such an investigation is simply not sufficient for the requirements of establishing routine engineering control over the phenomena in question. The degree of precision, curation, and completeness necessary for an excellent scientific publication is simply much lower than what is required for a design automation tool because computer algorithms are unforgivingly stupid, while the people who read scientific papers are very intelligent. That means that if there is any ambiguity or gap in knowledge that is pertinent to the engineering of the system, then an automation tool will almost certainly run afoul of it and force us to confront these issues directly that we otherwise might overlook until they came back to bite us and cost millions of dollars of wasted effort and disappointing failures.

And so, I believe that high-level design automation and the supporting knowledge necessary to enable it need more time to develop and mature before they become an industrially viable business for more than certain niche applications. Right now, the places where automation tools are likely to be of high value are at a low-level, such as we se in protein design, CRISPR nuclease design, codon optimization, etc. Likewise, there is a great deal of potential opportunity in the automation of laboratory equipment and I expect a high potential for disruptive innovation in this area given the high premiums and extreme vertical integration common in laboratory suppliers at present. Any business that can begin shifting biological engineering instruments from from the current "car sales" model towards something more like office equipment services might be able to radically affect the area. Similarly, I foresee great disruptive potential for anyone who can bring microfluidics from specialty investigation to a set of compact and user-friendly tools.

In the meantime, more basic research funding is still needed to enable the development of characterization and devices that will be able to support the more complex biological engineering targets of the future. This is particularly true since I do not see this being profitable area for large companies to invest in for the near future, given that there are still so many pieces of low-hanging bioengineering fruit that can be collected for significant financial return. 

Thus, at present, I see biological design automation as another good example of a technological area where government investment is likely to spur the foundation of multi-billion dollar industries, if only we can start it going.