Monday, September 17, 2018

Ten years at (Raytheon) BBN

Ten years ago, I started my job at BBN Technologies, my first professional step outside of my graduate school environment.

Much has changed since then: the focus of my research has shifted; the networks of colleagues I work with have grown and changed; BBN was swallowed up by Raytheon; I got married, moved to Iowa, and had two kids. I am more comfortable and confident in my skills and my abilities, I've learned to manage both my arrogance and my imposter syndrome better---and also learned that those are just two sides of the same problematic coin for me, and are likely chronic challenges that will not go away. I've learned how to more effectively say "yes" and I'm still learning to say "no."

Tiniest Moose, helping me with my work on a business trip to California.

At its core, however, my world of research and professional life is much the same. I wake up every morning entangled in the delicate balance of work that might have a profound impact on our society and work that will be completely irrelevant before it is completed (and sometimes little way to tell these apart). I take joy in my collaborators and the artifacts we produce, the satisfaction of programs working and data-points that form a beautiful line, the hope and anguish of proposals and papers submitted, rejected, and accepted. My day is a day of the craft-work of the scientific, in all of its prosaic glory, and I have every anticipation that I will find it no less engaging years from now, even if someday I end up somewhere else in terms of my career.

For its part, Raytheon, in its infinite wisdom, has informed me that in honor of my ten years of service, I am to be awarded a gift picked from a menu of some intriguingly "safe" and mediocre choices, like a fancy dart board, designer shades, a bike rack, a glowing bluetooth speaker, or a package of Omaha steaks. We picked the carpet cleaner. Happy anniversary!

Monday, September 10, 2018

What's your bus factor?

My wife and I had our second child just under three weeks ago, and as I'm slowly beginning to find my equilibrium in the midst of my parental leave, I find myself contemplating my personal bus factor.

The bus factor is a tongue-in-cheek name for a measurement of a project or organization's level of robustness. The proposition is this: let's say that some critical people in your organization are out to lunch one day, and while crossing the street, they get run over by a bus. The bus factor is the minimum number of people who, if they go under that bus, will result in the project or organization being badly disrupted. It's a wonderful and horrible thought experiment that asks us to face the question: life happens---whether it be a bus or a baby, marriage or divorce, cancer or a long overdue vacation---and how does that affect the world of work?

Obviously, as an organization you don't want a low bus factor. If your project has a magic guru without which all is lost, then your bus factor is 1 and sooner or later all will be lost. So from the perspective of resilience, a higher bus factor is always better.

As an employee, however, a high bus factor is actually a very bad sign for your value to the organization. If your organization has (or thinks they have) lots of people just like you, then they probably won't be valuing you as much as you would like them to. As a researcher, my value is ultimately in my expertise and my ability to deploy that in ways few others can, so my bus factor had better be fairly low. So it would appear that from the perspective of the individual employee, a lower bus factor is always better.

When your bus factor is too low, however, that's bad for work-life balance. If you're critical path on everything you do, then the ups and downs of projects can't be shared with others. Every crisis is your crisis, and if you take time out then things break and you let everybody down. At its worst, your bus factor is effectively less than one and you're always doing overtime just to stay afloat.

So, what's your bus factor?

In my own personal transition over the past year and a half, as we've grown the synthetic biology team at BBN while still continuing to execute on projects that started well before, my bus factor has definitely been under one at times. Preparing for paternity leave, however, became a very interesting exercise in evaluating how things had changed and where I needed to rethink how I had things organized at work. I started well ahead of time, listing out all of my different responsibilities and seeing what were the truly critical things that needed me to do them before my gap, then working with colleagues to plan out how to cross the time of my absence. Since you never know when a baby will come, I started warning people about my likely disappearance weeks ahead of time (and there are wonderful labor prediction tools now available online to make it quantitative!). My bus factor in prediction seems to be about 1.5, meaning some things will break if I am gone too long, but my team can probably run for quite a while without me, given our preparations and the competence of the people that I collaborate with.

And now I'm gone. I've got a lovely baby girl, an older daughter who seems to be adapting well, and our sleep schedules are as unpredictable and in flux as any parents of a newborn might expect. I've been quite solidly off my email and not taking calls. I'm grateful for the privilege to have this time off and spend these early days at home, and think it's a tragedy that here in America so many who work are unable to take such time. Babies and parents both deserve better, and our economy could most certainly afford it if we had the will as a society.


When I come back, I will find out how right or wrong we were about my bus factor and our preparations. Maybe at this very moment a project is going down in flames and I will have to deal with terrible things in the moment when I first connect and read what I've been missing out on. But I hope not, and am grateful for the professional community that has let me continue to try to walk this work-life tightrope as I balance.

Thursday, June 21, 2018

Big paper out today: units matter in biology!

This is a big one: our paper out today, "Quantification of bacterial fluorescence using independent calibrants," is the official peer-reviewed presentation of the results from the 2016 iGEM interlaboratory study.  After spending a year or so digesting all of the data for publication, the bottom line is this: everybody can and should calibrate their fluorescence measurements.

Here's the key figures of the paper, showing just how much error reduced when using an independent calibrant to put units on your measurements. And notice those orange bars in the middle: that's how well you can do with relative units based on a control strain of cells. It's better than nothing, but still far worse than with an actual independent calibrant, because there are so many ways your control strain can get messed up in the same way that your experimental strains are getting messed up.


It's cheap. It's easy. High school students and undergraduates can do it.  And so should every other biological researcher or engineer measuring cellular fluorescence, especially those working in synthetic biology.




Sunday, April 08, 2018

Linking biological designs and experimental data

One of the biggest points of friction in my professional life is the disconnect between the design of an experiment and the data that comes out of it. Not in any deep or scientific sense, but in a boringly practical sense of "How do I know what's in file MyRun_F05_039_pXK405.fcs?"

When I'm working with experimentalists and analyzing the data that they've produced, in order to make this connection, I get sent spreadsheets with colored cells and personal shorthands, or unintentionally cryptic emails, or scans of tables with hand-written notes. Then I make my best guess as to what's being encoded there and start organizing file names into scripts to run my analysis. The actual process of analysis is often very fast, only a few minutes, but for a good-sized experiment it can take hours to set it up to be able to run.
Example of fairly typical current integration of biological data with experimental design.
Even then, our pain isn't over, because there's a major challenge in comparing across data sets, especially when working with multiple people on a project or across a project spanning many months or even years.  Is the control the same as it was two months ago? What does "same" even mean, exactly? I had a data-set go completely wonky once because the experimentalist working with me had run out of one plasmid and substituted another that they thought should be equivalent but had an extra "unimportant" gene on it.  The descriptions that I got gave the same descriptor to refer to the new plasmid as they used for the old one, because of course they were only describing the "important" parts of the construct. We lost at least a month of time on the project.

All of this can be simplified if we get automated software tooling involved, so that with minimal human involvement we can link data to laboratory samples, samples to the descriptions of what they are supposed to contain, and designs for DNA to the biological functions and interactions that they are intended to produces.  For that to work, we need to agree on how we are going to describe those relationships, and thus I believe that the most critical part of what our newest release of the Synthetic Biology Open Language (SBOL), version 2.2, gives to us, along with some tools for describing combinatorial designs.  Version 2.2 has just been officially published as a free journal article, and we're well into putting these new linkages to use in several programs, as well as organizing a workshop to teach people how to link these and other tools together

Step by step, we are getting closer to removing this persistent source of friction and error in our biological studies.

Sunday, March 18, 2018

Diagrams showing structure and function in biological organism engineering

We've just had official publication of another major step forward in turning synthetic biology into a well-organized field of engineering: the SBOL Visual 2.0 standard. This is a big one, because it means we have a clear way not only of summarizing genetic structure (as we have had since SBOL Visual 1.0), but also of showing the interactions of genes with proteins and other molecules in order to actually affect cellular functions.
Example of an SBOL Visual 2.0 diagram, showing a system with two functional units: one producing the regulatory protein TetR, which in turn represses the other's production of green fluorescent protein (GFP).
Everybody's been drawing diagram sort of like this already, in the papers that they publish, but there hasn't been any agreement on how to do so, and so every diagram's a little (or a lot) different, with no good way to make sure that you really know what somebody's diagram means besides reading the whole text in detail---and sometimes not even then. Now, with this standard, we have such a system, and we just need to work with folks to keep spreading the word so that people are aware and can understand how following the suggested guidelines will help them by making it easier for others to read what they have written.

Friday, March 16, 2018

Good Measurement Practices

As we work to promote awareness and use of good scientific measurement practices in iGEM (the International Genetically Engineered Machines competition), we've just posted an educational video with me giving a (hopefully accessible) introduction to four simple principles of good measurement practices.

Tuesday, February 06, 2018

The LOLCAT Method

You probably think the title of this post is a joke. Well, it is, but probably not in the way that you think it is.
LOLCAT helping me with SCIENCE!
You see, back in the waning days of my grad student career, I started working with an ambitious and enthusiastic young undergrad named Sagar Indurkhya who wanted to work on better ways to design synthetic biology circuits. I was just getting into the area myself, and our efforts quickly wandered sideways, from work on circuit design to work on simulators. Sagar was using stochastic simulators and found (as many people do) that they were way too slow for his taste. So he went to town on the optimization problem, finding all sorts of crazy ways to improve the speed, from highly general (factoring reactions to improve scaling properties), to super-specialized (making his own specialized virtual machine). Happy with the remarkable improvements in speed that we'd gotten, we decided to write it up and, liking publications without paywalls and having no particular reason to send it anywhere else, we sent it to PLOS ONE.

In the process of writing things up, however, we needed to give the algorithm a name, and one fateful day Sagar asked me: "Can I name it anything?" I said sure, and he continued, "Even something silly, like LOLCAT?" I hesitated, but couldn't really find any particularly good argument against it besides the fact that it was silly, which at the time didn't seem to me to be a sufficient argument against. And if it was a problem, the reviewers would ask us to change it, right?

Not a peep. I just looked back through and found that the reviewers were perfectly happy with our absurd title, engaged seriously with the paper to provide a sound and sober analysis of the LOLCAT method that resulted in significant improvement in manuscript presentation, and then the paper went through for publication. And then I mostly just forgot about it.  I don't use stochastic simulations very often, and when I have it's typically been on much smaller systems, so I just haven't ever had reason to use the work myself.

But others have. I was reminded of the paper this morning, in fact, by a citation alert. After a long period of dormancy, the LOLCAT method is gathering citations as reaction network simulations grow and people are apparently finding it to be of significance in their work. As of this writing, it has received 18 citations---not huge, but definitely showing a significant impact.  I am profoundly ambivalent about this fact: happy that it's a useful piece of work, cringingly embarrassed at my early career naiveté, yet also defiantly proud of our little joke. We didn't even have the good grace to try to make the name an acronym.

It's out there still, and will be in the scientific record forever after, for good or ill: "Reaction Factoring and Bipartite Update Graphs Accelerate the Gillespie Algorithm for Large-Scale Biochemical Systems."  The LOLCAT method.

Wednesday, January 31, 2018

The Mark of Dubstep

I did a very joyful and stupid thing today, but they can't say they weren't warned.

Some of my fellow committee members submitted their pictures and bio blurbs on time. Some did not. The ones who did not were asked again. It's just a sentence or two. The blurbs went unwritten. Last week they were jokingly warned: send in your bios, or else Jake will write something unusual and ridiculous for you. At today's meeting, I was given free rein.

They all got bios and DJ names. The first sentence was serious, the second exposed the free-associated and unusual fictional lives of my colleagues:
The one who hadn't submitted a headshot yet got Wikipedia's current illustration of a kitten. He responded with a correction very quickly indeed. The rest are still up there, as of this writing.


I am disproportionately tickled by my own jokes, and it has been making me smile all day.
I am clearly a bad, bad man and an unreliable and dangerous troublemaker.

I wonder how long the Mark of Dubstep will remain.

Tuesday, January 02, 2018

The Physics of Time Management

As my professional life grows increasingly complex, I have found a need to organize it with the aid of physics-style laws. The three basic principles that I use are:

  1. Conservation of Time: Time can neither be created nor destroyed (though it can be wasted).
  2. No Free Lunch: Accomplishing goals requires time.
  3. Burnout Limit: The (sustainable) amount of time available for work in each week is limited.
Considering these three principles forces me to make difficult decisions about triage. No Free Lunch means that I cannot hope for things to be accomplished that I do not make real time for in my schedule: at best, I can hope for my accomplishments to be proportional to the time that I invest. So my (average) week needs to have time set aside for all of the major ingredients that I need in order to be the scientist I want to be: delivering on my current projects, securing funding for new projects, nurturing my collaborations, pursuing strategic technical goals, and service to my professional community. Each of these requires a certain amount of hours to reasonably make progress (and my billing and timesheet goals are subsumed within these too), and at this point in my career, I am not too bad at making estimates.

The burnout limit, on the other hand, is about the relationship of my professional life to my marriage, parenting, sleep, friendships, and self-care. Here, I estimate both the number of hours per week that are sustainable without pain, by looking at my "normal" work times, and also estimate the "surge capacity" that can be obtained if necessary by neglecting the other aspects of my life and calling in favors from my wife. I know that I most certainly will face surges during the year (e.g., paper and proposal deadlines, technical review meetings, parts of travel that aren't dual-use) and this capacity is also where I can try to catch up following surges in other parts of my life (e.g., sick child, doctor's appointments, etc.).  So I'd better make sure my "normal week" planning is restricted to the sustainable level, or else every surge will be not just be a strain, but instead a serious crisis.

These two collide painfully in the principle of conservation of time. If I want more time to write papers, that means less time for something else. As my responsibilities for management and advising grow, that means my time for doing my own work on programming decreases. I can allocate my time around in many ways, but somewhere, somehow, I will have to say no to things, and conservation of time enforces that dismal fact upon me, forcing me to limit my wishful thinking to something that is more likely to be actually doable.

I have just finished going through this exercise for planning 2018, and it took 2.5 hours (budgeted in my schema to "self-organization" and "group organization"). The spreadsheet is very complicated, but I have made the numbers balance in a way that I know will not entirely match reality---but at least gets me started in a way that does not have predictable failures built in. I don't enjoy doing this, but doing it once a year has turned out to be important for me so far, and it's better than the closing my eyes and wishing for something that I know deep down cannot be true.

Physics is painful, eppur si muove.

Thursday, November 16, 2017

Pre-Publication Review: Validity vs. Significance

A fellow researcher was recently telling me about their frustrating experience with a journal, in which their paper was rejected when reviewers said it wasn't "significant," but didn't actually bother to explain why they thought so.

This struck a chord with me, and made me think about the two fundamentally different ways that that I see peer reviewers approaching scientific papers, which I think of as "validity" and "significance."
  • "Validity" reviewers focus primarily on the question of whether a paper's conclusions are justified by the evidence presented, and whether its citations relate it appropriately to prior work.
  • "Significance" reviewers, in addition to validity, also evaluate whether a paper's conclusions are important, interesting, and newsworthy.

I strongly favor the "validity" approach, for the simple reason that you really can't tell in advance which results are actually going to turn out to be scientifically important. You can only really know by looking back later and seeing what has been built on top of them and how they have moved out into the larger world.

Science is full of examples like this:
  • Abstract mathematical properties of arithmetic groups turned out to be the foundations of modern electronic commerce.
  • Samples contaminated by sloppy lab work led directly to penicillin and antibiotics.
  • Difficulties in dating ancient specimens exposed the massive public health crisis of airborne lead contamination.

The significance of these pieces of work is only obvious in retrospect, often many years or even decades later. Moreover, for every example like these, there are myriad things that people thought would be important and that didn't turn out that way after all. Validity, is thus a much more objective and data-driven standard, while significance is much more relative and a matter of personal opinion.

There are, of course, some reasonable minimum thresholds, but to my mind that's all about the question of relating to prior work. Likewise, a handful of journals are, in fact, intended to be "magazines" where the editors' job includes picking and choosing a small selection of pieces to be featured. 

Every scientific community, however, needs its solid bread-and-butter journals (and conferences): the ones that don't try to do significance fortune telling to select a magic few, but focus on validity, expect their reviewers to do likewise, and are flexible in the amount of work they publish. Otherwise, the community is likely to be starving itself of the unexpected things that will become important in the future, five or ten years down the road, as well as becoming vulnerable to parochialism and cliquishness as researchers jockey and network for position in "significance" judgements.


Those bread-and-butter venues are the ones that I prefer to publish in, being fortunate enough that my career is not dependent on having to shoot for the "high-impact" magazines that try to guess at importance. I'm happy to take a swing at high-impact publications, and I'm happy to support the needs of my colleagues in more traditional academic positions, for whom those articles are more important.  My experience with these journals, however, has mostly just been about being judged as "not what we're looking for right now." So, for the most part, I am quite content to simply stay in the realm of validity and to publish in those solid venues that form the backbone of every field.