Monday, October 26, 2015

An unknowing inheritance: BBN's stop and go history in genetic engineering

When I joined BBN back in 2008, one of the new things I brought with me to the company was my research in synthetic biology.  I was a starry-eyed and naive recent Ph.D. graduate, and it was one of my little exploratory sidelines, which would not expand into a full-scale line of research for another two years, blooming as my AI work slowly withered away into neglect.  Nobody at BBN was even thinking about synthetic biology at the time, and nobody I was working with had any institutional memory of such research being done at BBN before, and so I assumed that must be so, and indeed for most intents and purposes it was so.

In fact, however, BBN has been a significant player in the work of genetic engineering at least twice before that I now know about.  One of those times I learned about several years ago, and is not the brightest of episodes for the community.  The other, however, I only learned about a short time ago, as I prepared to give a talk on the new SBOL 2.0 standard for encoding genetic designs, and it makes me both proud of my institution's history and amazed that it has somehow dropped out of its memory as an institution.

The nearer and less proud episode was BioSPICE, and it haunts my every step as a non-lab-centric researcher.  As best I understand the project (I was still a grad student chasing strong AI), in the early and heady days of the word "synthetic biology," a bunch of the leading researchers in the field made a try on the big goal of predictable simulation and engineering of organisms, at that point thinking they already had a sufficient critical mass of good tools and knowledge to take something like a straight-up electrical-engineering-style approach to the problem.  Thus, BioSPICE, a big DARPA-funded project to try to build the equivalent of the SPICE tool for simulation and engineering of electronics, which started with much fanfare in the early 2000s and much more quietly folded up shop a few years later.  I had known about some of the academic side (quite distantly at the time), but only later came to learn that BBN had been significantly involved in some way---I'm still not quite sure how, since the few stories I've gathered don't seem to correspond to what searching online turns up.  Years later on, when I would open my mouth to talk about the promise of model-driven design, it was often BioSPICE that dogged my heels, and fueled cries of, "We know that doesn't work, just remember BioSPICE!"  I have fought that history hard, all the way to the last few years when we've been finally able to start producing evidence that we really can predict biological circuits from their component parts.

The other episode of BBN's involvement with genetic engineering is much older, quite long before my time.  Back in 1982, when I was only four years old and the genetic engineering revolution not so much older, BBN began development of GenBank, probably the most important repository of biological information in the world.  What is it, and why is it so important?  GenBank stores genetic sequence information: it's where pretty much all the important scientific information about genes and genomes gets stored, one way of another.  BBN, with subcontracting help from another apparently unlikely collaborator, Los Alamos National Laboratory, put it together and ran it for its first few years of existence, as it became established and started gathering information.  Eventually, as it became less a research project in and of itself and its contents became more and more important, it moved to curation by the NIH, who still manage it to this day, as an exponentially growing resource made publicly available to all of humanity.  Perhaps it's not quite as big a deal to work on as the internet or email, but pretty close, in my books.

Somehow, though, we seem to have almost entirely forgotten this history, as a company,  It's not trumpeted on the list of accomplishments on our front page, nor bragged about in the "history of BBN" materials that people pass around. The only name I've been able to find so far who was associated with the project from BBN is Howard Bilofsky, who apparently spent 17 years at BBN before leaving in 1990 for a long, distinguished, and apparently ongoing career in the biotech industry.  Someday, I would love to look him up and learn a bit more about the hidden corners of our corporate history.

GenkBan and BioSpice, triumph and failure.  And now a third wave of biology at BBN, with me, trying to navigate these waters once again, as best my limited scientific sight can guide me.

Friday, October 23, 2015

Is academia really just a huge competition?

Another question that really made me think was posed last night on the Academia site of StackExchange, and once again I'd like to share my answer with you, my dear readers.

The question was simple in its essense, yet deep and rather challenging:
Is academia really just a huge competition?
I started writing an answer several times before I finally ended up with a direction that I could really believe in what I was saying.  The result was this statement, that I think reflects some difficult passages of my own over the years, back and forth along the tension between cooperation and competition:
You've asked a question that is both very important and very difficult, as well as one that is likely to draw different answers from different people depending on their own experiences in academia. 
This is because there are both competitive and cooperative aspects to academia. Different people take different strategies with respect to the balance between these two, and that affects their communities as well, so that the mixture of competition and cooperation that you encounter will also radically differ between different academic communities. 
Some of the key factors for inducing cooperation are: 
  • Science is hard.
  • Working together, people can accomplish things that they cannot possibly accomplish alone.
  • Cooperation in a team gives you an advantage when competing with other teams.
  • Many people enjoy working together in teams, and this is just as true for science as it is for any other human endeavor.
  • Scientific discovery feels awesome and it can be really fun to share that feeling with other people.
Some of the key factors for inducing competition are: 
  • Inherent conflict of ideas: when theories compete, people often become polarized and begin competing based on the "team" they support intellectually.
  • Limited resources: you've got a good idea, but a lot of other people have good ideas too, and there is not enough funding to support all of them fully: some people will not get what they want. Likewise, the Hubble space telescope can only point at one thing at a time, and there are a lot more things people want to point at than time to point at them.
  • Explicit competition set up by external agencies. For example, DARPA will sometimes make scientists in the same program compete with one another, and the loser gets their funding cut off.
  • Many people are just plain competitive, and want to "win" over other people in various different ways, and this is just as true for science as it is for any other human endeavor.
Bottom line: just like everything else, academia can be a competition, and everyone faces some aspects of a competition. But it's not just a competition, and I feel sad for anyone who experiences it in that manner. 

Sunday, October 18, 2015

Racism, fond memories, and toddler education

As I was reading Harriet her bedtime stories tonight, I was struck once again by a thing that greatly pains me.  Many of my fondest childhood memories are laced with rather awful racism that I simply failed to be aware of.  Case in point, tonight one of the books we read was To Think that I Saw it on Mulberry Street. This book is a simple and delightful Dr. Seuss tale of a child's fantasies of what he saw while walking home, building from a simple horse and wagon to a fantastical parade.  And there, on the second to last page, is this:
"A Chinese man who eats with sticks" --Dr. Seuss
Apparently, Dr. Seuss thought that Chinese-Americans were just as unusual a freak-show as a man with a 10-foot beard, a magician pulling piles of rabbits from a hat, and two giraffes and an elephant towing a brass band down the street.  And so we get this image, on which I can count at least six blatant pieces of racism.  Worse yet, this is apparently the post-1978 revised edition in which the racism is toned way down: he's a "Chinese man" rather than "Chinaman" and he's no longer wearing a pigtail and painted bright yellow.

OK, I know that Dr. Seuss is well known to have done some awfully racist things over the years (e.g., this cartoon condemning Japanese-Americans during World War II).  I know this.  But it burns me up that I had no idea that this monstrosity was living inside a favorite childhood book.  In other words, it's not that Dr. Seuss was making racist drawings, but that I didn't remember the racism at all. We bought this book (well, I bought this book) for Harriet quite early on, on the strength of my fond memories, and I was shocked when I got to this point.  I also noticed that the police were Irish and was a little bit dubious about the Rajah riding the elephant.  Not being familiar enough with the subject matter, I wasn't sure if the Rajah was racist or just archaic (like a knight in shining armor or a lady in a wimple), so I asked my wife, who is South Asian.  Her answer? "Totally racist."

This leaves me with two dilemmas that I struggle with.  First, what does this say about me, to not have known I had such racism in my education?  Clearly there's at least a bit of "fish don't have a word for water" going on.  I did not have this racism called out to me, and thus I didn't realize that it was anything to notice.  It's there in many other things I loved as well, like If I Ran the Zoo (another Seuss), The Jungle Book, and Tintin (oh my goodness, Tintin).  I loved these things and, if I am honest with myself, still do.  My favorite Jungle Book story of all time is "Kaa's hunting," and now I cannot read its descriptions of the Bandar-Log monkeys without wondering if they are allegorical for Kipling's views of India.  Tintin in America is practically hallucinogenic in its kaleidoscope of stereotypes and disrespect for, well, everything, and I still would read it again if I had a copy here in front of me.

And that leads me to the second struggle: do I share these things with Harriet or do I censor them? Mostly, there's an obvious third path that avoids the issue: there are so many good things out there, that I can simply choose to select the ones that I find less problematic.  But what about the ones I find out afterward, like in Mulberry Street? Tonight, I didn't read the line.  I broke the rhyme and went straight to the big magician doing tricks.  Other times, I read it through.  Sometimes, I point things out to her and critique them ("this picture is being mean"), and sometimes I do not.  Mostly, I am uncomfortable and simply shift my strategies back and forth.  I find some of the advice out there about liking problematic media to be useful, but it's not the end of the story and I still have not found peace.

Saturday, October 17, 2015

SBOL 2.0, governance, and Jake's self-perception

This past summer, one of the most significant scientific milestones I've been involved with is the publication of the SBOL 2.0 standard for representation of biological designs.  What it's all about is being able to better describe and exchange information about the genetic constructs and similar such systems that people are trying to build.  Perhaps the best way to describe it is with this diagram I prepared for a talk, comparing SBOL 2.0 to previous standards:


FASTA is about as bare-bones as it comes: pretty much just listing out the DNA sequence that you want.  GenBank lets you annotate that sequence with descriptive information about what the different parts mean, and SBOL 1.0 lets you describe the structure of a design hierarchically in terms of annotated sequences that get combined together as "parts" to make bigger designs.  SBOL 2.0 lets you talk about function as well, describing the way that these parts interact with one another to create the overall behavior of a design.

Conceptually, it's fairly simple, but in practice it took several years to work out and the arguments are not yet over.  The document that we produced is more than 80 pages long, and we're still tinkering with bits and pieces as we try to understand all of the consequences of what we've built.

SBOL is heavy on my brain right now because for the past week, I've been at the COMBINE meeting, where the communities for SBOL and a number of other biological standards meet up to try to improve their systems, work on interoperability, etc.

This is still not something I ever thought I would be doing with my life.  Even now, in my prejudicial mind, standards design is still something done by grey little people who care passionately about trivial and boring things.  I struggle with this, because I look at my work in this area and simultaneously feel that it is highly important and mind-numbingly stultifying to anybody who isn't actually in the room arguing passionately about the potential long-term consequences of adding a single arrow to a diagram.

A case in point: one of the things that I'm most proud of this week was the updated governance document I drafted, and my mediation of discussion on this document, which helped tune it to become widely accepted; the updated version now appears well on its way to official approval by a formal community vote.  So, apparently I am proud of work I've done on adjusting the methods for making decisions regarding an experimental standard for interchange of information about biological designs that will allow faster prototyping of improved systems for biomedicine, biomanufacturing, etc.  That's at least five levels of separation from anything that really affects the larger world. Looked at in that light, this is clearly the very definition of obscurity.  And yet, let me spell it out in another way...
  • Good governance, which gets openness, power, and decision-making right, is critically important for the health of a community, and a number of little warning signs have indicated that the SBOL community needed to adjust its governance to match the way the group has developed and grown.
  • If the SBOL community governs itself effectively, then it will make better decisions that are more likely to lead to a useful and effective standard.
  • If the SBOL standard works well, it will make it a lot easier for people to develop good biological engineering tools.
  • Those biological engineering tools will make it a lot easier to safely and predictably engineer with and for living organisms.
  • Used responsibly, those capabilities can help make all of humanity healthier and safer, as well as improving our ability to manage our environmental impact on a global scale.


This nail I've driven in is very small and unimportant, almost certainly, and yet it matters.  It matters a lot, and not at all, all at the same time.  And I suppose that's just the way the world works, on a planet with seven billion interconnected and increasingly technologically powerful individuals.  Our civilization is remarkably strange and obscure in its operation, and I'm glad when I find satisfaction in the parts I play.

Thursday, October 08, 2015

Publication delays ARE aimed at manipulating impact factor!

A few months ago, I wrote a post with a question: Are publication delays aimed at manipulating impact factor?

Today, I have an answer to that question: yes.

A recently published article, "Editors’ JIF-boosting stratagems – Which are appropriate and which not?" (h/t RetractionWatch) investigates strategies that journal have been using to boost their impact factor and explicitly calls out what it calls the "online queue strategem."  The article is paywalled, so let me summarize here.  In addition to reviewing some of the better-known and clearly unethical practices used by some journals (e.g., forcing citations on authors, citation cartels), the paper carefully dissects the effects of having a long "online early" period of publication, finding four main effects:

  • Papers accumulate citations before "official" publication (multiplying by ~1.5 to 2)
  • Citation rates typically peak 3-4 years after publication, so shifting the time selects for a better citation date (adding another ~50%)
  • Queue order can be manipulated to publish the papers picking up the most citations earlier, (adding another ~30%)
  • Calendar-year boundaries mean that papers in early months count more than papers in later months, so strategic organization of early-month issues can further boost citations (adding another ~30%).

All of this adds up to around 5-fold potential distortion in impact factor.  Since the dynamic range of most journals is only around 0.5 to 10 anyway and even the very highest impact factor journals top out at ~50, this renders that most precious number completely useless.

Now, it's possible that many journals aren't deliberately and strategically manipulating their queues, meaning they'll only get about a 2x boost in impact factor from queuing.  So what?  It still means that impact factor is going to be highly distorted and basically only good for distinguishing journals into three categories: "glamour journal", "normal journal", and "ignored journal" (less than about 0.3).

Ironically, the article itself is dated February, 2016.

That's it: it's clearly time to adopt the wise strategy of my favorite satire journal, the Proceedings of the Natural Institute of Science.  Their current impact factor? "Leadership"

Wednesday, October 07, 2015

Scale-free distribution of payoffs in science

One of the things I've been enjoying these days has been answering questions on the Academia site on StackExchange.  This question-and-answer site is part of the vast network of Q&A sites that have flowered out of the wildly successful StackOverflow, which is pretty much the best source for coding help on the internet.  The model is that people ask question about the topic, e.g., academia, and other folks turn up and provide answers, and then you get or lose Fake Internet Points depending on whether the crowd thinks it's a good answer.  It's surprisingly effective and also, for me at least, pretty enjoyable and kinda addictive.

Anyway, I answered one this morning that made me think a lot, and I thought that I might share my thoughts here as well.  The question was simple, fundamental, and ill-posed: "What is the distribution of payoffs in research?"  Basically, the person is wondering whether every experiment is a roughly equivalent step forward, or whether some are much more valuable than others, and if so whether there's some sort of power-law relationship between topic, funding, and value of result.

This is ill-posed, because the whole notion of "payoff" is extremely vague and probably the wrong question to ask, but it really made me think.  My response, which I'd like to share with you, was this:

There's a vast amount of ill-definition and uncertainty wrapped up in your question... and yet despite that, the answer is almost certainly yes, there is a power-law distribution.
I'm going out on a limb a bit here, because I'm not building on any published analysis that I'm aware of. However, a little analysis of limit cases and fundamental principles can take us a long way here. Let us start with two simple and relatively uncontroversial statements:
  1. Better experimental design leads to better results. It seems self-evident that if you make a bad choice in designing and experiment, it's not going to get you the interesting results you want. At the micro-scale, some choices are clearly better than others, and some are clearly worse.
  2. Sub-fields appear, expand, shrink, and die. As I write this, CRISPR research is hot, and a lot of people are finding interesting results there, and accordingly that field is rapidly expanding. Nobody is doing research on the luminiferous aether because it's been discredited as an idea. Nobody is trying to prove that it's possible to generate machine code from high-level specifications because Grace Hopper did that in the 1950s, when she invented the compiler, thereby initiating what is now a fairly mature and stable research area.
So clearly, no matter how one defines "payoff," any sane definition will see a highly uneven distribution of payoffs both the micro-scale of individual experiments and at the fairly macro level of sub-fields.
Finally, we need to recognize that "significance" is a matter not only of objective value, but also of communication through human social networks. This means that the same result may have wildly different impacts depending on the methods and circumstances of its communication. The history of multiple discoveries in science is ample evidence of this fact; one nice illustrative example is the way in which Barbara McClintock's work on gene regulation was largely ignored until its later rediscovery by Jacob & Monod.
So, we have variation and we have interaction with human social networks, which tend to be rife with heavy-tailed distributions. All of this says to me that it would be remarkable if there were notsome sort of power-law distribution regarding pretty much any plausible of definition of impact, significance, and investment. For these same reasons, I think it would also be surprising if one can make any more than weak predictions using this information (e.g., "luminiferous aether research is unlikely to be productive", "CRISPR is pretty hot right now").
And the devil, of course, is in the details...

Saturday, October 03, 2015

Tribute to driving in Germany

I'm sitting in the Frankfurt airport right now, having just finished driving two hours on the Autobahn up from Schloss Dagstuhl, in Wadern near the French border.  As an American, I'm used to fairly titanic road networks, but driving on the Autobahn feels different to me: my impression is that while Americans use our roads, Germans really love their roads.

Out in the gently winding hills and valleys of Western Germany, forests and fields flash past traffic freely flowing at 100 miles per hour.  The gentle curves are well designed to encourage speed, and many people take good advantage of it.  Yes, it's true, on much of the Autobahn system there is simply no official speed limit (though you'll still get pulled over if the police think you're driving dangerously), and even where there is a limit it usually restricts you only down to 130 kph (a little over 80 mph).

In my little bitty economy rental car, I cruised along comfortably at 110 mph or so in 6th gear (don't even try asking for automatic in Germany), its happy German engineering not making the least complaint about the speed.  At home, my faithful Toyota starts to get very loud and quite unhappy by the time that I reach 85.  Even so, happy-looking people in bigger cars rocketed smoothly past me at significantly higher speed, bound for who knows where at the highest speed available.  And all I know is that my head has got this song on repeat, and I invite you to join along with me and sing:

Why I love iGEM

Last weekend was the annual iGEM jamboree---that is, the International Genetically Engineered Machine (iGEM) Competition. I poured in about 60 hours of my time into the contest over the course of 3.5 days, and by the end I was exhausted, both physically and mentally, but feeling absolutely elated and on top of the world, raring to go for another one in 2016.

What had I seen, and why was I so excited?  Well, iGEM is a magnificent and unique event, a gathering of students from every continent, from high school on up, all driven by a passion for biological engineering and simply overflowing with creativity.  Each team spends the summer working together on a project that they create, and in the fall they come together to have a big party, where everybody gives talks on what they've done and the best few are recognized in front of everybody for their superlative accomplishments.  There's lots of silly things, lots of over-ambitious ideas that don't get too far, and lots of nice little steps and learning by the students.

And in the middle of it all, some damned good science gets done as well.

Last year, I co-founded a new track at iGEM focused on measurement.  Yes, we're back to that again: my obsession with terribly unsexy rulers.  We had some very good teams last year, and this year again there were a bunch of excellent projects in the measurement track.  And this year, one of those projects stood out head and shoulders above all the rest.

The team from William & Mary, a small but long-standing and excellent public college in Virginia, chose to focus on an important but subtle problem: quantification of noise in gene expression.  Building on recent work in the area, they dug into the problem and ended up with a simple and easy to use kit for measuring this noise, then applied it to quantify noise for a few of the most widely used biological components in the iGEM parts registry.  Very deep and very geeky, but it matters a lot.  If we want to have safe and reliable genetic engineering, we need to be able to predict what will happen when we modify an organism, and this strikes right at that heart of that problem by measuring predictability.

But that wasn't all: they also worked with their county school system to develop a curriculum for synthetic biology.  It's magnificent, and you can get a copy for free online.  Inside this 80-page document, you can find 24 age-appropriate activities, from "DNA twizzlers" for 1st gradesr to Monster Genetics a couple years later (fire-breath is a dominant trait, but cyclopses are recessive), building all the way to adult-level work in high-school like PCR amplification of DNA and bioethics analysis.  The interactions with teachers really show, as the lessons are not only pretty but also give clear goals and a materials list and expected cost per student (usually a whole class can be supplied with a just a few dollars of groceries or arts & crafts supplies).  Even more remarkably, teachers have already begun enthusiastically adopting it, both throughout their county and in other states and nations.

The William & Mary team gave clear, understated presentations that simply let their work shine through, and the whole community recognized it, ultimately first giving them a chance to present as a finalist in front of all the thousands at the convention center, and finally awarding them the competition's top prize (along with a bunch of others as well).  This simple yet deep set of work comes from a team whose school doesn't even break the top 100 in US News' ranking for biology, and shows the power of careful and thoughtful work in science.  The Washington Post may have been too confused to even mention them, but their university is quite elated, and its staff took the time to understand and write a clear and accessible article about their project.

To me, all of this is a vindication not just of the work I've put in organizing and promoting measurement at iGEM, but of the entire scientific process.  Good things can come from unexpected places, and sharp minds thinking careful thoughts can be recognized and receive the recognition they deserve.  Yes, there are problems in the scientific world---quite many, in fact---but this is why we must fight to preserve and promote the scientific ideals, and to keep making that world more diverse, more inclusive, and more able to recognize and promote the potential to improve our world and make a difference.  This is why I love iGEM, why I'm proud to be involved and for what part I've had in helping to enable this, and why I'll be back again for more in 2016.

William & Mary, iGEM 2015 winners, with the Measurement Track committee