Jake Beal's Next Step: 2017

Thursday, November 16, 2017

Pre-Publication Review: Validity vs. Significance

A fellow researcher was recently telling me about their frustrating experience with a journal, in which their paper was rejected when reviewers said it wasn't "significant," but didn't actually bother to explain why they thought so.

This struck a chord with me, and made me think about the two fundamentally different ways that that I see peer reviewers approaching scientific papers, which I think of as "validity" and "significance."

"Validity" reviewers focus primarily on the question of whether a paper's conclusions are justified by the evidence presented, and whether its citations relate it appropriately to prior work.
"Significance" reviewers, in addition to validity, also evaluate whether a paper's conclusions are important, interesting, and newsworthy.

I strongly favor the "validity" approach, for the simple reason that you really can't tell in advance which results are actually going to turn out to be scientifically important. You can only really know by looking back later and seeing what has been built on top of them and how they have moved out into the larger world.

Science is full of examples like this:

Abstract mathematical properties of arithmetic groups turned out to be the foundations of modern electronic commerce.
Samples contaminated by sloppy lab work led directly to penicillin and antibiotics.
Difficulties in dating ancient specimens exposed the massive public health crisis of airborne lead contamination.

The significance of these pieces of work is only obvious in retrospect, often many years or even decades later. Moreover, for every example like these, there are myriad things that people thought would be important and that didn't turn out that way after all. Validity, is thus a much more objective and data-driven standard, while significance is much more relative and a matter of personal opinion.

There are, of course, some reasonable minimum thresholds, but to my mind that's all about the question of relating to prior work. Likewise, a handful of journals are, in fact, intended to be "magazines" where the editors' job includes picking and choosing a small selection of pieces to be featured.

Every scientific community, however, needs its solid bread-and-butter journals (and conferences): the ones that don't try to do significance fortune telling to select a magic few, but focus on validity, expect their reviewers to do likewise, and are flexible in the amount of work they publish. Otherwise, the community is likely to be starving itself of the unexpected things that will become important in the future, five or ten years down the road, as well as becoming vulnerable to parochialism and cliquishness as researchers jockey and network for position in "significance" judgements.

Those bread-and-butter venues are the ones that I prefer to publish in, being fortunate enough that my career is not dependent on having to shoot for the "high-impact" magazines that try to guess at importance. I'm happy to take a swing at high-impact publications, and I'm happy to support the needs of my colleagues in more traditional academic positions, for whom those articles are more important. My experience with these journals, however, has mostly just been about being judged as "not what we're looking for right now." So, for the most part, I am quite content to simply stay in the realm of validity and to publish in those solid venues that form the backbone of every field.

Wednesday, October 18, 2017

Professional Life Transition

I haven't posted anything for a while, and I'd like to talk about the reasons. I've been going through an interesting professional life transition right now, and as I've been working on coping and adapting, one of the things that had fallen through the cracks is my online writing. As I am starting to stabilize again, however, I'm feeling inspired to write and would like to share some of my thoughts and experiences with you, dear readers.

I find that a useful way for understanding how my professional life has recently been evolving is Latour's cycle of scientific credibility. I explored this in more detail in a prior post, but it may be simplified to relations between three primary "currencies" of credibility: data can be invested to develop publications, publications invested to develop funding, and funding invested to develop data.

A researcher always needs to be tending to all parts of the cycle at least to some degree. At different points in a project or in one's professional life in general, however, the emphasis and available resources may shift around. For the past few years, I had been very heavily invested in the data and publications portions of the cycle, getting stuff done as part of a number of delightful collaborations and as a byproduct demonstrating that the ideas and approaches I've been advocating are capable of providing some real value.

Across the course of this year, that has resulted in several really fun new projects kicking off (which I intend to share with you as I come back to writing once again), and me needing to spend more time coordinating with the folks I'm working with. So these days, in addition to my existing external collaborations, I'm working in partnership with an amazing super-experienced program manager (one of the big benefits of my niche in the scientific world), growing my group, and ramping up a number of other folks on these projects.

This is all good, but it's a significant transition, and I've needed to shift around a bunch of my personal heuristics in how I organize my work life. For example, I have to be less of a perfectionist and control freak when I need to be delegating a larger fraction of the work on a project. I have also had to accept that I can't write most proposals in LaTeX any more.

Going through a transition is always intense for me, but I feel fortunate that this is being a good and joyful one so far.

Sunday, July 30, 2017

Mantra: a trip down memory lane

I woke up this morning with bright plans to be productive and focused and accomplish various things. Instead, I have spent the morning on a delightful trip down memory lane.

Way back in high school, more than 20(!) years ago, my friends and I made a video game called Mantra. It was a short, fun freeware adventure with a Zelda-like feel and a bunch of obscure jokes (my favorite was a villager who said: "Godot is coming, please wait"---we got so much tech support mail asking us how long you needed to wait before Godot showed up). It was a lot of fun, actually got kinda popular, and probably helped to get me into my college of choice, and then I would forget all about it for years at a time.

This morning, I was reminded again when I found a link shared by my friend Ben to a person who'd done a wonderful play-through on YouTube with commentary. There's a whole six-episode series, and very well done, and I totally blew all my early morning time-to-myself watching it and indulging in a couple of bucketloads of nostalgia.

Even more amazing to me, Mantra apparently got a page on TVTropes too! OMG, my fanboy self totally sqees! There is something incredibly amazing to me about seeing the Internet dissect my work and identifying the tropes, just as they do with my favorite pieces of media.

It's been a nice, if unproductive, morning.

Wednesday, July 12, 2017

Why gene expression has a log-normal distribution

In a new paper just out, Biochemical Complexity Drives Log-Normal Variation in Genetic Expression, I explain a biological mystery: why do log-normal distributions keep showing up in gene expression data?

Anybody who's spent much time looking at gene expression data has probably noticed this: lots of distributions tend to have nice bell-curve shapes when plotted on a log scale. Consider, for example, a few samples of a gene being repressed by various levels of LmrA:

Some typical distributions taken from the Cello LmrA repressor transfer curve, all approximately log-normal

In short, these distributions are approximately log-normal, though they might also be described by one of a number of similar heavy-tailed distributions like the Gamma or Weibull distributions. Indeed, the typical explanation for gene expression variation has been that it's a Gamma distribution, based on the underlying randomness of chemical reactions causing stochastic bursts of gene expression.

What kept bugging me about that explanation, though, is that it just doesn't fit what we know about how gene expression actually works. If it's basically about randomness in chemical reactions, then as expression gets stronger, the law of large numbers should take over and the distributions should get tighter. Think about it like flipping coins: when you flip a few coins there's a lot of variation in how many come up heads and how many come up tails, but when you flip lots of coins it always comes out pretty even. But in most cases we deal with in synthetic biology, that just doesn't happen. Consider for example, the distributions of LmrA above: the high and low levels of expression are just about as wide, even though one's nearly 100 times higher than the other.

Instead, the answer turns out to be a beautifully simple emergent phenomenon. Gene expression is a really, really complicated chemical process. Most of the time, we don't pay attention to most of that complexity because we're not attempting to affect it, just use it as a given. But that complexity means we can describe gene expression as a catalytic chemical reaction whose rate is the product of a lot of different factors. And the same Central Limit Theorem that tells us that coin flips should make a nice bell-shaped normal distribution also says that when we multiply a lot of distributions, it should tend to a log-normal distribution.

This has a few different implications, but the most important ones are these:

When you are analyzing gene expression data, you should use geometric mean and geometric standard deviation, not ordinary mean and standard deviation.
When you plot gene expression data, you should use logarithmic axes, not linear axes.

Any discussion of gene expression data that does otherwise, without good reason, will end up with distorted data and misleading graphs. In short: welcome to a brave new world of geometric statistics!

Friday, May 26, 2017

Communication

Last night, on my way back home from a scientific meeting, I received my first ever coherent email from my nearly-five-year-old daughter, written all by herself from her own email account as she was getting ready for sleep. Just three short sentences, complete with misspellings and in her own inimitable style, but it was the defining moment of my night, and struck me much harder than I expected.

I have saved her email in a permanent location. The content is unimportant: what matters to me is the vista of communication it opens up. I am overjoyed and frightened as my little one begins to dip her toe into the great river of human knowledge and communication. From this moment, she begins to tie herself into the much larger world beyond our home and family, her friends and her school. Now I can start to write to her directly when I travel, to send her the pictures I take and stories I write for her while I am away.

And it's also time to start talking about information safety and privacy. Knowledge, consent, boundaries. Notice, for instance, that I have not actually shared the content of her email, because I feel those are not my words to share. Just like with other big issues, like sex and relationships, my belief is that these conversations need to start happening, at the age-appropriate level, long before they are likely to start becoming critical.

I am excited and scared, and it is wonderful and terrifying. Just like so many other parts of parenting.

From the email Harriet was responding to: her stuffed animal representative on the trip and me, all sweaty from a long terminal-to-terminal running to catch a plane.

Monday, May 01, 2017

Explaining CAR T-cell therapy with marshmallows

Last week, I had fun giving a guest lecture at my daughter's preschool on some cutting edge synthetic biology research. Part of what made it so fun was figuring out how to communicate the essence of the subject on an appropriately comprehensible level.

My daughter's class has been learning about the body, things like muscles and bones and the heart and blood. One day a few weeks ago, she came home bubbling with excitement about having made blood out of candy that day: into some diluted corn syrup (plasma), they put mini-marshmallows to be white blood cells, red cinnamon candies to be red blood cells, and sprinkles to be platelets. I thought this sounded awesome, and it inspired me to build on that for a lesson about CAR T-cell therapy.

For this lesson, you will need white, red, green, and orange mini-marshmallows, food coloring, and toothpicks. The white marshmallows are white blood cells, the pink ones are healthy cells, the green ones are germs, and the orange ones are cancer cells.

Dip the toothpicks into the food coloring, then poke them into marshmallows to make patterns of three colored dots on the marshmallows. Put patterns on the marshmallows as follows:

Give all of the pink healthy cells the same pattern.
Give the orange cancer cells a pattern that's almost the same as the healthy cells---but with one difference.
Give the germs patterns that are quite different from the pink healthy cells.
Give the white blood cells patterns that match germs, but not the healthy cells or cancer cells.

Remember that marshmallows can get flipped around, so "red-red-blue" is effectively the same as "blue-red-red"!

You should now have a bunch of marshmallows with patterns on them. The lesson goes like this:

All cells have patterns of chemicals on their outsides (Show some patterns).
White blood cells tell which cells are diseases by matching patterns (Show some white blood cell patterns).
A white blood cell leaves your healthy cells alone because they don't match (Show a white blood cell not matching a healthy cell)
The white blood cells learn the patterns of diseases and when they match the germs (Show a white blood cell matching a germ), they kill the germ (Eat the germ marshmallow).
But cancer cells are tricky, and sometimes their patterns are too close to healthy cells for the white blood cells to learn their patterns (Show how the cancer cell and healthy cell patterns are similar).
But now there is a new type of medicine people are trying to make work, where we can take some white blood cells out and teach them a new pattern to recognize (Take a white blood cell and mark it with the cancer pattern).
Now we put the white blood cells back in, and they recognize the cancer (Show how the pattern matches now) and kill it! (Eat the cancer marshmallow).

That's CAR T-cell therapy in a nutshell in 5-7 minutes, minus all the details and the cautions and concerns. I had great time teaching this class, and these 3-5 year old kids asked really good questions, like "Does everybody have white blood cells?" and "How do you teach the cells the patterns?" so I think they learned.

And as I was writing this, my daughter arrived home, bringing a heartmeltingly lovely thank you card her classmates had made.

I think that her class got it. Science communication win!

Monday, April 24, 2017

The edge of science is never far away

My preschooler daughter started her bedtime routine rather late tonight, because I am always a sucker for certain types of questions. Tonight, she asked about how our lungs move air in and out. That led to how the heart works, then how muscles work, which zoomed in and in through fiber bundles to individual cells, fibers within the cells, and actin and myosin.

Each picture or diagram came with a "And what is inside that part? [point]" until we were looking at an actin protein, then a molecule of ATP, an oxygen atom, and finally the stark and simple table of the standard model itself. What's inside an electron? Nobody knows, or even if that question really makes sense. We know somebody who's working with CERN on the Higgs. Quarks have really funny names. We're out at the edge of science and I'm grinning and telling her that when she grows up, she could be a scientist and help try to find out the answers to all of these questions.

We know so much about our world, remarkably much, but the nearness of the edge of science continues to exhilarate me. It doesn't take many questions to get you out there, and the path is simpler than many realize. Our children can walk it easily, if we do not discourage them and if we smile and appreciate the "I don't knows."

After Harriet gets out of her bath, we're going to omit the usual bedtime story and watch "Powers of Ten" instead. I'm looking forward to it.

What it takes to do an interlaboratory study

In another step of my ongoing quest to make synthetic biology engineering simpler and reliable, my collaborators and I are starting another big interlaboratory study focusing on precise measurement of fluorescence. We're now in the very nervous part, where all of the samples of material that everybody helping out with the study is going to measure have just been shipped out, and I'm hoping that the numbers that come back will be nice and tight, just like the preliminary study showed.

It takes a lot of work to put a study like this together---much more than I would have anticipated before I started doing this sort of thing. We've spent several months figuring out how exactly we want to run the experiment, and documenting it all as precisely as possible in order to make sure everybody does it the same way. Then my colleague Nicholas at MIT spent quite a bit of time over the past 24 hours preparing 875 sample tubes and packing them into boxes. As Nicholas put it: "On a completely unrelated note my lab is currently low on Eppie tubes."

Nicholas DeLateur preparing samples for shipment.

One step at a time, of such careful and unglamorous work, does science and engineering move forward, and I am grateful for all of the people I have found who understand its value and join in working together on such steps.

Thursday, April 20, 2017

Reducing DNA context dependence in bacterial promoters

Swati Carr's work on insulating promoters is now out as an article in PLOS ONE, with me and Doug Densmore. I've talked about this work before, and I'm very happy to have been involved in it as something that I consider a very solid piece of engineering.

Basically, promoters are the "control switches" that determine how much a gene is expressed, and which other chemicals in the cell can regulate that expression. The problem is, at least in bacteria, that the promoters we usually use are extremely sensitive to what you put in front of them---even to the point that the tiny "scars" left in the DNA sequence from stitching genes together can have a radical effect on their operation. With Swati's method, Degenerate Insulation Screening (DIS), we now have a simple "shake and bake" engineering method for insulating these promoters, which works very well to make a promoter behave consistently, despite changes in what is placed in front of it.

Let me illustrate it very simply, with two images that I suspect will be clear to even a non-synthetic-biologist. In these pictures, the green and red bars show the behavior of two genes in a bunch of different variations of a small circuit. The more similar the bars are, the better, because it means the genes are behaving more reliably.

In short, this is your circuit:

This is your circuit without DIS:

Any questions?

Thursday, April 06, 2017

Grace on Biology

I just made this image for a talk introducing ideas in synthetic biology to programmers, and cannot resist sharing. Bonus points if you recognize one of my scientific heroes without help. Images courtesy of Wikipedia.

Thursday, March 09, 2017

Predatory publishers "cloaking" themselves with Editorial Manager

It appears that low-quality "predatory publishers" are now using the well-known and trusted Editorial Manager software to try to make themselves appear more like legitimate scientific publications. So far, the company that runs Editorial Manager is effectively supporting this practice by declining to exercise due diligence in their business partnerships. I have a guest post up on "Retraction Watch" that gives all the details.

Sunday, February 12, 2017

Zen Walks

We're having unusually beautiful weather for February right now, so it's a good time for going out on zen walks. That's my name for a way that I walk when I want to go get in touch with myself, or when I really need to think something over.

In its essence, a zen walk is quite simple: put on your shoes, go outside, and begin walking. As you go, make no decisions about which way to go, but simply follow where your feet are taking you. At every intersection (and sometimes between them also), take the motions that feel the most natural in that moment. Let go of your image of where you might be going to, so that you do not just follow a habitual path, yet also let go of the image of avoiding your habitual paths, which also constrains your movements. Go, let go, and find out one step at a time where you are going.

Zen walking is without a goal, except for being there, and it brings me space to find out what I care about and what is on my mind. I am not a person to whom contentment and satisfaction come naturally, though there is surely enough in my life that I "should" be happy about. It is far too easy for me, though, to become caught up in "should" and "ought" and lists that I make for myself that generate unhappiness in their lack of completion. A zen walk is one of my tactics for letting myself re-discover that fact and finding my way back to which things I truly care about and why.

Friday, February 10, 2017

Protein Engineering Diagrams

We've got a new paper that's just been accepted, working toward extending the SBOL visual diagram language to be able to describe the engineering of proteins as well as DNA and RNA. The core driving force behind this effort has been Sid Cox, who's done a good bit of work in the area and has had the courage to make this first surely-imperfect proposal, with a number of others of us helping critique, refine, and bend things towards compatibility and integration.

The idea behind the language is surprisingly simple: despite the ferocious complexity of how proteins fold and interact, when we engineer with proteins our actions can often be described much more simply. Proteins, particularly in complex eukaryotic organisms, are often quite modular, with specific domains controlling things like where they go in a cell, what they interact with, and how they decay. These are, in turn, laid out along an initial single line of amino acids (and encoded in DNA or RNA), and can often be recombined by mixing and matching these components. Doing that isn't simple, but explaining what you have done and why often can be fairly simple.

That's what our new diagram language aims for. Each protein in a system is represented by a line decorated with glyphs representing structured (oval) and unstructured (line) regions, membrane domains (zigzags), binding domains (open boxes), etc. With a brief glance, you can get a pretty good idea of what the protein or protein system is supposed to do and how it's supposed to do it.

Diagram for a two-protein design that provides light-inducible programmed localization to the cell membrane.

This is by no means a finished product, but it's a good solid start. Now that we've got a proposal, people can start critiquing it, and we can start working on various tweaks and philosophical debates necessary to get it integrated with the other diagram standards already in place, like SBOLv. This won't be fast, but it should hopefully produce a reasonable consensus on how to describe what's currently typically just shown as all sorts of random ad-hoc blobs.

If your institution permits, you can see the paper where it's been accepted at ACS Synthetic Biology, or you can read a preprint, and you can also play with the associated online diagram software.

Monday, January 09, 2017

Now witness the true power of measurement!

Today, one of my friends and colleagues sent me the following image. I am flattered, and notice that I am apparently picking up some kind of reputation, but I am entirely unsure on just how to interpret it. Dear readers, would you be so kind as to provide your own commentary?