Tuesday, March 26, 2013

A typographical nuisance


If you don't care about grammar or typesetting, you should just stop reading this post right now.  Lord knows that I should probably stop writing it.

I'm not much of a grammar nazi.  But I am somebody who, once I've gotten my nose rubbed in a grammatical issue, can never unsee it again.  For example, during grad school, Hal Abelson taught me once and for all how to tell whether to use "which" or "that": "that" begins a clause with information that is necessary to identify the subject, while "which" begins a clause with "bonus" information, which adds to your knowledge about an already identified subject (see what I did there?).  I have never since been able to use the wrong one or see the wrong one used without it whacking me in the face.  I even mark it on papers I'm reviewing---not that I am such a quibbler that I would ever put that on a review, but I can't help but notice it and mark it.  Similarly, I can't put a comma or period after a closing quotation mark, since it should be enclosed within the quotes.

And here's where Blogger bugs me on a totally trivial matter.  I reflexively type two spaces after a sentence.  Apparently I shouldn't, really, at least according to the god Wikipedia, which declares that this standard has gone by the wayside.  Still, I do.  It got drilled into me back in the monospace era, I'm really not sure how, and that's what I always do.  Absolutely reflexively.  I also tend to end my own lines with a return rather than letting them wrap, which is simply stupid for a typeset world, though that should really be blamed on my reliance on LaTeX as my choice for professional document production---you should see the madness of line lengths and comment structure there in my document source, all the better to maintain organization and keep version control as informative as possible.  LaTeX is also why I use these "---" separators, as they would produce an appropriate-length dash in LaTeX.

But anyway, back to the "two space" thing: when I type two spaces in a row, if it's at the end of a line, blogger will put the second space onto the next line.  Or maybe it's not blogger but my web browser, I don't know.  I think it's Blogger, because I know of no other text layout software or WYSIWYG editor in the world that will push whitespace onto another line unless absolutely forced.  It's doubly infuriating that the spaces are handled correctly in the article editor, then render incorrectly after it's published.  But whoever is wrapping my text, can you please finish parsing the whitespace before you start the next line?

And now, that's way more than ever should be said on a subject this trivial, particularly given that I can't illustrate my gripe as beautifully as Matthew Inman.

Thursday, March 21, 2013

Reviewing with help from Harriet

Paper-writing season is apparently closely followed by paper-reviewing season---not a surprise really, given that my professional service to the communities whose conferences I care about often includes serving on the program committees of those conferences.  Over the last few weeks, I have reviewed nearly a dozen papers, which can take up a startling amount of time.

I had help, fortunately.  Harriet's not very happy to have her father's attention focused on a piece of paper rather than her, but the paper itself is of great interest.  As I was writing up one batch of reviews, Harriet was sitting near me on the bed, flapping her arms and playing away by herself with great gusto.  I had my pile of papers to finish writing about on the left, and as I finished the first of these, I set it over to my right, to start a pile of completed papers.  A few moments, later, I realized that my "done" pile was apparently within grabbing range of our barely-still-sessile baby:

Babies appreciate the kinesthetic properties of the scientific literature.
After that, my course was clear.  As I finished each paper, I turned it over to the local biological shredder for careful destruction.  She pounced on each with great delight, examining them, crinkling them, and in her joy giving me space enough to finish my task.

For me, though, the stiutation is a little bit more complicated.  I put in a lot of effort when I review, mostly from the Golden Rule perspective: I want to give the people I review the same sort of depth and fairness that I myself would want in feedback for a paper.  I try to be constructive, too, saying "This specific thing would help the paper in that way" rather than just "The paper is lacking in substance."  But sometimes a paper really tries my patience.

Good papers are a delight to review.  Really bad papers aren't very enjoyable, but at least they're fairly easy, because they are so, so terrible.  The worst paper I have ever reviewed was quite some time ago, and was a manuscript that had been produced by a clearly mentally deranged person.  In addition to its flaws from a scientific perspective, the text constantly changed color and font, and the "figures" were clip art.  But honestly, I didn't mind reviewing it that much, because the flaws were right there in front of your eyes.

No, the papers that are a true trial for me to review are those that are on the borderline in substance and are also heavily dependent on mathematical formalism.  A wonderful heuristic for mathematical papers that my advisor, Gerry Sussman, once told me, is to compare the length of the definitions section to the length of the theorems and proofs.  The higher the ratio of definitions to proofs, the more likely that you are dealing with shallow over-formality rather than any sort of significant result.  It's a failure mode that I totally understand: it just feels more "sciencey" to say, "Let B be a purely inertial spherical object whose state S^B(T) at time T is described by a tuple (x,y,x',y'), where S^B(0) = (0,2,10,0), and where y'' = g." rather than "Consider a ball thrown at 10 meters per second, beginning 2 meters above the ground in normal Earth gravity."  And it's a lot harder to write prose that is both lucidly transparent and scientifically precise.

But I definitely hate reviewing that sort of over-formalized material, because the flaws are never obvious, but are buried in a sea of mathematical notation, from which they must be carefully extracted.  The text is tedious to read, and I'm always worried that I'll be wrong because maybe I didn't understand or remember some turn of notation relevant to what I'm complaining about.

So I do it, and I curse and I sweat, and I simply pray that my own papers are not causing the same reaction in another reviewer at the same time, somewhere on the other side of the world.


Monday, March 11, 2013

The Measurement of Babies Redux

A followup to my earlier post, and also apropos some other recent discussion regarding null hypotheses on paper distributions and the general scientific method: in my earlier post on measuring babies, I stated that we had observed Harriet being taller than a 30-inch carpentry level, and then wandered off into a discussion of height and weight distributions with respect to infant age.  At her six-month checkup appointment, not all that long after that post, her pediatrician found a much less startling height, somewhere around 28 inches (I can't remember exactly).

I have no doubt that the doctor got the right number---their measurement system is actually fairly ingeniously simple.  You simply lay the baby down on the disposable paper that gets pulled out to cover the examination table, mark a line tangent to the feet, and then mark up on top of the head.  With good hands and a compliant baby, getting those two marks right is easy.  Then measure between the marks, and the length of any normally growing baby is enough relative to likely sideways displacement that any distortion from angle is likely to be quite small (my quick-and-dirty estimate is that at Harriet's height, a 1-inch sideways displacement should give less than 1% error).  It's imperfect, but pretty damned good.

So, what about our earlier measurement of 30 inches?  As with most surprising experimental results, it boils down to simple experimental error.  Not quite so dramatic as accidentally finding particles moving faster than the speed of light, but then we're dealing with a much smaller scale and more poorly controlled experiment.

What could have caused it?  Remember, Harriet was playing with the level, so we weren't exactly dealing with a stable instrument.  I certainly didn't look to see whether the level was actually level, so it may have been leaning somewhat.  I may also have suffered from some degree of an optical illusion since I was looking downward, with first Harriet and then the level further from me.  I may have counted some of her fluffy hair without realizing it.  She was being partially supported by me, as she worked on her great (and slightly premature) ambition of standing, so she may well have been stretching upward in some way.  At the end of the day though, if an error doesn't persist, it's probably not worth trying to investigate its causes, since they are likely to be transient and, frankly, boring.

One of the most important lessons of science, I think, is embedded in this experience: most things that appear extremely unusual actually are not.  Instead, most compellingly unusual things are the result of some combination of happenstance and circumstance, and our cognitive bias for noticing unusual things plucks them out of the background noise and throws them into stark salience.  For example, I can remember quite clearly the circumstances of Harriet playing with the level, but can't remember just what the doctor actually measured.  It's not a mistake to pay attention to unusual-seeming things: certainly, it has been evolutionarily adaptive for our species, and still is.  But it's equally important to remember that our unusualness detectors are tuned up so high that they give us constant false positives, and that is because those few circumstances where there really is something there make it all worthwhile.  Sometimes it saves us from a stalking leopard or drunk driver. Other times, it is as in the quote attributed to Asimov: "The most exciting phrase to hear in science, the one that heralds new discoveries, is not 'Eureka!' (I found it!) but 'That's funny ...'"  It's just that finding one "That's funny..." requires going through a rather large number of places where it turns out not to be after all.

So, an interesting lesson in the banality of experimental error and the importance of proper metrology (which appears to be one of my current favorite scientific concepts, thanks in no little part to my ongoing work in synthetic biology).  With regards to the measurement of babies, however, in the end the same judgement applies: we have a long trim baby (though she's coming more into standardized proportion, which of course doesn't mean a damned thing).

Monday, March 04, 2013

SemiSynBio

As I mentioned recently, just about immediately upon going remote for a month, I had to fly right back to Boston. That was the Thursday before last, when I narrowly escaped the Doom of O'Hare on my way to a curious workshop. This event, SemiSynBio, was an invitation-only gathering organized by the SRC, an organization that essentially acts as a research arm of the semiconductor industry, who had invited a group of people to come and discuss semiconductors, engineering of biology, and how the two of them might fit together in the future.

From my perspective, there were three main strands of discussion:

  • DNA used as a nanotechnological material for fabrication, memory or computational substrate.
  • Integrated systems of semiconductors and biological organisms, where the biology does the chemistry (sensing, actuation, power, etc.) and the semiconductors do the information processing and decision making.
  • Engineering of biological organisms (generally single-celled), where both the computation and the chemistry operate self-contained within the organism.

Could biology provide the new substrate to keep Moore's law alive for another decade or so? Could the incredibly massive design techniques of the semiconductor industry be adapted for coping with the tangled complexity of evolved organisms?  I don't think there were any answers yet, but it made for a good conversation, and something interesting may come of it...

My own talk was squarely in the third area, presenting the work we've done on biological design automation.  If you look at those slides, you'll get the first public peek at some truly large circuits produced by the Proto BioCompiler.  Not that we can even plausibly build those any time in the next few years, but they're well within the possibilities of eukaryotic cells... and the structures produced by the optimizing compiler are intriguingly difficult to interpret and reminiscent of some of the tangles in naturally occurring gene regulatory networks... I look forward to digging in and seeing if there's anything there...

Saturday, March 02, 2013

Is Paper-Writing Season Real?

As I mentioned in my last post, one of the things I just struggled my way through was a fierce batch of paper deadlines.  All told, there were eight paper deadlines in less than a month, meaning that even with excellent and responsible co-authors and triaging two papers, I still had a rather intense several weeks.

I feel like this sort of "paper-writing season" happens to me on a regular basis.  Certainly, every year around January/February feels like a time of madness, and there are other similar pockets of crunch time that show up at other times, though perhaps less consistently.  But is this phenomenon real, or just an artifact of my own time management and retrospective view on the matter?  Being a sucker for an occasional graph, and certainly for the ability to procrastinate on reviewing papers a little bit more, I made a list of the past year's worth of deadlines, both for conferences and workshops (which are generally regular in when they occur) and for journals and book chapters (which are generally irregular and presumably independent).  It looks like this:

Jake's Paper Deadlines, Mar. 2012 - Feb. 2013
This includes both those deadlines where I actually submitted something and those where I persistently care about and track the conference but did not actually submit (e.g., those triaged submissions from last month).  The journal and book chapter deadlines include all of the revision deadlines as well, so those publications contribute 1-3 deadlines to the collection, depending on how many iterations happened and how many were within the sample period, given the months- to years-long time scale for journal review and revision.  I didn't include the camera-ready deadlines from conferences and workshops, since the level of revision required for those is generally much more lightweight, and a few are even abstract-based, requiring no revision at all.

The verdict?  Well, let's see... I don't usually look for statistical significance in data sets this small or this poorly controlled, so it's going to take a little bit of work to figure out.  Usually I'm dealing with excessively large numbers of data points or nice tight distributions, and if I even have to ask whether a difference is significant, then it means that the result is probably too poor a quality for me to use in any case.  But the search for low p-values is practically a rite of passage in most disciplines, so I guess it's about time that I went on a p-value fishing expedition of my own.

Matlab's built in easy-bake significance-testing functions all seem to assume Gaussians, rather than the case we should be considering here, which is a uniform random distribution over months of the year.  So it's off to go spend a little quality time with Wikipedia, which has an excellent article giving pretty much exactly what I need.  Futz around with the numbers for a while, and I think I've managed to calculate things correctly... and it's surprising just how primitive and easy to screw up these tests are.  Bottom line, though, I think I've got my numbers correct, and they are giving me the following: conference deadlines are distributed randomly throughout the year (p=0.41) and journals are significantly non-random (p=0.018).

That first result is somewhat surprising, but I believe it.  Despite the occasional hell that is January/February, with six deadlines in two months, the actual month-to-month variation just isn't that high.  The second is a good example of why you should never believe a p-value without interrogating it fiercely.  You see, it happens that last year we submitted two papers to the same special journal issue in April.  If I drop just that single duplication, knocking the journal count for April from five down to four, we end up instead with p=0.19, an order of magnitude worse on the magical significance scale.  If you wanted, you could say that the significance test was doing exactly its job, and detecting that there was a non-random correlation; I would say, however, that it's clear that just a little bit of noise (a single doubled deadline) was enough to completely mess with our ability to ask the real question (are paper deadlines randomly distributed?), and persist in my stance that any effect that requires a significance test to see is a pretty weak effect, scientifically.

Bottom line, then, for this investigation, is that it appears that a) there's no point in speculating about interesting external effects that might cause my deadlines to bunch up, and b) just because the clumps come randomly doesn't mean they aren't likely to be seriously intense if I don't prepare for them well in advance, and that's hard to do when journal revisions are part of the mix.  There is a conference paper-writing season for me, it comes at the beginning of the year, and just a few randomly occurring journal interactions are likely to be enough to tip it over the edge from intense to excruciatingly stressful.

Do you have a paper-writing season, or similar deadline-fest in your own lives, dear readers?