Monday, December 31, 2012

The Year in Review

Good afternoon, dear reader, and welcome to the end of 2012.  Unless, of course, you live somewhere quite a ways to my East, and you've already entered 2013.  And, of course, this will be up on the Internet for all eternity, so a priori any reader of this is unlikely to still be in 2012.  But brushing all that foolishness aside, I am certainly still in 2012 right now, and I think I'll take this opportunity to look back over the year from a scientific perspective.

Let's be organized about it, and look at things in terms of different aspects of life as a scientist:
  • Research Projects: the core of it all, carving truth from the substance of the world
  • Publications: the primary product of research, and place to draw research threads together
  • Funding: powerful amplifier of research, yet generally a trailing indicator of one's impact
  • Position: one's institution and position within that institution affect opportunity greatly
  • Impact: what difference one's research makes to others in the world
  • Professional Service: organizing, reviewing, supervising students, etc.
  • Work/Life Balance: that portion of life as a scientist which is not being a scientist
Not all of these need to advance every year, but in a healthy career, at least something significant should be happening in most categories.

Looking back over my own past year, the biggest change by far is in the area of work/life balance.  I've been running pretty hard for some years now, and since my wife is a scientist as well, "work/life balance" sometimes meant things like "let's sit all snuggled up on the couch while we work on our laptops."  In July, that changed irrevocably, with the birth of my daughter.  Now I live by an ironclad rule: from the time I get home to the time she goes to bed, I do not work, but spend time just being a parent to my child.  More than anything else, this means that I am having to give up perfectionism, and the notion that I can do it all and have it all.  My lack of effective triage has been slowly grinding me into dust, and with Harriet's arrival it has accelerated to the point where I can no longer pretend.  My goal now is to be only 80% of perfection.  This is extremely difficult, but feels doable---I suppose it is my New Year's Resolution.  Ask me at the end of 2013 how it has gone.

The other big news for me this year is in scientific publications, with four major journal articles and two book chapters, besides the usual collection of conference and workshop publications.  Those journal articles and book chapters loom larger than usual in my view, because of their contents: this is the year when we reported major results from my first funded project in synthetic biology, and in spatial computing we published two key formalizations of space/time computation (one for continuous space/time and the other for discrete), and a massive review of spatial computing programming languages.  Overall, it's been a very good year, and there's more in the pipeline from my ongoing research, so I feel very secure about my scientific base.

Funding's been much more of a mixed bag, but I'm still alive, and I'll just keep my fingers crossed on the proposals that are outstanding.  Position is a no-op (as one usually expects), and impact is hard to evaluate (Will my energy work escape the lab?  Only time will tell.), though Google Scholar indicates a significant uptick in my citations, which is always nice.

In the world of service, I am graduating a co-supervised PhD student, as I reported in this post.  The rest is pretty standard: we put out another special issue on spatial computing, and I'm continuing to act as an associate editor for ACM TAAS, plus running my seminar series at BBN and reviewing innumerable papers of highly variable quality.  I have also taken a big step by not being an organizer for the 2013 Spatial Computing Workshop (the sixth in the series, and I feel happy that we've been going long enough that I didn't know that number off the top of my head).  The 2012 edition was the best yet, and I have confidence that the others will do at least as well without me.

Putting it all together... I think I'm happy: strong on the scientific core and surviving OK everywhere else: not ideal, but a very good base to continue building on.  Next year will see big changes as well, both professionally and personally, and from where I sit right now, I think it will go OK.  And you can hear my perfectionism again, to not be all superlative, especially in a public forum like this.  Honestly, though, I think I prefer a quieter confidence that I can simply stand upon as a firm foundation for the year to come.

Tuesday, December 18, 2012

Better Living Through Manifold Geometry

"Better Living Through Manifold Geometry" was my cheeky title for our editorial introduction to the Computer Journal special issue on Spatial Computing that is just about to come out.  Alas, the final article appears to be receiving only the rather more boring simple appellation of "Editorial."

Regardless of title, though, the thing that I found quite striking as I actually read through all of the articles in the special issue, looking for the common threads that drew them together, is that spatial computing really has been getting much more coherent in intellectual approach, and that manifold geometry is one of the key concepts that keeps popping up.

My take on this is that it is not enough simply to recognize the locality and spatial embedding of a distributed system.  You also need representations that will let you take advantage of that insight, and normal Euclidean geometry, like we all learned in grade school, is just not sufficient.  We need our geometry to align with the structure of how information can actually flow, and the tool for that is a manifold.  The nice thing about manifolds is that they can give you the "stretchiness" of topology, warping around whatever constraints exist in the real world, yet they can still provide most of the nice geometric properties we want, like distances, angles, paths, volumes, etc.

The only problem is that we don't grow up learning about them, so most people find manifolds to be a  difficult and non-intuitive notion. Even our maps get flattened into Euclidean projections when the surface of the Earth is really a sphere.  And of course the formal mathematical notation typically just makes things worse.  But that's one of the things that I think we're nibbling away at, bit by bit, as we work on Proto and other spatial computing languages: how to capture the power and ideas of manifolds, but wrap it up in a way that makes it easy for any programmer to take advantage of it.

Monday, December 10, 2012

The International Journal of Mystery

Hi folks... I had a lovely vacation away from the internet last week, and now I'm back with another batch of scientific philosophizing.  Lots of discussions of papers queued up, but that will keep a little longer...

Recently, a junior colleague of mine was telling me about a journal publication he's working on, and told me he was a bit concerned because he wasn't sure whether the journal was actually any good or not.  To my great shame, the first words out of my mouth were "What's the impact factor?"  To my astonishment, his immediate reply: "What's an impact factor?"

I've been thinking more about this since.  Could I not have done something even slightly more worthy than immediately falling back to the shared common bugaboo of science?  After all, I don't generally pay all that much attention to impact factor either, and certainly can't quote numbers for most of the places I've published.  Is it so odd for my colleague to not have known about impact factors?  Moreover, I receive pseudo-personalized invitations to publish in various international journals every day and I ignore most of them as academic spam without even bothering to look up their impact factors. How do I actually judge the quality of a journal when I'm deciding whether to submit there?

First, for those of you so fortunate as to join my colleague in his innocence, let me explain.  Impact factor is a number used as a way of measuring how important a scientific journal is to a field of research---and therefore as a proxy for measuring how important a piece of research is by the company it keeps.   It is typically calculated using three years of journal articles indexed by Thomson Reuters, as the mean number of citations  in a given year to articles that a journal has published in the prior two years.  You're probably already thinking of objections: Why count only citations from journals? Who the hell is Thomson Reuters and how do they decide what's indexed? Why two years - don't we care if things stand the test of time? Can't people manipulate the system?  These, dear reader, are only the tip of the iceberg and there's a long tradition of scientists deriding impact factor as a metric, making up new alternative metrics that address some of the problems while creating other new ones, and generally adding to the chaos of standards.  Nevertheless, impact factor, like Microsoft Word, is the lowest common denominator that many are forced to bow to, by their institutions, by their funders, by their tenure committees...

Let's avoid going any further down that tempting rathole of a discussion.

Instead, let's return to the question at the root of the whole discussion:
Is this journal any damned good?
First, off, what do we mean by "good" when we're talking about journals? In my view, this basically boils down to three things.  In order, from most to least important:
  1. Will my reputation be enhanced or tarnished by publishing here?  Some journals will add lustre to your work without anybody even reading it.  Rightly or wrongly, we primates love argument from authority.  Conversely, if you publish in a journal that's a total joke, people will wonder what's wrong with your work that you couldn't put it somewhere meaningful.
  2. Will my work be read by lots of people?  I believe that most articles will only ever be noticed, let alone read, by people who found them by Googling for keywords in a literature search.  And your close colleagues should know about your work because you talk about it together.  Each community, though, typically has one or two publications that people just read because they feel it represents the pulse of their scientific community.  Get into one of those and you'll be seen by orders of magnitude more readers.
  3. Will I be competently reviewed and professionally published? Amongst the great herd of middling journals, some are a pleasure to work with and some are a total train wreck.  In the end, though, if you get reviewers who give good feedback and the actual mechanics of publication are handled professionally, that's a nice bonus.
Ideally, impact factor ought to tell you about #1 and #2, but in practice I find it really only tells me about extreme highs.

So, what is it that I actually do in order to tell if a never-before-heard-of journal is any good?  Well, first I check the editorial board: Do I know them?  Do I know their institutions?  Of course, the really big names in a field are often not on boards, or on boards only ceremonially, since they're too busy.  I tend to look for the presence of solid mid-rank contributors and decent institutions---the sort of folks who I find form the strongest backbone of professional service.  But if nobody I've heard of in the field and nobody at reasonable institutions cares enough to help run the journal, then why should I think that publishing in a particular journal will make any impact?

If the editorial board hasn't convinced me one way or another, then maybe I'll check the impact factor, but really that's just a +/- test: if it has an impact factor of at least 1.0, that's a good sign, but a hazy one and not necessary, since many good venues have no impact factor and impact factor can be gamed.  More important is how long something has been around: anything that has survived at least a decade is likely to be solid (though again, not necessarily).

As for black marks: if a never-heard-of-it journal seems to have an extremely random or broad scope, then what could its community possibly be?  Those I always find suspicious, since it feels like they are just trolling for submissions.  Much worse than that, though, is if the publisher is a known bad actor, especially somebody who spams me repeatedly.  I'm sorry, Tamil Nadu, but your academic community will be forever tarred in my eyes by the people who fill my inbox with poorly targeted spam.

Sufficient? Hardly. But those, at least, are my own heuristics for dividing the worthy and the dubious when approaching yet another new journal. I suspect that this isn't a problem for people who don't do as much interdisciplinary work as I do, and that it was a lot easier a few decades ago when the number of journals was much lower. But think: if it's this hard to decide where to write, how much worse is the problem of finding what to read?  And that is a discussion for another time...

Saturday, December 01, 2012

Author Order Semantics are Broken

OK, folks: rant time again.  This one's been in the back of my mind for a while, and dealing with several different papers recently that all managed their authors differently, the cognitive dissonance is high enough that I think it's time to get it out of my system.

Author ordering on scientific papers is totally broken.

Here's the thing: the order of authors on a paper matters a lot. It's exposure, since the first author is the one that gets associated with the paper, and you'll always see it cited as "[Busybody et. al, '01]," and not any of the other permutations.  And there's a lot of tea-leaf reading that goes on as people interpret the how to understand who's really to credit for the work in an article.  Only problem is, there's several conflicting theories of how to interpret author ordering.

Here's the main theories:

  1. The first author is the most important, the second author less so, etc.
  2. The first author is the most important, the last author is the senior author, the authors in between don't really matter.
  3. Authors are listed alphabetically, with no author assumed to have significantly more credit.

Then there's lots of different sub-theories as well, having to do with who did the laboratory or coding work versus who did most of the writing, do you include only really important contributors or anybody who ever commented on the project, how does supervision play into the decision, etc.

Within any given community, there's usually some conventions, often driven by typical author list size. For example, my roots are largely in the more theoretical and software-driven side of computer science, where it's not unusual to see single-author papers, and most are probably 2-3 authors.  In that community, Theory #1 tends to dominate, and the bar for authorship is pretty high.  On the other hand, my work overlaps a lot with biology now as well, where there tend to be lots of authors, and Theory #2 is more typical.  I've also seen Theory #3 pop up in special circumstances, in which I am often unfairly privileged because my name begins with "B."

But these theories conflict, and no paper ever comes with a note saying which theory it belongs to.  Oh, there are journals where they have you put in a little assignment of responsibility saying "R.F. wrote the paper, J.X. performed the experiments, K.O. did the data analysis, and P.Q. killed mice until we begged him to stop."  But those are typically telegraphic at best, deliberately obscure at worst, and potentially subject to all sorts of odd internal group politics.

The real trouble comes at the boundary cases.  When there are ten authors, you can safely assume there's some tiebreaker policy in effect, and most of the ones in the middle aren't terribly important.  But what about 3 or 4 or 5?  Is the last the unimportant tag-along, or the all-important thought-leader / supervisor?  If three authors are in alphabetical order, is that a deliberate choice, or just a 1-in-6 coincidence?  How quickly does significance decay going down the list of authors?

Ultimately, I think the trouble comes from the fact that our language is linear, and we're trying to express a team structure that often is not.  If we had a symbology of authorship, that would perhaps help, so that one could draw the authorship as a graph with circles and boxes around names and arrows between them.  But that will never fly, and probably wouldn't make a difference anyway, since we still have to pick somebody to come first when we're talking about the paper with other people.

So, in the end, what do I think we should do about it?  I guess I'm feeling Churchillian tonight, because at the end of the day my feeling is this: author ordering is the worst possible way to indicate responsibility for a paper, but it's better than all the alternatives.