Saturday, December 01, 2012

Author Order Semantics are Broken

OK, folks: rant time again.  This one's been in the back of my mind for a while, and dealing with several different papers recently that all managed their authors differently, the cognitive dissonance is high enough that I think it's time to get it out of my system.

Author ordering on scientific papers is totally broken.

Here's the thing: the order of authors on a paper matters a lot. It's exposure, since the first author is the one that gets associated with the paper, and you'll always see it cited as "[Busybody et. al, '01]," and not any of the other permutations.  And there's a lot of tea-leaf reading that goes on as people interpret the how to understand who's really to credit for the work in an article.  Only problem is, there's several conflicting theories of how to interpret author ordering.

Here's the main theories:

  1. The first author is the most important, the second author less so, etc.
  2. The first author is the most important, the last author is the senior author, the authors in between don't really matter.
  3. Authors are listed alphabetically, with no author assumed to have significantly more credit.

Then there's lots of different sub-theories as well, having to do with who did the laboratory or coding work versus who did most of the writing, do you include only really important contributors or anybody who ever commented on the project, how does supervision play into the decision, etc.

Within any given community, there's usually some conventions, often driven by typical author list size. For example, my roots are largely in the more theoretical and software-driven side of computer science, where it's not unusual to see single-author papers, and most are probably 2-3 authors.  In that community, Theory #1 tends to dominate, and the bar for authorship is pretty high.  On the other hand, my work overlaps a lot with biology now as well, where there tend to be lots of authors, and Theory #2 is more typical.  I've also seen Theory #3 pop up in special circumstances, in which I am often unfairly privileged because my name begins with "B."

But these theories conflict, and no paper ever comes with a note saying which theory it belongs to.  Oh, there are journals where they have you put in a little assignment of responsibility saying "R.F. wrote the paper, J.X. performed the experiments, K.O. did the data analysis, and P.Q. killed mice until we begged him to stop."  But those are typically telegraphic at best, deliberately obscure at worst, and potentially subject to all sorts of odd internal group politics.

The real trouble comes at the boundary cases.  When there are ten authors, you can safely assume there's some tiebreaker policy in effect, and most of the ones in the middle aren't terribly important.  But what about 3 or 4 or 5?  Is the last the unimportant tag-along, or the all-important thought-leader / supervisor?  If three authors are in alphabetical order, is that a deliberate choice, or just a 1-in-6 coincidence?  How quickly does significance decay going down the list of authors?

Ultimately, I think the trouble comes from the fact that our language is linear, and we're trying to express a team structure that often is not.  If we had a symbology of authorship, that would perhaps help, so that one could draw the authorship as a graph with circles and boxes around names and arrows between them.  But that will never fly, and probably wouldn't make a difference anyway, since we still have to pick somebody to come first when we're talking about the paper with other people.

So, in the end, what do I think we should do about it?  I guess I'm feeling Churchillian tonight, because at the end of the day my feeling is this: author ordering is the worst possible way to indicate responsibility for a paper, but it's better than all the alternatives.

Post a Comment