Wednesday, October 07, 2015

Scale-free distribution of payoffs in science

One of the things I've been enjoying these days has been answering questions on the Academia site on StackExchange.  This question-and-answer site is part of the vast network of Q&A sites that have flowered out of the wildly successful StackOverflow, which is pretty much the best source for coding help on the internet.  The model is that people ask question about the topic, e.g., academia, and other folks turn up and provide answers, and then you get or lose Fake Internet Points depending on whether the crowd thinks it's a good answer.  It's surprisingly effective and also, for me at least, pretty enjoyable and kinda addictive.

Anyway, I answered one this morning that made me think a lot, and I thought that I might share my thoughts here as well.  The question was simple, fundamental, and ill-posed: "What is the distribution of payoffs in research?"  Basically, the person is wondering whether every experiment is a roughly equivalent step forward, or whether some are much more valuable than others, and if so whether there's some sort of power-law relationship between topic, funding, and value of result.

This is ill-posed, because the whole notion of "payoff" is extremely vague and probably the wrong question to ask, but it really made me think.  My response, which I'd like to share with you, was this:

There's a vast amount of ill-definition and uncertainty wrapped up in your question... and yet despite that, the answer is almost certainly yes, there is a power-law distribution.
I'm going out on a limb a bit here, because I'm not building on any published analysis that I'm aware of. However, a little analysis of limit cases and fundamental principles can take us a long way here. Let us start with two simple and relatively uncontroversial statements:
  1. Better experimental design leads to better results. It seems self-evident that if you make a bad choice in designing and experiment, it's not going to get you the interesting results you want. At the micro-scale, some choices are clearly better than others, and some are clearly worse.
  2. Sub-fields appear, expand, shrink, and die. As I write this, CRISPR research is hot, and a lot of people are finding interesting results there, and accordingly that field is rapidly expanding. Nobody is doing research on the luminiferous aether because it's been discredited as an idea. Nobody is trying to prove that it's possible to generate machine code from high-level specifications because Grace Hopper did that in the 1950s, when she invented the compiler, thereby initiating what is now a fairly mature and stable research area.
So clearly, no matter how one defines "payoff," any sane definition will see a highly uneven distribution of payoffs both the micro-scale of individual experiments and at the fairly macro level of sub-fields.
Finally, we need to recognize that "significance" is a matter not only of objective value, but also of communication through human social networks. This means that the same result may have wildly different impacts depending on the methods and circumstances of its communication. The history of multiple discoveries in science is ample evidence of this fact; one nice illustrative example is the way in which Barbara McClintock's work on gene regulation was largely ignored until its later rediscovery by Jacob & Monod.
So, we have variation and we have interaction with human social networks, which tend to be rife with heavy-tailed distributions. All of this says to me that it would be remarkable if there were notsome sort of power-law distribution regarding pretty much any plausible of definition of impact, significance, and investment. For these same reasons, I think it would also be surprising if one can make any more than weak predictions using this information (e.g., "luminiferous aether research is unlikely to be productive", "CRISPR is pretty hot right now").
And the devil, of course, is in the details...

No comments: