Today while dealing with citation queries from a journal's editing staff on a pre-publication proof, I was confronted once again with the recurrent annoyance of delay "formal" publication. Back in November, we published a nice paper on high-precision prediction of genetic circuits. Well, I say we published it in November, but technically it was only just published today, seven months later
This is due to the curious phenomenon where many journals will publish "online early" shortly after a paper is accepted (an excellent idea!), yet still wait, sometimes for many months, to bundle papers together into an "issue," as if the journal were still all about printing on dead trees and shipping to libraries, rather than having most people simply access it directly online. This phenomenon has always struck me as odd, and it's a pain in the butt, because it means citations have to change over time and different citations to the same document end up with different years in them.
Confronted again with this today, I had an insight. I wonder if this phenomenon is no mistake, but perhaps in fact intentional on the part of some journals, in order to manipulate their Impact Factors. The "Impact Factor" of a journal is a horrible, broken statistic that is used to make or break people's careers, particularly in the biomedical fields. It is calculated as the average number of citations that papers in a journal receive during the two years following their publication. For example, if Journal X published three papers in 2015, and two of them are never cited, but one gets cited 5 times in 2016 and 7 times in 2017, then Journal X would get a nice high Impact Factor of 4.0 (i.e., (5+7)/3). Yes, it's kind of a dumb statistic, but it's heavily used and thus frequently gamed.
Here's the thing, though: because it was "online early," my paper has collected several citations before it was ever officially published. So when it gets included in the computation of the journal's impact factor, it's effectively going to get 24+7 = 31 months of citations, rather than the usual 24 months of citations. That's going to increase its expected number of citations, and thus the all-important Impact Factor of the journal. This is further compounded by the fact that getting noticed takes time and publishing a citing paper takes time, so the later we get from a significant paper, the higher its citation rate is likely to be.
So from a journal's perspective, it seems like it would make sense to drag out the time between online publication (when a paper starts being noticed and collecting citations) and official publication for as long as possible. It's also possible that some enterprising editors or publication houses have noticed this and may thus set their publication delays intentionally to manipulate this impact factor. Even if the reasons are benign, however (e.g., smoothing out the publication pipeline), the distortion in statistics is still there.
Maybe the citation indices that compute the magic Impact Factor numbers have noticed this and accounted for it... and maybe they haven't. I would not be surprised in either case, but I'd be very interested to know the answer. The real answer, though, is not to be more precise about Impact Factor computations, but to discard the damned thing and obtain a more sane and reasonable metric for discussing the significance of papers and journals.