Sunday, February 02, 2020

Unique sequences found in Wuhan coronavirus

Like many people, I have some concerns about the emerging virus in Wuhan. I am also fortunate enough to have some tools that might turn out to be helpful. For the past two years, I've been leading a project on improving pathogen screening in DNA orders by applying cybersecurity tools, and was, in fact, in the midst of writing up a paper on our improved ability to detect small virus fragments with high precision.  

So it just so happens that I've got software to hand that's very good at detecting the unique aspects of a viral pathogen, and a pre-existing collection of organized coronavirus data, and it looks like we may have found something interesting---some chunks of the virus that look unlike any of its known relatives. We've written this up in a quick manuscript that's now under review and up on bioRxiv:
Highly Distinguished Amino Acid Sequences of 2019-nCoV (Wuhan Coronavirus)
Using a method for pathogen screening in DNA synthesis orders, we have identified a number of amino acid sequences that distinguish 2019-nCoV (Wuhan Coronavirus) from all other known viruses in Coronaviridae. We find three main regions of unique sequence: two in the 1ab polyprotein QHO60603.1, one in surface glycoprotein QHO60594.1.
Summary statistics of distinguishing amino acid sequences identified for 2019-nCoV (Wuhan coronavirus), organized by the identifiers of protein sequences in which we found unique content. The blue is the fraction of sequence that's judged unique and the red is the total amount: the left-most and right-most sequences look particularly interesting. 
It's also been a fascinatingly fast project: we noticed the sequence and decided to evaluate it on Tuesday morning and got our first results that afternoon. On Wednesday, we refined and confirmed the results. Thursday, we checked with others that it might be interesting, and I wrote up the quick report. Friday was polishing and submission as a research letter to CDC Emerging Infectious Diseases and a bioRxiv preprint, and then it took 48 hours for bioRxiv to post it. At just under a week from project conception to submitted preprint with DOI, this is definitely my fastest experience with scientific publication, and it's been a strange experience.

I don't know just how important this might or might not be---I am definitely not a viral pathology specialist. And maybe the journal will just laugh at us and reject it all as naive.  But I'm still happy that this is out there, no matter what, in case it may indeed be useful. More than anything else, I really hope that this gets in front of people who are, in fact, the right type of expert, so that they can evaluate it and see if they can put this information to effective use in helping diagnose, prevent, and mitigate this new disease.

No comments: