Friday, February 10, 2017

Protein Engineering Diagrams

We've got a new paper that's just been accepted, working toward extending the SBOL visual diagram language to be able to describe the engineering of proteins as well as DNA and RNA.  The core driving force behind this effort has been Sid Cox, who's done a good bit of work in the area and has had the courage to make this first surely-imperfect proposal, with a number of others of us helping critique, refine, and bend things towards compatibility and integration.

The idea behind the language is surprisingly simple: despite the ferocious complexity of how proteins fold and interact, when we engineer with proteins our actions can often be described much more simply. Proteins, particularly in complex eukaryotic organisms, are often quite modular, with specific domains controlling things like where they go in a cell, what they interact with, and how they decay. These are, in turn, laid out along an initial single line of amino acids (and encoded in DNA or RNA), and can often be recombined by mixing and matching these components. Doing that isn't simple, but explaining what you have done and why often can be fairly simple.

That's what our new diagram language aims for. Each protein in a system is represented by a line decorated with glyphs representing structured (oval) and unstructured (line) regions, membrane domains (zigzags), binding domains (open boxes), etc. With a brief glance, you can get a pretty good idea of what the protein or protein system is supposed to do and how it's supposed to do it.
Diagram for a two-protein design that provides light-inducible programmed localization to the cell membrane.
This is by no means a finished product, but it's a good solid start. Now that we've got a proposal, people can start critiquing it, and we can start working on various tweaks and philosophical debates necessary to get it integrated with the other diagram standards already in place, like SBOLv. This won't be fast, but it should hopefully produce a reasonable consensus on how to describe what's currently typically just shown as all sorts of random ad-hoc blobs.

If your institution permits, you can see the paper where it's been accepted at ACS Synthetic Biology, or you can read a preprint, and you can also play with the associated online diagram software.

No comments: