A few months ago, I wrote a post on how DNA could be used as long-term data storage (and it was Discovered, of all things!), according to some research in genetics. As I’ve mentioned in that post, it makes a lot of sense, though: what is DNA but chemical data that your cells use as instructions for functioning, after all? What is text but digital data that your word processing software (like WordPress – *$*PLUG*$*) uses as instructions on what to display on your screen, after all? If one can translate digital data into chemical data, there’s your long-term storage. I also talked about translating back from chemical into digital data – that’s useful for analysis. However, is text the only way to represent biomolecular (this is the scienciest word I plan on using, I promise) data? According to a study published in Heliyon by Robert P. Bywater and Jonathan N. Middleton, you can represent biomolecular data, particularly protein structures, as music.
Why is this a big deal? Well, if you were to visualize a protein, it would look like this.
If you’re interested and/or a biochemist, this protein is Ste5, which has to do with yeast mating. If you’re not, then I hope you’re not eating bread (lucky me, I’m eating a peanut butter cookie). Also, if you’re not a biochemist, the above just looks like a mess. Using some modeling software, you can rotate it, zoom in, color-code, and so on. The trouble is following the shape, with its ribbons and helices and sheets. The shape of a protein is important, since this could affect how it functions. Hemoglobin is a good example of that – its shape allows for red blood cells to transport oxygen. If the shape goes wrong, then oxygen isn’t being moved around properly; if you’re thinking sickle-cell anemia, you’ve got the idea (I’m just hoping you don’t actually have it – that’s rough).
So anyway, you’re staring at a protein structure, trying to get a feel for how it twists and turns so you can get an idea of what happens if it’s exposed to some chemical, heat, and so on. How would it contract, expand, unravel even…now imagine it’s hour 6 of an 8 (if you’re lucky) hour day. How accurate do you think you’re going to be? And that is, of course, if you know the sequence of the amino acids that make up that protein as well as the structure – what if you don’t know the structure? Why is that a problem? Because an amino acid sequence, translated into text (each letter corresponds to a certain amino acid), looks like this for the Ste5 protein above, and you need to propose a structure based on this:
SLIESGNNNCPLHMDYI (source: Uniprot)
As a knowledgeable person who doesn’t directly work with biochemical molecules every day, I know how you feel. There are people who do this every day and feel the same way, I’m sure. But imagine if there was another way to represent all this data – that’s all this is, after all – that is more intuitive and may even aid in more accurate analysis. After all, seeing the same information in more than one medium may help with learning, according to the Cognitive Theory of Multimedia Learning. According to the authors, you may even catch a pattern that would you may not notice by only reading the sequence (I’m not going to try, I literally can’t even). So why not sequence these letters (each an amino acid that has its own properties) into something a lot of people can pick up on, like pitch? What if instead of letters (you were trying to spell out words in the Ste5 sequence, weren’t you?), they’re notes on, say, a piano?
For the musically-inclined reading this (a science blog!?), you’re thinking this isn’t going to sound good, more like randomly smashing piano keys with your fingers, face, and nearby furniture all while screaming yourself cross-eyed (if so, you’ve got a weird imagination). The authors had to take some musical motifs in order to make it easier for people to listen to, which is a good thing that one of them (Middleton) is not only a data scientist, but also works in a music department. It’s all data, I’m telling you.
What they get is a rather simple sequence with no tempo or key changes, that they’ve mapped onto piano or marimba. Each note represents an amino acid that makes up the protein, while protein structure hints are mapped onto some kind of musical cue. For example, the protein in Figure 7 (I don’t know what the rights for sourcing the article are going to be, so you can look at the journal yourself – it’s open-source anyway) sounds like “Supplementary Audio 4” (search for this at the journal’s page). Thanks to some musical know-how, it doesn’t sound completely like a random sequence of notes. That could be a really important concept in figuring out protein structures. Further testing revealed that people are able to detect similarities between how a protein looks with how its sequence sounds.
By the way, those of you with access to a musical instrument can try the protein ‘scores’ yourself – the authors have included it in the supplementary material. If someone puts this into some guitar tablature with the right effects, you might end up with a new form of speed metal. I think I’d call it ‘proteincore’ music. I should tell That Djenty Fool about this…
OK, it’s going to be a while before you can analyze protein structures in such a way that you see this and hear this. If nothing else, it could help our protein nerds with their research: a better idea of what a protein is gives more insight to what it does and what it can do. What could cause a protein to change shape? What could change it back? How does it interact with other proteins? Other medications? Radiation? The list goes on, and while this research into ‘sonification’ isn’t all new (the lead author has studied this phenomenon before), anything that helps people view data in different ways can give a deeper understanding of the data and what it entails – just ask Hans Rosling.
I wonder what my post-workout protein shake sounds like…
Featured Article: Bywater RP, Middleton JN. (2016) Melody discrimination and protein fold classification. Heliyon 2. doi: 10.1016/j.heliyon.2016.e00175.