The World as Language: NLP Learns to Read Viral Genomes

With NLP (natural language processing) dominating headlines in 2021, many might already be losing their capacity for astonishment at the ways AI is working with language. Last week, we looked at whether one of the more valuable products of machine translation – linguistic data – could have value in offsetting negative developments from new NLP algorithms like GPT-2 and GPT-3 that challenge existing fraud detection capabilities. Just what constitutes a language-related application, though, is already a more complex question than it was a week ago, when we focused on the emerging presence of bots that imitate real internet users in potentially sensitive areas like government.

Now, one very striking use for NLP is coming into focus with news that an MIT-based research team has successfully applied such a model to pick apart genetic sequences in viruses the same way it would pick apart a complex sentence, ultimately enabling it to evaluate the expected impact of one or another observed mutation. As Technology Review reports, the same way that a word-level change in a sentence can have no impact on its syntax but a significant impact on its meaning, an isolated mutation can have a great or small impact on how a virus behaves. To quote the original research, “Viral mutations that evade neutralizing antibodies, an occurrence known as viral escape, can occur and may impede the development of vaccines. To predict which mutations may lead to viral escape, Hie et al. used a machine learning technique for natural language processing with two components: grammar (or syntax) and meaning (or semantics).”

The question of why linguistic syntax and amino acid sequences can be modelled by the same neural network is seemingly philosophical, suggesting that that the logic of communication reflects something more universal, or that molecular science has its own language that machines can learn to “speak”. However, one could also argue that the NLP model’s grasp of human language is scientifically functional but less nuanced than we might imagine, dispensing with our words as one more type of variable in a universal game of analysis. Many have pointed out that even the chattiest bots don’t really know what they are saying. Ultimately, that may be to their credit as implements; in being so agnostic, NLP and other types of experimental AI may be all the more capable of deriving insights than people working through a known conceptual vocabulary.

Related: The Artistry of Post-Editing

Trained MT, by contrast, can serve the purposes we know so well in localization: a message, well translated, delivered with consistency into new languages.

Learn more about CSOFT’s technology-driven translation services at csoftintl.com!

[dqr_code size="120" bgcolor="#fff"]

The World as Language: NLP Learns to Read Viral Genomes

CSOFT International in Life Sciences, Technology | January 22, 2021

Published

January 22, 2021

Updated

February 11, 2021

CSOFT International in Life Sciences, Technology | January 22, 2021

Published

January 22, 2021

Updated

February 11, 2021