If you’ve ever copied a string of foreign text into Google Translate, you’re surely familiar with the Detect Language function of neural machine translation, the form of language AI that delivers everything from a quick lookup in another language to the raw MT in any machine translation post-editing (MTPE) localization project. Even if you don’t know what language you’re looking at, NMT most likely does, if it’s been trained on the troves of real linguistic data that machine learning thrives on. For those with grander aspirations, ancient languages like Sanskrit are also candidates for this kind of recognition – something people who know Sanskrit’s rare but well-preserved workings can actually help improve for Google by contributing terms and expressions. (Here’s where.) Now, what if you wanted help recognizing ancient carvings in a language almost no one today can read even a shred of, known but only painstakingly decipherable from sparse relics? Seen another way, would any language AI model be powerful enough to quickly parse the bare meaning of a Rosetta stone of sorts from which two of the three languages had been scrubbed before it being smashed to pieces and scattered?

In short, the answer seems to be yes, with the right kind of legwork. On February 8th, tech news publication Rest of World announced MIT researcher Jiaming Luo’s successful application of an AI model able to translate between modern language pairs in the task of recreating, through automation, the mammoth 20th century undertaking that first made Linear B, the earliest known form of Greek, legible in translation. Linear B, something like a symbolic version of ancient Greek, was more code-broken than translated when it was first deciphered glyph-by-glyph in the 1950s. After decades of rote testing on the validity of candidate definitions using paper strips numbering in the hundreds of thousands, a conundrum that outlived prime contributor Alice Kober, an English architect named Michael Ventris managed to pull semantic meaning from those accumulated labelling endeavors. By contrast, Luo and his team’s algorithm ran for just hours once configured to translate Linear B into modern Greek, achieving roughly 2/3 accuracy in translation. In other words, with the baseline knowledge that Kober and Ventris had furnished to validate, the algorithm output an abundantly post-editable translation, tunable to prehistoric standards for accuracy.

When making analogies to mainstream language AI applications, it’s important to note that much of what enabled MIT’s researchers to leverage machine learning was not purely linguistic in nature. Like the original effort to crack Linear B, doing so with a computer relied on testing and process-eliminating potential logical relations between words that were documented discreetly as terms without clear relations. If Google Translate is something like a term-recall mechanism, the scenario at hand involved the very different task of testing hypotheses against tidbits of certainty as to whether formulations of characters worked or did not, as a whole, in structuring meaningful expressions. Nevertheless, despite its many customizations, researchers did first establish the algorithm’s capabilities purely as a translation tool: linguistic AI, plus some logical computation of all the good sense language comes encoded with.

Where are neural machine translation and language AI heading in 2022? If nowhere else, the distant past, in search of more interesting places.

