in All Things Localization

When we looked at language AI sensation ChatGPT just a few days ago, many in localization were asking what few claimed to know: can ChatGPT do the work of a machine translation engine? By most accounts, the novel text generator seemed essentially monolingual in nature, focused on retrieving relevant answers to prompts in the world’s most common language rather than equivalents in others. Meanwhile, its public release (on the premise that letting people interact with the model would be good training) presented an English-language interface for prompting its responses, with nothing resembling a translation function in view. Leaving aside the many for whom this already is a tool for navigating a second language – for instance by asking it to write an email to ensure correct English grammar – the model seemed not to imply any sort of challenge to tools like Google Translate or custom-trained translation engines. What few people likely realized, though, is they could just as easily ask ChatGPT as they could Google Search and the broader internet whether it could help them with a translation. In something of an aha moment, one group of researchers did just that, uncovering a surprisingly multilingual capability in this new presentation of the GPT-3 model originally developed on 90% English-language data.  

Slator reports, researchers from Tencent preoccupied with evaluating ChatGPT as a translator made a breakthrough when they asked the interface to generate several formats of prompts they could in turn pose back to it so as to access a demonstration of its machine translation skills. With the options the system produced, the team used a trial-and-error approach to test the efficacy of each format in specific translation request between Chinese and English, before selecting the best of these formats to apply in translation requests for target languages including German and Romanian. In other words, rather than entering a line of text to translate, the team used expressive language to persuade ChatGPT to take a shot at a translation of something, following recommendations it had provided for how to do so. It then did exactly that, with varying degrees of nonetheless astonishing success across the handful of languages presented.

Related:  eLearning Translation and Localization: Delivering Top-Quality Courseware Across Borders with Rise and Storyline

How well did ChatGPT do as a translator, and to what extent can it provide a valid form of machine translation? As reported, Tencent’s researchers compared the model against Google Translate and DeepL, broadly finding that where it lacked for accuracy compared to the MT-dedicated engines, it showed great potential in working with prompts that target outputs of natural, everyday language, seemingly avoiding some of the awkwardness that can undermine MT results in informal contexts. Still, despite having the soul of a chatbot, ChatGPT maintains a vast enough monolingual knowledge of human data to venture at expressing itself fluently and truthfully in various world languages, suggesting that additional training could add to its potential as a translator. Other commentary notes that it outright bests traditional machine translation engines at producing translations of jokes. At a technical level, though, it is worth noting that what the researchers asked ChatGPT to do is probably not advisable when accuracy is a serious concern: namely, zero-shot translation. By contrast, trained translation engines that begin with exposure to key subject matter offer a far greater assurance of validity and consistency for translation outputs, which makes all of the difference in real world localization scenarios that demand large volumes of content, free of costly or harmful errors. As even the best MT outputs require human proofing, there is simply no practical way to entrust large volumes of content to ChatGPT as of yet.

As ChatGPT continues to wow us and even replace human workers for the first time, it is tempting to ask if machines are any closer to eliminating language service providers and linguists in the localization world. For the time being, though, the time-tested use of human linguists in tandem with machine translation engines and methodologies like translation memory leverage remains an absolutely crucial guard against errors and lost time when communicating products and services across languages. To learn how CSOFT helps companies grow their multilingual communications in 250+ languages, visit us at