Friday, April 22nd, 2011

Machine Translation: A Statistical MT and Rule-based MT Comparison

Earlier this month at the CMS/DITA conference, CSOFT’s Uwe Muegge presented on machine translation. Machine translation, or automated translation, has been a topic of interest for some time now in the localization industry. With the arrival of machine translation tools and their increasing rate of use around the world (especially Google Translate), we thought we’d take a moment to give you a very high-level rundown on this topic.

To begin, machine translation (MT) is the process of using a software application that takes content in a source language and translates it into the target language without human intervention. There are two main types of machine translation to consider: statistical machine translation, and rule-based machine translation.

An image of different types of machine translation: stastistical machine translation and rule-based machine translation.

Statistical machine translation (STM) employs a statistical model, based on the analysis of a corpus, to generate text in the target language. Rule-based machine translation (RBMT), on the other hand, uses linguistic rules to analyze the inputted content in its source language to generate text in the target language.

One example of STM is Google Translate. With a wide and extensive corpus (which in 2006 had already reached one trillion words), Google is able to apply different algorithms based on probability and statistics to generate translations. Google also cleverly connected its machine translation with translation memory, which refines the generated text with previously human-translated phrases. Combining machine translation with translation memory increases the consistency and quality of their translations.

Because rule-based machine translation uses linguistic information to mathematically break down the source and target languages, though, it is more predictable and grammatically superior than SMT. RBMT can also be customized with a terminology management system to fine-tune the generated text by specifying the terminology that should be used.

The benefits of using machine translation in general are plentiful. It requires less time, is cheaper than employing human translators when dealing with low-volume and quick-turnaround requests (especially if you are using a free application), results in faster time-to-market, and is relatively easy to use.

The weaknesses, though, are also apparent. Machine translation is content-exclusive: technical material is best, with marketing content or creative writing considered unsuitable. Machine translation is also beyond the control of the user. We input the data and receive the generated text, but everything in between is out of our hands.

Within the localization industry machine translation may, at first glance, seem threatening. On the contrary, while most can understand the value and potential of machine translation, at the end of the day, even with vast advancements in this technology in recent years, machine translation is still machine-generated translation.

One approach to machine translation is to use it as a type of “pre-translation”: generating an initial draft through machine translation, then handing it over to human translators for post-editing. This will provide translators with something concrete to work with. Even though the resulting text is not perfect, it is usually understandable. With the implementation of certain best practices for content development in the source language, such as making sure sentences are short, concise and written in the active voice, post-editing work can be minimal, saving time while maintaining overall quality. With this system in place, more content can be translated at a faster pace.

If you’re interested in translation, localization, or even if you just like to read the occasional blog, make sure to subscribe to our RSS feed for automatic updates from T for Translation!

You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

2 Responses

Great post, very informative. I think a lot of people will find this very useful.


Trackback