Statistical Rule-based Machine Translation Comparison

Earlier this month at the CMS/DITA conference, CSOFT presented on machine translation. Machine translation, or automated translation, has been a topic of interest for some time now in the localization industry. With the arrival of machine translation tools and their increasing rate of use around the world (especially Google Translate), we thought we’d take a moment to give you a very high-level rundown on this topic.

To begin, machine translation (MT) is the process of using a software application that takes content in a source language and translates it into the target language without human intervention. There are two main types of MT to consider: statistical machine translation, and rule-based machine translation.

Statistical machine translation (STM) employs a statistical model, based on the analysis of a corpus, to generate text in the target language. Rule-based machine translation (RBMT), on the other hand, uses linguistic rules to analyze the inputted content in its source language to generate text in the target language.

One example of STM is Google Translate. With a wide and extensive corpus (which in 2006 had already reached one trillion words), Google is able to apply different algorithms based on probability and statistics to generate translations. Google also cleverly connected its MT with translation memory, which refines the generated text with previously human-translated phrases. Combining MT with translation memory increases the consistency and quality of their translations.

Because rule-based machine translation uses linguistic information to mathematically break down the source and target languages, though, it is more predictable and grammatically superior than SMT. RBMT can also be customized with a terminology management system to fine-tune the generated text by specifying the terminology that should be used.

Related: The Artistry of Post-Editing

The benefits of using MT in general are plentiful. It requires less time, is cheaper than employing human translators when dealing with low-volume and quick-turnaround requests (especially if you are using a free application), results in faster time-to-market, and is relatively easy to use.

The weaknesses, though, are also apparent. TM is content-exclusive: technical material is best, with marketing content or creative writing considered unsuitable. TM is also beyond the control of the user. We input the data and receive the generated text, but everything in between is out of our hands.

Within the localization industry machine translation may, at first glance, seem threatening. On the contrary, while most can understand the value and potential of MT, at the end of the day, even with vast advancements in this technology in recent years, machine translation is still machine-generated translation.

One approach to machine translation is to use it as a type of “pre-translation”: generating an initial draft through machine translation, then handing it over to human translators for post-editing. This will provide translators with something concrete to work with. Even though the resulting text is not perfect, it is usually understandable. With the implementation of certain best practices for content development in the source language, such as making sure sentences are short, concise and written in the active voice, post-editing work can be minimal, saving time while maintaining overall quality. With this system in place, more content can be translated at a faster pace.

If you’re interested in translation, localization, or even if you just like to read the occasional blog, make sure to subscribe to our RSS feed for automatic updates from Simply CSOFT!

[dqr_code size="120" bgcolor="#fff"]

James naton

November 1, 2012

Actually the post is very helpful. but i have a question on the Rule Based Machine Translation:
1. what are the steps i have to follow in order to develop a machine translation system that will translate English text in to another local language which is written by using Latin characters similar to the English text?
2. when you compare SMT with that of RBMT which one is better regarding there translation accuracy for translation system of English to other languages?
Language Interpretation Services

September 15, 2011

Great post, very informative. I think a lot of people will find this very useful.

Comments are closed.

Machine Translation: A Statistical MT and Rule-based MT Comparison

CSOFT International in Language Technology | April 22, 2011

Published

April 22, 2011

Updated

March 16, 2023

CSOFT International in Language Technology | April 22, 2011

Published

April 22, 2011

Updated

March 16, 2023