in Language Technology, Technology

It’s a somewhat difficult number to identify, but experts believe that somewhere between 6,000 and 7,000 languages are spoken on Earth today. Of these 6,000+ languages, nearly 95% of people use just one of 100 languages to communicate, and more than 50% of the world’s population uses just one of five languages—Chinese, English, Hindi, Spanish, and Russian. From these statistics, it’s evident that, while the Earth is rich in language diversity, the vast majority of people use only one of a handful of languages to communicate with each other.

At this point, there is no real question as to whether or not Earth’s language diversity is dwindling, as it already clearly is. But there is a question as to how much of this phenomenon is due to increasingly powerful machine translation (MT) tools such as neural machine translation (NMT) versus how much is due to natural processes.

Neural MT can’t help language diversity

A little bit on Neural MT (for a more complete picture of this technology, read our blog here).

Neural MT is made possible by large sets of data. The bigger the data set, the better the machine can learn. These data sets are compiled from vast libraries of manually translated pieces of text, such as from books, journal articles, and newspaper stories. While collecting a wealth of materials is not so much a challenge for the World’s most commonly used languages, the vast majority of the 6,000+ languages mentioned above simply don’t have many translations to pull from. As a result, they can’t be machine translated. This is why Google Translate, for example, only supports 90 languages.

For many lesser-known languages, not only is there not enough data on them, but there aren’t enough speakers left to “transform speech into computer data” either. Because very few languages can be translated via MT and NMT with only minor amounts of post-editing,  it is expected that the popularity of these few languages will grow at the expense of other, lesser-used ones—a phenomenon that has been observed with satellite TV, which normally only broadcasts in a single language.

Related:  How LSPs Can Keep on Truckin’ with Green Tech

Language diversity and the Internet

Currently, only 500 languages are used online, according to a UN report, and as of the end of 2015, Google only supported 348 languages, Wikipedia 290, and Facebook 80. Furthermore, the number of languages being supported by the Internet has appeared to plateau in the past couple of years, which has led experts to believe that it’s reaching its saturation point in regards to the volume of languages it is able to support.

This has led people to question both the reasons behind this saturation rate and the fate of those languages being kept ‘offline’.

In response to the first question, in 2015, Facebook estimated that the Internet needed to only support 92 languages for 80% of the world’s population to find it useful. Thus, it’s clear that even if the Internet did support thousands of more languages, where would be the users to make the supply of content worthwhile?

As for the second question, by 2115, 90% of the 6,000+ languages spoken today are expected to die out, leaving just 600-odd languages left. This is due to a number of reasons, though their being unsupported by the Internet is not so much a catalyst for their death but rather a reflection of what’s happening in the real world as a result of globalization and increasing migration towards urban centers.

The natural process of language simplification

Nearly 90% of the world’s languages have less than 100,000 speakers. As they continue to become less and less important, the momentum behind teaching these languages to younger generations becomes lost.

Related:  3D Printing Localization Services: Maximizing the Global Potential of Customized Manufacturing

It’s important to note though, that the majority of the world’s population, and most speakers of small languages, are multilingual. Most people do not read nor write in these languages, rather the speakers of these languages most often use the official ones of their countries to communicate.

While many have bemoaned the exclusion of these small languages on the internet, there is good reason for their non-representation. Most notably, that they aren’t widely recorded, thus their inclusion wouldn’t be particularly necessary for their speakers who can switch to consuming information in one of their other tongues.

Conclusions

While the excitement around machine translation and now Neural MT has led many to question the technology’s suspected impact on society, one thing is clear: NMT will not be the root cause of the massive loss of language diversity expected to come in the next 100 years. Rather, it will be just one of many drivers behind it, along with migration and globalization.

While the loss of so many languages can seem unsettling, take heart in knowing that the same experts who have predicted this decline have also denied that the world adopting the use of just a single language is nowhere near possible in the coming centuries. So while it may be possible to carry around a device that can translate between any language pair in the coming years, the majority of the world’s population’s will continue to speak in their native tongues, and thus cultures and communities will remain intact.

[dqr_code size="120" bgcolor="#fff"]