UM develops Cantonese-Mandarin machine translator to further GBA integration

The University of Macau’s research team

A new online machine-translation system has been developed by a research team, the University of Macau (UM) has recently announced. It was developed, the university explained, to facilitate the integration of cities within the Greater Bay Area (GBA).
The system focuses on the translation of texts between Cantonese and Mandarin. “The system can efficiently and accurately translate text between the two [languages],” the university said in a statement.
As one of the most culturally diverse regions in the People’s Republic of China, Guangdong Province has one of the greatest diversity of languages in the country, separated into three major categories: Cantonese, Hakka and Hokkienese (Fujianese).
A recent report compiled by the Guangdong Provincial People’s Government found that about 40 million people within the province use Cantonese, or its sub-categories, regularly. As for the other two, Hakka, including its sub-categories, is used regularly by approximately 15 million. Hokkienese is regularly used by about 17 million in Guangdong.
The UM statement pointed out that Cantonese-Mandarin translation belongs to the category of dialect translation. Although the two dialects bear some similarities, there are differences in grammar and many other aspects.
In order to facilitate the system, the UM proposed a novel Unsupervised Neural Machine Translation (UNMT) model by introducing Pivot-Private embeddings and coordinating the learning of word representations from both the encoder and decoder in a layer-wise approach, to model the commonalities and diversities of Cantonese and Mandarin at different levels.
As explained in the statement, this approach can greatly enhance translation quality and both machine self-evaluation and human evaluation have confirmed the new approach’s relatively high degree of accuracy.
There are everyday cases in which Cantonese and Mandarin speakers misunderstand each other. The differences between the two languages can be confusing or even humorous, whereby completely normal pronunciation in one can constitute foul language in the other.
The Cantonese-Mandarin translation system was developed by UM’s Natural Language Processing & Portuguese – Chinese Machine Translation Laboratory (NLP2CT).
The multidisciplinary team at the lab has won numerous awards for their achievements, including a second prize at the Macao Science and Technology Awards in the Science and Technology Progress Award category for their project that studied the technologies related to Chinese/Portuguese machine translation systems and the applications of the systems. The system is now available online. Staff reporter

Categories Greater Bay