By Tim YoungHoon Jung, CEO, XL8 Inc.
Did he really just say that? She couldn’t have meant what she wrote.
For all the benefits of the current content explosion, with more of the worldwide population gaining access to content – creating it, viewing it, sharing it, one of the major potential downsides continues to be conveying an accurate meaning.
The tone of an email can easily be misread and misinterpreted. The same ad or commercial running in different countries, or even sections of the same city, can have vastly different meanings that aren’t always intentional, much less even positive.
What it all comes down to is accuracy, tone, intent – in a word, context.
In a previous article, we outlined the vision of XL8, the growing trend of combining translation with “hyper-localization” capabilities, and the need for and benefits of context awareness to achieve accurate content translation.
To recap, hyper-localization means more than only considering the languages spoken in one country. Localization strategies must also consider the various dialects and local vernaculars of a country’s regions. That’s often easier said than done, further illustrating the potential of context-aware Machine Translation (MT) based technologies to speed the localization process and deliver higher accuracy. This is especially important as the delivery timeframes to turn around localized content become increasingly compressed.
As we’ve previously discussed, getting content out to the market faster means more consumers can enjoy the titles they want and content owners can efficiently use and monetize the vast amounts of titles in their libraries.
Translating text ‘word for word’ is now the baseline, which is the minimum expectation of AI-based machine translation. As audiences become more demanding, the stakes are raised even higher for localization providers to go above and beyond, delivering highly accurate meanings according to context while taking into account cultural, historical, and regional variations.
More information is outlined in this article by one of our product experts about XL8’s use of machine translation technology and Context Awareness models to better translate colloquial phrases. Ultimately, XL8’s MT engine enables providers to fully understand the circumstances behind the original text and accurately translate with those circumstances in mind. In other words, actually “localizing” content instead of simply translating it “word for word.”
Let’s explore this premise more deeply.
Most MT models translate sentences one by one, losing critical context that is outside of the primary sentence. Instead, context-aware translation models, like the ones employed by XL8, use information “surrounding” the source sentence.
Context awareness also looks at sentences as a whole to determine if words should be feminine or masculine. Consider these two sentences: “I like this flower. Put it in the bag.”
The subject “it” in the second sentence refers to the flower. In English, the pronoun “it” doesn’t vary based on the gender of the subject word. In a language like French, for example, pronouns change according to gender.
Our CA models convert the English sentences to: J’aime cette fleur. Mets-la dans le sac.
The first “le” that appears in the first French sentence was masculine, while the correctly translated pronoun form of “it”, denoting the flower, would be the feminine; “la”.
Considering the context of a conversation, in other words, “reading between the lines,” was until very recently considered solely to be a human capability. Now, Context Awareness models are rapidly gaining that same ability. These CA models provide stunning results in terms of accuracy, often exceeding the expectations of translators while providing extraordinary consistency across the document as well as an entire series.
It’s up to individual content owners or licensees to decide on the level of accuracy they need for each application. For example, live event audiences are more tolerant of mistakes created by AI-generated captioning, but mistakes detract from “offline” experiences such as viewing pre-recorded, broadcast, or OTT content.
With different language pairs, we’re able to achieve different levels of accuracy; and in fact, a growing number of language pairs exceed the mid-to-high 90 percent range. But, don’t just take our word for it. Research recently conducted by a committee of localization partners conducted testing of our new model for translating English to LATAM Spanish using several categories of programming (e.g., sci-fi, comedy, food, travel, drama). The tests were conducted with and without the XL8 Context Awareness model applied.
Although both sets performed well, the accuracy of XL8’s Context Awareness model averaged 95.5% while the normal model average was 91.2% (a percentage change of +4.3%.)
Overall, the CA model was more accurate regarding gender and formality consistency among multiple subtitles. While both performed well at providing coherent sentences, even when faced with misspelled words or odd phrasings, the CA model was more accurate with certain categories like food, where dishes were described in extreme detail with long lists of ingredients.
English to LATAM Spanish, as of July 20, 2022
Engine 1 – XL8 (context awareness model)
Engine 2 – XL8 (normal model)
The food category example is a perfect example of how accuracy levels can fluctuate. With certain advanced language pairs, accuracy may only get to 75% and for those exceptions, we recommend augmenting the translation with a Post-Edit review and quality control (QC) process. We are also constantly training and updating language pairs to make them context-aware, and through those efforts, we’ve seen increases in accuracy exceeding 15% as compared to the previous model.
Linguists who have worked with our CA-based models agree that it’s the tool they need to work more efficiently and complete projects with a higher level of customer satisfaction.
“XL8’s new Context Awareness engines make the translation and QC process much easier and faster,” said June K., a linguist with 4 years of localization experience and a specialist in Korean and English-to-Korean translations. “The accuracy of the results got higher from 70% to 90% in the best cases, and the time we spent on revising the translation was shortened by nearly 20%. The main characters’ names and the key phrases show much more consistency compared to the original engines, which helps us maintain the best quality of our industry.”
Even linguists who were at first hesitant due to their previous machine translation experiences have realized the workflow improvements possible with the XL8 model, leading to higher-quality work and faster customer responsiveness.
“I remember the first time I heard that we would implement the NMT [Neural Machine Translation] engine,” said Francesco R., a linguist with 15 years of localization experience and a specialist in French, Italian and English. “I was skeptical because, in my previous experience, machine translation meant lower quality. But in time, working with and on the engine, I have seen it improve in understanding the context and provide better translations, thanks to our database made of millions of lines, thanks to the programmers, and thanks to the feedback of translators. Now the NMT is a good instrument to help us provide quality translations for our clients and reduced turnaround times.”
The global demand for content is only going to increase, and customizable tools like our Context Awareness models will become increasingly critical to achieving effective hyper-localization. Our vision is to remove language barriers, allowing everybody to communicate and enjoy the content they want to watch in their own language, while also giving content owners more control over the entire localization process and helping them grow their businesses efficiently.