Would You Invest $1 for Enhanced Machine Translation to Save $198?
Savings can hide in the most curious places – particularly when doing machine translation (MT) to translate vast quantities of foreign language content.
Machine translation is usually the starting point in the legal profession, corporate law, and many other high-volume content situations in order to churn through large amounts of foreign-language documents. But as powerful as machine translation engines can be, there are countless words that MT engines aren’t trained to know.
These “outlier” words can take many forms:
• No machine translation engine contains all the words of a language (e.g., English is thought to have more than 1 million words);
• Words may have multiple uses, and machine translation can mistakenly apply the wrong context (e.g., “cranes by the riverbank” referring to birds and a body of water, are not the same as “construction cranes used to build the bank by the river”);
• Some words may be rarely used (e.g., in English where grammatical use has changed, words like “thee,” “thy,” and “whom” may be unknown);
• Words may be newly created (e.g., in German and Russian, it is common to see new words pop up or combinations of words with new concepts);
• Company names and product names should generally not be translated, although there are certainly exceptions in some languages; • Personal names should not be translated, (e.g. – “Mr. Grey” is a person and not a “man of a certain color”);
• Some chemical compound names may not be present in a machine translation word repository; • Industry-specific terms may not be widely known outside the industry.
What do you do with these “outlier” words?
Linguistic Systems’ translation analytics capability allows us to extract all words that the machine translation engine “did not translate” (DNT). An output file of those words is then created which includes the number of times they occurred. This file is used to create a new proprietary client glossary (or to update existing glossaries) with DNT words. The glossary then serves as a reference for the machine translation engine, for this client’s specific project or case going forward.
The next step is most important. We collaborate offline with our clients to prioritize and define the outliers so they can be added to a custom glossary with “DNT” words for that client and project. The job can then be rerun with the custom glossary. The result: Significant savings of both time and money for a portion of the files. Here’s how it works.
“The process keeps getting quicker, more efficient, and less expensive with each additional job.”
Here are the numbers for a project that contains 2,000 files to be translated. Starting with machine translation, we might run 400 files (one fifth) as straight MT with no updated glossary. (We would leverage glossaries if they exist from previous jobs.)
Our machine translation engine would isolate the outlier “DNT” words and their number of instances. After collaborating with the client, the project is rerun – all 2,000 files — with the updated glossary in place.
To use round numbers, let’s assume that straight machine translation is charged out at $1 per document. The cost might go to $2 per document for enhanced machine translation (using the custom glossary) on the 400 documents to be examined more deeply. However, this $1 increase per file may enable savings of much more for this segment of identified files where human post-editing is recommended to significantly improve clarity.
Post-editing could run $200 per document for the files that require it. That’s $198 per file in savings times 400 files, which translates to $79,200 in savings by not having to post-edit that segment of machine translation work.
There should also be a significant savings in time if machine translation plus an augmented glossary can be used. The next time a project is run for that same client, a more robust glossary is already in place, and collaboration time on new “outliers” should be less. The process keeps getting quicker, more efficient, and less expensive with each additional job.
Machine translation is still limited in its quality. Even files that have gone through an enhanced glossary may require additional human translation if their end purpose demands it.
Machine translation — even with enhanced glossaries and post-editing — is nowhere near “certification grade.” An attorney could not go into court unless the content has gone through a full and careful human edit. But by augmenting the machine translation process with custom glossaries, this can be a very cost-effective option for segments of large projects.
EDITOR’S NOTE: Linguistic Systems uses a combination of advanced proprietary technology and 7,500 skilled, certified translators to deliver high-quality translations in 120+ languages. Trust us with your next translation project.