1 (617) 528-7410 ClientService@Linguist.com
Are Your Translations Exposing You To Risk?

Are Your Translations Exposing You To Risk?

How Linguistics Cracked The Ransomware Code

Cyberattacks, bugs, viruses, cybertheft, malware or ransomware … a breach of data security under any name is formidable. But, leveraging linguistic analysis is proving to be a valuable tool in cracking a hacker’s code.

As technology advances, the sophistication and intricacies of cyberterrorism add new complexity to data and risk management. However, each attack embeds identifiers in the code that can help lead authorities to the correct perpetrator.

Global law enforcement officials search for those identifiers within the malware to lead to the source of the attack. By analyzing language trends within the code, authorities can make assumptions about where the attack originated.

For example, with the WannaCry ransomware scam, ransom letters were sent out in different languages. But linguistic nuances appeared as errors in generic translations by free machine translation engines.

Experts saw that the hacker’s use of certain Chinese characters hinted at fluency, while the failure to recognize grammatic and contextual cues in other languages supported forensic claims.

You want to be careful of the accuracy of machine translation by itself, especially from free translation sites. (Note: Linguistic Systems uses advanced, proprietary statistical and neural engines for its machine translation. We then add human translation as needed, to get to the desired quality level.)

According to Flashpoint authors Jon Condra, John Costello, and Sherman Chu, in an article published May 25, 2017, “A number of unique characteristics in the note indicate it was written by a fluent Chinese speaker. A typo in the note, “帮组” (bang zu) instead of “帮助” (bang zhu) meaning “help,” strongly indicates the note was written using a Chinese-language input system rather than being translated from a different version. More generally, the note makes use of proper grammar, punctuation, syntax, and character choice, indicating the writer was likely native or at least fluent.”

Data security starts with a commitment to confidentiality. Although free translation sites may seem like a quick and cost-effective choice to translate your documents, they can expose you to risk.

Even Google Translate’s FAQs confirm this possibility: “The stored text is typically deleted in a few hours, although occasionally we will retain it for longer while we perform debugging and other testing. Google also temporarily logs some metadata about translation requests (such as the time the request was received and the size of the request) to improve our service.”

The lack of accountability of free translation sites may contribute to lower quality translations. Forgoing the expertise of human insight probably gave authorities valuable clues to the location of the WannaCry Ransomware hackers. It also highlights the flaws of machine translation software in general, particularly on free sites.

Using a free online translation tool may seem cost-effective, but it invites a third party to engage with your content — one that cannot be held accountable in the event of a security breach. This exposes you to risk.

To be sure that you have the most secure and accurate translation, put your trust in a translation service provider who can offer you the cost- and time-effective methods of machine translation complemented with the expertise of human translation as needed. Choose a service provider with a strong history of excellence in translation and confidentiality supported by multiple security certifications.

We’ve got you covered in all those areas.

EDITOR’S NOTE:  Linguistic Systems maintains an information security management system certified to the requirements of the ISO 27001 information security standards.

Machine Translation: The Hidden Value of Outliers

Machine Translation: The Hidden Value of Outliers

Would You Invest $1 for Enhanced Machine Translation to Save $198?

Savings can hide in the most curious places – particularly when doing machine translation (MT) to translate vast quantities of foreign language content.

Machine translation is usually the starting point in the legal profession, corporate law, and many other high-volume content situations in order to churn through large amounts of foreign-language documents. But as powerful as machine translation engines can be, there are countless words that MT engines aren’t trained to know.

These “outlier” words can take many forms:

• No machine translation engine contains all the words of a language (e.g., English is thought to have more than 1 million words);

• Words may have multiple uses, and machine translation can mistakenly apply the wrong context (e.g., “cranes by the riverbank” referring to birds and a body of water, are not the same as “construction cranes used to build the bank by the river”);

• Some words may be rarely used (e.g., in English where grammatical use has changed, words like “thee,” “thy,” and “whom” may be unknown);

• Words may be newly created (e.g., in German and Russian, it is common to see new words pop up or combinations of words with new concepts);

• Company names and product names should generally not be translated, although there are certainly exceptions in some languages; • Personal names should not be translated, (e.g. – “Mr. Grey” is a person and not a “man of a certain color”);

• Some chemical compound names may not be present in a machine translation word repository; • Industry-specific terms may not be widely known outside the industry.

What do you do with these “outlier” words?

Linguistic Systems’ translation analytics capability allows us to extract all words that the machine translation engine “did not translate” (DNT). An output file of those words is then created which includes the number of times they occurred. This file is used to create a new proprietary client glossary (or to update existing glossaries) with DNT words. The glossary then serves as a reference for the machine translation engine, for this client’s specific project or case going forward.

The next step is most important. We collaborate offline with our clients to prioritize and define the outliers so they can be added to a custom glossary with “DNT” words for that client and project. The job can then be rerun with the custom glossary. The result: Significant savings of both time and money for a portion of the files. Here’s how it works.

“The process keeps getting quicker, more efficient, and less expensive with each additional job.”


Here are the numbers for a project that contains 2,000 files to be translated. Starting with machine translation, we might run 400 files (one fifth) as straight MT with no updated glossary. (We would leverage glossaries if they exist from previous jobs.)

Our machine translation engine would isolate the outlier “DNT” words and their number of instances. After collaborating with the client, the project is rerun – all 2,000 files — with the updated glossary in place.

To use round numbers, let’s assume that straight machine translation is charged out at $1 per document. The cost might go to $2 per document for enhanced machine translation (using the custom glossary) on the 400 documents to be examined more deeply. However, this $1 increase per file may enable savings of much more for this segment of identified files where human post-editing is recommended to significantly improve clarity.

Post-editing could run $200 per document for the files that require it. That’s $198 per file in savings times 400 files, which translates to $79,200 in savings by not having to post-edit that segment of machine translation work.

There should also be a significant savings in time if machine translation plus an augmented glossary can be used. The next time a project is run for that same client, a more robust glossary is already in place, and collaboration time on new “outliers” should be less. The process keeps getting quicker, more efficient, and less expensive with each additional job.

Machine translation is still limited in its quality. Even files that have gone through an enhanced glossary may require additional human translation if their end purpose demands it.

Machine translation — even with enhanced glossaries and post-editing — is nowhere near “certification grade.” An attorney could not go into court unless the content has gone through a full and careful human edit. But by augmenting the machine translation process with custom glossaries, this can be a very cost-effective option for segments of large projects.

EDITOR’S NOTE: Linguistic Systems uses a combination of advanced proprietary technology and 7,500 skilled, certified translators to deliver high-quality translations in 120+ languages. Trust us with your next translation project.