Google Translate is only as dumb as you allow it
Statistical machine translation, brought to the mainstream by Google Translate, is a modern technological miracle. But for all its value, the software is easily fooled. Limit its stupidity by writing specifically and naturally, and the machine will work harder for your content.
Statistical machine translation — fueled by millions of reference documents and millions of crowdsourced corrections — is shockingly accurate. The logic is brilliantly simple and bewilderingly sophisticated. From Google Translate, the most visible brand consuming this technology:
When Google Translate generates a translation, it looks for patterns in hundreds of millions of documents to help decide on the best translation for you. By detecting patterns in documents that have already been translated by human translators, Google Translate can make intelligent guesses as to what an appropriate translation should be.
In 2012, translate.google.com was translating roughly as much text as you’d find in one million books — every day. By any definition, this is a modern miracle.
However, an infinitely fast omni-lingual supercomputer hive mind in the cloud can’t capture the abstractions that keep language so elusive: cultural inflection, assumptions, context, tone, intent. Not just the stuff between the lines, but the stuff beyond the page. With a bit of ambiguous writing, Google Translate is easily foiled. Example:
The clothing was worn.
For Spanish, Google Translate returns la ropa estaba gastada — “the clothing was worn-out”. For French, Google Translate returns le vêtement a été porté — “the garment has been worn”.
Which is correct? Context would dictate, but context is like ultraviolet light to statistical machine translation — affecting everything but beyond its purview.
Translation is not about words, it’s about meaning
What are we actually saying here? Is “worn” a verb or an adjective? Can we clarify? Technically, le vêtement a été porté is correct, but it’s unspecific until we reintroduce context. Why not elle portait la robe — “she wore the dress”?
At some point, we’ll achieve perfect AI and globally obsolete ourselves from all critical thinking. Before SkyNet goes live, however, the need for humans in translation workflow remains. People ask questions, machines respond to prompts. People seek meaning, machines seek accuracy.
And yet: always write for machines
Humans don’t scale. There are hundreds of languages, thousands of dialects, infinite contexts. The linear model of you manually translating your text to everything would implode under its own weight before you even started Chichewa.
Which brings us back to why machine translation exists in the first place. The only way for the world to communicate at the speed of now is through a digital intermediary. Google Translate, for all its flaws, is the best babel fish most users have, and it’s everywhere: the web interface, dedicated apps, website widgets for webmasters, Google+ authoring, the Chrome browser itself. People use it, constantly, in myriad contexts. One million books of text, every day.
But if four measly words like “the clothing was worn” can fool a machine, how can our lengthy, persuasive prose survive? Simply: write well.
- Keep sentences tight. Avoid compounding clauses and semicolonitis. There are few reasons a sentence should run past 25 words. Simplify.
- Use active verbs. Passive voice is not unique to English, and it’s not always a bad thing. But translation is simpler when moving between direct, active, specific verbs. “The boy bounces the ball” is much tighter than “the ball was bounced by the boy”. An easy trick to find this is remove instance of “make”. (This article doesn’t make me angry, this article angers me.)
- Reduce hyperbole. Overuse of flowery adjectives, jumbling adverbs and crooked turns of phrase invites trouble. Keep it clean, and it will keep your costs down.
- Avoid colloquialisms, cliches and idioms. Every culture has its “sayings” and special words that are near meaningless outside a cultural circle. These are rarely appropriate to translate because the literal interpretation is often idiotic (“You picked his brain?”) or grossly miscast in a different context (“a drop in the bucket” means what to someone in Sudan?).
- Never verb nouns. It’s fashionable to take a perfectly good noun and re-cast it as a verb. Don’t do this. Verbing nouns breaks known language rules and wrecks havoc for translators to interpret. Don’t solution problems. Don’t grass your yard. Don’t stair your way to the top floor.
- Remove newspeak. Cut away complex, teflon-coated languagisms designed to passively divert responsibility, distract with shiny multi-syllabication, and obfuscate meaning. Governments don’t “provide revenue enhancement”, they raise taxes.
- Write persistent ideas consistently. If you have a product name, brand tagline, complex theme, or other sophisticated group of words that require careful translation, ensure your source material phrases it precisely the same way every time.
Above all, write for specificity and clarity. Sharp, pointy words help the translator —human or robot — find its perfect match, and ultimately better snag the reader’s attention.
- Going Global Without Going Insane (my presentation at LavaCon 2014)
- Lies, damned lies, and statistics
- Statistics and languages: The power of n-grams
- How Google Translate works