Sunday, 14 July 2013

Our chances against the machine: Will translators stop having a profession soon?

We decided to quickly test the machine (click here to see), thinking perhaps of the Turing Machine contest, and got the following results for the couple (Portuguese,English):

·         Simple sentences seem to do well if they do not escape machine reasoning, that is, if they are not, for instance, localisms. We try eu te amo and GT[1] (we will from now onwards call Google Translate GT) responds with I love you;
·         Localisms are ignored. We try é fogo, hein? and this is the most common carioca translation for it is quite unbearable, is it not? (like the not-so-elegant version of this sentence in their local language), and we get and fire, huh?
·         Technical expressions such as Rede Mundial de Computadores get a very good result, for GT comes up with World Wide Web, but the truth is that that is the translation of the word Internet too, and, if we try Internet, we get Internet.  It is not acceptable that we do not translate terms that can be translated into Portuguese according to the Brazilian linguists. We agree: This is to protect the language, to preserve it.  Besides, if we think like the purists, World Wide Web is actually Rede Mundial only, and Internet is Entre Redes (literal translation) or Rede Mundial de Computadores (recommendation of the Brazilian linguists); 
  • Simple, but technical, words, like Internet are therefore not finding good translation in GT, what is unexpected, since our work on Translation[2] points to all that is technical being passive of automation. Another easy example is município, which is municipality. If we use GT, we get município again for some reason.  Still to this side, cartório becomes registry in the GT system, but it should become registry office or office of births, deaths, and marriages, for instance (see our post here on the topic); 
  • GT seems to ignore subtleties of the Portuguese language: Things like the gender of the person, which is frequently passed through the endings of the words or the articles that come before them in Portuguese, seem to be completely ignored by the machine. For example, we try juíza, which should mean female magistrate or female judge in English, and we get judge. We try professora and we get teacher, not female teacher, and so on so forth; 
  • GT seems to actually ignore anything that be specific to the Portuguese language, that is, anything that does not mimic the English language. As another example, we have diminutives and augmentatives. For instance, we try cãozinho and we get doggy. Well, cãozinho is not doggy. Cãozinho is little dog. Doggy could perhaps be translated into cachorrada or cachorrinho (if a child is saying that) in Portuguese, but only very rarely into cãozinho (we believe that doggy style would definitely be better translated into estilo cachorrada than into estilo cãozinho). We try peninha and we get little feather, but peninha might mean little pitty as well, and this sense, little pitty, is also a localism. Everyone would agree that the distance between one and another is almost infinity. We try homenzarrão and, apparently because there is only one sense for it in Portuguese, things go well: Big man. However, big man is usually homem grande and homenzarrão should be super big man instead;
  • GT translates tu into thou, but tu is not thoutu is youVós is thouVós is translated into ye by GT. Because ye is the plural of thou and they translate tu into thou, however, everything seems to make sense; and
·         The best item ever found in GT is, we believe, amorzinho in the most common of its senses. GT translates this one into chickabiddy, which is chicken or child in English (see chickabiddy, Merriam-Webster).
Well, amorzinho actually means sweetheart, considering cultural equivalences (could be little love, if one goes literally, but that would imply extraordinary mistake in translation. This is not a word to diminish the size of the love, but to actually show huge amount of affection and consideration).


Perhaps we all know that, with machines, we will always need the human hand over the final product in order to provide a good translated version of any document, regardless of how simple the document is.

If we stick to the paradigms that we have created for the science of translation, then it is impossible to go without the human hand, since we need to adapt everything considering time of the production of the original, location, and all other aspects that we have classified as important.

According to the theory we have developed, we need to worry even about the style of the document, so that if the author said tu, we want to understand, upon reading the translated version of such a document, that that is you everywhere that be not a few special regions of Brazil (where people use tu instead of você to refer to the person next to them by the moment they speak), if we talk about Brazil.

The problem is that we really need to pass that information to the reader of the translated version of the document, so that we have to translate that tu into you, but we also have to add a note to explain that the original document brought tu, and that was because of the place where it has been produced or because of the person who wrote it or even because of the intentions of the writer (say that it is a play in which the characters are from the South of Brazil).

It is possible to insert all this information into a machine (we get the best translated versions of documents that we know of, scan them, and then record them in entries containing also dates and places where they have been produced), and consider it as we try to translate texts with GT (say that we enter the date and the location of the document that we have in our hands in the system as well).

Notwithstanding, it is not possible to get an equivalent for a particular term that works for all documents.


It is not possible to get an equivalent that works for all documents for an expression either.
The computer will always have to come up with a set of choices if things are done in a serious manner.

GT will never be a perfect tool, regardless. One of the reasons for that is that language is always being created.

Even if all translators of this world were entering their translated versions of documents with their original documents, and inserting all this data we talk about as they do that, into a system, twenty four hours a day, seven days a week, the system, regardless of its nature, would still not be complete.

However, it would be a much better system, since it would allow for the translation of more pieces of text in a reliable manner.
If that has already happened or if that is what is happening now somehow, and the own GT says that this is more or less what is currently happening[3], we would think that we should start charging royalties for every expression coined by a translator (in translation/equivalences).

The date of the creation would have to be irrelevant, since we would have to start from today.

We would then obviously be paying royalties to all the translators whose data has been inserted into the system each and every time someone uses their work.

It would not be fair worrying about these issues in music and writing and not worrying about these issues in translation... .

Obs.: All sources mentioned in this text have been consulted on the fifteenth of July of two thousand and thirteen.







[1] https://translate.google.com/
[2] http://philpapers.org/rec/PINAOT
[3] http://translate.google.com.au/about/










Please help the SPTIA help our professional class by doing one of its courses:



1 comment:

  1. Hi Marcia - nice rundown of how GT works in portuguese. Translators (and interpreters) will continue to have a profession. But the kinds of work we do will continue to shift with the sea change in how humans communicate with each other that is being brought on by the Internet and social media. What we translate, using which tools, will be a moving target for awhile...

    ReplyDelete