Strategy to remove Machine translated pages from Google’s Search Index

Multilingual

Two weeks back, Technospot.Net had a major drop in traffic. After analyzing a lot I wasn’t able to figure out anything because there was no change to the site parameters, So I decided to ask this question in the Google Webmaster Forum and within a day or two other webmaster pointed to a problem which I never thought of : Automated /Machine Translated Pages.

Since a year we are using a plugin which generates machine translated pages. Our intention was purely to help reader with language other than English to use our content but the problem is that the terms of the Google Language API do not allow you to store the translation more than 15 days.

You may copy, store, archive, republish or create a database of results returned from the service, in whole or in part, directly or indirectly, except that you may store results in a temporary cache for a period not to exceed Fifteen (15) days solely for the purpose of using those results to carry out a specific user-requested action.

And co-incidentally a day before the traffic drop, Google Webmaster Central   posted on “Working with Multilingual Sites”  where they stated clearly that :

We recommend that you do not allow automated translations to get indexed. Automated translations don’t always make sense and they could potentially be viewed as spam. More importantly, the point of making a multilingual website is to reach a larger audience by providing valuable content in several languages. If your users can’t understand an automated translation or if it feels artificial to them, you should ask yourself whether you really want to present this kind of content to them.

and with translated pages we were ranking in non-English search engines.

Though later I figured the problem was because of a script which did not allow the stats tracking scripts but I was pretty much sure that the problem of automated translation pages is going to strike sooner or later.

What did I do to find a solution to the translated pages problem ?

It was a state of panic because I didn’t wanted to wait for a day and fall into trouble because when Google  Search starts clearing spam you aren’t warned.

  • First thing I did was started asking and emailing couple of people on what would be the best strategy to get it off.
  • Second I saw that related to the same post, Google had a thread open which John Mu ( A Google employee ) was following. I took that Opportunity and asked the question

You Said

We recommend that you do not allow automated translations to get indexed. Automated translations don’t always make sense and they could potentially be viewed as spam.

So if somebody has lot of machine translated page and he wants to drop it off what he should do ? Apply a 301 redirect to original content and drop off the translated pages ?

There were couple of options I figured out my self :

  1. Drop the translated pages from my database and let Google bot’s see a 404. This should slowly remove all the translated pages from Google index but current users will have a problem with that.
  2. Add a NoIndex to all pages but that would be tough for me.
  3. Block all the language pages with Robots.txt which was affordable.
  4. Add a 301 Redirect to original post which again was tough.

I was lucky at two places.

First John Mu replied ( Search for “ashishmohta”) to my question and said

@ashishmohta – Yes, using a 301 redirect to the original is a good idea, you could also use a noindex robots meta tag or use the rel=canonical link element, if you prefer. Depending on the number of pages involved, this can take quite some time, so it’s good to get this right from the start (or to start fixing it early, if you can :-)).

Second, The plugin which I used to translate has an option to give 301 Redirect to all pages if I switch of all the  languages. So I just went and switched of all the languages and I was done.

Currently I am seeing the pages are getting of slowly from Google’s Index  and I hope it will be all gone but its going to take time.

How I gave users an option to translate after 301 Redirect ?

After deciding that to go with 301 Redirect, there was a problem which I cited from user point of view. Since users came from search engine with text seen in their language and since I had dropped that translated page and showing them the English version, the users might be disappointed.

Google Translate beside articles
Google Translate beside articles

So I placed a Google translation Widget Drop Down right beside the post. So even if the user sees that it’s in English he can to  translate it to his language. Since I knew which languages the plug-in was translating to I only picked up those languages but if you still need to find, you can use Google Analytics to find the top 20 languages for your site .  You can read Amit’s Article on Google Translation Widgets for more ways of implementations.

As of now I see in the analytic that users are translating those pages in their language so this strategy is successful.

Summarizing the complete strategy :

Translation Removal Strategy
Translation Removal Strategy
  • Avoid serving automated translation pages to users because translators are not perfect and sometimes give improper meaning of what you write which is not good from user experience point of view.
  • If you decide to still create automated  translated pages don’t let Google bots index it and if you store in your database, don’t keep it more than 15 days.
  • To remove the automated translation page from Google’s Search Index you can do 301 Redirect, Use a no index robots meta tag or use the rel=canonical link element.
  • Use Translation Widgets, preferably Ajax,  like one of Google which do not store anything and translation happens in real-time.

You might see a drop in traffic but its a good thing to avoid in long-term. You never know when sites will start getting penalty because of this.

LEAVE A REPLY

Please enter your comment!
Please enter your name here