What is Language Spam?

While referrer spam is mainly about targeting search engines, language spam typically is used by a spammer for a certain agenda or to promote their own sites or products. What happens is they manipulate the language used by real sites. The idea behind this is that once you see the URL of the new visitor, you might be tempted to trace it back to its source. This would in turn generate real visits to the hacker’s website, thus pushing it up the rating ladder.

Language spam can be seen in Google Analytics on your dashboard or under the “Audience > Geo > Language” section. Here are a couple examples of recent language spam attacks you might have seen lately popping up in your reports:

  • Secret.ɢoogle.com You are invited! Enter only with this ticket URL. Copy it. Vote for Trump!
  • Congratulations to Trump and all americans
  • Vitaly rules google ☆*:。゜゚・*ヽ(^ᴗ^)ノ*・゜゚。:*☆ ¯\_(ツ)_/¯(ಠ益ಠ)(ಥ‿ಥ)(ʘ‿ʘ)ლ(ಠ_ಠლ)( ͡° ͜ʖ ͡°)ヽ(゚Д゚)ノʕ•̫͡•ʔᶘ ᵒᴥᵒᶅ(=^ ^=)oO
  • o-o-8-o-o.com search shell is much better than google!
  • Google officially recommends o-o-8-o-o.com search shell!

Google is apparently working on fixing this issue, but more and more keep emerging. Once one stops, another one seems to begin.

Why Should You Block Language Spam?

The first reason to block language spam it so that it obviously doesn’t completely skew your analytical data as seen above. If you ever want to use your language data of your visitors, say in a multilingual WordPress setup, then you want the data to be accurate.

Another important reason, that a lot of people don’t realize, is that Google Analytics filters don’t apply retroactively. This means that filters will only apply to data gathered from the day that the filters are created. That is why it is important to tackle the spam problem right away. Historical data cannot be fixed with filters. However, the downside to this is that if you implement a filter wrong, you could lose valuable data forever. There are advanced segments though which can help you with your historical data, of which we will go more into below.

Block Language Spam with a Filter

The easiest ways to block language spam in Google Analytics is to use a filter. Filters allow you to modify and limit data. For example, you can exclude certain subdirectories, whitelist traffic from specific IP or IP ranges, etc. We recommend setting up a new view whenever you are creating filters, because if anything goes wrong you should always have access to your original data untouched. You then apply all your custom filters to the new view.

Step 1

The first step is copy your current view so that you can filter the data only on a separate view. This is an optional step, but is highly recommended. You might already have a separate view in which case you can skip to Step 2. Otherwise, click into the Admin section in Google Analytics and into your current “View Settings.” Then click on “Copy view.” The reason you want to use copy is because this will carry over any other filters and goals that you already have in place on your website.

Name your new view. In our example we chose “filtered domain.com.” Then click on “Copy view.”

Step 2

Click into your new filtered view (or original view) and click into “Filters.” Then click on “+ Add Filter.”

Note: You will need “Edit” access at the “Account” level in Google Analytics in order to set up new filters, or you won’t have the ability to follow through on these next steps.

Step 3

Give your filter a name (ex: Filter Language Spam). Then choose custom from the Filter types. You will want to select the “Language Settings” filter and input the following into the Filter Pattern field:

.{15,}|\s[^\s]*\s|\.|,|\!|\/

You can then click on the “verify” button to see an example of what the filter found in the last 7 days. Then click “Save” to apply the filter.

That’s it! You now will only see valid/real languages pass through in your Google Analytics.

Leave a Comment