Removing Spam and Robot Traffic from Google Analytics

Analysts, I bet there’s been a time when you were gathering data for your weekly or monthly report where you saw an increase in traffic. At first you’re excited but then a bit skeptical. “Where did all this traffic come from?” you ask. As you start researching, you might find nothing off. Or maybe you discover one page that has skyrocketed with traffic.

Either way, you go in to investigate further. You hold your breath, hoping for the best but preparing for the worst. As you search for where the traffic came from, you notice some of these names – semalt and ranksonic. And you notice characteristics like high bounce rate and average time on page as one second or less. Uh, oh. It looks like bad traffic.

So what can you do?

Based on what type of bad traffic you see (referral spam, robots/spiders, ghost spam, crawler spam, etc.), your approach to cleaning up your traffic will be different.

 

Use Google Analytics’ Robot/Spider Tool

A starting point to delete traffic useless to you is to check off Google’s pre-made filter called “Exclude Hits from Known Bots and Spiders.” Head over to Admin > View Settings > Exclude all hits from known bots and spiders.

Bot Filters Admin

For those of you who are unfamiliar with robots and spiders in terms of web traffic, robots and spiders automatically crawl websites. Some robots are good, in that they crawl websites without activating your analytics tags (like a Google Analytics pageview). Others crawl sites for malicious purposes. For the most part, the robots you’ll see in Google Analytics are the bad robots that are purposely inflating your web analytics traffic.

Check off this Bot Filtering box in your Google Analytics admin section to eliminate this known bad bot traffic from showing up in your Google Analytics data. For some, checking this box might not make much of a dent in reducing bot and spider traffic. But it’s a place to start. Besides, as Google Analytics identifies more known bots and spiders, you’ll know that your data won’t be hit by the bad traffic that Google recognizes.

 

Create a Referral Spam Segment

The next thing you need to do is create a referral spam segment.

Using a fake referrer, referral spam or ghost spam bombards your website with visits. And the referral spam is not visiting your site for the benefit of gaining knowledge from your website. Instead, referral spam is using this technique of visiting your site from a fake referrer URL in hopes that their advertisement/website/etc. gains a free link and moves up in search engine rankings. Nowhere does referral spam benefit your website.

You might have read elsewhere on the web to create a referral exclusion list, but creating a referral exclusion list is not the way to go. What will happen if you create one is that your referral spam will now show as direct traffic instead of referral traffic. And this fake direct traffic is still inflating your website’s numbers.

Referral Exclusion List

Another option you might consider trying is creating a filter for your website’s production view. However, if you filter out the most common spam traffic, you’ll end up making multiple filters. And if you want the same filters to be applied to other views for other websites, hopefully you have enough time to copy every single filter for all your websites. It’s a pain to create filters because you can’t share filters like you can for reports and segments.

Spam Filters

The only filters that you probably would only want to create would be customized ones that don’t need to be copied for all of your websites.

So that leads me to the best way to cut out referral spam – creating a referral spam segment. All you have to do is create one segment and share and apply it to other views for your other websites. That’s a lot easier, efficient, and effective.

To create a referral spam segment, you’ll have to do a little research first. Go to Audience > Technology > Network and choose “Hostname” as the primary dimension.

Hostname

When the results pop up, you’ll see a long list of hostnames, some of which you’ll recognize (your website) and others, not so much. Take this list and export it into Excel. Mark the hostnames that have your content on them with a “y” and those that you know do not have even a single page of your content with a “n.” Some of these hostnames might be recognizable as big name companies, but if you don’t have content on their site, the hostname listed is actually “ghost” spam. It’s appearing as one host but is really another.

Content on this site

So taking your list of “y”s, create a custom segment to include hostnames that you indicated were yours. In addition to the correct hostnames, include as another section in your segment an exclusion of the most popular spam sources (as well as not so common ones you just find on your site). Once you’re done, save the segment and apply it to your data.

Spam removal segment

 

Eliminate Crawler Spam Using Your .htaccess File

The next thing you could do to eliminate spam is by editing your .htaccess file. This step isn’t recommended if you have no idea what you’re doing. One wrong typed key and your whole website could be no more. So proceed with caution if you’re going to edit your .htaccess file.

To remove crawler spam, first identify the domains from where the bots are originating. You can use a method similar to the previous section, find a list of domains online to avoid, or look through the access logs to find IP addresses or domains. Once you’ve got your list of crawler spam, insert the following code into your .htaccess file:

RewriteEngine on

Options +FollowSymlinks (Note: If you use IIS. If you use Apache, do not include this line.)

RewriteCond %{HTTP_REFERER} ^http://.*domainnamehere.gtldhere [NC,OR]

RewriteRule ^(.*)$ – [F,L]

 

Get Google Tag Manager to Help Out

If you have relied on capturing and filtering spam traffic by IP address, I have both good and bad news for you. For bad, spam and robot traffic doesn’t keep to one IP address. So your IP address list might be invalid tomorrow. However, thankfully there’s a solution for you – custom dimensions in Google Tag Manager.

To create a custom dimension, head over to Google Analytics first. Go to Admin > Property > Custom Definitions > Custom Dimensions. Then add a new dimension called User Agent. The scope will be a session and make sure it is active (check the Active box). Press create.

Custom Dimension User Agent

Now head over to Google Tag Manager. Go to Container > Variables > User-Defined Variables > New. Choose Type as JavaScript Variable and Configure Variable with navigator.userAgent.

GTM Variable Set

Still within the Container, go to Tags > New. Choose the Product as Google Analytics and the Tag Type as Universal Analytics. When you’re at Configure Tag, go all the way to Custom Dimensions, enter 1 as the Index and {{User Agent}} as the Dimension Value.

GTM Custom Dimension

Now that you’ve set up everything within Google Analytics and Tag Manager, it’s time to do some research again. Once you have a couple of days’ worth of data, head over to Google Analytics.

Go to Audience > Custom > Custom Variables (or User Defined) and use the time frame in which the custom dimension is active.

Export this data into Excel for further research. Look at the users with multiple visits and high bounce rates during the time period and for a single day. Write down all of the users that express this behavior. Once you’ve identified all of these users, go back into Google Analytics and create a filter excluding all the user agents that you identified above.

While this method using Google Tag Manager works most of the time, this approach won’t work if the bots don’t declare who they are through the agent string.

 

Summary

Now that you’ve gone through all four steps, you are finally ready to use and explore your website’s accurate web analytics data.

What are your thoughts on this process? Did we miss anything that you believe is necessary to eliminate spam from Google Analytics? Let us know in the comments below!

 

Looking for Google Analytics consulting help? Call us at 877-694-2495 or email us at info@bayleafdigital.com