Are you swimming or drowning in the ocean of SEO keyword data? A big issue facing SEO analysts today is getting their arms around the extremely large data set of search terms for which their website could rank. More often than not, analysts end up using a list of the top <put your number here> keywords and track that into eternity. It is time for all of us to out-grow that and adapt a smarter, more comprehensive approach to understanding the SEO opportunity before us.
A Quick Recap
Before we get into demand analytics, here’s a quick recap of the previous post. In my last post, I outlined the three pillars of SEO analytics that are required for a comprehensive understanding of SEO performance. The need for these pillars has always been there, but with Google’s algorithm getting more and more sophisticated, these pillars are a necessity in order to understand organic search performance. As a recap, the three pillars of SEO analytics are
What is SEO Demand Analytics?
Today, I am going to cover the first pillar (SEO Demand Analytics) in detail. By demand analytics, I mean understanding how users search, work with the volume of these searches, and measure the presence of a website in these searches. One might say this is nothing new and that it has been done for several years now. The reason we need to look at SEO Demand Analytics using a much more sophisticated approach is that Google’s algorithm changes in the last couple of years have been detrimental to analysts’ ability to measure search demand and search presence. While I have covered these Google changes in detail in my previous article, here they are in summary:
- Keyword data is now hidden in Google’s (not provided) category
- Google’s auto suggest has made user searches more long tail (low volume, 3+ word phrases)
- Semantic search displays search results where the keywords may not even be present on the landing pages
- Search results vary by geographic location of users
- Search results are personalized based on users’ search history
The large data approach to SEO Demand Analytics
In this constantly changing world of Google’s algorithm, the analytics for tracking search demand and search presence have lagged severely. In fact, many marketers we have spoken to use a variation of the same old technique of tracking the top 10, 20, 100, or 1000 high volume keywords. Here’s a screenshot of a typical SEO performance report using Stanley Steemer as an example.
Full Disclosure – Stanley Steemer is not a client, and we did not work with anyone at Stanley Steemer to get this information. All of this information is available in the public domain.
The table above shows the top 30 keyword Search Engine Results Page position changes for the last three months. It is definitely useful information should an analyst want to track performance for a small set of keywords. Problem is that the number of keywords for which a website ranks have increased dramatically. To put this into perspective, SEMrush tracks 5,000 keywords per month for Stanley Steemer. And it is still one of the smaller set of keywords. Even so, wrapping our head around 120 head terms and 4,800 mid and long tail terms is pretty daunting. Think about a table 5,000 rows long with no context other than keyword and volume to analyze the data with. Even this table above of just 30 keywords gives me pause the tracking is at a very tactical level and doesn’t support big picture decisions.
There has to be a better way to analyze large data sets of keyword data! Here’s how we go about it:
Data Sources for Keywords Searched
The first problem is where to get the keyword search data from. Thankfully, there are a few great third party services available to pull the basic searched keywords and SERP positions for a website. Our two favorites are SEMrush and SERPs.com, and they each have their own advantages. SEMrush is great for keyword discovery by domain while SERPs.com is good for pulling keyword ranks by location. Think about “carpet cleaners” searched from Dallas versus Chicago.
For this exercise, we limited our data to the keyword set from SEMrush. However, we will look at how to use keyword data by location in a subsequent post. Just to put things into perspective: The data set we used is 5,000 keywords strong for each month. To analyze three months of data, the data set increases 15,000 rows. And to analyze the same data for the top 100 cities in the U.S., we’re looking at 1.5 million rows of data. So this can easily get into big data territory for a larger website.
The Big Picture Result in a Single Page
The struggle with large data sets often is that they are not looked at in a manner that is digestible and easy to understand. Making sense of large data sets is at the core of our approach to SEO Demand Analytics. After running through our 15 step program, here is what the raw data set of 15,000 keywords looks like (not all 15K keywords are shown because of various filters applied to the result).
In a single page, we are now able to see performance of all tracked keywords. There are so many interesting things one can now glean from just this view, but here are five insights we can get in about five minutes of looking at this big picture view:
- Stanley Steemer is very well positioned for “carpet cleaning” and “upholstery cleaning” keywords
- Over 60 percent of Stanley Steemer’s non-brand organic search traffic comes from the “carpet & upholstery” category of keywords
- The “floor cleaning” category represents a nice chunk of opportunity for Stanley Steemer, but it hasn’t been leveraged well yet (the average SERP rank for these keywords is 5)
- “Damage cleanup” is another underleveraged category
- “Oriental rug care” and “car upholstery” subcategories suggest that there are other niche subcategories that Stanley Steemer could pursue as a long tail play
Find Actionable Insights using SEO Demand Analytics
Now that we have a feel for how the Stanley Steemer website performs across a broad set of keywords, we can now dive into the specifics of a given category or URL or location or whatever other dimension makes sense to find actionable insights.
The secret sauce in our approach is the grouping of keywords into various levels of buckets. For this analysis, here are the buckets we used:
This helps us look at the forest at the category level and at the trees at the keyword level. So we can see that the “floor cleaning” category has potential to drive more traffic to StanleySteemer.com. Digging into this category, we can identify “tile flooring” as a subcategory with a good amount of traffic potential and “floor clean” as the lexical phrase containing high volume keywords that could drive more traffic.
Other Ways to Slice the Data
There are many other ways to look at this data. Once each keyword is “tagged” for filters such as URL, type of keyword, and location, we are able to identify many other actionable insights. Here are some examples:
Looking at the Stanley Steemer tile and grout page, we are able to see that the page does not rank in the top three for tile floor cleaning or for grout cleaning. This provides an indication that they likely need two pages to compete in these two related yet separate subcategories.
Another way to find opportunities is to look at the same data by locations searched, such as “carpet cleaning San Diego.” This provides actionable insight into how keywords with location modifiers perform.
Hopefully, this gives you an idea of the level of SEO Demand Analytics that is possible with the data that is available to you through third party providers. There are enough insights here to keep your analysts busy for several months.
The Process for Sophisticated SEO Demand Analytics
There are many ways to get this sort of insight for your website. We have a 15 step automated process that goes through various lexical and numeric comparisons to arrive at the categorization you see above.
Some of the most important steps in this process are identifying location level data, detecting root phrases to group keywords into categories, and creating a website specific lexicon. And of course, there’s also identifying exceptions to processing.
The technology stack used for this process includes Python, Gawk, Awk, Linux Shell Scripts, My SQL, TextPad, and of course, good old Excel. You could technically use any programming language for creating a similar process. We found that this stack works best for processing millions of records within a few hours.
Builder beware – the actual process took us 200 hours to create. However, now that we have it ready, it takes us trivial amount of hours to adapt the solution for any website that we want to analyze SEO demand for.
A Last Word
In summary, large scale SEO analytics is not just a fancy pants term. Today, it is necessary for analyzing SEO keyword rank data. Using a simple list of the top 30, 50, or 100 keywords to track SEO performance is just not enough anymore. It isn’t by accident that Google has made SEO demand analytics hard. But you don’t have to settle for something primitive to track it. The data is all there for you to use. So you really owe it to yourself to look at the forest and not just a handful of trees to make good, strong SEO strategy decisions.