How Google Collects Search Quality Data

As a full service SEM agency, we often field questions about how and/or why Google makes changes to its Search Engine Results Pages.  The only honest answer to that question is “I don’t know.”  It’s very difficult to predict Google’s next move.  This is especially true for our SEO clients.  They’re making changes all the time, both visible and invisible.  The only promise we can make is that we will quickly pivot and create solutions for our clients to ensure they can navigate through this ever-changing landscape.  Here’s an interesting piece from Search Engine Journal giving a behind-the-scenes glimpse into some of the tools Google use to improve the quality of its SERPs.  While it doesn’t answer the how/why questions, it certianly gives good insight into the process.  Link here.

How Google Collects Search Quality Data on Your Site

Posted on  by Johann Beishline | Leave a comment
The views of contributors are their own, and not necessarily those of SEJ

Google Inc
It’s little known that Google was never meant to be a company. In fact, in 1998 Vinod Khosla (one of the first investors in Google) managed to convince Larry Page and Sergey Brin to sell their technology to Lycos, Excite, or Yahoo, the leading search engines of the time, for a paltry million dollars so that they could go back to focusing on their research at Stanford University.

While each company took the time to review Larry Page’s and Sergey Brin’s work, they passed on the opportunity of purchasing it. The companies were simply not that concerned with search quality; they figured their results were good enough and that they needed to differentiate themselves from the competition by other means.

When the two friends decided to pursue their search engine anyway, they made focusing on search quality their top priority. They even put it in their mission statement: “Organize the world’s information and make it universally accessible and useful.”

Google has stayed true to this commitment more than a decade later. It still dedicates tremendous amounts of resources figuring out how to deliver the best possible results.

However, when a search engine is dealing with millions of searches each second –15 percent (or 500 million queries per day) of which it has never encountered before- how does it make sure that it is delivering the best possible experience? The answer is that it uses a combination of automatic and manual review processes.

Because automatic review processes (such as PageRank, Panda, and Penguin) have been discussed so often, this piece will focus on the more manual ways that Google gathers data to judge search result quality.

Millions of Everyday Users

In the interest of improving its algorithm, Google performed more than 20,000 experiments on its search results in 2010. Much of the data from these experiments comes from placing a percentage of users into specific buckets.

If you use Google, you are likely unwittingly helping the company improve their results as part of one of these experiments right now. Based on the outcome of these experiments, Google makes more than 500 changes to their algorithms each year with each one typically impacting a small percentage of search engine results.

Experiments might be as simple as split testing their website layout or as complicated as altering millions of search engine rankings in an attempt to improve their quality.

Website Statistics

The primary way that Google gains data about its search results is through basic usage statistics.

seo-search-quality

For instance, if you happen to look for a dentist using Google and you click on the first result, Google can indirectly measure your reaction to the website by seeing whether you return back to the search engine results page (SERP) and click on another link.

It is clear you didn’t find what you were looking for if you have to return to the search engine to click on another link. Google’s goal is to provide you with what you were looking for in the least amount of time possible.

Google knows what types of usage patterns to expect based on query intent (navigational, informational, or transactional) and aggregates data on each link’s performance.

Statistics they collect include bounce rate and time spent on site. Even though these statistics are indirect, they tend to be incredibly accurate at predicting if a user has liked the content they clicked on.

Another way that Google can gather search result quality data is by looking at page load speeds from people who have installed their Toolbar. You can see the data Google has collected from users about your site’s speed in Google Analytics & Webmaster Tools. While Google also tests website speeds when its bot is crawling the web, the Toolbar provides the company with more information about real world conditions.

Social Media Metrics

Google started gathering social media data quite a few years ago. Since Google does not have fire hose access to Twitter and Facebook, they have been hesitant to trust social signals, but these signals are becoming an ever-growing part of its algorithm.

In the past, Google relied almost exclusively on backlinks to determine how to rank sites. The problem with this approach is that it only lets webmasters decide whether your site is high quality.

Social media signals are the democratization of search engine quality measurements. Basically, social media has the potential to let everyone vote on whether a page should be ranking in Google. If you share a website on your social media profiles, it likely means that you viewed the site favorably.

Social media data will play a huge role in the future of search and will be heavily drawn upon by search engines to determine how people are reacting to content online.

Surveys & Questionnaires

While indirect data from its users is great, Google still has a difficult time understanding why users don’t like certain pages. It has plenty of data, but it needs to transform it into information through analysis and evaluation to make sense of it all. As a result, it has started to take more direct feedback from its users as well.

As a search engine optimizer, I conduct a fair amount of searches each day (so much so that I often have to type in a CAPTCHA just to use the service). The following images are from my real world experiences.

Google-Improve

After searching Google for “local search engine optimization”, they presented me with the box in the right hand corner. The box asked me to rate two different pages for relevance.

google-search-quality-result 1

One of the pages that Google was looking for information on was a Search Engine Watch article.

google-search-quality-result 2

The other page Google wanted input on was for a local search engine optimization company.

After visiting the two pages it’s fairly obvious what Google was looking for with its questionnaire. Since the query I typed into Google (“local search engine optimization”) is ambiguous, Google wasn’t sure what type of content to show.

They were checking to see if people were looking for articles on local search engine optimization (an informational query), or if they were looking for a company to hire (a transactional query).

I voted for the Search Engine Watch article, as the majority of people must have, because Google has recently begun integrating News results into the query. Now rather than showing local SEO companies near the top of the SERP, Google is showing more articles instead.

A few weeks later I performed a Google search for “great science fiction novels” to check on the types of queries for which it was using its new slider.

experimental-google

After performing the search, I noticed a feedback button that I hadn’t seen on any of the other sliders. When I clicked on it, the following buttons showed up:

crowd-sourced-experiment-googleGoogle was looking for input on the types of books that should be in the slider. If enough people feel that a certain book should not be in the slider, Google likely removes it.

In fact, if you look at the results right now, you will notice that several of the books seen above no longer appear on the first slide.

Another survey tool that I have seen Google use is the one below:
search-quality-survey

The survey came in the form of a chat box in the right hand side of the screen and asked me to judge the page of results as a whole.

Google Search Quality Raters

Since around 2005, Google has used an army of hired search quality raters to ensure that its results are up to par.

Because Google still relies mostly on indirect measurements of quality, it uses search quality raters to look at poor performing pages and determine why users didn’t like them.

This feedback mostly makes its way back to Google’s engineering department so that it can develop more experiments. This feedback doesn’t usually have a direct impact on your site’s rankings. You can see the exact guidelines Google gives its search engine quality raters here.

Webmaster Tools Spam Reports

SERP-Spam-Report

Another manual method that Google relies upon to ensure quality search results is through itsWebmaster Tools spam reporting interface.

Spam reports come from a variety of sources including victims of scams, disgruntled SEOs, and copyright owners.

Google employees go through the reports and can manually penalize sites that they come across.

Conclusion

As you can see, Google is serious about the quality of its search results. Google is a living, breathing organism fed by enormous amounts of data.

The company is in constant flux, exploring ways to improve the quality of its search results. As long as search engine optimizers continue to outsmart Google’s algorithms (which will happen for the foreseeable future), Google will keep using manual data gathering techniques to enhance their algorithm and penalize manipulators.

What other tactics have you seen Google use to gather more manual forms of search quality data?

 

 

Read More