What’s in Google’s Search Data Leak?

Posted on in Blog

Despite the rapid rise of AI-powered alternatives like ChatGPT, Claude and others, Google’s search algorithm remains the search marketer’s most coveted and secretive data. Recently, a massive data leak gave the world an unprecedented look into Google’s black box. The documents show how Google allegedly organizes, ranks, and serves search results and gives industry insiders new opportunities to improve results.

A Google API Doc Leak Broke the Internet

The leak itself was relatively straightforward. According to Search Engine Land, a man named Erfan Azimi acquired the document via a Google API. Azimi handed over his findings to SparkToro’s Rand Fishkin, one of the most well-known personalities in the SEO industry. Fishkin enlisted the help of iPullRank’s Michael King to distribute the story.

@oneupweb

What does Google’s search data leak mean for marketers? The massive leak shows how Google allegedly organizes, ranks, and serves search results and gives industry insiders new opportunities to improve results. The information raises important questions on what attributes make a difference for websites, but it doesn’t conflict with what we already knew: ✅ Create informative, accurate pages and follow Google’s best practices for positive results. ✅Share clear authorship information that’s highly visible to search engines. ✅Don’t neglect older content, and instead continually create new content to fill topic gaps. 🕵️Do you have your own insights on the leak? Theories to share?💬 🔖You read more of our takeaways in our blog and get weekly updates on all things marketing ✨join our newsletter!✨ #SEO #GoogleLeak #DigitalMarketing #SearchAlgorithm #SEOInsights

♬ original sound – oneupweb

What’s Inside Google’s Leaked Documents?

The document is dozens of pages long and includes information about Google’s 14,000+ ranking evaluation factors, as well as the search engine results page (SERP). Much of it focuses on “attributes,” Google’s term for ranking factors based on websites and individual web pages. The document gets technical, but there are a few key takeaways the Oneupweb team has settled on so far. We expect additional insights to bubble up once our team and the larger industry has more time to unpack (and test!) the data’s potential.

Related: The Big, Bad List of SEO Terms You Need to Know

Google Data Leak 2024: Oneupweb’s Interpretation

Context is everything. This isn’t the first time a Google leak has poured information into cyberspace, and while data on Google searches is certainly valuable, it’s important to know what it means – and what it doesn’t.

The context:

  • It’s contemporary. The attributes mentioned in the document were in use as recently as March 2024. We think it’s highly unlikely that Google has made substantial changes in the few months since then.
  • It’s ambiguous. While we now know there are more than 14,000 ranking factors, we don’t know how each factor is weighted. Certain factors likely have a larger influence, while some may only apply to certain types of websites based on industry, size, and/or geographic location. We don’t know how all the pieces fit together.
  • With that in mind, these are a few key takeaways marketers should know.

1. Clicks Matter.

At Oneupweb, one of our SEO department’s mantras is “quality always wins.” Providing users with useful, interesting and up-to-date information delivers results. And it appears Google agrees even more than they’d previously publicized, as they use click quality to gauge positive user experiences. The document mentions using click-based attributes to evaluate what it calls a “successful click”:

  • badClicks, which are typically accidental, spam, or uninterested visitors.
  • goodClicks, which are clicks that result in on-site engagement.
  • lastLongestClick, which appears to mean the most recent goodClick, but Google doesn’t clearly define it in the document.
  • unsquashedClick, which is a canceled or interrupted click that is likely a user navigating back to the SERP before the page loads. Again, Google doesn’t specifically define the term, but most experts believe this is an accurate definition.

2. Authority Matters

Google stores authorship information to evaluate the quality and authenticity of content created by specific individuals and brands. Interestingly, Google ended the rel=author attribute a decade ago, but variations of authorship live on in various “entities” attributes found in the leaked documentation.

In a similar vein, Google also apparently uses an attribute called “siteAuthority” as a ranking factor. Google has repeatedly denied using a specific domain authority ranking factor. However, they seem to.

3. Size Matters.

This one was a little surprising. Google maintains an attribute to categorize small or personal sites, though it’s unclear how this impacts keyword rankings or what qualifies a site for this category. Dubbed “smallPersonalSite”, the term could be based on the number of URLs, organic traffic volume or some other characteristic.

4. Freshness Matters.

SEOs have long understood that creating new content and refreshing existing pages sends positive signals to Google. Based on Google’s search document, we see that three date-based attributes are likely used to measure freshness:

  • bylineDate (the date written on the page)
  • syntacticDate (the URL’s publish date)
  • semanticDate (the date on-page content was last updated)

What Does the Data Leak Mean for Marketers?

Google acknowledged but refused to comment on the leak, which means it’s unlikely we’ll get additional information or context about what this all means. Instead, industry insiders (including our spirited SEO department) will spend the next several months poking, prodding and testing Google’s algorithm to see what can be gamed and what can’t.

The leaked information raises important questions and theories on what attributes make a difference for websites. But crucially, it doesn’t conflict with what we already knew:

  • Creating informative, accurate pages and following Google’s best practices will deliver positive results.
  • It’s a good idea to share clear authorship information that’s highly visible to search engines. (Non-orphaned author pages and schema markup could both help with this!)
  • We shouldn’t neglect older content, and we should continually create new content to fill topic gaps. 

The document also damages Google’s trustworthiness. Search ranking is likely impacted by dozens of attributes that Google has specifically said it either doesn’t measure or doesn’t use. As Google wades through the aftermath of a massive antitrust lawsuit and faces more privacy challenges ahead, this leak is not exactly a good look.

Ready for Whatever Google Does Next

The revelations of Google’s leaked document are just one of many roiling uncertainties ahead. Oneupweb is committed to tracking new challenges and opportunities in search, including landmark shifts in artificial intelligence, smart assistants and more. Stay ahead of the curve; join our newsletter for weekly updates on all things marketing, and get in touch to learn more about what our marketing agency can do for you.

Up Next

One of the most underrated and undervalued marketing strategies is repurposing content across channels. If your domain is robust with blog content, some of your most creative ideas are already packaged up and ready for distribution in email marketing, social media, and paid media. Don’t reinvent the wheel. Get more out of every piece of content....

Read More