Why Every SEO Audit Should Start With a Crawl Budget Analysis

With complex websites, the starting points for search engine optimization are very diverse. So where should an SEO audit begin? As is so often the case, optimization is a problem of scarce resources. The scarce resource in our case is the time budget of the Googlebot, the so-called “crawl budget“. Therefore, in our opinion, every SEO audit should begin with the analysis of the Googlebot’s behavior in the last weeks and months.

How do I start the crawl budget analysis?

The log files from the FTP server are required for this. Depending on the number of daily website visitors and URLs, these log files can also be larger than 1GB. Usually the data of the last month should be downloaded and processed quickly. A log file analyser (e.g. Screaming Frog Log File Analyser) is required to analyse the data. In this software the log files can be imported and evaluated.

What insights does the crawl budget analysis provide?

In general, one would like Google to crawl the important pages that should appear high up in the search results as often as possible. It is therefore necessary to find places where the Googlebot “wastes” its crawl budget.

  • Over time, it becomes clear that the frequency of visits to GoogleBots is steadily decreasing. This is an alarm signal and means that Google tends to classify the page as less important.
  • If, for example, product URLs are crawled more often than higher-level overview pages, then the GoogleBot has not (yet) correctly understood the desired prioritization. A corresponding internal linking concept can then help.
  • Dead URLs with 404 errors (page can no longer be found) can be located and redirected.
  • You get an overview of all set redirects and possibly incorrect redirect codes (e.g. 302 instead of 301).
  • Under certain circumstances, pages that are crawled very often but have no relevance for indexing in the search results should be excluded from crawling.
  • Uncover empty directories and URL levels that should be excluded from crawling.
  • Are important landing pages that have been optimized for top rankings crawled often enough?
  • are all scripts and files that are crawled in addition to the actual content pages really indispensable or can the code be thinned out accordingly to save crawl budget?

 

What tools do we have to control crawling behavior?

1  Exclude certain directories and paths via robots.txt. The function of robots.txt can also be checked in the Google Search Console.

2  Exclude individual pages with the meta-robot “noIndex” tag. For larger page structures, a structured index/noIndex concept is required.

Internal linking concept – pages that are to be viewed higher up must be linked more frequently internally.

4 Vary header “last-modified” and server code 304 (no update) – The page will be crawled for a short time, but the bot does not use as much time, because it does not download the whole page, but moves on after reading the header.

5 Clean up the CMS backend and delete outdated pages

Set redirects correctly (301 – Permanently Moved)

7  Clean and minimize code

All in all, this analysis is indeed a very suitable starting point for an SEO audit, as it reveals many starting points and provides input for almost all known technical and structural topics of search engine optimization.

Autor

Thomas Kaußner