Everything you need to know about Crawl Budget

everything you need to know about a crawl budget

Understanding a crawl budget can help with SEO efforts on your website. A crawl budget is the number of pages that a Googlebot will crawl and index on a website within a certain amount of time. Without these crawl bots, Google can’t index your website which means that you won’t rank for anything. 

Most sites don’t have to worry about their crawl budget but there are some cases that may require you to learn how your crawl budget works. If you have an eCommerce website, then chances are your website has over 10,000 pages. If you have lots of redirects, you could also face your crawl budget being eaten up in the redirect chain. Another thing to be aware of is if you’ve added pages. You’ll want to ensure you have enough crawl budget to get them indexed swiftly. 

What is a Crawl Budget? 

A crawl budget is the number of pages that a bot is able to crawl and index in a certain time frame. Once, your crawl budget runs out, the crawler will stop accessing your site’s content and move onto other sites. 

A crawl budget is established by Google and the allocation depends on a number of factors including the size of the site. Larger sites will require bigger crawl budgets as they have more pages to cover. 

If you update content often, Google takes this into consideration and prioritises content that is being updated regularly. A site’s performance and load times can correlate with your crawl budget as well. It also takes into account your linking structure and how many dead links are on your site.

How Crawl Budget Works

Google is extremely well versed at crawling sites, so there isn’t too much to panic around if you have a normal-sized site. If you have less than a thousand URLs, then you are guaranteed that Google won’t encounter an issue crawling your site. 

To ensure that your site can cope with crawlers, Google also creates a limit on how often it can visit a site, depending on what it can physically sustain. The bot will push a site’s server to see how it responds and then will lower or raise the limit of the crawl budget depending on the response. 

This is achieved by crawl bots visiting your site on their own but they may also visit your site based on the instructions provided within your site map

Once you create your site map you can tell bots how often they should crawl certain sections or individual pages within your site. The industry standard is shown below.

  • Core content like your home page, service pages, contact us pages can be crawled once per month.
  • Blog/News Content can be crawled once per week.

This standard is set like this because your core content is not likely to change very often and therefore does not need to be crawled that often either. Your blog and news related content is what you create more regularly and therefore you can ask bots to crawl it more regularly so it can be indexed.

Once the bots visit your site they will crawl your content as normal and then follow the links to other pages on your website. This is how a search engine develops a deeper understanding of your site as a whole and is also why internal links are so important.

To see how often a google bot or any bot for that matter crawls your site you can view your servers log file. This shows you similar data to an analytics platform but it is less user friendly and can sometimes be hard to understand.

google crawl bots
Google crawl bots crawl bots and index them for users to access on SERPs, (Source: Haywood Beasley)

How to Optimise for Crawl Budget

Optimising your crawl budget ensures that Googlebot crawlers are indexing valuable content and pages that will prioritise your website when using SERPs. 

Optimise Pages

As mentioned before, your sitemap is a direct link between how a bot may wish to crawl your site. This gives you some control over the crawl process and helps you direct the bots towards valuable content on your site that you may wish to optimise as part of an SEO strategy

Updating Content

Search engines like sites that provide relevant content that is fresh and up to date for its users. You may be able to increase the allocated crawl budget given to your site by updating your content regularly and relevantly.

Doing this means search engines will need to crawl your site more often to provide more relevant search results within their index, but they will also attribute more valuable search queries to your site as well.

Fix Internal Link Problems 

As mentioned before, internal linking is very important for on-page SEO. When it comes to the crawl budget we need to make sure our internal links work properly to get the most out of our crawl budget once a bot arrives.

  • Broken Links – When links are broken they stop bots from crawling your website further and can hinder a search engines ability to properly understand your site. This can also affect the keywords your site can become visible for.
  • Redirect Chains – This is a series of redirections a bot needs to go through before it reaches the end URL. As a bot goes from link to link it is using up it’s crawl budget. This means there may not be much left to crawl the final URL once it has arrived. This can occur if you are constantly deleting content and redirecting it to other pages.
  • Link Loops – This is a similar issue to redirect chains but in this case the internal links within your content is linking back to itself from other articles and vice versa. 

This results in google bots getting stuck in the loop with nowhere to go which means they cannot crawl further into your site. This can happen accidentally but in some cases it was a black hat SEO tactic many years ago.

Avoid Having Orphan Pages

Orphan pages are pages on your site that are not connected to your site map or linked to in any way. Therefore they are very difficult for search engines to index because they struggle to find them on your site. There is no path or access route for them to follow. 

In some cases orphaned pages are ok to have depending on the circumstances but more often than not they are the result of a small mistake while publishing content on your site.

A good example of when it’s ok to have an orphan page is when you are holding a very brief one off promotion. In this case the URL is likely being shared through various marketing channels and a landing page is required for potential customers to sign up to your service or buy your product. It’s important to remember though that once the promotion is finished the page must be deleted. 

Keep Duplicate Content to a Minimum 

It’s simple. Search engines want to crawl the most unique and relevant content available. If your site has a large amount of duplicated content the search engine may not consider your site worthy of it’s crawl budget in future. This risks your valuable content not getting crawled enough in the future and as a result it can have a negative impact on your rankings and keyword visibility.

Creating valuable and unique content can ensure that Google continues to index your pages. 

Maintaining your site and keeping an eye out for spam content can also help Google bots index your content quicker and more efficiently as they avoid the sites that appear spammy.

indexing pages with Google Crawl Bots
Indexing pages is done through Google crawl bots that scan pages for valuable content.

Key Takeaways for your Crawl Budget

Understanding what a crawl budget is and how it can affect your website’s SEO is very important. It allows you to make sure your content is working as well as it can to provide your site with as much organic traffic as possible.

  • Improve your site’s speed to allow the Googlebot to crawl more of your site’s URLs. Not only are you improving user experience but you’re allowing the bot to crawl faster. 
  • Use internal links as Google prioritises pages that have external and internal links pointing to them. Internal links send bots to all the different pages on your site that you want it to crawl. 
  • Ensure your website has a flat website architecture as this sets up all your site’s pages to have link authority flowing to them. 
  • Google doesn’t like to waste resources by indexing multiple pages with the same content, so make sure that you limit duplicate content on your site. 
  • Lastly, Google struggles to find orphan pages, so make sure there is an internal or external link pointing to each page on your site. 
Share on facebook
Facebook
Share on twitter
Twitter
Share on linkedin
LinkedIn
Share on pinterest
Pinterest
Share on whatsapp
WhatsApp
Share on email
Email

Share

Share on facebook
Share on twitter
Share on linkedin
Share on whatsapp
Share on pinterest
Share on email

Table of Contents