The number of pages that search engines will crawl on a website in a certain amount of time is called its “crawl budget.”
Search engines figure out how much crawl budget they have based on how often they can crawl a site without causing problems and how often they want to crawl a site.
If you waste crawl budget, search engines won’t be able to crawl your site as well, which will hurt your SEO.
What is the meaning of crawl budget?
The number of pages that search engines will crawl on a website within a given period of time is referred to as the crawl budget.
Why do search engines assign websites crawl budget?
Because they don’t have an infinite amount of resources, and because they have to split their attention among millions of different websites. Therefore, they require a method to prioritize the crawling effort that they put forth. They are able to do this with the assistance of crawl budgets that are allocated to each website.
How is crawl budget allocated to websites?
This is determined by the crawl limit and crawl demand:
- Crawl limit / host load: how many crawls can a website accommodate, and what do its owners want?
- Crawl demand / crawl scheduling: which URLs are most worth (re)crawling based on their popularity and update frequency.
In SEO, crawl budget is a common term. Sometimes crawl budget is also known as crawl space or crawl time.
Is crawl budget limited to pages?
In reality, we are talking about any document that is indexed by search engines, but for the sake of simplicity, we will refer to this as “pages.” Files written in JavaScript and CSS, mobile page variants, and PDF documents are a few examples of other types of documents.
How exactly does crawl limit / host load function in practice?
The crawl limit, or host load, is a key part of the crawl budget. Search engine crawlers are made to avoid sending too many requests to a web server, so they are careful about this.How do search engines figure out how much of a website they can crawl? There are many things that can change the crawl limit. Among others:
- How often URLs time out or give server errors is a sign that the platform is in bad shape.
- How many websites are running on the host: If your website is on a shared hosting platform with hundreds of other websites and it’s pretty big, the crawl limit for your website is very low because it’s set at the host level. You have to share the crawl limit of the host with all the other sites that are running on it. In this case, you’d be much better off on a dedicated server, which will probably also cut load times for your visitors by a huge amount.
Another thing to think about is running separate sites for mobile and desktop on the same host. They both have a crawl limit. So remember this.
How does crawl demand/scheduling function in practice?
Crawl demand, also known as crawl scheduling, is about figuring out if it’s worth it to crawl URLs again. Again, there are many things that affect crawl demand, such as:
- Popularity is how many internal and external links a URL has, as well as how many search terms it ranks for.
- Freshness refers to how often the URL is changed.
- Type of page: How likely is it that the type of page will change? Take a product category page and a terms and conditions page as two examples. Which do you think changes the most and should be crawled more often?