Googlebot and the 15 MB file size limit

June 28, 2022

On Tuesday, June 28, 2022, Google released an update to their documentation about Googlebot, which clarified that Googlebot can only "see" the first 15 megabytes of certain file types when fetching them. This limit has been in place for many years, but was only recently added to the documentation in order to be helpful for those debugging. It should be noted that this limit only applies to the initial request made by Googlebot - not the referenced resources within the page (e.g. if an HTML page references a JavaScript file, Googlebot will still be able to see and fetch that JavaScript file).

Most likely, the new 15 MB limit for Googlebot won't have much of an impact since very few pages on the internet are bigger in size. However, if you do happen to own an HTML page that's over 15 MB, you could try moving some inline scripts and CSS to external files.

The content after the first 15 MB is dropped by Googlebot and only the first 15 MB gets forwarded to indexing. This applies to fetches made by Googlebot (Googlebot Smartphone and Googlebot Desktop) when fetching file types supported by Google Search.

No. Googlebot fetches videos and images that are referenced in the HTML with a URL (for example, <img src="https://example.com/images/puppy.jpg" alt="cute puppy looking very disappointed" /> separately with consecutive fetches.

Yes. Using data URIs will contribute to the HTML file size since they are in the HTML file.

There are a number of ways, but the easiest is probably using your own browser and its Developer Tools. Load the page as you normally would, then launch the Developer Tools and switch to the Network tab. Reload the page, and you should see all the requests your browser had to make to render the page. The top request is what you're looking for, with the byte size of the page in the Size column.

For example, in the Chrome Developer Tools might look something like this, with 150 kB in the size column:

If you want to check how much data Googlebot is downloading when it crawls your site, you can use the Network tab in Chrome Developer Tools or cURL from a command line. To use cURL, type in the following code:

curl
-A "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36"
-so /dev/null https://example.com/puppies.html -w '%{size_download}'

Replace "https://example.com/puppies.html" with the URL of the page you want to check. If you have more questions about this process, you can find more information on Twitter and in the Search Central Forums. You can also leave feedback on the documentation pages themselves if you need more clarification about something.

Updating Item Classification to Simplify Search Console Reports
Google is simplifying the way it classifies pages, items, and issues in Search Console reports. The goal is to help users focus on critical issues that affect their visibility in Search, and to better...
Read More
Announcing the new Search Console Video Index Report
The Video indexing report is a new report from Google Search Console that helps webmasters understand the performance of their videos on Google and identify possible areas of improvement. The report s...
Read More