New Open Source Robots.txt Projects

September 21, 2020

Last year, Google released its robots.txt parser and matcher to the open source world. Since then, people have used it to build new tools, contribute to the open source library, and release new language versions (like golang and rust).

With the intern season ending at Google, they wanted to highlight two new releases related to robots.txt that were made possible by two interns working on the Search Open Sourcing team: Andreea Dutulescu and Ian Dolzhanskii.

First, they are releasing a testing framework for robots.txt parser developers, created by Andreea. The project provides a testing tool that can validate whether a robots.txt parser follows the Robots Exclusion Protocol, or to what extent. Currently there is no official and thorough way to assess the correctness of a parser, so Andreea built a tool that can be used to create parsers that are following the protocol.

Google has released a Java port of its popular C++ robots.txt parser. The parser is a 1-to-1 translation of the C++ parser in terms of functions and behavior, and it has been thoroughly tested for parity against a large corpora of robots.txt rules. Teams are already planning to use the Java robots.txt parser in Google production systems, and the company welcomes your contributions to these projects.

It was our genuine pleasure to host Andreea and Ian, and we're sad that their internship is ending. Their contributions help make the Internet a better place and we hope that we can welcome them back to Google in the future.

Googlebot will soon support HTTP/2
Starting in November 2020, Google will start crawling some sites over HTTP/2. This is the next major revision of HTTP, which is more robust, efficient, and faster than its predecessor. This change wil...
Read More
New support for retailer shipping data on
Starting today, Google supports shippingDetails markup as an alternative way for retailers to be eligible for shipping details in Google Search results. This change comes after Google allow...
Read More