The Why, How And What The Heck?! Of robots.txt Files For SEO
It’s an understatement to say that SEO is a multi-discipline area of digital where creativity meets technical knowledge. While many SEOs love the human element, here’s no getting away from how those clever technical tricks can be the secret to better results for your clients.
Robots.txt files are just one example of something that could be overlooked, but can be really worthwhile in boosting your search analytics. But why? Sam Gipson [samgipson.com] clued us in during this year’s BrightonSEO talks. Here’s what we learned.
What is a Robots.txt file?
Simply put, this file tells search engines what pages to crawl and what not to crawl. First introduced in 1994, it’s remained relevant to search while other technologies have fallen out of favour.
Why is it important for you?
We could talk about the wizardry of it, but the why is much more important. First and foremost, Robots.txt files help prevent duplicate content. That means better search results for your site, making sure the pages you want to be seen can be prioritised. Not only that, your site speed is improved. Research shows good site speed can be the difference between a user staying or leaving your site.
While it’s not actually an official internet standard to have a robots.txt file, in July 2019 Google announced that they are working towards making it so. It’s always worth getting ahead of those must-haves, ensuring you’re not scrambling to update at the last minute.
After revising the REP (robots exclusion protocol), Google announced that they don’t support incorrect rules for robots.txt. They’ve begun treating unsuccessful requests or incomplete data as a server error. The REP is ‘a group of web standards that regulate how robots crawl the web, access and index content, and serve that content up to users.’ [https://moz.com/learn/seo/robotstxt]
The Technical Stuff
So what does it look like to implement one of these files? Let’s go through the tech spec below. Or just fire this blog in the direction of your SEO expert! (Not got one? We’re here to talk)
Robots file are made up of 4 elements.
- Field (user-agent)
- Value (*)
- Directive/Rule (allow/disallow)
- Path (/checkout/)
These together all made up something called a group. You can also group agents together, which looks like this:
- user-agent: googlebot
- user-agent: bingbot
- Disallow: /checkout/
- Disallow: /*?delivery_type
The ‘user-agent’ is a way of specifying the specific crawler, and the field name is case insensitive (user-agent). For those who’re unfamiliar, a crawler is a software program that search engines use to scan content uploaded to the web. It’s the clever little bot that allows your website to appear on Google searches (and Bing and the rest too).
Example:
user-agent: googlebot
The ‘value’ (*) is a wildcard that represents any sequence of characters. Again, the user-agent <value> is case insensitive
Example:
User-agent: *
Disallow:
This would allow all web crawlers access to all content.
The ‘directive/rule’ are meta tags which can be used to instruct the crawler on what to do. For example, this is where you can tell the crawler to ‘noindex, nofollow’ (meaning your content won’t be indexed or crawled by the search engine. Directives are case insensitive but word errors aren't accepted.
Example:

The ‘path’ allows you to specify the URL pathway. Don’t worry, the order of directives doesn't matter for most bots. As ever, the URL path is case sensitive. Invalid path format (where to find the file), should always start with a '/', can't miss '/' or * to begin the path.
Example:
user-agent: bingbot
Disallow: /checkout/
So should you or shouldn’t you? We think it’s a definite yes - there’s everything to gain from implementing this relatively simple file as standard. There’s no reason not to do something that Google’s about to expect from you, after all.
Not sure where to start with improving your technical SEO? We want to help. Get in touch today and find out how we can help maximise your whole digital marketing approach.