What does your Robots.txt look like?

Joined
Mar 15, 2018
Messages
53
Likes
10
Degree
0
Since I was facing a suspiciously high CPU load on one of my sites I gave it a try blocked out everything except the Google bot. CPU load dropped by almost 60% right away.

I was wondering what's best practice for a Wordpress robots.txt to look like. What are you guys blocking from crawling your sites?
 
@F5K7, did you notice this thread pinned to the top of DevOps?
Block Unwanted Bots on Apache & Nginx (Constantly Updated)

An alternative idea to what you've done would be to rate limit all bots except Google. You can add a delay like this: crawl-delay: 5 where 5 would be 5 full seconds.

I'm not sure all the bots are going to respect that initiative, but it's something to consider if you still want data from Ahrefs, Majestic, Alexa, or other services like that.

I had a lot of crap in my robots.txt that I ended up trimming way back. Nowadays I disallow all crawls to /cgi-bin/ and /wp-admin in the case of Wordpress.

Otherwise I have a Sitemap: https://... line. Google, Bing, Yahoo, etc. all use that. I use it even though I also submit my sitemap to Search Console, but I don't mess with Bing or Yahoo's consoles (if Yahoo even has one) so I want them to be able to find the sitemap.
 
Actually no, I didn't see that thread and it didn't show up when I searched either. But that helps, I copied big parts of it.

An alternative idea to what you've done would be to rate limit all bots except Google. You can add a delay like this: crawl-delay: 5 where 5 would be 5 full seconds.

That's pretty interesting. Am I right to assume that, even tho the list in the thread you linked gets updated, it still only contains the big stuff and there are lots of bots that will still be allowed to crawl, which probably still makes up a big part of the bot traffic a website would receive? In that case just whitelisting the big search engines and delaying every other bot would be more effective, right?

I'm not sure all the bots are going to respect that initiative

Any way to block those other bots and is it worth it for the average website?
 
Back