Block Link and Rank Checkers

The Kloser · Mar 30, 2016

Is there a quick way to use robots.txt file or anything else to block ahrefs, moz, majestic and bunch of KW rank checker tools from crawling a site?

mrodriguez27 · Mar 30, 2016

Why would you want to block KW rank checker tools? They won't even visit your site, only Google.

rockzz · Mar 30, 2016

Hey one of my friend shared about this how to block spy bots...Lets try this...

Use Robots.txt file with this - http://pastebin.com/dnkEeeEk

and

edit your htaccess file with this - http://pastebin.com/wGwHLUcZ

That's it and one more thing hide your tier1 links with 301 redirection.

The Kloser · Mar 30, 2016

Perfect!

Saiten · Mar 30, 2016

Disclaimer: I'm not very knowledgeable about server / .htaccess configuration so the following info might be wrong.

Some servers have mod_rewrite (the solution above) disabled, to make sure you can check with botsimulator.com if the user agent gets blocked (I found this out the hard way...).

If it's disabled you can try it with other modules (depending on if they are enabled) in your .htaccess file:

Code:

BrowserMatchNoCase BOTNAMEHERE bad_bot
BrowserMatchNoCase BOTNAMEHERE bad_bot

Order Deny,Allow
Deny from env=bad_bot

or

Code:

BrowserMatchNoCase BOTNAMEHERE bad_bot
BrowserMatchNoCase BOTNAMEHERE bad_bot

Order Deny,Allow
Deny from env=bad_bot

CCarter · Mar 30, 2016

rockzz said:
Use Robots.txt file with this - http://pastebin.com/dnkEeeEk

and

edit your htaccess file with this - http://pastebin.com/wGwHLUcZ

LOL, OMG, Anything redirect to webmd.com is from my ancient ancient thread at WF about this. This is an OLD version of this file. There are a ton of new bots on the scene too.

I will say this, you won't get past crawlers that don't play fair and use human user-agents instead of telling you they are "BLEXBOT". Ahrefs, SEMRush, and the big guys all respect the robots.txt (SOMEWHAT), and you can even block the wget guys with htaccess, but if someone is smart enough they'll just spoof who they say they are and pretend to be an updated browser.

The Kloser · Mar 30, 2016

I'm only concerned about the big guys right now - the bulk of the users.

@CCarter Will you get us an updated list of bots when you have a chance? Gracias!

Saiten · Mar 30, 2016

Saiten said:

Sorry, the second code was supposed to be

Code:

SetEnvIfNoCase User-Agent "BOTNAMEHERE" block_bot
SetEnvIfNoCase User-Agent "BOTNAMEHERE" block_bot

Order Allow,Deny
Allow from all
Deny from env=block_bot

Here are a few of the newer backlink crawler bots you might want to block:

JamesBOT (http://cognitiveseo.com)
SEOkicks-Robot (https://www.seokicks.de)
SearchmetricsBot (http://searchmetrics.com)
LinkpadBot (http://www.linkpad.ru/)
spbot (http://www.openlinkprofiler.org)

SMoKeR · Mar 31, 2016

@The Kloser As someone noted here above me, some rewrite rules will not work with certain hosts due to different hosting setup, so it's extremely important that you test each and every site individually.

You could do that with the User Agent Extension:
for firefox: https://addons.mozilla.org/en-US/firefox/addon/user-agent-switcher/
for chrome: https://chrome.google.com/webstore/detail/user-agent-switcher-for-c/djflhoibgkdhkhhcedjiklpkjnoahfmg

If you don't want to mess with .htaccess and codes you could use a plugin to do the job.

There are free plugins out there and then there are paid stuff like Spyder Spanker.
For a free plugin you could use Link Privacy.

@CCarter Not all link crawlers obey robots.txt, I've had cases where ahrefs crawled a site even though there was a specific rule that disallowed it.

CCarter · Mar 31, 2016

SMoKeR said:
@CCarter Not all link crawlers obey robots.txt, I've had cases where ahrefs crawled a site even though there was a specific rule that disallowed it.

That's why I stated "SOMEWHAT"

CCarter said:
Ahrefs, SEMRush, and the big guys all respect the robots.txt (SOMEWHAT)

I was in a slack group where I confronted the Ahrefs CEO about the fact they were using a "East European" country's ISP to piggy back into a Private Blog Network which was specifically disabling Ahrefs in the robots.txt AND within .htaccess file. They were cloaking their user-agent and location, and indexing these sites which were specifically hidden FROM THEM. The CEO was denying it, but evidence was evidence, hard to disprove when you are seeing the logs, the blocking files, and then Ahrefs indexing these pages which clearly was against the robots.txt rules.

turbin3 · Mar 31, 2016

Creating specific exclusions in robots.txt, .htaccess, nginx.conf, etc. is a fun exercise, and still a decent best practice. That being said, people will probably get better long term results as well as more scalable results in developing systems/methods to monitor traffic behavioral characteristics, and simply blocking traffic that exhibits certain characteristics. Of course, once you reach that point and begin to think about, "How would I scrape my site???", you might then come to realize there is NEVER a foolproof method. There are a lot of creative ways to build bots, that most would never be able to detect.

SMoKeR · Mar 31, 2016

CCarter said:
That's why I stated "SOMEWHAT"

I know, I just wanted to sharpen the fact that there are cases where they don't.
TBH I don't even use robots.txt directives as I think it leaves a footprint and it makes it pretty obvious that you're trying to hide.

CCarter said:
I was in a slack group where I confronted the Ahrefs CEO about the fact they were using a "East European" country's ISP to piggy back into a Private Blog Network which was specifically disabling Ahrefs in the robots.txt AND within .htaccess file. They were cloaking their user-agent and location, and indexing these sites which were specifically hidden FROM THEM. The CEO was denying it, but evidence was evidence, hard to disprove when you are seeing the logs, the blocking files, and then Ahrefs indexing these pages which clearly was against the robots.txt rules.

Glad you brought that up, it just shows how ineffective it would be to rely on just the robots file as these crawlers often disregard it.

Darth · Jul 31, 2017

Anyone got an up to date list of these crawlers/bots?

Found it here: https://www.buildersociety.com/threads/block-unwanted-bots-on-apache-nginx-constantly-updated.1898/

648 x 260

Block Link and Rank Checkers

The Kloser

mrodriguez27

rockzz

The Kloser

Saiten

CCarter

Final Boss ®

The Kloser

Saiten

SMoKeR

CCarter

Final Boss ®

turbin3

SMoKeR

Darth