The No-Footprint Way to Protect Your Assets

Hopdoddy

Skeptical
Joined
Jun 12, 2015
Messages
3
Likes
9
Degree
0
Since our first posts have to be made in the orientation room and I'm not trying to resurrect a thread from the past, I figured I'd go ahead and share something with you guys I whipped up last week.

If you work in PBNs, you probably know the value of blocking backlink crawlers from your websites. Why would you share that information with the enemy when you can track that information yourself using proprietary or free tools?

There are tools that can help you do this, but I prefer to use a combination of the robots.txt file and .htaccess files in order to get the job done. The problem there though is, if you're using the same plugin, or if the bots you're blocking in your files are in the same order each time, you're leaving a pretty decently sized footprint.

So what I whipped up for my assistants is a spreadsheet that will allow you to randomize the order of the user agents or bots that you want to block.

Download Your Bot / UA Randomizers Here

How to Use the Google Sheet Version
  1. Visit the link above and make a copy in your own account.
    1. (If you would prefer to keep a copy local instead of in Google sheets, please skip this section).
  2. Highlight cell B2 to work with the Robots.txt blocking.
    1. You should see that there's actually an equation here. This equation is going to randomize that list in column B each time that you modify it.
  3. Copy this cell's contents (B2).
  4. Paste the cell contents back into the same cell (B2)
  5. Your list in column B should randomize the order.
  6. Copy all of coulmn B and it should paste out into any other document just fine.
  7. Repeat for column D if you also like to block the bots at the server level via .htaccess.
  8. Upload that shit to your site!

How to Use the Local Excel Version
  1. Download the spreadsheet above.
    1. Sheet 1 is the Robots.txt list, and Sheet 2 is the .htaccess list.
  2. Select any of the numbers in column two (Randomizer)
  3. Sort the column, A -> Z or Z -> A, it does not matter.
  4. Your first column should randomize the order they're in.
  5. Copy all of column A and it should paste out into any other document just fine.
  6. Flip to sheet 2 for htaccess
  7. Upload that shit to your site!

Here is the standard robots.txt for WordPress sites in template form:
Code:
User-agent: *
Disallow: /wp-admin/
Disallow: /wp-includes/

[Paste in randomized robots list]
Disallow: /

Here is my standard .htaccess for WordPress sites in template form:
Code:
# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>

# END WordPress

RewriteEngine on
# Abuse Agent Blocking
[Insert Random htaccess bots]
RewriteRule ^.* - [F,L]
# Abuse bot blocking rule end

I hope this helps some of you protect your assets better. Happy grinding everyone.

Cheers!

cheers_law_and_order.gif
 
Nice. Blocking crawlers has been on my to-do-list for quite some time and I really need to set aside some time to get it done... this has served as a timely reminder :wink:
 
Randomizing the order of their listing in robots.txt seems like it could just as easily be made alphabetical and cross-referenced. It's another layer of protection but one that would take very little computational time to see through.
 
Back