Block Unwanted Bots on Apache & Nginx (Constantly Updated)

CCarter

If they cease to believe in u, do u even exist?
Staff member
BuSo Pro
Boot Camp
Digital Strategist
Joined
Sep 15, 2014
Messages
2,177
Likes
4,969
Degree
6
Last update: Feb 22, 2019 @ 11:23 PM EST

Instead of people going to my MoneyOverEthics.com site for the latest robots, I decided to just create a running/updating version here. I recommend using these browser add-ons to check that you are indeed blocking the bots you want to block: Browser User-Agent Changer Add-ons.

SideNote: Don't just blindly add these files without going through them and double checking to make sure you aren't blocking bots you want crawling your site; for example these files block 'curl', 'python', 'perl', and even 'SEMRush' - if you utilize bots or services which these files block then you will not only be blocking your competition from using those services to query your domains but also yourself. I block EVERYTHING!

--

Robots.txt version
First is the top level robots.txt, the "good" bots "should" respect these:

Code:
User-agent: AhrefsBot
User-agent: adbeat_bot
User-agent: Alexibot
User-agent: AppEngine
User-agent: Aqua_Products
User-agent: archive.org_bot
User-agent: archive
User-agent: asterias
User-agent: b2w/0.1
User-agent: BackDoorBot/1.0
User-agent: BecomeBot
User-agent: BlekkoBot
User-agent: Blexbot
User-agent: BlowFish/1.0
User-agent: Bookmark search tool
User-agent: BotALot
User-agent: BuiltBotTough
User-agent: Bullseye/1.0
User-agent: BunnySlippers
User-agent: CCBot
User-agent: CheeseBot
User-agent: CherryPicker
User-agent: CherryPickerElite/1.0
User-agent: CherryPickerSE/1.0
User-agent: chroot
User-agent: Copernic
User-agent: CopyRightCheck
User-agent: cosmos
User-agent: Crescent
User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0
User-agent: DittoSpyder
User-agent: dotbot
User-agent: dumbot
User-agent: EmailCollector
User-agent: EmailSiphon
User-agent: EmailWolf
User-agent: Enterprise_Search
User-agent: Enterprise_Search/1.0
User-agent: EroCrawler
User-agent: es
User-agent: exabot
User-agent: ExtractorPro
User-agent: FairAd Client
User-agent: Flaming AttackBot
User-agent: Foobot
User-agent: Gaisbot
User-agent: GetRight/4.2
User-agent: gigabot
User-agent: grub
User-agent: grub-client
User-agent: Go-http-client
User-agent: Harvest/1.5
User-agent: Hatena Antenna
User-agent: hloader
User-agent: http://www.SearchEngineWorld.com bot
User-agent: http://www.WebmasterWorld.com bot
User-agent: httplib
User-agent: humanlinks
User-agent: ia_archiver
User-agent: ia_archiver/1.6
User-agent: InfoNaviRobot
User-agent: Iron33/1.0.2
User-agent: JamesBOT
User-agent: JennyBot
User-agent: Jetbot
User-agent: Jetbot/1.0
User-agent: Jorgee
User-agent: Kenjin Spider
User-agent: Keyword Density/0.9
User-agent: larbin
User-agent: LexiBot
User-agent: libWeb/clsHTTP
User-agent: LinkextractorPro
User-agent: LinkpadBot
User-agent: LinkScan/8.1a Unix
User-agent: LinkWalker
User-agent: LNSpiderguy
User-agent: looksmart
User-agent: lwp-trivial
User-agent: lwp-trivial/1.34
User-agent: Mata Hari
User-agent: Megalodon
User-agent: Microsoft URL Control
User-agent: Microsoft URL Control - 5.01.4511
User-agent: Microsoft URL Control - 6.00.8169
User-agent: MIIxpc
User-agent: MIIxpc/4.2
User-agent: Mister PiX
User-agent: MJ12bot
User-agent: moget
User-agent: moget/2.1
User-agent: mozilla
User-agent: Mozilla
User-agent: mozilla/3
User-agent: mozilla/4
User-agent: Mozilla/4.0 (compatible; BullsEye; Windows 95)
User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows 2000)
User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows 95)
User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows 98)
User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows NT)
User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows XP)
User-agent: mozilla/5
User-agent: MSIECrawler
User-agent: naver
User-agent: NerdyBot
User-agent: NetAnts
User-agent: NetMechanic
User-agent: NICErsPRO
User-agent: Nutch
User-agent: Offline Explorer
User-agent: Openbot
User-agent: Openfind
User-agent: Openfind data gathere
User-agent: Oracle Ultra Search
User-agent: PerMan
User-agent: ProPowerBot/2.14
User-agent: ProWebWalker
User-agent: psbot
User-agent: Python-urllib
User-agent: QueryN Metasearch
User-agent: Radiation Retriever 1.1
User-agent: RepoMonkey
User-agent: RepoMonkey Bait & Tackle/v1.01
User-agent: RMA
User-agent: rogerbot
User-agent: scooter
User-agent: Screaming Frog SEO Spider
User-agent: searchpreview
User-agent: SEMrushBot
User-agent: SemrushBot
User-agent: SemrushBot-SA
User-agent: SEOkicks-Robot
User-agent: SiteSnagger
User-agent: sootle
User-agent: SpankBot
User-agent: spanner
User-agent: spbot
User-agent: Stanford
User-agent: Stanford Comp Sci
User-agent: Stanford CompClub
User-agent: Stanford CompSciClub
User-agent: Stanford Spiderboys
User-agent: SurveyBot
User-agent: SurveyBot_IgnoreIP
User-agent: suzuran
User-agent: Szukacz/1.4
User-agent: Szukacz/1.4
User-agent: Teleport
User-agent: TeleportPro
User-agent: Telesoft
User-agent: Teoma
User-agent: The Intraformant
User-agent: TheNomad
User-agent: toCrawl/UrlDispatcher
User-agent: True_Robot
User-agent: True_Robot/1.0
User-agent: turingos
User-agent: Typhoeus
User-agent: URL Control
User-agent: URL_Spider_Pro
User-agent: URLy Warning
User-agent: VCI
User-agent: VCI WebViewer VCI WebViewer Win32
User-agent: Web Image Collector
User-agent: WebAuto
User-agent: WebBandit
User-agent: WebBandit/3.50
User-agent: WebCopier
User-agent: WebEnhancer
User-agent: WebmasterWorld Extractor
User-agent: WebmasterWorldForumBot
User-agent: WebSauger
User-agent: Website Quester
User-agent: Webster Pro
User-agent: WebStripper
User-agent: WebVac
User-agent: WebZip
User-agent: WebZip/4.0
User-agent: Wget
User-agent: Wget/1.5.3
User-agent: Wget/1.6
User-agent: WWW-Collector-E
User-agent: Xenu's
User-agent: Xenu's Link Sleuth 1.1c
User-agent: Zeus
User-agent: Zeus 32297 Webster Pro V2.9 Win32
User-agent: Zeus Link Scout
Disallow: /
Note: Some people have a notion that this will create footprints, for who I'm still unaware, and I can never seem to get a straight answer on who is checking and comparing these robots.txt files across billions of domains and till this date I haven't found any software that bothers to do this so... ¯\_(ツ)_/¯ .

I'd be more worried about Google being able to read your gmail when you email back and forth your PBN providers or some customer of your PBN provider uses gmail and now Google can connect dots, Don't believe me? How are they able to have their A.I. email bots understand this reddit user's parents died several years ago while he was scrolling through old photos? [Sauce: Google's AI offered this man sympathy for his parent's death . If you can't put 1 and 1 together ¯\_(ツ)_/¯

--

.htaccess version (Apache)
Classic Apache version:

Code:
RewriteEngine On
RewriteCond %{REQUEST_URI} !/robots.txt$
RewriteCond %{HTTP_USER_AGENT} ^$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^.*EventMachine.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*NerdyBot.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Typhoeus.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*archive.org_bot.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*archive.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*adbeat_bot.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*github.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*chroot.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Jorgee.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Go\ 1.1\ package.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Go-http-client.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Copyscape.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*semrushbot.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*semrushbot-sa.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*JamesBOT.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*SEOkicks-Robot.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*LinkpadBot.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*getty.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*picscout.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*AppEngine.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Zend_Http_Client.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*BlackWidow.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*openlink.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*spbot.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Nutch.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Jetbot.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*WebVac.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Stanford.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*scooter.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*naver.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*dumbot.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Hatena\ Antenna.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*grub.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*looksmart.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*WebZip.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*larbin.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*b2w/0.1.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Copernic.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*psbot.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Python-urllib.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*NetMechanic.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*URL_Spider_Pro.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*CherryPicker.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*ExtractorPro.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*CopyRightCheck.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Crescent.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*CCBot.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*SiteSnagger.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*ProWebWalker.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*CheeseBot.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*LNSpiderguy.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*EmailCollector.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*EmailSiphon.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*WebBandit.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*EmailWolf.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*ia_archiver.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Alexibot.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Teleport.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*MIIxpc.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Telesoft.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Website\ Quester.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*moget.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*WebStripper.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*WebSauger.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*WebCopier.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*NetAnts.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Mister\ PiX.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*WebAuto.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*TheNomad.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*WWW-Collector-E.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*RMA.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*libWeb/clsHTTP.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*asterias.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*httplib.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*turingos.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*spanner.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Harvest.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*InfoNaviRobot.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Bullseye.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*WebBandit.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*NICErsPRO.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Microsoft\ URL\ Control.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*DittoSpyder.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Foobot.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*WebmasterWorldForumBot.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*SpankBot.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*BotALot.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*lwp-trivial.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*WebmasterWorld.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*BunnySlippers.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*URLy\ Warning.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*LinkWalker.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*cosmos.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*hloader.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*humanlinks.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*LinkextractorPro.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Offline\ Explorer.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Mata\ Hari.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*LexiBot.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Web\ Image\ Collector.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*woobot.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*The\ Intraformant.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*True_Robot.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*BlowFish.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*SearchEngineWorld.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*JennyBot.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*MIIxpc.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*BuiltBotTough.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*ProPowerBot.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*BackDoorBot.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*toCrawl/UrlDispatcher.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*WebEnhancer.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*suzuran.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*WebViewer.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*VCI.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Szukacz.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*QueryN.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Openfind.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Openbot.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Webster.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*EroCrawler.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*LinkScan.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Keyword.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Kenjin.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Iron33.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Bookmark\ search\ tool.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*GetRight.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*FairAd\ Client.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Gaisbot.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Aqua_Products.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Radiation\ Retriever\ 1.1.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Flaming\ AttackBot.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Oracle\ Ultra\ Search.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*MSIECrawler.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*PerMan.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*searchpreview.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*sootle.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Enterprise_Search.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Bot\ mailto:craftbot@yahoo.com.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*ChinaClaw.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Custo.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*DISCo.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Download\ Demon.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*eCatch.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*EirGrabber.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*EmailSiphon.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*EmailWolf.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Express\ WebPictures.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*ExtractorPro.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*EyeNetIE.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*FlashGet.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*GetRight.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*GetWeb!.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Go!Zilla.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Go-Ahead-Got-It.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*GrabNet.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Grafula.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*HMView.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*HTTrack.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Image\ Stripper.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Image\ Sucker.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Indy\ Library.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*InterGET.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Internet\ Ninja.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*JetCar.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*JOC\ Web\ Spider.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*larbin.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*LeechFTP.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Mass\ Downloader.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*MIDown\ tool.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Mister\ PiX.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Navroad.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*NearSite.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*NetAnts.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*NetSpider.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Net\ Vampire.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*NetZIP.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Octopus.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Offline\ Explorer.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Offline\ Navigator.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*PageGrabber.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Papa\ Foto.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*pavuk.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*pcBrowser.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*RealDownload.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*ReGet.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*SiteSnagger.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*SmartDownload.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*SuperBot.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*SuperHTTP.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Surfbot.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*tAkeOut.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Teleport\ Pro.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*VoidEYE.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Web\ Image\ Collector.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Web\ Sucker.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*WebAuto.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*WebCopier.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*WebFetch.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*WebGo\ IS.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*WebLeacher.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*WebReaper.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*WebSauger.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*wesee.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Website\ eXtractor.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Website\ Quester.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*WebStripper.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*WebWhacker.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*WebZIP.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Wget.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Widow.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*WWWOFFLE.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Xaldon\ WebSpider.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Zeus.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Semrush.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*BecomeBot.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Screaming.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Screaming\ FrogSEO.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*SEO.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*AhrefsBot.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*MJ12bot.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*rogerbot.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*exabot.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Xenu.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*dotbot.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*gigabot.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Twengabot.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*htmlparser.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*libwww.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Python.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*perl.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*urllib.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*scan.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Curl.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*email.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*PycURL.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*Pyth.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*PyQ.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*WebCollector.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*WebCopy.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*webcraw.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*webcraw.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*SurveyBot.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*SurveyBot_IgnoreIP.*$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*BlekkoBot.*$ [NC]
RewriteRule ^.*.* http://www.google.com/ [L]
Note: My code redirects to google.com, just so it renders a proper html page, what the bad bot does from there ¯\_(ツ)_/¯ LOL. In very old versions I had it redirecting to webmd.com (so anything with webmd.com originated form me), but that's cause I was in the health industry at the time.

--

nginx.conf version (NGINX)

Since NGINX is taking over, it's time to upgrade the bad bot file and convert it to nginx. I go into details on how to do this in the mastering NGINX guide. Here is the comprehensive version I talk about:

inside http block (nginx.conf)
Code:
http {
map $http_user_agent $limit_bots {
     default 0;
     ~*(adbeat_bot|ahrefsbot|alexibot|appengine|aqua_products|archive.org_bot|archive|asterias|attackbot|b2w|backdoorbot|becomebot|blackwidow|blekkobot) 1;
     ~*(blowfish|botalot|builtbottough|bullseye|bunnyslippers|ccbot|cheesebot|cherrypicker|chinaclaw|chroot|clshttp|collector) 1;
     ~*(control|copernic|copyrightcheck|copyscape|cosmos|craftbot|crescent|curl|custo|demon) 1;
     ~*(disco|dittospyder|dotbot|download|downloader|dumbot|ecatch|eirgrabber|email|emailcollector) 1;
     ~*(emailsiphon|emailwolf|enterprise_search|erocrawler|eventmachine|exabot|express|extractor|extractorpro|eyenetie) 1;
     ~*(fairad|flaming|flashget|foobot|foto|gaisbot|getright|getty|getweb!|gigabot) 1;
     ~*(github|go!zilla|go-ahead-got-it|go-http-client|grabnet|grafula|grub|hari|harvest|hatena|antenna|hloader) 1;
     ~*(hmview|htmlparser|httplib|httrack|humanlinks|ia_archiver|indy|infonavirobot|interget|intraformant) 1;
     ~*(iron33|jamesbot|jennybot|jetbot|jetcar|joc|jorgee|kenjin|keyword|larbin|leechftp) 1;
     ~*(lexibot|library|libweb|libwww|linkextractorpro|linkpadbot|linkscan|linkwalker|lnspiderguy|looksmart) 1;
     ~*(lwp-trivial|mass|mata|midown|miixpc|mister|mj12bot|moget|msiecrawler|naver) 1;
     ~*(navroad|nearsite|nerdybot|netants|netmechanic|netspider|netzip|nicerspro|ninja|nutch) 1;
     ~*(octopus|offline|openbot|openfind|openlink|pagegrabber|papa|pavuk|pcbrowser|perl) 1;
     ~*(perman|picscout|propowerbot|prowebwalker|psbot|pycurl|pyq|pyth|python) 1;
     ~*(python-urllib|queryn|quester|radiation|realdownload|reget|retriever|rma|rogerbot|scan|screaming|frog|seo) 1;
     ~*(scooter|searchengineworld|searchpreview|semrush|semrushbot|semrushbot-sa|seokicks-robot|sitesnagger|smartdownload|sootle) 1;
     ~*(spankbot|spanner|spbot|spider|stanford|stripper|sucker|superbot|superhttp|surfbot|surveybot) 1;
     ~*(suzuran|szukacz|takeout|teleport|telesoft|thenomad|tocrawl|tool|true_robot|turingos) 1;
     ~*(twengabot|typhoeus|url_spider_pro|urldispatcher|urllib|urly|vampire|vci|voideye|warning) 1;
     ~*(webauto|webbandit|webcollector|webcopier|webcopy|webcraw|webenhancer|webfetch|webgo|webleacher) 1;
     ~*(webmasterworld|webmasterworldforumbot|webpictures|webreaper|websauger|webspider|webster|webstripper|webvac|webviewer) 1;
     ~*(webwhacker|webzip|webzip|wesee|wget|widow|woobot|www-collector-e|wwwoffle|xenu) 1;
}
}
inside each server block (within your example.com file inside sites-available folder)
Code:
server {
location / {
#blocks blank user_agents
if ($http_user_agent = "") { return  301 $scheme://www.google.com/; }

  if ($limit_bots = 1) {
  return  301 $scheme://www.google.com/;
  }
}
}
--

I'll continue updating this post with the latest whenever I find new stuff to block.
 
Last edited:

patrick

BuSo Pro
Joined
Apr 14, 2015
Messages
7
Likes
2
Degree
0
much appreciated! do you prefer the robots.txt version or the server config method?
 

CCarter

If they cease to believe in u, do u even exist?
Staff member
BuSo Pro
Boot Camp
Digital Strategist
Joined
Sep 15, 2014
Messages
2,177
Likes
4,969
Degree
6
much appreciated! do you prefer the robots.txt version or the server config method?
Use both. Sometimes 3rd party tools like Ahrefs use different user-agents (*gasp* - yes they cloak) and if you simply block them in the server configuration they will technically still allow themselves to index your data since you didn't bother blocking them in the robots.txt file (which is the official way). If you block them in the robots.txt file and they STILL index your site there would be mass outrage.

So those people that skip the robots.txt file have found their PBNs exposed because these "SEO gurus" stated it's a footprint. Then the PBN owners complain and can't figure out the problem. It was because Ahrefs cloaks their user-agent at times, but they still respect the robots.txt as gospel - they sort of have too or there will be backlash. If you don't do both you'll defeat the whole purpose and soon find all your sites exposed.
 

Michael

BuSo Pro
Joined
Sep 4, 2016
Messages
313
Likes
204
Degree
1
I know ccarter doesn't from what I've seen around, but for the people disavowing links and blocking bots... do you just keep an eye on the search console incoming links instead of using ahrefs etc?
 

CCarter

If they cease to believe in u, do u even exist?
Staff member
BuSo Pro
Boot Camp
Digital Strategist
Joined
Sep 15, 2014
Messages
2,177
Likes
4,969
Degree
6
I know ccarter doesn't from what I've seen around, but for the people disavowing links and blocking bots... do you just keep an eye on the search console incoming links instead of using ahrefs etc?
You're confused about the technique. Blocking Ahrefs with these scripts would only block YOUR outbound links. It won't remove you from Ahrefs or the 3rd party tools. You would have to place the blocking scripts on the sites you don't want to show are outbound linking to you (PBNs come to mind, if you place the scripts on PBNs, their outbound links to you will not be reported since Ahrefs cannot crawl their website).

If some spammer links to your website in a negative SEO attempt you would still seem them linking to your website (unless they blocked bad bots on their site as well).
 

Michael

BuSo Pro
Joined
Sep 4, 2016
Messages
313
Likes
204
Degree
1
You're confused about the technique. Blocking Ahrefs with these scripts would only block YOUR outbound links. It won't remove you from Ahrefs or the 3rd party tools. You would have to place the blocking scripts on the sites you don't want to show are outbound linking to you (PBNs come to mind, if you place the scripts on PBNs, their outbound links to you will not be reported since Ahrefs cannot crawl their website).

If some spammer links to your website in a negative SEO attempt you would still seem them linking to your website (unless they blocked bad bots on their site as well).
Ah right, I was thinking the other way around. Cheers
 

contract

We're all gunna mine it brah.
Joined
Jun 2, 2015
Messages
312
Likes
386
Degree
2
Wanted to add to this thread: I've noticed a MASSIVE uptick in AWS Amazon Web Services bot spam lately. Apperantly it goes back a few years:

http://www.seo-theory.com/2012/03/14/amazon-web-services-is-slowly-crushing-the-independent-web/

http://serverfault.com/questions/653987/how-do-i-block-incoming-traffic-from-amazon-aws-ips

https://news.ycombinator.com/item?id=2712423

It makes all the other bots on my logs look like peanuts. It's something I completely missed.. Even the server admin from my hosting company missed it.

Each IP is using anywhere form ~1-3 GB per DAY. Over lots of ips. Over a month they chewed through ~4 TB of bandwidth. All of it was wasted.. Any they are doubling that # every month it seems. :surprised:

I'm still in the process of dealing with it, but had to share for now...
 

Nat

Joined
Nov 18, 2014
Messages
563
Likes
344
Degree
2
I have absolutely zero interest in website traffic from India. I'm not going to earn any money from it and there is no way they are going to engage/share/etc. Recently I've seen some weird stuff in Analytics from India/Australia/Middle East. Are there any potential downsides to blocking entire countries, and is there a "best way" to do this?
 
Joined
Nov 5, 2015
Messages
20
Likes
20
Degree
0
@doublethinker thanks that's a really handy tool to keep bookmarked. I've got client that wants out of these auditing bots and price scrappers, do they remove you when try to crawl you again or is it once they've got your data they'll show it.
 
Joined
Sep 3, 2015
Messages
305
Likes
122
Degree
1
No matter what I do I can't get nginx to restart correctly after inserting the second part of the code

Code:
server {
location / {
#blocks blank user_agents
if ($http_user_agent = "") { return  301 $scheme://www.google.com/; }

  if ($limit_bots = 1) {
  return  301 $scheme://www.google.com/;
  }
}
}
@CCarter although server is included in the code above, you don't mean to paste that line and the final } within each server block do you? ie creating a new server within each server block?
 

CCarter

If they cease to believe in u, do u even exist?
Staff member
BuSo Pro
Boot Camp
Digital Strategist
Joined
Sep 15, 2014
Messages
2,177
Likes
4,969
Degree
6
creating a new server within each server block?
No it's suppose to be in the main server block, so take out the pieces from the "location" to the end closing bracket of location (2nd to last bracket) and input it into your server block.
 
Joined
Sep 3, 2015
Messages
305
Likes
122
Degree
1
No it's suppose to be in the main server block, so take out the pieces from the "location" to the end closing bracket of location (2nd to last bracket) and input it into your server block.
Thanks, I was doing that already. Still no go. Weird. Robots.txt version it is

_____

lol, another question :smile:

Do the robots.txt rules above go before or after my exisiting:

Code:
User-agent: *
Disallow:
Is nothing simple these days!?
 

Frequencies

BuSo Pro
Joined
Feb 25, 2015
Messages
29
Likes
9
Degree
0
If you use auto renew Let's Encrypt ssl certificate. Then make sure that curl agent is not blocked.

If it is blocked, it can not be checked whether the certificate should be renewed. With an expired ssl certificate as result.