Wasted Resources: What Crawling Means for Us Now & in the Future

GarrettGraff

Cui Bono?
Joined
Jun 12, 2018
Messages
61
Likes
42
Degree
0
As I'm sure some of you noticed last month, Yoast put out some research that was quite interesting - they waste a TON of server resources (money) by allowing bots such as googlebot, ahrefs, majestic, bing, etc etc to crawl their sites.

In more recent news, WPX has said they are (my opinion) going to stop allowing bots, other than googlebot, to crawl sites hosted by them: https://wpxhosting.com/knowledgebas...ing-block-bots-from-ahrefs-majestic-moz-etc-/

For hosting companies this could mean huge savings by blocking these crawlers. I see a couple of options:

1. They raise pricing on hosting to balance the increased cost of crawlers using resources, aka electricity.
2. They leave pricing the same on hosting/server rental/whatever and impose tariffs (lol) on these tools to allow them to crawl.


Thoughts?
 
Fortunately, WPX has already backpedaled due to customers being upset.

The choice to block crawlers should be ours and not the hosting companies.

I'm sure there are plenty of other ways hosting companies waste resources that they could fix.

The last thing they should do is raise prices. Servers are cheap and they're making a killing marking them up.
 
A $5 plan from Linode comes with 1 TB of data transfer month. Bandwidth is pretty cheap from the consumer side.

FROM WPX Hosting's side, yeah it makes sense to look into this and make the decision for their customers cause realistically anyone that needs Wordpress specific hosting isn't trying to do a lot of decision making and wants to pay other people to do the thinking for them.

We actually encourage and have an up-to-date bad robot blocking there here that has the robots.txt version as well as .htaccess (apache), and nginx conf version to help members block a lot of these link tools; but the reasoning is more so for putting obstacles in front of our competitors versus saving bandwidth.

I encourage everyone to block these research tools' bots, but realistically there is very little to stop them from cloaking their user-agent if they really want to. The worse that will happen is a segment of users will be outraged for a while, but will eventually forget or give up the fight. I doubt any customers of these tools is going to drop their subscriptions cause they get a bit "unethical".

I also have got data showing that Ahrefs has in the past... "done things", that might be frowned upon - they seemed to have stopped though.
 
Wouldn't be the first time AHREFs did something to serve their own agenda.

I guess I'm arrogant - we've always blocked bots in the past for clients, but I have a "catch me if you can" mentality on my own work.

Regardless of what we're paying for hosting, and yes you're spot on with WP's model regarding people paying for WP "specific" hosting, blocking these bots translates into cost savings. This also means huge sites that are implementing true cloud hosting may consider blocking as well, because that is $$$ compared to traditional hosting.

My thoughts, anyways!
 
FROM WPX Hosting's side, yeah it makes sense to look into this and make the decision for their customers cause realistically anyone that needs Wordpress specific hosting isn't trying to do a lot of decision making and wants to pay other people to do the thinking for them.

WPEngine is the perfect example of this. They even prohibit the use of select plugins and implement certain "enhancements" server-side or, at least, they contend to and people pay a premium for using WPEngine as compared to many other great hosting companies.
 
For a complete newb, WPEngine is pretty great. I had a client I did a speed optimization job for recently that was on it. They do things for you hands free that newbies would never be able to handle or even know exists. And the staging server setup they have is god-tier.

Still, I don't think anyone who needs that kind of service is going to care or even know about other bots existing. They're likely not worried about back-engineering link profiles or hiding their PBNs.

I think I agree with the sentiment that we should block bots, not the hosts.

The idea of hiding information from Google is futile. Their surveillance is ubiquitous. This is why 99% of their offerings are free. They don't need to get you in their net when they have all of your users already.

As far as restricting link database crawlers, sure, I could see that, but at the same time I'd leave certain ones because I want access to the information about my own site, and to get good information you have to leave the best crawlers, and the best ones are the ones your competitors are using to look at you too. It's a pickle, but not one worth worrying about unless you're working with PBNs or doing massive guest posting campaigns I guess.

But in terms of bandwidth, you should be getting enough real visitors that all non-Google bots should be a drop in the bucket, not to mention bandwidth is so cheap its not worth the hassle.
 
Back