Scraping googles search engine ranking page

Joined
Feb 6, 2015
Messages
45
Likes
14
Degree
0
Hello guys,

how can someone scrap for, let´s say 10000 keywords the SERPs, without getting blocked?

I tried a simple java program, then emulating a browser using selenium and furthermore, I have now created a spreadsheet which parses the serps via a spreasheet function.

However, google blocks after 40 requests everything else.

How to scrape the serps from google?

I appreciate your replies!
 
Not trying to sound obvious here but you have 2 options,

either you add in proxy rotation or you have your script wait the allotted amount of seconds between requests.

Its all down to preference but i prefer to user phantom.js / casper as my headless browser rather than selenium
 
Death by Captcha (http://www.deathbycaptcha.com/user/login) to get around those pesky captchas, and thousands of IPs or thousands of proxies OR a combination of all 3. I know "someone" who has 300K scraper IPs and counting to gain data... you can imagine the cost at $1/month an IP.

Best of luck to you.
 
I know "someone" who has 300K scraper IPs and counting to gain data... you can imagine the cost at $1/month an IP.

If they were needing to use that many proxys they would probably be using home connections as proxy's. web server proxys would cost them an awful lot of money.

However probably best not do discuss that here :smile:
 
If they were needing to use that many proxys they would probably be using home connections as proxy's. web server proxys would cost them an awful lot of money.

However probably best not do discuss that here :smile:

Those are actual IPs, not proxies.
 
Oh yeah, this is a problem I'm very familiar with. Like guys before me have already said, you need tons of private proxies and from the country you need to results from. Most people go with those public proxy scrapers but you just can't get enough working proxies from one particular country to do any meaningful scraping. We used to have few hundred private proxies with quite a cheap price from this one company but they got hacked and I'm having hard time to recommend them because how they handled the whole event.

I suggest you immediately forget trying to purchase IP blocks yourself unless you're really going to make this a long-term investment. Unless you happen to know right people, getting large amount of IP blocks is pretty hard because usage of the IPs need to be justified and "scraping Google" doesn't fly very far.

Another option is to go with cloud based scrapers like AuthorityLabs or AWRcloud but of course it's not very cost efficient with large amount of keywords.

I know this is very annoying problem to solve, good luck :smile:
 
Don't bother with Tor, it's really rare you find IP that hasn't been blocked already by Google. HMA is otherwise good but those are used by many other people as well so you'll run into blocked IPs quite often. We use to run with public proxies but it takes a bit of developer effort to make your scraper smart enough to deal with all of those blocked IPs, proxies, changing Google html templates etc. in a regular basis so that you don't have to babysit it all the time.
 
Don't bother with Tor, it's really rare you find IP that hasn't been blocked already by Google. HMA is otherwise good but those are used by many other people as well so you'll run into blocked IPs quite often. We use to run with public proxies but it takes a bit of developer effort to make your scraper smart enough to deal with all of those blocked IPs, proxies, changing Google html templates etc. in a regular basis so that you don't have to babysit it all the time.

Which other public proxies did you also use?
 
@CCarter

The reason for this is, that I want to get the serps(20 top pages) for around 10000 Keywords. The I will calculate certain metrics about a specific market. So basically it`s a one time situation at the moment.
 
Personally, I'd look into outsourcing this, considering it's a one time thing or even a "once in a while" thing. The investment required to just do this once would be more than handing it off to someone with the infrastructure already in place.
 
So you want 20 top pages for each keyword = meaning top 200 (20 pages) for each term x 10000 keywords = 200,000 success pulls? or actually 2 x 10,000 = 20K successful - if you ask Google for 100 results at a time. What metrics are you look, at www.SERPWoo.com we can do numbers like that easily, but don't pull past page 3, but can probably setup a special one-time scenario... for the right price.
 
@SmokeTree
Good idea! However, we are currently in the developing phase of our model and want to do as much for ourself as possible, cause we do not know weather our market research model will change or not...

@CCarter Thx! Will talk with my partner about that!
 
Which other public proxies did you also use?
The ones that are posted on various public websites and forums. Though with that you have to have separate infra/process to constantly check if the proxies are alive, what's the speed, are they anonymous and so on. Not worth it in my opinion.
 
Just use pro rank tracker api.

Do it your selfing on this is just a big headache.

Your re-inventing the wheel in a niche that's already ready pretty darn price competitive. There no point in being the 10th person to invent the wheel. Its cheaper to just let someone who's solved the technical headaches do it.
 
Last edited:
This also depends on the speed you need.
If you can mix it up KW wise and can insert random pauses ... google normally doesn't give a shit.

::emp::
 
Your re-inventing the wheel in a niche that's already ready pretty darn price competitive. There no point in being the 10th person to invent the wheel. Its cheaper to just let someone who's solved the technical headaches do it.
Not really if you are after a LOTS of keywords. All the solutions available are priced towards 1000s of keywords and not 10000s or 100000s. Like with ProRankTracker you've to pay $499/m for 30000 keywords. That's $0.017 per keyword.

If you would buy IPs for $2 a piece, which is high for this amount, you would get 850 IPs so even if you do only one request per IP per every 5 minutes just to make sure you won't get banned, that's 244800 keywords per day, which is $0.008 per keyword.

Of course there's a lot more in the service offerings like nice UIs, reports etc. but I'm just saying that there's a lot of "innovation" left to be done when it comes to monitoring in bulk :smile:
 
You really don't value your time do you.

Good luck.
 
Not really if you are after a LOTS of keywords. All the solutions available are priced towards 1000s of keywords and not 10000s or 100000s. Like with ProRankTracker you've to pay $499/m for 30000 keywords. That's $0.017 per keyword.

If you would buy IPs for $2 a piece, which is high for this amount, you would get 850 IPs so even if you do only one request per IP per every 5 minutes just to make sure you won't get banned, that's 244800 keywords per day, which is $0.008 per keyword.

Of course there's a lot more in the service offerings like nice UIs, reports etc. but I'm just saying that there's a lot of "innovation" left to be done when it comes to monitoring in bulk :smile:

You're wasting a whole not of time for a one-time pull. Setting up scripts, systems, captcha crackers, and a lot of nonsense to save money on a one-time pull - All that would make sense if you were creating a service or something else that you need on a routine basis, but for one time, get someone that has the API, do a deal, get your data, and move on.

Your calculations are erroneous since you didn't factor in programming time to build all this out - all you calculated was IP costs at $0.008. Unless you value your time as $0, you need to adjust for the programming time to build all this out whether it is you or a developer you are hiring.

And if it's you that's programming it, then you need to figure out what "lost opportunities costs" (hours you spent doing something else instead of using those hours to generate revenue - yes that's a real accounting term) are involved in wasting 20 hours at $75 an hour for example on a new system to do all this ($1500). Even 10 hours of programming for this at $75 = $750 of loss opportunity plus your 850 IPs at $2 a piece = $1700.

So this whole project is going to cost you between $2450 to $3200... That's what @secretagentdad means by not valuing your time, cause apparently @Anton @ Uberaff your time is calculated at free ($0) in your model...

Just contact someone that's already has an API built out and pay them $400 or whatever the negotiated price is and move on... a whole lot cheaper and faster without having to worry about proxies, IPs, captcha, and bad programming.
 
You're wasting a whole not of time for a one-time pull. Setting up scripts, systems, captcha crackers, and a lot of nonsense to save money on a one-time pull - All that would make sense if you were creating a service or something else that you need on a routine basis, but for one time, get someone that has the API, do a deal, get your data, and move on.

Your calculations are erroneous since you didn't factor in programming time to build all this out - all you calculated was IP costs at $0.008. Unless you value your time as $0, you need to adjust for the programming time to build all this out whether it is you or a developer you are hiring.

And if it's you that's programming it, then you need to figure out what "lost opportunities costs" (hours you spent doing something else instead of using those hours to generate revenue - yes that's a real accounting term) are involved in wasting 20 hours at $75 an hour for example on a new system to do all this ($1500). Even 10 hours of programming for this at $75 = $750 of loss opportunity plus your 850 IPs at $2 a piece = $1700.

So this whole project is going to cost you between $2450 to $3200... That's what @secretagentdad means by not valuing your time, cause apparently @Anton @ Uberaff your time is calculated at free ($0) in your model...

Just contact someone that's already has an API built out and pay them $400 or whatever the negotiated price is and move on... a whole lot cheaper and faster without having to worry about proxies, IPs, captcha, and bad programming.
I said pretty much everything you just said earlier in this thread so yes I agree and I don't recommend something like this for one-time pull. But for the post above I was talking in a general context that you would need to scrape something like 100000 keywords often which is the use case for Pro Rank Tracker @secretagentdad was recommending.

I don't tend to include valuation of my time in calculations like the one above because that's a variable that changes for every person. I trust that the person who reads the post is capable of judging themselves the total cost of such project and proceed accordingly. And honestly I would rather worry about the cost of hunting down 850 private proxies from various IP blocks than development costs.
 
This thread is ridiculous and makes me angry.
Please stop tagging me in it.
P.S- Go give Ccarter monies and get it over with. Stop trying to napkin math your way through a rather mature industry.
 
Last edited:
This thread is ridiculous and makes me angry.
Please stop tagging me in it.
P.S- Go give Ccarter monies and get it over with. Stop trying to napkin math your way through a rather mature industry.
Wish granted, didn't tag you.

Not sure what part of this thread is ridiculous because we all seem to agree that building your own tool is a waste of time if you have access to existing API that is affordable and can handle your volume. By all means go and give monnies to SERPWoo - based on what I've seen it looks like a totally awesome tool and there's certainly some features I hope our in-house tools would have.
 
Back