Google indexing search results on my website

Joined
Apr 23, 2019
Messages
214
Likes
168
Degree
1
I was messing around in GSC and noticed a warning.

It said "indexed but blocked by robot.txt".
That sounds right because I added Disallow: /?s= and Disallow: /search/ in my robots.txt,
to make sure it doesn't get crawled.

GSC was talking about /?s={search_term_string} .
I'm pretty sure the search results should be blocked right? Otherwise, you will spend your whole crawl budget on that.

Now I'm thinking I need to do something else as well, to let google know to stay away.

Or do you guys just let them crawl the searches?
 
No, you do NOT want to block search results inside of your robots.txt.

I know you're talking about Wordpress from your other thread. Wordpress automatically adds <meta name="robots" content="noindex, follow" /> into the <head> of your search results. This meta tag tells Google not to index those search pages.

What's happening to you is that you've now told Google not to crawl the search results at all. You've disallowed it completely, meaning that Googlebot never encounters that meta tag that tells them not to index those pages.

So the conundrum is they've discovered those pages but can't view them to know they aren't supposed to index them, so they index them without any content. You'll see notice in the SERPs like "this page has no description due to robots.txt" and the title tag will end up being the anchor text used by whatever site that linked to the search results.

The end result is Index Bloat. I wrote about this in some threads here:
I talked about it all around the forum actually because I had a technical SEO problem with one of my main websites for years. It took a while to realize it was happening, and then about the same amount of time to figure out what the problem was and then fix it.

If you don't get this fixed, the end result ends up being a Panda penalty. I'm 99% sure Google assigns sites a sitewide quality score based on whatever is indexed, and if they're indexing blank pages (blank because they can't see the content due to being disallowed), then that tanks your quality score and you get hit by Panda.

In your case, all you need to do is remove the Disallow: lines from robots.txt related to search. You want Google to crawl them so they see the "noindex" meta tag. No, it won't eat up your crawl budget. They'll rarely encounter a search page unless some spammer starts linking to them on your site.
 
Back