Indexing Issues - Spam Pages & Image URLs Combined

Joined
Sep 28, 2023
Messages
4
Likes
5
Degree
0
I got hit by the HCU update and I've been trying to diagnose issues on my site.

Digging into Search Console I've noticed a couple of indexing issues.

Chinese/Japanese Internal Spam

My site has been hit with the internal Asian spam attack, where they just search for long spammy results through your websites search function.

In my Search Console indexing report, my site has 45K Excluded by 'noindex' tag errors, of which the vast majority are all things like:

Code:
https://mysite.com/?xe=dendv&s=สล็อตเว็บตรง ฝาก-ถอน true wallet ไม่มีขั้นต่ํา(~PG99.Asia~),สล็อตเว็บตรง ฝาก-ถอน true wallet ไม่มีขั้นต่ํา(~PG99.Asia~),สล็อตเว็บตรง ฝาก-ถอน true wallet ไม่มีขั้นต่ําxe8

And, about 35K of similar results showing up under my Crawled - currently not indexed report.

Apparently, this attack is common. And, all of the advice I've read says it's no big deal if they're not index.

But Google keeps trying to crawl them and a few of them have even made it into the Indexed report.

Has anyone else dealth with this? WTF do you do when Google indexes stuff you tell them not to index?

Weird Image Indexing - Combining Different Images from Different Sites

This one is kind of weird. Google is trying to index image URLs on my website that combine my image URL with two different image URLs from COMPLETELY different sites.

It looks like this:

Code:
https://mysite.com/wp-content/uploads/2019/10/example-of-my-imge.jpgΩΩΩhttps://completelydifferentsite1.com/wp-content/uploads/2019/10/a-totally-different-image-url.jpgΩΩΩhttps://completelydifferentsite2.com/wp-content/uploads/2019/10/yet-another-totally-different-image-url.jpg

I had been using Nitro CDN and the only thing I could think was that they were somehow combining images for their different users. But I don't think that could be it because some of the URLs that were being combined with my images were from Amazon and I doubt they're using Nitro CDN.

Anyone else have this? The easiest way to check is to go to Settings in Search Console, open your crawl stats report, and look under Crawl resuests: Image.
 
Has anyone else dealth with this? WTF do you do when Google indexes stuff you tell them not to index?
You could take that URL parameter ?xe= or anything that has a common and identifiable pattern, and make it show a 410 error when anyone tries to load the page. This 410 HTTP response means "this is permanently gone on purpose, it's not just a whoopsy 404 error". That's about as clear as you can get in terms of communicating this to Google.

How did you tell them not to index it?

Google is trying to index image URLs on my website that combine my image URL with two different image URLs from COMPLETELY different sites.
This could be image scraper sites screwing up with their automated programs, and Google picking it up there, in which case it's out of your hands.

What does "trying to index" mean? Do you mean they crawled it?
 
Hi Joe, I have a similar problem, I was wondering if you found a solution in the end?
 
You could take that URL parameter ?xe= or anything that has a common and identifiable pattern, and make it show a 410 error when anyone tries to load the page. This 410 HTTP response means "this is permanently gone on purpose, it's not just a whoopsy 404 error". That's about as clear as you can get in terms of communicating this to Google.

How did you tell them not to index it?


This could be image scraper sites screwing up with their automated programs, and Google picking it up there, in which case it's out of your hands.

What does "trying to index" mean? Do you mean they crawled it?
This is an excellent idea - gonna steal it myself.
 
Back