Massive influx of links

So today I found I had a bunch of odd URLs in GSC showing as linking to my site, but most of the pages were blank when visited in a normal browser.

Neat trick to test. Change the user agent to Googlebot and you might be able to see the real page with complete source code.

Chrome Dev Tools > Network Conditions > User Agent
Select "Custom"
Add this as the user agent:
Code:
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Googlebot/2.1; +http://www.google.com/bot.html) Chrome/W.X.Y.Z Safari/537.36


At first I was thinking "why put in the effort to add these sites, there's nothing there" but then when the real page was exposed I could see someone was targeting all my top pages. Cool trick to hide it from everyone except Google.

The other fun thing about seeing the source code of those pages is that I was able to find the publish dates of those pages. Guess what I noticed when checking organic traffic to the pages linked from the shit sites? hm... Trying to not jump to conclusions about what caused the drops in traffic, but it's kind of hard to look at anything else.

These pages pop up targeting only my important pages (were high ranking), no other real competitors on the page. Core updates happen, pages continue to drop. We make content changes. Core update and page traffic continues to drop.

Updated Disavow (40% complete maybe?) on the way. Adding time into my calendar to keep updating. We'll see what happens.
 
Considering that it's not relevant to your niche, my checklist would be:
  • Is the poker site ranking well in Google? If it's dead, probably disavow
  • Are the anchors exact match, strong partial match, or targetting images on your site? If so, probably disavow
  • Is the DTOX Score (if you have LRT) >500? If so, probably disavow
  • But if the anchors are natural, the site's traffic is steady/growing, and the site's backlink profile seems good? Probably keep
Thanks for the checklist - I followed your exact steps and decided it wasn't worth keeping the links.

The pages were dead, and the site was already declining, and not getting much traffic. I already cancelled LRT so couldn't check the DTOX. But with the other details, I was happy to cull the links.
 
Received some sage advice from @Darth that I added to my clean-up and filtering process that I outlined earlier in the thread. Figured I would share it here in case anyone is lurking and copying the process... if so, you do not want to skip this step.

Basically, since we're dealing with such large sets of data and it can be easy for desirable domains/urls to slip through the cracks, pull the quality/high authority sites that you want to keep from SEO tools (SEM & ahrefs) and use those as the foundation for a "clean domain" list. I also added a list of all paid backlinks and then earned backlinks that I knew I needed to keep.

I'm using Sheets to manage all of this. So I simply added conditions to highlight any duplicates in my disavow list. Anytime one of the clean domains was accidentally added to the list of domains I was going to disavow, I was alerted and I could then go and make the necessary changes.

I used a similar approach for the URL list, though I first had to pull the root domain for comparison to my clean list. This was pretty quick since I just used a series of REGEXEXTRACT to pull the root, quickly filtered, removed any duplicates, and compared the resulting list to the clean domains. If any of my clean domains appeared, I manually went through and removed the related URLs.

This ended up being an EXTREMELY important step as I would have accidentally disavowed ~7k in high DR paid backlinks from the last year (facepalm) along with a few high DR earned backlinks that I obviously wanted to keep. Maybe I'm the only person that would have made these mistakes... maybe not.

If you're following this process and there's any chance you might miss some high DR domains, I highly recommend following Darth's advice and creating a "clean domain" list to save any headaches later. But, don't forget to remove the clean domains before disavowing.

I'm now in the waiting game... going to keep pushing through on internal anchors in the meantime.
 
Do any of you more technical guys have experience with blocking hotlinking?

I'm running a combination of png, jpeg, and webp images on my site. I have a massive amount of spam sites hotlinking my images. I can block the hotlinking of my png and jpeg images through my CDN but not webp. So, I'm considering blocking through .htaccess.

If I take the .htaccess approach, I'm worried about blocking images unintentionally to good sites. I'm guessing I'll need a whitelist of domains for this purpose, e.g. Google, Bing, Yahoo, Facebook, X, etc.

If this .htaccess approach makes sense, does anyone have a suggested whitelist or know of any resources that provide a whitelist of domains for this purpose?
 
You're using screaming frog to track the number of duplicate internal anchor texts linked to a post or just sheets?
 
You're using screaming frog to track the number of duplicate internal anchor texts linked to a post or just sheets?
Not sure who this was directed towards?

But on my end, I'm not using SF yet...

I know for a fact I used exact match anchors throughout the entire site. So, there's no need to crawl. I'm going through each article one by one. But yeah, if I was only worried about duplicates and a few exact matches I would either use SF or other plugins if the site is WP.

I use sheets as my off-site CMS so I added a few columns that allow me to track the anchors being used. This also allows me to highlight any time I use a duplicate anchor across the entire website.

Also, I have a ton of semi-programmatic content, I say semi because there is still quite a bit of variation. But the anchors used throughout this content are basically identical. That means I have hundreds of matching exact match anchors that I need to change and change with sufficient variation. To do this, I'm asking Chat-GPT to spit out large sets of synonyms and alternative phrases that I can mix and match accordingly. If I didn't have this, I'd be sitting around trying to come up with new anchors for hours. I'm sure this isn't best practice but it's what I'm using to try and scale this project as fast as possible.

Last point... if PowerThesauraus doesn't go out of business by the end of 2025 I'll be shocked. Chat-GPT is so much easier, faster, and a better experience overall. Great tool for this purpose if you're not already using it.
 
Colinkri isn't an indexer. It's a crawler. It's the only thing I'll use on my own URLs/good links.

Everything else is a fast track to getting punted out of the index.

Spam the spam still works the best if you can provide the necessary volume.
 
Not sure who this was directed towards?

To the thread in general. I was wondering how people keep track of their internal anchors.

Also, I have a ton of semi-programmatic content, I say semi because there is still quite a bit of variation. But the anchors used throughout this content are basically identical. That means I have hundreds of matching exact match anchors that I need to change and change with sufficient variation. To do this, I'm asking Chat-GPT to spit out large sets of synonyms and alternative phrases that I can mix and match accordingly. If I didn't have this, I'd be sitting around trying to come up with new anchors for hours. I'm sure this isn't best practice but it's what I'm using to try and scale this project as fast as possible.

Cool AI usage. I'll try it out. I can put the list in a table, cross out the ones already added, and note the ones to use next. Yup, prob ran into similar with this plugin. I ditched it just in case too many duplicates look like spamming anchors to a bot or would in the future. Then more so after that influx thread. Ryu internal links quickly and manually, so I thought I'd try that way along with socials. Social... is another thing (cries a river).

Good going. Thanks, Smith!
 
Jesus Christ - can't stress enough you gotta use ALL link sources to find as many potential links as possible.

My first round with LRT earlier this year netted 15k links
My second look with SEM rush told me about 35k links

Just finished a new disavow and used the following:
- LRT
- SEMRush
- Ahrefs
- Moz
- Majestic
- Zizta (OpenLinkProfiler)

Ended up with over 100k unique garbage links (of course, that includes the Majestic Historic index stuff just for good measure)

But then Google Disavow only allows for 100k lines and 2 MB files MAX, so spent a whole bunch more time truncating the file about 30% URLs and the rest domain level disavow.

Anyways, use all the sources just so you find as many URLs as possible.
 
I put in the disavow file, wasn't too difficult because only 1000 spam links, which is still a lot relatively for this site/niche, and they all had commonalities in url's and linked to images.

Something I noticed was that a lot of these links seem to be on hacked pages. Many of the hacked pages, if not most, have been cleaned or suspended.

That does make it more serious, because hacked pages are, for obvious reasons, considered the worst of webspam. It's also illegal. In terms of bad neighborhood links, it doesn't get worse than a bunch of links from hacked blogs.

I still don't see the business model in this. I figure they scrape affiliate and ad results, then hack pages and repost, injecting their own ad code or their own affiliate links. Then eventually someone sees a result in Google and clicks it. It can happen, particularly for product specific keywords.

It doesn't seem a great business model, but if you're doing at the absolutely astonishing scale that seems to happen, then it might work.

They must have millions and millions of hacked pages in their network, when literally all of us are seeing this.

It's similar to the massive bot networks on Twitter/X and literally everyone at some point were liked by romance bots.

Someone, maybe @Smith talked about how this might be a foreign agent and I could agree. The scale and scope of this is too massive.

It very well could be chinese or russian hacker groups that are allowed, and maybe even funded, by their governments to attack western companies in whatever way they can. Like a modern Letter of Marque (for you Pirates players).

It would make sense.
 
@Grind - I know you have a big back log of people waiting for you to offer this as a service. Can you give a number or percentage on the number of successful recoveries vs no change or continued bleed after you do your disavow techniques.

I see the winners but curious as to how many this made no difference on.

Cheers
 
@MrMedia Ones I've done or coached personally, it's 100% success rate. Responses from the email list of guys I know actually doing the work, 100% success rate.

I'm defining success as a clear turnaround on traffic decline, tens of thousands of regained traffic in either GSC or Ahrefs (when that's all I have access to) and massive ranking increases across the board.

Rando bloggers who got access to the videos and did it on their own, mixed bag. I suspect there's a baseline level of knowledge regarding link value that's necessary to produce results like I'm sharing.

I also think that the less quality links you have, the slower and lower the recovery will be. It's all math.

If you don't have the power to pull you back up to the top quickly after shedding the shite, you won't recover as quickly.
I still don't see the business model in this. I figure they scrape affiliate and ad results, then hack pages and repost, injecting their own ad code or their own affiliate links. Then eventually someone sees a result in Google and clicks it. It can happen, particularly for product specific keywords.

There's dozens of other asked and already answered questions in this thread but this one is worth revisiting.

It's massive foreign doorway indexing networks, covering 1000s of domains, hundreds of thousands of subdomains and populating new pages automagically every time the crawler hits the page.

They trap Googlebot in the network and everything they link to ranks really fast, for millions of keywords, for 24-96 hours. Then they do it again. Isn't AI wonderful?

Our sites are just the fodder for their system. Blowback if you will. Probably unintentional but I'm sure they're having a laugh about crushing the internet and all the 'experts' chasing their tails about why it's happening.

Semi-related, this is fucking criminal imo.

 
Jesus Christ - can't stress enough you gotta use ALL link sources to find as many potential links as possible.

My first round with LRT earlier this year netted 15k links
My second look with SEM rush told me about 35k links

Just finished a new disavow and used the following:
- LRT
- SEMRush
- Ahrefs
- Moz
- Majestic
- Zizta (OpenLinkProfiler)

Ended up with over 100k unique garbage links (of course, that includes the Majestic Historic index stuff just for good measure)

But then Google Disavow only allows for 100k lines and 2 MB files MAX, so spent a whole bunch more time truncating the file about 30% URLs and the rest domain level disavow.

Anyways, use all the sources just so you find as many URLs as possible.
I thought about that and decided to switch to domain level disavow only.

If that site sent me a couple shit links, what's stopping them from sending me more in the future? Therefore, domain level made more sense to me at least future-proofing wise.
 
I'm not worried about determining whether or not a link is toxic. If a DR30 mom blog looks OK but only gets 100 monthly search visits, I'm tossing it in the disavow list. My theory -

If it's not toxic now, it certainly will be after I build all these spam links to it to get it recrawled.
 
I'm not worried about determining whether or not a link is toxic. If a DR30 mom blog looks OK but only gets 100 monthly search visits, I'm tossing it in the disavow list. My theory -

If it's not toxic now, it certainly will be after I build all these spam links to it to get it recrawled.
You're going to disavow a link, because it's going to become toxic, because you're going to build spam links to it and make it toxic...

ops.meme_.nba_-1024x768.jpg
 

Trolling aside, and just to be clear, you don't need to re-crawl links that you aren't disavowing.

If you know about the link from Ahrefs etc, that means it's been crawled. You can test it by searching for site:[link].

The point of sending spam links is to make Google recrawl a page you have disavowed, so Google recognises the disavow.

So there's no need to recrawl a link if you aren't disavowing it.

If there's no need to recrawl, then there's no need to send spam links.

And if you aren't sending spam links, then there's no need to disavow.

TLDR - you don't need to send spam links unless you are disavowing something. So you don't need to disavow because you're going to spam it, because you don't need to spam it if you aren't going to disavow.

My brain hurts.
 
Trolling aside, and just to be clear, you don't need to re-crawl links that you aren't disavowing.
Of course. You should only spam links you disavow. To get Google to recrawl them.

My point is that I'm not worried about disavowing links that are questionable. They probably weren't providing much benefit anyway.
 
Grind says a big issue with LRT is that it flags a bunch of solid links as toxic

Another issue I see is that 90% of the red toxic links it finds are long gone—404 errors or domain sale placeholder pages. I don't know if it's even worth disavowing those (I mean, I'll add them to the disavow file cause it only takes a second, but I don't see the point if the link no longer exists).
 
Another issue I see is that 90% of the red toxic links it finds are long gone—404 errors or domain sale placeholder pages. I don't know if it's even worth disavowing those (I mean, I'll add them to the disavow file cause it only takes a second, but I don't see the point if the link no longer exists).

Doesn't mean Google knows it's gone. Also could be cloaked for only Googlebot. Check the Google cache.
 
Grind says a big issue with LRT is that it flags a bunch of solid links as toxic
Yeah I don't think I'll use it again.

I've done so many of these now that I can use regex to pull out 95% of the obvious spam pattern links now and then use logic on the remaining 5%.

Much better results doing it with my home rolled process vs LRT. Quicker and higher returns.

Except this last update that was supposed to only target Site Abuse manually also has an algorithmic component hammering a bunch of previously hammered and recovered sites.

Le sigh.
 
Back