Content Pruning - Did You See an Increase in SERP Visibility?

I'm also just now seeing some really solid jumps in the rankings. I have NOT pruned, cause I only have like 60+ posts anyway, but I've done a lot of EAT stuff. On site, off site.
I just saw particular ranking success on "cheap widget", which is a product feed page combining 10+ feeds with 500+ products. I'm the only one to do so. Only 300-400 word content. Search intent reigns supreme. Cheap = user wants a selection of choices ranked on price. Could Google really be as smart as to recognize a product comparison search engine?
 
This seemed like the best place to post this rather than a new thread.

Big Established Site

I came across a large fashion site that has been around for appox 10 years with a current ahrefs monthly traffic of over a million and 5-6k referring domains. Looks like they upped their game in 2019 with increased inbound links and a large amount of pruning/optimising content.

Standard Stuff / On A Large Scale

Articles have been consolidated, removing the ugly urls(year/date/month), into better articles. But due to the nature of the fashion niche, a lot of articles were cut, maybe not relevant or fashionable anymore. The site has been cut down significantly and stands at about 9,000 pages in the index ATM.

301'd To NoIndex

The interesting thing about what they've done, is everything is 301ed to an innerpage named named "Archived Articles", with a short paragraph explaining how stuff changes and rather than give an old article here are the best of our recent content etc. The page then links to categories/profitable affil articles. Think a custom 404 page for SEO.

But this site has marked the page noindex/follow.

Any Thoughts?

The almost 400 linking domains, are going to an unrelated page but the whole domain is pretty niched down. What's the angle with noindex?

The current orthodoxy is that noindex will lead to (an effective) nofollow tag after some time.(if you believe this).

I'd be interested to hear anyone's thoughts on this.
 
The current orthodoxy is that noindex will lead to (an effective) nofollow tag after some time.(if you believe this).

This was one of those stupid, vague, cryptic statements that another Google employee debunked on the same day, but the debunking never outpaces the #fakenews. John Mueller sometimes plays the same old ego game Matt Cutts used to play: "I know something you don't know, not because I'm smarter, but because I have access. But if I state it like a wise old sage where you have to unravel the riddle, then you'll think I'm smart."

What John Mueller meant by this is that if a page is noindex'd then it won't ever be crawled again, so the links won't ever be followed again.

Gary Illyes came in and rained on his parade by fully explaining it. If a page has internal links leading to it, external links leading to it, menu links leading to paginated links, etc... if it's easily crawlable, it will be crawled. And the links won't ever be nofollow'd. The only way John's scenario would ever play out is if your noindex page was completely orphaned from the rest of the web.

You can definitely have pages be noindex with followed links. That's actually the default status. And if Google didn't include all of this in their link graph, they would have a very inaccurate and broken link graph. And if they didn't crawl noindex pages they wouldn't discover a lot of the web pages out there to begin with.

I like that solution the fashion site did, though. Like you said, it's like a custom 404 for SEO to distribute the page rank juice where they want it to go. Clever.
 
@Ryuzaki , I missed the Gary Illyes statement on this and it was only when I saw that Yoast had changed how they handle archives I put some faith in it. Yoast...
 
After deleting content and submitting a sitemap containing the deleted URLs, Google search console shows them as errors. I created the sitemap since @Ryuzaki suggested this might accelerate the deindexing of deleted content. Should I just ignore the errors or is there anything I can do to make Google know this is on purpose?
 
@F5K7, they are technically "errors." Google expected content it knew about and got a 404 error instead. It's an error only because you have them in a sitemap. Once you remove that sitemap it'll move to the gray tab instead of the red tab. This is fine and not really an error, it's just where they place them in the Coverage Report.

After starting this thread I ended up talking about this issue in various places on the forum. Here's two posts you'll find of interest:
You asked what you can do to make Google know it's on purpose. You can throw a 410 error instead of a 404. You can think of a 404 error as "Missing" and a 410 as "Gone" (as in, yes, there was content here and we purposefully have removed it).

I never did the 410 method simply because I didn't want to deal with writing code for it. But with the 404 you may have to get the pages crawled a couple of times before Google says "okay, we get it, it's gone and we'll deindex it." Most pages will drop out pretty quickly and the final stragglers will take months.

There was a few times, since I had so many URLs, that I ended up filtering the temporary sitemap down to what was left so I could get a fresh look in the Coverage Report. This also helped get those URLs crawled again.

_____

UPDATE
I never really updated what happened with this mini-project, but removing the 147 low quality posts, fixing a few indexation errors with categories, and then fixing around 700 blank URLs being indexed due to robots.txt... I got a full recovery as far as I can tell so far. I became convinced it was a Panda problem and treated it as such and it took about 11 months after the fixes were deployed to finally pop back up in the SERPs:

H4F3YU3.png
 
@F5K7, they are technically "errors." Google expected content it knew about and got a 404 error instead. This is fine and not really an error, it's just where they place them in the Coverage Report.

After starting this thread I ended up talking about this issue in various places on the forum. Here's two posts you'll find of interest:
You asked what you can do to make Google know it's on purpose. You can throw a 410 error instead of a 404. You can think of a 404 error as "Missing" and a 410 as "Gone" (as in, yes, there was content here and we purposefully have removed it).

I never did the 410 method simply because I didn't want to deal with writing code for it. But with the 404 you may have to get the pages crawled a couple of times before Google says "okay, we get it, it's gone and we'll deindex it." Most pages will drop out pretty quickly and the final stragglers will take months.

There was a few times, since I had so many URLs, that I ended up filtering the temporary sitemap down to what was left so I could get a fresh look in the Coverage Report. This also helped get those URLs crawled again.

_____

UPDATE
I never really updated what happened with this mini-project, but removing the 147 low quality posts, fixing a few indexation errors with categories, and then fixing around 700 blank URLs being indexed due to robots.txt... I got a full recovery as far as I can tell so far. I became convinced it was a Panda problem and treated it as such and it took about 11 months after the fixes were deployed to finally pop back up in the SERPs:

H4F3YU3.png
I feel like you deserve some kinda congratulatory trophy for defeating a black and white animal very publicly.
 
I never did the 410 method simply because I didn't want to deal with writing code for it. But with the 404 you may have to get the pages crawled a couple of times before Google says "okay, we get it, it's gone and we'll deindex it." Most pages will drop out pretty quickly and the final stragglers will take months.

What do you think about serving a 410 for all 404 pages. That would be pretty easy to do and I don't see the downside.

Alternatively there is a plugin that lets you serve 410s for posts that are in trash.

What's the easiest way to serve 410s for a list of URLs?
 
@F5K7, that 410's for Trashed Posts sounds like an easy solution for this job. Once a post is deindexed you can perma-delete it, then ultimately delete the plugin once they're all gone.

Otherwise you'd probably need to use PHP to change the HTTP headers or do it within your .htaccess file. With such huge lists of URLs many of us are working with, it makes it kind of prohibitive or at least a time waster to do it this way. I like that "if post in trash, then 410" method.

What do you think about serving a 410 for all 404 pages. That would be pretty easy to do and I don't see the downside.

I wouldn't do this. 404 errors happen ALL the time. The amount I have showing up in the Search Console Coverage Report is insane, and that's just the ones Google is deciding to report. People link to me in crazy broken ways, they make up URLs hoping a page exists, etc.

What I don't want to do is serve a 410 error there. There's zero risk of it getting indexed because it goes to my 404 page, and the last thing I want to do is imply there was ever any content purposefully at these busted URLs, which a 410 does imply. "It was here, now it's not" versus the 404's "never was here, maybe it was, dunno, it's not here now, someone made a mistake"
 
I'm not sure if this was mentioned already, but I was just doing something similar due to an affiliate link cloaking plugin. Old search console has a URL removal tool. I used it today and the URLs were out of the index in just a few hours. The downside is that it may only be temporary (90 days). You can do all urls that start with a specific prefix in one shot too.
 
I'm not sure if this was mentioned already, but I was just doing something similar due to an affiliate link cloaking plugin. Old search console has a URL removal tool. I used it today and the URLs were out of the index in just a few hours. The downside is that it may only be temporary (90 days). You can do all urls that start with a specific prefix in one shot too.

Yeah but the problem is this only hides them from the SERPs and doesn't remove them from the index. It's how I prolonged my own problem, trying this a couple times. You have to actually get them deindexed if you want to recover from any negative Panda effects, even if they're hidden from view in the SERPs from the URL Removal Tool.
 
Ah ok. I'll be sure to get them recrawled in the mean time now that the problem should be fixed. Thanks.
 
Anyone got an idea why Google won't drop domains out of the index that were moved with a 301?

I've checked several domains I have redirected, some years ago, but they are still indexed with apparently all of their pages. I indicated the move in search console. I wonder if this also affects the number of pages Google has indexed for a site and the overall website quality rating.
 
@F5K7, because sometimes Google thinks it's more relevant to show the SERP results of the original domain when something related to that domain is searched. The users are still redirected where you want them, but Google is showing results that result in a click and a more satisfied searcher. It helps the searcher understand that a change has occurred with the website, versus showing them a result for a website that they never searched for.

Like if the searcher looked for "Tom's BBQ Recipe" but you 301'd it to Jerry's site, it makes more sense to display Tom's site than Jerry's, even though they land on Jerry's site ultimately.
 
What do you think about installing a plugin that redirects any 404s to closely related pages, if available.

If you delete huge amounts of content, there is a good chance that what was discredit as an irrelevant link adds up. This way you wouldn't lose any backlinks. Obviously that's only a solution for big sites where individual 301 redirects aren't feasible.
 
What do you think about installing a plugin that redirects any 404s to closely related pages, if available.

If you delete huge amounts of content, there is a good chance that what was discredit as an irrelevant link adds up. This way you wouldn't lose any backlinks. Obviously that's only a solution for big sites where individual 301 redirects aren't feasible.
I use the redirection plugin to monitor my 404s and every few days check and manually redirect anything to the correct (or next best) content.
 
What do you think about installing a plugin that redirects any 404s to closely related pages, if available.

If you delete huge amounts of content, there is a good chance that what was discredit as an irrelevant link adds up. This way you wouldn't lose any backlinks. Obviously that's only a solution for big sites where individual 301 redirects aren't feasible.

I wouldn't trust a plugin to determine which page is most relevant. I prefer to setup the redirect right when I delete the post. Rarely do I have a deleted page 404. There's usually something semi-relevant that I can 301 it to.

But if you go the 404 route, @Darth 's idea is nice.
 
@F5K7, there is a plugin for Wordpress, Redirection by John Godley, that's trusted and supported. Just updated 4 days ago. If your worried about your .htaccess file getting too long and bloated, this does PHP redirects. You still have to create each one individually (is that your concern?), but it puts the redirects in the database instead of .htaccess.
 
Thanks for all the replies.

I use the redirection plugin to monitor my 404s and every few days check and manually redirect anything to the correct (or next best) content.

That's a nice idea, however I am more concerned about links that I missed or discredited while pruning 80% of the content (10,000+ pages).

I wouldn't trust a plugin to determine which page is most relevant. I prefer to setup the redirect right when I delete the post.

This is unfortunately not feasible for the amount of pages I am dealing with.

If your worried about your .htaccess file getting too long and bloated, this does PHP redirects. You still have to create each one individually (is that your concern?)

It's more that I don't want to go through 10,000+ pages and redirect them manually. None of them have links above a certain DR, I kept those pages of course. But there are many natural contextual links that I had to ignore when deleting because of low DR or otherwise I couldn't have pruned properly. Now I think having those low DR links pointing at 404s is less beneficial than using this plugin, which actually does decent redirects.
 
This was one of those stupid, vague, cryptic statements that another Google employee debunked on the same day, but the debunking never outpaces the #fakenews. John Mueller sometimes plays the same old ego game Matt Cutts used to play: "I know something you don't know, not because I'm smarter, but because I have access. But if I state it like a wise old sage where you have to unravel the riddle, then you'll think I'm smart."
Oh god yes, this. So much this.

As for @Stones 's story, I've done similar with a site of my that had a ton of old links going to all sorts of thin inner content pages. I 301'd each to an "archive" page, and placed thumbs and descriptions to the main categories, along with a few manually selected featured/popular posts. I didn't noindex, of course. This was done a few years ago, and has worked very well. Juice and weight seems to flow like intended, and pretty much all new articles rank without having to do anything else.

This was an expired domain I took over, btw. Like, dropped, repurposed, re-regged.
 
I was wondering if it would ever make sense to noindex individual posts and let Google index all of that content on a category page rather than a number of individual posts that aren't getting much traffic alone.

Anyone ever tried it or have an opinion?

Example - product reviews that aren't all that long (maybe 500 words) that drove traffic when the product launched but died off shortly after that. I was thinking of showing the full post on the category page and only allowing that page to be indexed (and change the canonical). It would likely only be 5-10 posts per category, but some have even less. I'm wondering if this could have the same positive benefit as deleting or merging posts like a number of people have been mentioning on here. I'm not sure if it would save time or be the same amount of work as merging them all into a single post in the long run.
 
@scottb, from a robot stand point that makes complete sense. You get to use the content still and it doesn't bloat your indexation. The canonical part doesn't make sense unless the pages have links. If there's no links, the product pages will drop from the index anyways and the category page will get the credit.

From a human stand point this makes little sense. If the content isn't useful any more then you don't want it in your categories, pulling clicks and attention away from content that still matters.

I'd also think about what's quality content or not. Just because a page isn't getting a lot of search traffic doesn't mean the people that do find it and read it think it's useless. The word count has little do with what's quality or not. It's whether or not it fulfills the needs of the query that determines that.

If time-sensitive content needed to be no-indexed, you'd see every every news site no-indexing or playing canonical games with their content after 30 days, etc. But none of them are doing that, because the content is still quality and useful.

I'd tell you to stop thinking about this and move on from it. This is what I've been talking about in the past month or so when I keep mentioning on the forum that when it comes to SEO you need to get out of your own way. This isn't going to benefit you greatly.

If you don't think the content is useful any more, I'd either delete it and let it 404, or I'd noindex it and NOT display the full post on the category (just leave it like it is with an excerpt, presumably), or just leave it alone and leave it in the index because it probably is still useful.

Think about the New York Times and CNN and Fox News and anyone else. I'd suspect 99.9% of their content gets zero views from anything but bots. Yet they dominate with all that "trash" in the index, and that's because it's not trash. Old != Low Quality
 
I am going through a prune right now and was wondering about noindex vs delete.

Prior prunes, I just deleted the posts outright (safest). However then you have potentially bad user experiences and 404 links on your other posts (like if you had linked to a post you pruned from a still live post).

Anyway, I think I will go to the effort of noindexing my pruned posts this time. Anyone have any thoughts on why delete vs noindex might be better?
 
Prior prunes, I just deleted the posts outright (safest). However then you have potentially bad user experiences and 404 links on your other posts (like if you had linked to a post you pruned from a still live post).
Yeah, I went through and removed all internal links to the deleted posts, and I redirected the deleted posts back to their parent categories. I don't want 404's for users or Google. I truly believe that most of Tech SEO is about doing Google the favor of not costing them wasted resources, and they return the favor back to you.

Anyway, I think I will go to the effort of noindexing my pruned posts this time. Anyone have any thoughts on why delete vs noindex might be better?
The only thing to consider is that noindex pages will still play a role in Google's new "user experience" Core Web Vitals page speed stuff like Cumulative Layout Shift, Largest Contentful Paint, First Input Delay, etc. (readers can learn more here). So they still need to be up to snuff. Could create a maintenance issue that you could dodge by deleting the posts, redirecting them, and deleting internal links (as to not have a redirect chain or bad user experience).
 
Back