Working on Massive Site, how to deal with a huge amount of old content

Joined
Mar 15, 2018
Messages
53
Likes
10
Degree
0
  • 20k+ Posts
  • All around 200-400 words
  • Barely receive traffic
  • Mediocre interlinking
  • Most of them have no Backlinks, some do

The plan is to launch new content later this year.

I read some case studies on deleting old content and the positive effects it might have. Even more I am afraid of the negative effects 20k+ barely visited pages might have on the new content.

Would you delete old content? If so, what metrics would you use?

I struggle finding a decent way to sort out what to delete and what not. Ahrefs metrics are pretty much useless as their UR metric seems to be completely random and I can't use referring domains as a metric since it doesn't give you any idea of the value of those backlinks.

Right now I am thinking about deleting 80% of the old content, preferably the posts with no strength, and use the rest to link to new money content that will be published eventually.

I could also just keep all the content and hope that Google doesn't care. Please let me know your thoughts.
 
I started a thread here that will interest you: Content Pruning - Did You See an Increase in SERP Visibility?

My first question would be how the site is performing in general right now. How much organic traffic does it get daily / per month with 20k posts that bring nearly nothing?

Content can be of a high quality and not on-page optimized, and thus receive little traffic, and at the same time I don't think Google will consider it a poor page for not receiving traffic either. News posts, seasonal posts, and other types end up getting no traffic but still are good pieces of content.

My concern is 200-300 word posts. I know news sites can get away with it and depending on the search queries being targeted, that's the perfect amount of content. But Google does have their "Everflux Panda" that measures a sitewide quality score using content and technical SEO concerns. So the question becomes "are 200 word posts written for queries that tend to need 1000 words considered low quality content if they'd be considered high quality for different queries with different intent?" Nobody knows.

What we do know is there is a sitewide score being calculated based on everything that is indexed.

Yeah, Ahref's UR isn't that great, I'd use DR, and as long as the link isn't one that I need to disavow, I'd keep it.

With that much content, do you have the posts in thematic groupings, or can you group them? I see no reason you can't create new posts optimized for specific search terms you could take down, and then use each ~250 word post as a section under a header. You could combine the related posts into monster posts, tweak the titles as headers, optimize for good terms, and end up earning traffic with the content.

All you'd need to do then is 301 the posts you combine to the new parent post. That would not only combine the content but combine the backlinks.

Normally I'd say to put the 301 redirects in .htaccess or whatever the Nginx version is, but if you were to hit 5,000 redirects or more in there that'd be silly since it's parsed on every page load. Depending on the CMS you're using, I'd set up database PHP redirects since these old pages probably don't get crawled very often.

Anyways, instead of smashing delete, I'd use as much of this as I can to strengthen the domain. All this old stuff can become new money content that can earn, while still deflating bloated indexation.
 
I read your Content Pruning case study before and especially the "sitewide score" is what made me think that I really need to cut down on indexed pages.

My first question would be how the site is performing in general right now. How much organic traffic does it get daily / per month with 20k posts that bring nearly nothing?

The site gets around 3000-4000 monthly visitors.

But Google does have their "Everflux Panda" that measures a sitewide quality score using content and technical SEO concerns. So the question becomes "are 200 word posts written for queries that tend to need 1000 words considered low quality content if they'd be considered high quality for different queries with different intent?" Nobody knows.

What we do know is there is a sitewide score being calculated based on everything that is indexed.

That's what I am most concerned about as well. The sitewide score must be dragging down any new content, no matter how much better it is. All of the posts are press release/tech news. I don't think that most of them should have much more than 300 words, but there is no need for them at all really. No one is going to care about a blender or a headset that came out 6 years ago.

Yeah, Ahref's UR isn't that great, I'd use DR, and as long as the link isn't one that I need to disavow, I'd keep it.

That would force me to check manually tho. So far I exported all "Top Pages by Links" (huge excel sheet) and sorted by UR, as referring Domains isn't useful thanks to spam. I might have to look into scripts to automate the manual lookup.

With that much content, do you have the posts in thematic groupings, or can you group them? I see no reason you can't create new posts optimized for specific search terms you could take down, and then use each ~250 word post as a section under a header. You could combine the related posts into monster posts, tweak the titles as headers, optimize for good terms, and end up earning traffic with the content.

Unfortunately I don't really see how I could combine them to something valuable. They are grouped by categories (hundreds each), but the product range is so broad that I'd struggle to come up with a suitable main topic for more than a couple of posts, so that's nothing I could scale really. Also, none of the content is informational which makes it even harder.

However, this might be easier once I brought the post count down to a couple of thousands. I need to find a way to weed out a majority without losing value in form of backlinks.
 
So, basically, you have product pages?

Are you selling? DO you have stats?

First step would be to kill the obvious under-performers.
Second, identify top performers.

See if it makes sense to create informational pages for those.

"The ten best X"
 
Last edited:
First step would be to kill the obvious under-performers.

That's what I am currently struggling with. Traffic doesn't work here and I haven't been able to find a way to sort them by link strength other than Ahrefs UR/RD, which is extremely unreliable.
 
Why does traffic not work?
They are all at 0?
 
You can pull up ALL links to the site using Ahrefs and Majestic (your best choice since they come with metrics) and come up with some threshold Domain Rating and Citation/Trust Rating for each, respectively. Then just cut everything below those thresholds. Then you can see which pages have links above those thresholds and then quickly see which are spam or not. You'll cut out 99% of the spam doing this and be able to see which URLs you need to 301 redirect.

You're probably going to find that very few have worthwhile links, based on the content you've described, not a knock at your project, just the nature of the content.

If this was me, I'd do a quick look using the procedure above, and if it turns out there's almost nothing worth keeping, I'd probably just do a mass delete and move on, if you're sure this is a direction you want to go. Take a backup though, you may want to bring it back.

Perhaps you could post a handful of the new money content first and see how it performs. You may find you don't need to delete anything. Maybe Google thinks it's all valuable content and it's not holding you back. If you feel it is, then you can still delete and see how it reacts. If you're worse off for it, you can bring it back.

One thing you need to plan for is waiting forever to get this crap out of the index. Since you'll be sending 404 codes, it's going to take a while. Depending on your CMS, you could get these pages to send 410 codes, which is like "I purposefully deleted this, it's gone for real, don't wait to take it out of the index." Otherwise, you'll have to wait for them to recrawl each page several times before they respect the 404, and then still wait for Panda to process to see how it plays out.

One method to help you track it is to put every URL you delete in a custom sitemap and upload that to Search Console. Then you can see exactly how it's progressing. I laid out this method here: https://www.buildersociety.com/threads/Ω-→-∞-surrender-supremacy.446/post-41926

The benefit here is you can update the Last Modified date or whatever on the XML sitemap to trigger them to crawl it. Also they'll continually process it since it's a sitemap.
 
If you are going for another route, maybe use the date.

As in keep the newest X articles for each category.
or delete everything older than Y.

Then use the remaining to rework/refocus
 
I used Ahrefs API to export exact referring domains for every one of those posts. I deleted what's obvious spam but am now not 100% sure what to do with the ".blogspot.com" links. The website is quite old and apparently ".blogspot.com" used to be way more popular back in the day, resulting in 30% of the referring domains being ".blogspot.com" blogs. Not spam, just people running their own small blogs naturally linking to the website, definitely nothing I'd disavow.

90% of those have a DR of less than 10 and I am thinking about how much they actually benefit the website.

...and come up with some threshold Domain Rating and Citation/Trust Rating for each, respectively. Then just cut everything below those thresholds. Then you can see which pages have links above those thresholds and then quickly see which are spam or not.

What would be your threshold for DR?
 
Back