Duplicity On Web Properties - How Much Does It Matter?

RomesFall

‍‍‍
BuSo Pro
Joined
Oct 7, 2014
Messages
460
Likes
684
Degree
2
I've got to thinking lately a lot about duplicity on web properties. My main focus right now has been getting my technical understanding to a decent level of both on-page considerations and of course I'm trying to learn more about GSA SER and how to use that properly and understanding more about that.

Because of this I've been thinking about how not only duplicate content is an issue, but how duplicity is specifically covered within a number of Google's patents, treating individual web pages as nodes / properties and how they are relationally similar to other 'nodes'.

E.g. If money site A is linked to from web pages B and C what do B and C have in common with each other and what does A have in common with B and what does A have in common with C.

This is a super simplified way of looking at Google's relevancy factoring and we can also assume their janitors/crawlers are using this kind of logic as well to detect footprints that look like an attempt at manipulating their SERPs. We know they have the patents.

I feel like a lot of people are ignoring potential footprints that are easy to identify and that footprints have become too intwined with talking about PBN's except for of course things like diversity, duplicate content, anchor text ratios et all.

There's so many more we don't take into consideration, maybe they're fringe considerations, but surely they're going to become more important in the future? For longer term projects where you're not going down a squeaky clean, minimal manipulation route I believe this stuff really matters.

I know that there are people here doing spam very smartly and I'd love to open all of this up as a topic of discussion, not just talking about spam, but how you treat any web property to avoid these issues of duplicity, footprints and the like.

- Content
- On-Page Factors
- Ownership (Cookies, IP, Flash Tracking - is this important all the time? How much can Google possibly know from third-party databases?)
- Link Diversity
- Ratios (including property - property ratio footprints, if this matters?)

- Plus anything else you believe, know and think of.

I'm going to continue sinking my teeth into patents, testing in environments where I control as many variables as possible etc, but I believe it would be great to hear the thoughts of those who I consider to be pros.

I welcome anyone to contribute, but I hope this can be a technical discussion without people parroting what they've heard elsewhere without the understanding and willingness to explain their understanding to back up what they're saying.

- RF
 
Well I can only speak from my experience and my own side of things with the client work I do, which is by no means the full picture and purely one side of this issue. The thing I always keep in mind with Google is, they have to weigh all of their tactics, techniques, and practices against how it will affect legitimate websites and brands. Any little change can potentially result in huge effects that are far-reaching, and potentially devastating. So the thing I try to keep in mind is, knowing this, if you were trying to solve that problem and target low quality sites with algorithms, what creative ways can you think of that might allow you to sidestep the issues or at least push through them with minimal effect to legitimate sites?

That being said, I deal with a fairly wide range of clients. Everything from small, local businesses to a few major national franchise brands and big international brands. If there is one thing I've seen that is accurate, it's that most legitimate websites are lacking some degree of due diligence in taking care of SEO best practices. Duplicate content is a massive problem for, I would say, a pretty significant percentage of legitimate sites. Most people never think to even resolve their www and non-www versions of their site with a 301 redirect. Many don't even know what that means. Hell, I have several major franchise brands I've dealt with that have the majority of their top level "corporate" pages duplicated across EVERY single franchisee location (Imagine 50-75%+ duplicate content LOL). Most people simply do not pay attention to detail. That's normal, and it's true even for a not so insignificant percentage of supposed "professionals" that are supposed to be managing these issues for companies. Considering this is effectively the default behavior, I would guess that this has to be something that Google, to some degree, accounts for in their algorithms. What I mean by that is, I highly doubt the "penalty" or negative effects of duplicate content are quite a "scorched earth" type of effect, as they have to account for most people's commonplace lack of due diligence.

Anyways, I apologize. That's only one small part of what you're talking about, but I figured there might be something useful in it, even if just to point in some directions to go down that line of thought.
 
Do I need to speak much philosophy here ?

Duplicate content still work but is a severe penalty trigger.
 
Do I need to speak much philosophy here ?

Duplicate content still work but is a severe penalty trigger.

You can save the Nietzsche for another time :smile:

I'm not saying it doesn't work and that it's anything but a potential flag for a penalty trigger, just want to talk about how far this flags/footprints/duplicity might extend and what people are doing to avoid such potential issues.

You know the patents better than I do and I know that so it would be very interesting to hear your thoughts on things like the paid link patent and similar ones.

To be honest duplicate content isn't really what I was hoping to talk about, was just using that as an example.

- RF
 
I know you aren't focusing specifically on duplicate content, but I will comment about it. It's okay to use templated content that gets ad-lib'd, to some degree. It's also okay to syndicate content. But in both situations, I would make an effort to include at least a paragraph (four sentences of decent length) to each page. It doesn't take much to make a page "unique" enough.

Regarding other possible footprints, which seems to be what you're really talking about, I would say that there are algorithms that can analyze vocabulary, grammar, syntax, punctuation use, and more that can, along with other items you've mentioned, build enough statistical confidence to connect sites together to the same owner. All just based on the way you type or the way your writer's write.
 
I know you aren't focusing specifically on duplicate content, but I will comment about it. It's okay to use templated content that gets ad-lib'd, to some degree. It's also okay to syndicate content. But in both situations, I would make an effort to include at least a paragraph (four sentences of decent length) to each page. It doesn't take much to make a page "unique" enough.

Regarding other possible footprints, which seems to be what you're really talking about, I would say that there are algorithms that can analyze vocabulary, grammar, syntax, punctuation use, and more that can, along with other items you've mentioned, build enough statistical confidence to connect sites together to the same owner. All just based on the way you type or the way your writer's write.

Agreed. A paragraph could represent a certain % of unique text dependent on the total words and characters of the content itself. I'd love to know if we could work out an 'acceptable' percentage of 'uniqueness' to a mathematical certainty.

I tend to just use unique content even if it takes me longer, regardless of what the content is for and if I need to spin it then I'll do it manually, so I'm not sure if I'll ever find out if there's a percentage we can get away with or if it's SERP dependent.
 
I don't think Google measures the amount of unique content on a page by a percentage of overall content. I think they just require a certain amount of words.

Does it have 100 unique words?
versus​
Does it have 10% uniqueness?

I mean, they index pages with 100% duplicate content, and they index pages with zero content, all day every day.
 
I don't think Google measures the amount of unique content on a page by a percentage of overall content. I think they just require a certain amount of words.

Does it have 100 unique words?
versus​
Does it have 10% uniqueness?

I mean, they index pages with 100% duplicate content, and they index pages with zero content, all day every day.

That's true of course, but also more pages out there are indexed that don't rank 1st page (obvious is obvious) because of many different factors. No SERP holds a site in it that is what any of us would call perfect, but surely the closer we get the better our chances and the further away from that we get the less likely we are to rank. That's partly what optimization is.

Let's just say for example that we have a campaign setup to build a shed load of *insert link type here*... What's the problem with using 100% duplicate content on each? Is it that we're scared it won't index? No, it's that we're worried about the property & site it links to getting a penalty because they've detected duplication and know that's a sign of manipulation.

What about conducting a link building campaign purely from one link type? Do we not do this purely because we've heard others say not to? Because we've conducted tests and we've seen the effects are not ideal? Why are they not ideal? Why does it not work? Footprints, duplicity... Obvious manipulation.

So let's forget content completely for now, how far does this extend?

I'd rather spend a little bit of extra time setting up my campaigns and get a month extra out of my sites or however much longer is possible, not have to deal with conducting more link building because I've flagged myself when Google has detected relational data between two properties and this meaning the links don't pass as much value at best! A few little tweaks and even contextual spam can be that little bit better, that much safer.

The SEO's on here would mostly identify as 'black hat' - who are supposed to pride ourselves on our technical ability. Let's be honest though this subject which warrants some extensive investigation has hardly been looked into if this is as far as our conversation has taken us all.

Is this who we are now? The people that are happy with 'good enough' and then debate on blogs, forums and Skype what's going on with Penguin and Panda, which can be avoided if we just put the time in to learn to avoid them in the first place.

Certain scale campaigns will always fall foul in some way because it's inevitable, that's fine, just saying we can do a lot to minimize our risk both today and in the future.

I was hoping there would be others on here who have thought about this stuff and would be willing to share it. I'm guessing there are those here who know exactly what I'm on about, they either don't want to share or they're happy doing 'just enough'.

I've only just started to sense the tip of the iceberg with how this stuff works and I'm taking it very seriously because I know how much time and money it could save me in the long run.

- RF
 
Last edited:
I absolutely agree with you about people ignoring huge potential footprints that would be easy for Google to identify.

The problem is the cocky mindset that a lot of SEO's have, they think they're smarter than Google because they can blast 10k links and beat an algorithm for a week. They forget that Google is one of the largest technology companies in history, who go out of their way to pay out the ass to hire the best, smartest people in the industry. Herping and derping around with spun posts and garbage links, acting like they're some kind of brilliant genius. Look on any scienece or tech news site and there are stories every single day where you're like "Wow, that's amazing that that's possible these days..." yet people assume that GOOGLE doesn't have some incredible behind the scenes stuff going on, too? Yeah, okay.

What I really don't get is when you have guys going through all of the motions to make their sites look like the real deal... people are putting more effort into PRETENDING to be a good site than it would take to actually make a good site that will last through any algo or refresh.

Granted, I guess that's what you've got to do in certain niches, so I can't talk too much because I avoid those niches.
 
Back