Can I use TF * IDF as a measure to optimize my content?

Joined
Aug 28, 2018
Messages
10
Likes
5
Degree
0
A new client has requested that we optimize 5 posts (informational). Any on-site SEO improvements such as internal linking or using long tails is allowed.

Should I use TF * IDF as a way to optimize these posts? Has anyone seen good results from this? Is there a better way to optimize content?

P.S Go through this if you want to have your mind blown away -> https://www.slideshare.net/ipullrank/technical-content-optimization-149169748

Cheers.
 
Myself and @stackcash have played with TF*IDF extensively. I've never jumped from spot #3 to spot #2 or anything that high stakes where the distance between two site's ranking score is exponentially higher than even the bottom of page 1.

But I've seen countless times jumping from page 2 to the bottom of page 1, or from the mid 30's to the lower 20's, etc.

The question you have to ask is... is this TF*IDF or did I just give myself a freshness refresh.

I can say though, that I've never seen a negative reaction to this type of optimization, only positive or neutral depending on where you're ranking.

If you want to measure the difference, take 3 of those posts and do TF*IDF changes to them, then take 2 and just rewrite a couple paragraphs and then update their publish dates or whatever system you have set up. They should all get a freshness bump but the TF*IDF ones should get a bigger bump if it there's anything true to it.
 
Interesting..

From my limited playing around with TF-IDF (I'm using WebSite Auditor for it, btw), I'm beginning to see it as a way to improve the quality of content, not as a way to match your keyword count to that of a competitor. I believe TF-IDF or any similar method would be the most beneficial if used to improve content depth.

Have you guys tried out tools such as Wordlift or Clearscope? Wordlift looks interesting and also, affordable.
 
The question you have to ask is... is this TF*IDF or did I just give myself a freshness refresh.

Did you also fix a typo or two? Add another sentence here/there that made the article more logical/better explained?

If you're anything like me (better from what I read), you probably can't help yourself from improving it in addition to the TF*IDF changes, so it may be a sum of all parts type of situation.
 
are you talking about using TF-IDF for the posts individualy or TF-Idf for each post compared to the serp, if it is the last then really you can't do that, since you don't know the actual IDF, you can do some guestimation based on what google displays in the serps, but remember that google doesn't show you all the results they have in their archive, and at that point then you have to ask your self if it's really worth it.

In short I'd forget about TF-IDF and focus on more tangible stuff, on page such time to first byte, general page load speed UI, Keyword density, keyword placement etc. etc.
 
Can someone please explain TF * IDF to me like I am a 5 year old, with some examples, again like I am a 5 year old. Every explanation I see is some shit I can't comprehend and if I can’t get it than there has to be a vast audience that do not know what you people are talking about. Again, like a 5 year old, and I’ll owe you a big favor.
 
Can someone please explain TF * IDF to me like I am a 5 year old, with some examples, again like I am a 5 year old. Every explanation I see is some shit I can't comprehend and if I can’t get it than there has to be a vast audience that do not know what you people are talking about. Again, like a 5 year old, and I’ll owe you a big favor.

https://medium.com/@nick_eubanks/tf-idf-and-how-it-works-with-seo-79b76d9db5c0

http://www.tfidf.com/

https://www.onely.com/blog/what-is-tf-idf/
 
That doesn't help. Usually when people ask for help, it's cause they probably did research at some level and don't understand it completely, linking to a bunch of technical articles doesn't help.

If someone can explain it to me like I'm 5 year old, I appreciate that.

TF * IDF is a formula used to measure the frequency of terms used in a set of documents. Documents referring to each listing in a search result. The formula is designed to put weight on words used infrequently within a document but across many documents. The idea being it will create a list of relevant words to use in your content while putting less value on over used words. The lowest value words being stop words like "a", "the", "and" etc...

For example, your competitors ranking for "Blue Widgets" will likely have "Blue Widgets" used in their title and body content maybe a few times. This term will likely appear infrequently per document but across many of those documents. They may also use words to describe specs, dimensions, common uses, etc... tf*idf will highlight whether those words have a correlation with the ranking content. So ideally tf*idf will clue you in on additional things your content should cover while steering you away from putting too much emphasis on overused words.
 
That doesn't help. Usually when people ask for help, it's cause they probably did research at some level and don't understand it completely, linking to a bunch of technical articles doesn't help.

If someone can explain it to me like I'm 5 year old, I appreciate that.

My qualification

I spent many, many hours on tf-idf and related algorithms, which resulted in practical outputs - don't want to divulge more than that on a public forum.

ELI5: tf-idf

In layman's terms:

It determines how relevant and rare your page content is, compared to other pages. No link-analysis is involved.

Relevancy: The more relevant terms your page has, the more relevant it is.

Rarity: The more terms your page have that are not found in most of the other competing pages, the higher your rarity score.

Relevancy is term frequency (tf).

Rarity is inverse document frequency (idf). In other words, the reverse of document frequency. If your term is not found in any other documents, you get a big boost.

Mix in a bit of logarithmic scaling and statistical sampling, and you have tf-idf.

What should you do, to make sure your tf-idf goes through the roof?

Don't look at the formula, unless you like immersing into it, just for the sake of it.
  • Read a hundred blog posts and end up even more confused
No, not that!

Instead:
  • Make sure most of your content is very relevant to the topic. In other words, minimize irrelevant content.
  • Make sure most of that content is not found in other web pages. In other words, produce unique content.
That is it. Good luck!
 
Last edited:
Relevancy: The more terms your page has, the more relevant it is.

Rarity: The more terms your page has that are not found in most of the other competing pages, the higher your rarity score.

Relevancy is term frequency (tf).

Rarity is inverse document frequency (idf). In other words, the reverse of document frequency. If your term is not found in any other documents, you get a big boost.

If a term has rarity, how is it deemed relevant to the topic? How do you find terms to include that would fit that description?
 
If a term has rarity, how is it deemed relevant to the topic? How do you find terms to include that would fit that description?

ElI5 answer: Only terms relevant to the topic are considered.

If you want to do this yourself:

Look at the top 20 pages (because you are not Google, to have the whole corpus!) in SERPs. Read the content on all those pages.Find out terms common to most of them. Make sure you have covered all of them.
 
TF-IDF
Essentialy is how many times does a word or sentence (Term) occur on your page pleatively to how many times does the word/sentence occur on all the pages.
Example if you are reading a book the TF-IDF would be how often does a specific word/sentence/term occur on the page that your are on relatively to how many times does it occur through out the entire book.
In that way you can determine how important or relevant the page you are at is compared to the entirety of the book.

The same explanation is also the reason why TF-IDF is for the most part irrelevant when it comes to SEO simply because you do not know how long the book is, google only shows you 1000 pages of all of those that are in its archive.
That said TF-IDF can have a relevance if you are trying to determain which page on your entire website you should attempt to rank for a specific term/word/sentence.
 
Could TF-IDF be a lagging indicator and not a leading one?

Basically, from my understanding, Google AI is capable of putting content into broad categories from which it determines if individual content is relevant to be served for the determined query.

https://cloud.google.com/natural-language/docs/categories

"Best Basketball Shoes" - Sports/Team Sports/Basketball

From there they look at all the usual metrics - ON-PAGE:
- EMQ - Title, H1, URL
- Query Deserves Freshness
- Keyword Density
- a bunch of other ones (domain age, etc..)

- OFF-PAGE:
- referring domains (to the page, to the domain)
- link quality (power and trust)
- a bunch of other ones (linking anchors, etc..)

They rank those sites and then based on them - they observe what percentages of non-common keywords are being used, and they set those values as SERP required averages.

What do you guys think - is this unlikely or what?
 
It could be a lagging indicator, but I seriously doubt it considering that google has trillions on websites in their archive and new ones are being added every day it would take some huge resources to be constantly calculating the TF-IDF for every query google sees and do rememeber that the the majority of queries to google har queries that are completely new.
I simply don't see the potential bennefit justifying the extra cost to google, when they can settle for something less complex and less resource demanding such as keyword density.
 
Back