John Mueller says "There is no such thing as LSI keywords"

Ryuzaki

お前はもう死んでいる
Moderator
BuSo Pro
Digital Strategist
Joined
Sep 3, 2014
Messages
6,127
Likes
12,747
Degree
9
This warrants a discussion:

LSI Keywords

On Twitter, Google's John Mueller has said expicitly:

"There's no such thing as LSI keywords -- anyone who's telling you otherwise is mistaken, sorry."​
LSI / LSA is Latent Semantic Indexing / Analysis. It's an NLP (natural language processing) technique that ties together words and phrases with the concepts they're concerned with.

The idea has always been that Google probably uses it. John is saying NO. But they probably did use it. I'd venture an educated guess at what they use now.

Google ever owned the patent on LSI (it's a Microsoft thing), but it's out there. It brings together synonyms and related phrases tied to one concept. Does Google do something like this? Yes, obviously. But now they have Hummingbird and other techniques.

So the questions become:
  • So is LSI useless?
  • How was data ever gathered on LSI terms so SEO's could access it?
  • Did we ever have access to Google's database on this?
  • Was it ever an actual "thing" or did it simply represent using related terms on-page?
  • Did you ever need to do more than simply talk about a topic in depth to end up including LSI terms?
I'd say no, it's not useless. Three was never a good way to gather terms (other than what I mention below). No, we never knew what Google used. Nobody ever really used LSI as anything but an explanatory idea for content depth. No, you never needed to do more than just talk about the topic.

Let's be explicit and look at the context. John Mueller says there is no such thing as LSI keywords, period. The context is currently in the moment. I'm not trying to say they never existed within Google. I don't care and am not invested in defending that.

NERD Keywords

What I do want to tell you about is how they probably do it now using what's called NERD (part of an upcoming Crash Course day), a part of NLP. NERD is Named Entity Recognition and Disambiguation. The easiest way to describe this that everyone uses is... how does an algorithm know if you're talking about a Jaguar car or Jaguar cat? The other terms used on the page, of course, plus the context of the backlinks and the anchor texts used.

A huge part of this is Named Entity Linking (NEL), literally how these named entities are linked together contextually. (You can think of it as hyperlinks, but it's really the interrelatedness of named entities in general). Are we talking about Paris France or Paris Hilton?

I can't drop all the bombs and secrets, but they're out there. I'm sworn to secrecy by a guy sharing me his findings, so I can only point the way. Wikipedia has tons of info on this, for good reason.

If you want to get a start on driving relevance home, you want to look into NERD, since "LSI is Dead" The current kings of this is Wikimedia. The way Wikipedia interlinks pages, has the See Also section, etc. Huge hints. Wikimedia even has a publicly offered database.

Tangent Over

That was a giant tangent I didn't mean to get into but I hope it helps spark some discussion. LSI is "over the target" for sure, but if they aren't using that specifically, what are they using? It's some form of NLP stuff, similar to what TF*IDF tries to turn up. What do you know? What will you share?

See Also: (get it? :D)
 
It does read like semantics... .

Wikipedia usually has a page where they let you distinguish between ambiguous terms.
Apart from that, the interlinking obviously provides a LOT of context.

From what I have read, that links are more important if they're coming from websites in the same, or related niches only strengthens what you said.

Also, new crash course day? Awesome!
 
The way Wikipedia interlinks pages, has the See Also section, etc. Huge hints. Wikimedia even has a publicly offered database.
The people over at Wikipedia are masters of internal link building, including their usage of Portals and treatment of external links. I highly recommend any content-first/blog-first SEO to do a deep dive and see how they can adapt what they find to their own projects.

I've always focused on Knowledge Graphs as part of my on-page efforts, not on LSI. Some might call that semantics, tomato tomatoe potato potatoe type shit, to which I say: KG's in the broadest sense of the word can include shit like Schema.org, Open Graph, href lang/lang= settings, hyperlinks and surrounding text (put simply), and even damn image recognition. Linkable data in a continuously updating "database of databases".

- End of rant -
 
Back