Google's AI vs Metadata and Ranking

Joined
Mar 28, 2019
Messages
105
Likes
97
Degree
0
Came across this interesting article on Hacker News: We were promised Strong AI, but instead we got metadata analysis (calpaterson.com)

Think this subject has been discussed partly in some other topics here, but this article explains it very well.

Basically the argument is that we've been promised 'strong' AI for years now. AI that, in the case of Google, would simply read a page and know exactly what the purpose of that page is and how that page should rank in the SERPs.

However, Google still relies to this day extremely heavily on metadata, i.e. webmasters telling Google very explicitly what a page is about, who wrote it, and so on.

What makes this relevant to this forum is that the author argues this is a key reason why content sites (like many of us here operate) still rank so highly. Content site owners know very well how to feed Google the right metadata so they rank for the correct queries. Many other people do not know this, and so rank very poorly even if their information is inherently more valuable to Google's users.

The promise of AI is that it will solve this problem, and the AI will on its own identify those valuable pages that do not spoon-feed Google structured metadata. The point being that we should be thankful that the AI is nowhere near as powerful yet as it was promised, or we'd probably have a much harder time ranking.
 
The point being that we should be thankful that the AI is nowhere near as powerful yet as it was promised, or we'd probably have a much harder time ranking.
If you have useless or bad content that would be the case.

For example if you are just writing fluff for recipes above the actual recipe this "stronger A.I." coming will detect that and drop your site to hell.

But if you are providing actual value, showing usages, pros and cons of something, and can explain it in an entertaining way you should win.

That entertaining part is what I believe A.I. will miss, cause a technical manual is going to be a lot more informative than a gif/meme story on a subject, but who is going to read that technical manual? There is no entertainment and it will be a snoozer.

ON-TOP humans are visual learners. They learn from pictures, images, and video, it's SEOs who are delusional in thinking someone is really going to read a 2000 word article with zero images, no videos, and no interactions about a subject.

People aren't spending 4-8 hours a day on Instagram, Twitter, Tiktok, and social media reading long format content.

fJXz1rk.gif

AND they get a ton of information and news from social media, so... clearly there is a disconnect.

If the SEO's primary metrics for quality is word count - how does that even make sense? Where is the salemanship, the entertainment, the "voice" of a brand? If you don't think that matters, well it's why SEOs can't seem to get it together.

That voice - people want to read certain people's content, just like on this forum, and they would rather not read others.

How is the A.I. going to distinguish between content with memes/gifs and has an entertainment aspect versus a technical manual WITHOUT some help from metadata showing: "hey this more entertaining/informative page has better time on site than this bland technical manual version"?

Nw0jSYj.gif
 
The problem is "A.I." (aka machine learning with self-referential algorithms, I've yet to see anything remotely like actual intelligence) learns from inputs and the consequence on the outputs. Even if the algorithms become a blackbox even to the developers, it's still manipulatable because it relies on inputs.

And in this case you can train it to do things it shouldn't be doing. So then they have to code "if" statements basically saying "don't do this or that".

Isaac Asimov wrote about this concept a long time ago when this was all a dream, with his Three Laws of Robotics
  1. A robot may not injure a human being or, through inaction, allow a human being to come to harm.
  2. A robot must obey orders given it by human beings except where such orders would conflict with the First Law.
  3. A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.
The normie doesn't realize that they're reading a set of nested if loops.

My point is that exploitable inputs lead to more if exemptions that are still exploitable within the confines of the intended use, and that's what we're concerned about anyways. We want the algorithms to rank us better, not cook us breakfast.

The Matrix makes a very strong point about this and I feel that Google already operates like The Architect does in those movies. You CANNOT get rid of exploitable metrics in an algorithm, so what you do is either make it a moving target with enough time delays that you can penalize those exploiting them before the benefit takes place. Or you keep the exploitable metrics the same and play whack-a-mole like they did with The One. And they become exceedingly good at it (Penguin, Panda, etc.).

Anyways, that's what Metadata and especially now, Schema, is for. It's to crowdsource the training of the machine learning algorithms until they have enough data that they no longer need Schema. The oldest example would be directories being usurped by algorithms, and then meta keywords being usurped by on-page indicators. A newer example might be Authorship profiles being dropped for Schema plus link signals to a home base for the author (a website, social media profile, etc.)

What happens when they no longer need Schema? Nothing. They still take the same inputs minus the Schema that we trained them to understand. And if they use the vast data they have to tweak things, we just keep crowdsourcing the feeding of new inputs, then one of us mentions it on BuSo, then the bloggers and gurus pretend they figured it out and spread it to the normies, and we're back to square one. Some of us will exploit it and get whack-a-mole'd or they'll start moving the target around with time delays (or both, like they do now).

I'm rambling now but people who have been in the game long enough have seen metrics, platforms, and all kinds of crap get exploited and work amazingly well, then get hammered into oblivion. Then someone like me will re-test those things 10 years later once everyone else forgot about it and find out it works amazing again, because the target moved.

I think the best assumption to make about Google isn't that they want to provide quality results or answers. They do, but they'll compromise on that for cash flow. So that can't be the main motivator. The main motivator, if you ask me, is simply to not be embarrassed on the public stage where the normies begin to understand Google like we do: it's simply a series of inputs that effect outputs and no matter how opaque the algorithms are and how many nested if loops are in place, it is we, the webmasters, that are in control.

Without us creating things to rank and creating inputs to evaluate and creating things for consumers to consume, there's nothing for them. And once we realize we control those things, we understand that we're the hive mind and the algorithm and A.I. is only a derivative of what we choose.

A part of me started to compare this whole thing to lucid dreaming, but I realize now that what I'm typing is only a derivative of what the coffee is inspiring :D
 
Indexes in disguise with extra steps describes most web tech buzzwords.

The real competitive advantage is being able to do indexing efficiently and having a data set worth indexing.

This is why supporting the Common Crawl and other initiatives is incredibly important.
Lack of open access data sets slows everyone and everything down.
 
Last edited:
It's interesting but an AI will run into the same problems as a human when it attempts to judge value, because value is subjective.

It is easy to filter out content that is high on fluff and low on contextual value, but like ccarter says, how do you judge 500 words of compressed and engaging content vs 3000 words semi-academic mumbo jumbo?
 
Last edited:
How is the A.I. going to distinguish between content with memes/gifs and has an entertainment aspect versus a technical manual WITHOUT some help from metadata showing: "hey this more entertaining/informative page has better time on site than this bland technical manual version"?

It is easy to filter out content that is high on fluff and low on contextual value, but like ccarter says, how do you judge 500 words of compressed and engaging content vs 3000 words semi-academic mumbo jumbo?
Definitely true. I think this is actually an argument that fits well with the article. People think Google has this amazing AI that really 'understands' content (after all Google keeps saying you should just keep focusing on great content and the ranking will come automagically). But really Google is just looking at metadata like time on page, click through rates, etc to figure out if something is 'engaging' content or not.

A really good 'proper' AI should be able to deal with those challenges you guys mention, and be able to understand that a short piece with a lot of visual stuff going on might be more useful and engaging. But it still can't. So Google just uses some really 'dumb' data like time on page.

Anyways, that's what Metadata and especially now, Schema, is for. It's to crowdsource the training of the machine learning algorithms until they have enough data that they no longer need Schema. The oldest example would be directories being usurped by algorithms, and then meta keywords being usurped by on-page indicators. A newer example might be Authorship profiles being dropped for Schema plus link signals to a home base for the author (a website, social media profile, etc.)
It's interesting, because that's kind of the opposite of the argument from the article. He argues that people use AI/ML to figure out something useful (like teaching a Tesla to navigate a McDonalds drive thru), which then gets replaced by metadata because it's way more stable and predictable (McDonalds generating XML files for drive thru routes for all their restaurants) .

But your point is a good example of it going the other way. But as you said, the AI/Algorithm is trained based on the original metadata, and sort of tries to emulate the result, but just in a fuzzy way so that we have to guess what it measures exactly. In that way it is still relatively 'dumb' and not what people think of when they hear the buzzword AI.
 
Back