How to Use Log Files to Find 404 URLs GoogleBot are Hitting?

JasonSc

BuSo Pro
Joined
Mar 30, 2016
Messages
109
Likes
144
Degree
1
I just started analyzing some log files and notice Google Bot is getting some 404 errors. I ran a screamingfrog and everything comes back as 200.

How do I find the referring page that is causing 404 if its nots showing up in screaming frog?

I'm also assuming the bot could be following an external link to the 404 page.
 
I just started analyzing some log files and notice Google Bot is getting some 404 errors. I ran a screamingfrog and everything comes back as 200.

How do I find the referring page that is causing 404 if its nots showing up in screaming frog?

I'm also assuming the bot could be following an external link to the 404 page.

The logs should tell you the referring page - Apache and NGINX have logs that tell you exactly that AND they both log bad requests to error.log. (Ubuntu Linux: /var/log/apache2 or /var/log/nginx)

Example Apache log:
192.168.255.50 - - [18/Oct/2019:14:24:26 -0400] "GET /blog/tutorials/ORM-Like-A-Pro/ HTTP/1.1" 200 15573 "https_//www.serpwoo.com/blog/tutorials/ORM-with-SERPWoo/" "Mozilla/5.0 (compatible; YandexBot/3.0; +http_//yandex.com/bots)"
192.168.255.50 - - [18/Oct/2019:13:52:03 -0400] "GET / HTTP/1.1" 200 11009 "https_//www.quora.com/How-do-I-find-most-searched-words-in-YouTube" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36"

The blue part is the page that was requested, the green part is where the user came from.

The first example is YandexBot following a link and getting a 200 http status.

The second example is a user that came from a specific quora.com question and landed on the SERPWoo.com home page. You can see the exact link of where they came from.
 
@CCarter Thank you....total mind fart on my part. I hadn't looked at log files in a long time, which isn't a good idea. I forgot about all the good information you can glean from them.
 
Redirection plugin for Wordpress might work. It's what I use to find 404s
 
Back