Catching 404 Errors

CCarter

Final Boss ®
Moderator
BuSo Pro
Boot Camp
Digital Strategist
Joined
Sep 15, 2014
Messages
4,209
Likes
8,695
Degree
8
In my never ending quest for perfection I created a script that helps me catch 404 errors quickly. It simply reads from the recent access/error logs and then send me a list once a day (using a cronjob at 8:30AM) on bad requests.

It'll get one-off stuff here and there but it will also help you see constant problems that need addressing.

Some of theses simple fixes, especially on large site can translate into recovering lost revenue due to users hitting bad pages. Here is the php version of the code:

404-error.php:

Code:
<?php

$files = ["/var/log/apache2/access.log", "/var/log/apache2/access.log.1"];
$missingPages = [];

foreach ($files as $file) {
    if (file_exists($file) && is_readable($file)) {
        $lines = file($file);
        foreach ($lines as $line) {
            if (strpos($line, '" 404 ')) {
                preg_match('/"GET (.+?) HTTP/', $line, $matches);
                if (isset($matches[1])) {
                    if (!isset($missingPages[$matches[1]])) {
                        $missingPages[$matches[1]] = 0;
                    }
                    $missingPages[$matches[1]]++;
                }
            }
        }
    } else {
        echo "Cannot read file: $file\n";
    }
}

arsort($missingPages);

// You can email this to yourself once a day in the morning or have it posted to a slack channel or other communication channel to monitor

function sendEmailReport($missingPages) {
    $to = 'myemail@compuserve.com';
    $subject = '404 Error Report';
    $message = "404 Error Pages Report:\n\n";
    foreach ($missingPages as $page => $count) {
        $message .= $page . " - Hits: " . $count . "\n";
    }
    $headers = 'From: main_site@example.com' . "\r\n" .
               'X-Mailer: PHP/' . phpversion();

    if (mail($to, $subject, $message, $headers)) {
        echo "Email sent successfully to $to\n";
    } else {
        echo "Failed to send email.\n";
    }
}

// Generate and send the report
sendEmailReport($missingPages);

?>

results:

x9LHqmt.png

It tells me that 11 attempts, me, went to /help/ which doesn't exist. If this was a live site then I use the following command line to figure out what's calling these pages:

cat /var/log/apache2/access.log | grep "404" | grep "help"

That should help narrow down your hunt.

Also instead of emailing it to yourself you can have it post to a slack channel called #404-Errors and so your whole team can see problems as they come in.

From the trenches,
CC
 
Back