Website Analysis – 404 HTTP Error

Yes, it’s time to address those dreaded 404 not found errors. A 404 is returned whenever you have a link to a page that doesn’t exist. For example, somebody links to your about-us.html page but a year later you decide that “About Us” pages are so last year and remove it. Now the link points to… nothing. That creates a 404 not found error.

But what do we care? Well, links are very important for helping our pages rank highly in the search engines and also for funnelling visitors to our site. If we take care of our 404s we can:

  • divert the otherwise wasted SEO value of the links to real pages on our site and help them rank highly.
  • deliver visitors to a meaningful and engaging page on our site instead of a generic and unhelpful 404 not found page.

So, how do we detect 404s and how do we fix them?

Detecting 404 HTTP Errors

Enter Google Webmaster Tools. GWT is easy to set up and provides useful link information about your site (among other things). If you have GWT installed, from the dashboard click Not Found in the Crawl errors section. Doing this presents a list of URLs on your site that don’t exist but that have links to them.

Let’s pretend you have a million 404s and only a limited time to sort them out. Which ones do you fix first? Fortunately for us, there is a handy column to the right called Linked From that tells us how many inbound links each missing page has. Intuitively we know we should fix those pages that have the most links as then we will be gaining the most SEO value from those links, and we will also making the most potential visitors happy. Unfortunately, you can’t resequence this list by descending Linked From value. It’s lucky for us that you can “download this table” in a CSV, open it in Excel and then sort the list by descending Linked From value. Phew.


Fixing 404 HTTP Errors

I ain’t technical, but I know I can fix a 404 error with a 301 redirect in my .htaccess file. If these things mean nothing to you, do not despair. There is an upcoming article due at any moment that explains what these arcane terms mean.

Fixing 404 HTTP Errors With A 301 Redirect

Download the .htaccess file from your server so you know you’re working on the most up to date (or at least the “live”) version. Open it using notepad and paste in the following:

Redirect 301 /old.htm

Beware the initial forward slash. I’ve missed that off a few times and the sky fell on my head each time.

Now you have the tools to redirect missing URLs, but where do you redirect them to? That is the 64 million dollar question. You have four options:

  • Create a new page whose URL exactly matches the 404 and, hey presto, you don’t even need a redirect. The link will simply point to that new page.
  • Create a new version of the missing page and redirect the old URL to that.
  • Redirect the missing URL to the homepage.
  • Redirect the missing URL to the best matching page. For example, the missing page might be about monkey training, but you actually have a page about dog training – what the hell, it’s nearly the same thing. Redirect monkeys to dogs. This reminds me of the time I transplanted a monkey’s brain into a dog. Man, that was crazy. Redirecting pages might not be as much fun as transplanting brains, but it has more influence on the search engines. Unless it’s Matt Cutts brain we’re talking about…


Buying New Sites And 404s

When you buy an existing site, the chances are that you’ll have to sort out some 404s somewhere along the line. The site I’m currently analysing has a mere 6 pages missing – but it’s early days yet. It could be that Google simply hasn’t found any others yet. One nightmare of a site I bought last year had around 100 404s I had to redirect.