Remember the good ol’ days back before dymanic websites where pages had .html extensions and when you tried to access a page that didn’t exist you got an ugly, yet reassuring 404 Not found page? The significance of this page is actually pretty important – not only does it tell the user that the page is not found but it returns a special HTTP status that tells web spiders the same thing. As web developers, sometimes we forget that humans aren’t the only ones accessing our pages, and as a result don’t use the correct HTTP response codes to denote what is going on.
What the hell is a HTTP response code?
When your web browser makes a request to a web server, the web server will return a status code as well as the web page, which tells your browser what has happened. This response is usually made up of two parts: a number (which is for any spiders or bots that might be accessing the web site) and a string (which is for humans) and back when everything was static the web server took care of everything.
Unfortunately for web developers, in this environment of database driven web sites, we often don’t have the luxury of letting the server take care of everything, so this article aims to show you that it isn’t that difficult to do HTTP responses correctly. As I will show later, this can adversely affect your ranking in your favourite search engine.
Let’s try it out
Firstly, lets see what happens when you actually make a request – you can see what is going on using another old school application: Telnet.
Open up command line or Terminal.app or terminal depending on you flavour and type the following:
telnet madpilot.com.au 80
You will get a prompt and type the following (Windows users might not see anything as the telnet client won’t echo what you type):
HEAD / HTTP/1.1
Host: madpilot.com.au
…and hit enter twice – you should see something like:
HTTP/1.1 200 OK
Server: Mongrel 1.0.1
Status: 200 OK
Cache-Control: no-cache
Content-Type: text/html; charset=utf-8
Content-Length: 4123
The first line of the response is the important bit – it tells the web browser that the response conforms to HTTP version 1.1 and more importantly the response code is 200 and the response type is OK! The 200 type is the most common response you will come across, it means that the page was found and served up correctly. Generally your web server WILL take care of this one for you. Let’s look at how you can change that status code.
I’ll use PHP as an example, because it is still the most common dynamic language – but you can do this in any language, leave a comment if you would like an example of how to do it in another dialect. It is all very simple – BEFORE you output any HTML, call the header() function as such:
header("HTTP/1.1 404 Page Not Found");
As you have probably guessed, this will tell the browser that the page it requested was not found. Why would you want to do that? Obviously if the script is being run, it has been found? Well, yes – that is true, however, when the HTTP specification was written, CMSs and dynamic product catalogues weren’t even thought about – so we need to think a little bit differently.
Let’s look at an example: Your customer has requested to see the details for product #10 by browsing to
http://www.yourcomany.com/products/view/10
Product #10 exists, so we serve up the details, but what if the customer decides to see what product #20 is? If product 20 doesn’t exist, then what should we do? One option is to print out “Product not found” which is fine for the user, but what happens if your favourite search engine tries to hit product #20? If you maintain the default action the search engine will receive a 200 OK status which makes it think that the product exists and it will index it! This just pollutes the search engines’ index and hurts your ranking, which is bad. So what we can do is serve up the 404 header from above and this let’s both the user AND the search engine know that what they have requested doesn’t exist.
So what other response codes can we use? There is a complete list here, but I’ll run through a couple of the common ones:
301 Moved Permanently: This response code means the resource that has been requested USED to live here but has now moved somewhere else and will never return. Returning this status code is extremely important if you are changing the structure of your website, as you can tell the browser where it needs to go to get the resource. More importantly, it also tells your search engine to update it’s index with the new URL. You need to supply the new URL as part of the request, so it looks something like this:
header("HTTP:1/1 301 http://www.yoursite.com/new-url");
302 Found: The 302 is actually generally used incorrectly. The most common use is to redirect a user to another page TEMPORARILY which is actually what the 303 code is for. unfortunately, not all browsers support 303 and actually expect a 302 in this case. So who are we to argue? If you are a PHP developer, you have probably used
header("Location http://www.yoursite.com/somewhere");
before – this is exactly what this does.
403 Forbidden: If you wanted to really play HTTP right, you would return this code every time someone tried to access a private URL when they weren’t logged in. It means the server knows what you are trying to do, but isn’t going to let you do it.
404 Page not found: This has been covered – basically if the resource the agent wants doesn’t exist, you serve this up.
410 Gone: This page you are looking for used to exist, but it doesn’t anymore. In fact there isn’t even a new URL, so if you are a search engine, just forget about it. Whether search engines listen to this, I’m not sure, but it can’t hurt.
500 Server Error: Something went wrong with the server. I would throw this up if there is an error that is stopping the page from loading, such as a missing database or a broken web service or similar.
Don’t forget that you can also server up content to the browser (in fact, if you don’t humans will just get a blank page), so it is recommended that you serve up a nice friendly message to your visitors explaining what happened.
So there you go – now there is no excuse for serving up errors to your users and forgetting about our automated friends. So when you are writing your next kick-arse web app, spare a thought for the visitors that aren’t so good at parsing human talk.