@madpilot makes

Yep, Software Engineering is dead

You know that feeling you get when something you’ve been taught to believe in gets discredited and because your belief was tenuous at best, the walls come tumbling down around you and then you have a huge weight lifted off you shoulders?

Pascal just posted this on the 220 mailing list. Amen. It’s something that I’m pretty sure I’ve been articulating for a long time. Whenever someone has asked me why software is hard, I always use this analogy:

If you ask a Civil Engineer to build you a bridge, it is easy to spec out. You know how far the bridge has to span, what sort of foundations you need, and as a result you can make a recommendation about what sort of bridge you need. The Engineer can build you a little model – you can look at the model and say “Yes! That is a bridge. That will do nicely”. They can mathematically model the bridge to make sure this doesn’t happen. They build the bridge and if it allows things to cross from one bank to the other, you have a success.

Unless you are building “Hello World”, a Software Engineer’s life isn’t so simple. You have different platforms, users, stakeholders, contexts – it gets exponentially harder with every feature that gets added. I once did a unit at Uni called Formal Methods which tried to mathematically model software. It was stupid. The code we modelled was like, nine lines long, and required a 32 page proof (I didn’t even get close). Stupid.

Of course, academics have been trying to shoehorn software into engineering for ever. In first year, they taught us UML which I guess is similar to architectural drawings or flow diagrams or something. I’m sure UML works really well when working with the waterfall model of software design, which has strong ties to old school, proper engineering. I couldn’t imagine having to go and update hundreds of UML documents every time a minor change was required. We are also taught in first year, that the waterfall model is pants in the real world, which by association makes UML nothing more than a nice thought experiment. (I’m still bemused by the number of Software firms that put it as a requirement for graduate Software Engineers – basically because coming up with job descriptions for inexperienced programmers is really hard).

Sure you can argue that testing is a software technique that we (should) use, but this is the exception to the rule. I guess the conclusion we need to come to is that Software isn’t an engineering problem – it’s a people problem. (Some may say, it’s a creative problem – that’s also true, but buy me a beer and I’ll explain that traditional engineering is too, so the argument doesn’t further my point). This in itself is a problem, as (gross generalisation ahead) boffins who like coding, tend not to deal with real people very well.

Further discussion on our internal list suggested that creating software products is the way to go. I think I want to agree with this – there are many examples of off-the-shelf products that are extremely popular: Microsoft Office, Adobe Photoshop etc. In these situations, the customer works with in the workflow of the software, and that seems to work. So do we as developers need to convince our clients that the feature they want may not be needed? Do our clients actually know what they need? Of course this view is not with out it’s flaws either – users will generally be working against the software, rather than with it. Is working with a sub-optimal solution better than battling with requirements and budget overruns?

I can’t help to think that there is something we are missing. It would seem there is a disconnect between what our clients want and what we can provide. If you look at the classic project triangle, your client wants to minimise price and time, and maximise good (I hope my English teacher isn’t reading this), where as we want to maximise all three. So the crucial “pick two” part flies out the window. Either we start sacrificing the good, re-negotiate the price, or try to stretch out the project to restore the balance – none of which makes for happy clients.

Well how about adding fat to the quote? In theory, this is fine – if a client sees value in an “inflated” (but more likey a realistic) price then everyone is happy right? Well, not really – software development is much like homework assignments: You start out with plenty of time, and the best intentions, and then end up pulling an all-nighter to get it finished – and you still only get a C at best. I suspect this is because it’s impossible to lock down requirements of an abstract problem. This isn’t only because of the difficulty in describing what we don’t understand, but because we don’t even know what half of the problems are going to be.

And this is our quandry – how can we estimate unknowns? Not just “we haven’t seen this before but it looks like X” unknowns, but “What the hell? How is that even possible?!” unknowns. Other areas of Engineering encounter these problems occasionally – we get them all the time. So, the solution (he says as if there is one) is to minimise the risks and/or consequences of these unknowns. Jobs that deal with people do this all the time. If you work in marketing, you can postulate all you like – you can’t be sure how a campaign will work until it does. Marketing is reactive.

When you make a change you can’t be sure what will happen. Sure, you can put an ad in the Yellow Pages year after year, because it has brought in on average Y leads per year – but there is no guarantee this year will be the same. It seems that the humanity-based sciences are happy with this, but quantitative-loving geeks don’t like that. Hell, binary is black and white, not Gray.

So, perhaps the key is to treat software as a living breathing thing. Agile programming and iterative development can help, but they are means to an end – they don’t work with out communication and understanding between people. We need to break down the barriers between provider and client – the question is: Is that even possible?

Debugging JavaScript in Internet Explorer

As anyone who has ever received the dreaded Object doesn’t support this property or method error in IE can attest, debugging using everyone favourite browser is a right, royal pain in the heiny. Using Firebug in Firefox really has spoilt us as frontend developers (Hey, what am I talking? These ARE BASIC TOOLS that every other development platform has had since Ada Lovelace was in small britches, but I digress), so what is a cross-platform developer to do?

Little known fact outside of the .NET world: Visual Studio has a complete JavaScript debugger built in, which allows you to set breakpoints, add watches, mess around with variable values and more importantly, gives you better error messages, and actually highlights the line where things went wrong.

  1. Download and Install Visual Studio Express Edition (which is free as in beer).
  2. Create a new Web Site – call it what ever you like, it’s just a placeholder
  3. Hit “Run” (or F5) on the blank problem – a local server should start, and IE7 should load a blank page
  4. Change the URL to the page you want to debug. Once the page is loaded the debugger is good to go – in fact, if your page has any errors, Visual Studio will get focus, and politely tell you as such
  5. To add a break point, flick back to Visual Studio Express and open the JavaScript file you wish to add the break point to, and then refresh the browser – once context hits the point, you will be able to step through the code.

Whilst still lacking a DOM browser (Firebug Lite might be able to help out with that), this takes some of the fun out of debugging JavaScript IE, from the point of view that it is now actually possible.

The need for speed

If you are a DBA, and your reading this – look away now, because I’m pretty sure they covered this in Database Optimisation 101 and you WILL laugh at me having this revelation. 88 Miles hasn’t been the snappiest web application around lately thanks mainly to an influx of users (NOT that I’m complaining :P). I’d successfully added some views to speed up some of the reporting recently, and I went through today and optimised a lot of code, but it still wasn’t as quick as I would have liked it (A page load in the main index page was taking on average 1.5 seconds – down from the 4 seconds pre-optimisations).

I was googling the performance differences between INNER and LEFT joins (INNER wins most of the time for those of you playing at home), and came across a word that I vaguley remembered between dozing off in my Database class at university – indexes. Now, don’t get me wrong, I KNEW these things existed, I even knew what they did, but because I don’t use them regularly, I didn’t even think to look at them. As all of the primary IDswere already primary keys, my gaze turned to the foreign keys (I use the term relatively loosely – they were foreign keys in the sense that they referred to another table ID, not because they had been explicity setup that way).

I added indexes to the foreign keys on the three main tables, and voila! A ~10x speed increase on that front page! It’s such a simple optimisation too! *Slaps head*

A recipe for server migration

As any website grows, we often find ourselves having to move servers to deal with capacity, or better prices or whatever. This can be a right pain to do, but with a little planning and a few tricks, you can make the process somewhat smoother. In this quick howto, I’ll cover how to move a typical web setup: Web server, DNS, Mail and database. Of course there are more than one way to skin a cat, but this method works quite well if you have a couple of days up your sleeve and has the advantage of only costing you a few minutes in downtime. In this example, we will pretend to move http://www.mycoolwebapp.com from the IP address 192.168.0.1 to the new IP address of 10.0.0.1 (Fake web host, and fake, local IP addresses – substitute with real values).

Step 1. Move your DNS

DNS can be the most painful step, as it can take up to 72 hours to move from one primary DNS server to another. In a nutshell, DNS acts like a big phone book, which tells your browser that http://www.mycoolwebapp.com belongs to the IP address 192.168.0.1. When you enter http://www.mycoolwebapp.com in to your browser, it will ask the operating system for the IP address. If the Operating system has it cached, it will return it. If the cached value looks old (ie the TTL has expired), or it doesn’t know about it it will ask a parent DNS server (or root server) where it can find the updated records, and will then fetch the new records from the relevant DNS server. The TTL is usually set pretty high – in the range of days, as generally Name-to-IP address mappings don’t change much, but if you do change, there means there could be a couple of days before all the servers around the world are updated, meaning your site won’t be found during that time!

Most users don’t have control of the Time-to-live (TTL) value on the root servers, so you need to ensure that both your old primary DNS server, and your new primary DNS server mirror each other. This way, regardless of whether the user get old information or new information, they will still be directed to the same information.

Most users WILL have control of the TTL on their own servers, so we can set that to something small which will make switch over much easier. Make note of the TTL on your old DNS server – you will need to wait that long after changes things before you can switch over to your new server. Setup your new DNS server to point to the old server (192.168.0.1) – ensure that all of the values are exactly the same. Now, on both servers, set the TTL to something very small, say 10 minutes (which is 600 seconds).

As this can take a couple of days to for the root DNS changes to propagate, you may as well change the root DNS details now. This will usually done via the domain registrars control panel. Simply change the primary and secondary DNS server IPs from the old DNS servers to the new ones (192.168.0.1 to 10.0.0.1 in this example). Make sure you double check all the values, because if you make a mistake it can take days to rectify!

Step 2. Setup the new webserver

Now that the DNS is setup and in the process of re-delegating, you can setup website on the new server for final acceptance testing. This is usually the easiest part of the process. What I do here is deploy the current software to the new server, take a snapshot of the database (phpmyadmin helps here if you are on MySQL, but each database system has a mechanism for backing up the database) and copy over any uploaded files etc. Now, most shared hosts will use virtual hosts, which means it serves up different pages based on the domainname, not the IP address, and if your software relies on the domain name for functionality, it can be really hard to test all the functionality with out the having the name-to-IP address mapping in place (remember, at this point http://www.mycoolwebapp.com is still pointing to 192.168.0.1).

Hosts file to the rescue! You might not know this, but all the major operating systems have a “hosts” file, which is checked before a DNS check is made, so if the operating system finds the requested domain name in that file, it won’t even bother querying the DNS server. By associating the new server IP address with the domain name in this file, we can actually view what is going on on the new server (Just don’t forget to delete the entry when you are done!). Windows users can find the file in c:windowssystem32driversetchosts, and Linux (and I’m pretty sure OSX users) can find it in /etc/hosts. There is usually examples in the files, so it’s best to follow those, but I know these values work for Windows and linux:

10.0.0.1    www.mycoolwebapp.com

You may need to restart your browser for the change to take effect. Now if you go to http://www.mycoolwebapp.com you will be taken to the new server allowing you to set it up and check everything is working properly with out affecting the currently live version.

Step 3: Setup mail

Email can be pretty painful to setup, and is one of those things that will get you in a lot of trouble if you stuff it up. First off all you need to know all of the email addresses associated with the domain. If your hosting provider uses a web admin control panel like Plesk, this is usually pretty easy. Mirror all of the accounts on the new server, and make sure all of the quotas are either equal to or greater than the current values. If you need to setup new passwords for all of the accounts, note them down, as you will need to notify each user of their new password. Don’t forget to check for things like aliases and forwards.

Step 4: Check and double check

Make sure you note down any other bits and pieces that may have been setup, like cronjobs, other services and make sure that things like mail contact forms etc actually work. There is a trick here for young players with email forms – the resulting email may go to the new server, not the old one, so if you test a contact for and no email appears, check the new server.

Step 5: The big move

The way I usually play this is to replace the website on the old server with a “We are down for maintenance” page. This has the advantage of ensuring no one posts any new information whilst the the site is in flux and also gives you a visual confirmation that things worked. If your server can use .htaccess file, this is pretty easy using a rewrite directive. If the framework you use has friendly URLS, there will probably already be a .htaccess file that tells the server to rewrite the URLS to file – this is the simplest solution – just create a maintenance.html file and change the rewrite to look like this:

RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)$ maintenance.html [QSA,L]

This allows you to still access CSS, images and JS files, so you can style the maintenance page, but any request to a file that doesn’t really exists will get piped through to the maintenance.html file.

If .htaccess isn’t an option or you have a lot of static file, you might have to do the old fashioned “move the current site root and replace with a new one” trick (There are rewrite rules you can do, but this just might be easier). Create a new directory that has the maintenance file (You would probably have to call it index.html in this instance) and put any associated images, CSS and JS files. Then move the current site root (for example public_html) to say, public_html.old and then move the new directory to replace the old one (ie public_html). When you view the site on the old server (you might need to remove the entry in the hosts file to see it) you should now see the maintenance message.

Next, re-sync the database and any new uploaded files and give the new server a final test (by putting the hosts file entry back in). Once you are happy everything is working, we can flick the DNS over.

Step 6. Flicking the DNS

On both the old and new server, change the IP addresses for all of the relevant entries from the old IP address to the new one. Within 10 minutes, everyone should be seeing the new server’s version of the site! This also means that all of the email will be heading to the new server to, so…

Step 7. Re-configure mail clients and syncing up folders

If you configured new passwords for all of the mail clients, they will need to enter the new password into their mail clients. I generally walk them through creating a NEW mail account on their clients – the reason for this will be revealed in a second.

If they have been using POP in the past, then they will probably already have local copies of their email, so your job here is done. However, if they are using IMAP (probably more likely now) you will need to sync up the two accounts. There are tools out there that can do it, but you will need to know the passwords on both servers, and they can be a bit hit and miss – thankfully you can use the users mail client to do the sync.

Simply create a new account on their mail client that points at the new server, and change the old account to point to the IP address of the old server. If done correctly, the user should now be able to access their old email and folders as well as any new email on the new server. Most mail clients will actually allow you to drag folders across servers – do that and your email will be synced up! (It might take a while, if they have a lot of archived mail). After that is done, you can delete the old account and set the new account to default.

Step 8. Shutdown the old server

The final task is to shutdown the old server. I usually leave it running for a week to be on the safe side, mainly to recover any email that the user forgot to move across (It ALWAYS happens).

Step 9. Do and get a drink

You deserve it – it’s probably been a big week!

I hope that this made sense, or at least acts as a resource when you try to explain to a client why it can take a week to move servers! Of course, these instructions are pretty general, so your mileage may vary. Golden rule is don’t rush and double check everything!

A stark realisation

There comes a moment in every career where you realise that there is a whole world outside of what you do. Sure, you don’t have to have three PhDs to figure out the world of macrame is significantly different to Ruby on Rails hacking, but when was the last time you thought about process in, say banking software development? Or had a look at what is going on in the world of Operating System code? To an outsider, these are related industries – they both involve sitting in front a grey box for hours on end banging gibberish on to keys with embossed letters that are out of order, but to us, they are worlds apart.

A couple of days ago, something reminded me that I, and I assume a great number of other web developers have forgotten our brethren that are more at home with linked-lists than unordered-lists, and as a result, we are significantly reducing our ability to find the right tool for the right job.

This revelation was borne from the recent Twitter downtime. As always happens, hundreds of well-meaning developers tried to offer up solutions to fix Twitters yo-yo like uptime graph – there was the usual Rails bashing, some bewilderment over MySQL replication, and a general consensus that is shouldn’t be this hard, but one article really stood out: Scaling a Microblogging Service. Go and have a read if you want to know why scaling Twitter is REALLY HARD (maybe impossible),

For those of you who don’t mind a spoiler, the crux is that Twitter is designed on a platform more suited to content management, when really it is a messaging system.

Now, you go to any web developer on the planet and ask them to build you a Twitter-clone, and I bet you each design would be pretty similar. You would have a table for users, a table to hold friend references, and a table for messages, which would all be linked via some sort of foreign key relationship. The reason being is that for 90% of what we work on on a day to day basis, this makes the most sense. Generally, you have few authors, many readers and those readers all get basically the same information.

Twitter is is completely different. Everyone is a producer and everyone is a consumer. Every second there are hundreds if not THOUSANDS of writes from people leaving their 140 characters worth of thoughts. To make matters worse, every user get a different view, so every page load would be absolutely thrashing the Twitter database servers.

Twitter is a MESSAGING SYSTEM – not a CMS. It closer to a mailing list then a blog. So why did the developers not make this observation straight away? Well, as web developers, this is foreign territory. If Nokia decided that they were going to build Twitter, I bet things would look very different. Twitter and SMS are pretty closely related, and for someone dealing with short messages all day, that may have been the natural path for them to take.

You might counter this argument by saying that we are web developers – of course we think in terms of web. We do what we know, just like everyone else. Correct. But maybe we should stop and think a little before we dive in and start coding up prototypes using our favourite RAD tools. Maybe we should try to look for metaphors in other areas before taking our hammer to unsuspecting screws, bolts and watermelons.

Maybe we should take some lessons from the User Interface guys? They use metaphors all the time when they design. Without somebody realising that people like putting paper into manila folders, we would still be kicking around text terminals. Lucky for us, our metaphors don’t need to be as abstract as that. We only need to look at outside our little bubbles to see what developers from other industries are doing and we may too see a obvious solution that we would have otherwise missed.

Working with branches in Subversion

I know that Git is the flavour of the month in regards to version control at the moment, but I still use Subversion (SVN) for my day-to-day version control needs. And since it is still very popular, I think this quick tutorial still has it’s place. Today, I was asked by a client to show them how to branch a SVN repository so they could start making some major changes to their application with out running the risk of breaking the release version.

The scenario works something like this: You have finally launched your application and everything is purring along nicely. You decided to start working on the next iteration, which has some major changes that WILL break things initially. You start working away, and find yourself half way through the changes when you get a call from an irrate customer who can’t complete their transaction because of an obscure edge case bug that you missed. The dilemma that you have is that your source base is in a state of flux, and you can’t release it, because in it’s current state, it doesn’t work. Wouldn’t it be great if you could have maintenance version of your application that you could make the fix on? Enter branching.

Firstly, a bit of terminology. I’ve used the word “branch” a couple of times. The analogy is simple – you have a core line of code (The “trunk”) and you can have concurrent lines of code that “branch” from the trunk. One of those branches may well be our “stable” branch, meaning that the only changes that can occur are bug fixes – ie NO new features. As with any system, there is more than one way to fell a tree, but this is the system that I generally use.

So how do you do it? When I create a new SVN project, I like to add a trunk directory, and a branches directory. With in that branched directory, I add another directory called stable (let’s pretend my SVN server is at svn.myserver.com):

# Create a new project - tedious stuff like locking down access omitted for clarity

svnadmin create new_project

# Now head over to your working directory, and check out the initial version

svn co svn://svn.myserver.com/new_project

» Checked out revision 0.

cd new_project

mkdir branches

mkdir trunk

# Now to add the new directories to the repository

svn add branches trunk

» A         branches

» A         trunk

svn commit -m "Adding initial directories"

» Adding         branches

» Adding         trunk

»

» Committed revision 1.

Now we have our working copy setup and committed back to the server, you can start work on the trunk. Cut scene to the day before go live. You are pretty happy with how the trunk is looking, and you would like to branch the code into stable. For this we use the copy command

svn copy svn://svn.myserver.com/new_project/trunk svn://svn.myserver.com/new_project/branches/stable -m "Branching stable"

» Committed revision 10.

cd stable

svn update

» A    stable

» Updated to revision 11.

And we are done! Now you can continue on your merry way and change code in trunk with out affecting your stable release. Once you are happy with the next version and you’re ready to create your next stable branch from your new feature rich trunk, you can “merge” your changes from trunk into stable:

svn merge svn://svn.myserver.com/new_project/branches/stable svn://svn.myserver.com/new_project/trunk -m "Merging trunk changes into stable"

» A    new_file_that_is_in_trunk.txt

And of course it works the other way – say you found that obscure edge case bug in stable, and you have fixed it, you would want to merge the change back in to trunk, so you don’t introduce any “regression bugs” (Bugs that you fix, then inadvertently re-introduce by changing code). All you need to do is flip the order of the two URLS:

svn merge svn://svn.myserver.com/new_project/trunk svn://svn.myserver.com/new_project/branches/stable -m "Merging edge case fix from stable"

» M    fixed_file_from_stable.txt

There you go! Easy as pie – let’s just hope you don’t get to many conflicts that you need to manually resolve. It’s always a great idea to test you code again after a merge, just to be sure everything works as expected.

The next generation of web professionals

Being an active member of the local web community, I’m often speaking to students at Port80 meetups, about the best way of getting work, and it isn’t an easy question to answer – and it seems that I’m not the only one – both Alex and Gary have recently blogged about apprenticeships, graduate programmes and internships.

The problem we seem to have at the moment, in Perth anyway, is the number of companies large enough to be able to take on interns and run graduate programmes is pretty small. I’ve seen this in the software industry – I remember vividly the last 6 weeks of final year, where every soon-to-be graduate was sending resumes to the big three software companies that ran graduate programmes – the numbers didn’t add up as there was many more applications, than positions. Of course, there is more than three software companies here – however many of them looked for people with some industry experience.

So the problem is a chicken and egg one – no experience means no job, and no job means no experience. I think the education institutions need to get a bit creative with how they are teaching. A perfect example of this is the Centre for Software Research at UWA run by my old honours supervisor and mentor, Dr. David Glance. I was lucky enough to be the first student to go through this system, and not only has it given me some great contacts, it gave me  valuable experience. They take internal university projects, and get final year students and graduates to work on them. They also take on some external projects, the proceeds of which pay for the running of the centre – and the student wages. This is a win-win on so many levels.

  1. The students get practical experience working with a Project Manager, a client and deadlines. And they get paid to do it.
  2. The lecturers, who acts as the Project Manager gets to keep their fingers on the pulse of what is happening in the real world – which is invaluable in such a fast moving industry.
  3. The University can implement and experiment with adding systems to improve their work flow for little practical cost.

Of course these programs aren’t a silver bullet, and take resources, but in my opinion they a big step forward from handing someone a certificate and then throwing them in the deep end of the real world.

Why do open source web apps suck?

I’m a professional web developer, so it goes without saying that I’ve seen my fair share of off-the-shelf open source web applications. I’ve also seen my fair share of web design companies take these applications and modify them up the wazoo to fit with clients requirements… Well, sort of. It is probably more likely that the sales staff have managed to convince the client that their requirements should fit in with what the open source project does. On behalf of all the web application developers out there who get lumped cleaning up the mess: STOP IT.

Modifying open source software seems like a perfect solution to managers – the solution is almost done, so surely it is just a matter of a few tweaks here and there, a splash of paint and Bob’s you uncle. Yeah – nah.

Here is somethings to think about before modifying an off-the-shelf to your next client.

  1. You can’t guarantee the code. Unless your developers has spent A LOT of time working with the application, they aren’t going to know the code. For them to become familiar, they are going to have to spend a lot of time getting to know it. This doesn’t save time, it wastes it. “But they will know it for the next client!” I hear you cry. Don’t bet on it. Unless you are doing the same mods for another client, they are going to have to spend the same amount of time investigating it next time.

    • Making core changes to a system is just asking for trouble – I hope the time you saved by using the system is re-allocated to testing the FULL application – you have no idea what you will break.

    • Skinning pre-built applications sucks. Trying to modify some else’s CSS is worse than someone else PHP. Just like modifying core code libraries and hoping for the best, it is really hard to know what you will break. That is of course assuming the application isn’t a spaghetti of tables, and includes that have little structure (Xoops, osCommerce, Joomla – I’m looking at you).

    • Open source developers are very narrow minded – their contributions are to suit their specific need, which means every developer will try to include their feature, and unless the leads are ruthless, you end up with a application that has everything that opens and shuts, but that doesn’t really open or shut very well. Not only that, you end up with a situation where there are thousands of different modifications that do the same thing. osCommerce is the perfect example of the mess this creates – I had to find a gift voucher module – and found at least 12 different variations of the same plugin, none of which worked. If I see YMMV on the end of one of these modifications I’m going to hit someone.

      • As soon as you modify software, forget about updating it. If there is a security fix, or a new feature, you will basically have to spend a similar amount of time re-patching the new version with your changes. If you wrote your own application, you can add a feature much more easily.
      • “Modules” are a misnomer, I am yet to see a decent module system for anything but the most basic feature – they all involving modifying code to work, which is you ask me, isn’t a module.
      • The documentation will never be up to date. On of the selling points of open source software is that you have thousands of developers at your disposal to fix and add features quickly – unfortunately, the documentation never keeps up. You better get used to reading source code.
      • Open source apps are hacked not engineered. Design by committee never works, design by ad-hoc anarchy REALLY never works – if the project doesn’t have a clear leader who has a vision and is ruthless in implementing it, you are going to end up with a mess.
      • Support. You don’t get any. Budget time for your developer to scour the ‘net for an obsure german forum where someone has found a solution to the similar problem you have had that may or may-not actually work.

      So when is open source the right thing to use? If the system does exactly what you want, then go for it. Want a blog? WordPress is an excellent blogging system, but it isn’t a content management system, so don’t expect it to work like one.

      Let me state that I’m making a big differentiation between applications and frameworks or libraries. I encourage the use of frameworks and libraries, because you can still control your code. You are leveraging low-level code, which is the boring stuff (for some) and you are left with building a system that your client actually wants.

      So please, continue using Rails or PHP or Apache or MySQL, but leave osCommerce and Xoops at the door. If you still want to use the latter, make sure you give your developers enough time to work through the issues you will have – about the same amount of time that you would have quoted on a custom solution in the first place should suffice.

Be a good netizen: Use the correct HTTP response code

Remember the good ol’ days back before dymanic websites where pages had .html extensions and when you tried to access a page that didn’t exist you got an ugly, yet reassuring 404 Not found page? The significance of this page is actually pretty important – not only does it tell the user that the page is not found but it returns a special HTTP status that tells web spiders the same thing. As web developers, sometimes we forget that humans aren’t the only ones accessing our pages, and as a result don’t use the correct HTTP response codes to denote what is going on.

What the hell is a HTTP response code?

When your web browser makes a request to a web server, the web server will return a status code as well as the web page, which tells your browser what has happened. This response is usually made up of two parts: a number (which is for any spiders or bots that might be accessing the web site) and a string (which is for humans) and back when everything was static the web server took care of everything.

Unfortunately for web developers, in this environment of database driven web sites, we often don’t have the luxury of letting the server take care of everything, so this article aims to show you that it isn’t that difficult to do HTTP responses correctly. As I will show later, this can adversely affect your ranking in your favourite search engine.

Let’s try it out

Firstly, lets see what happens when you actually make a request – you can see what is going on using another old school application: Telnet.

Open up command line or Terminal.app or terminal depending on you flavour and type the following:

telnet madpilot.com.au 80

You will get a prompt and type the following (Windows users might not see anything as the telnet client won’t echo what you type):

HEAD / HTTP/1.1
Host: madpilot.com.au

…and hit enter twice – you should see something like:

HTTP/1.1 200 OK
Server: Mongrel 1.0.1
Status: 200 OK
Cache-Control: no-cache
Content-Type: text/html; charset=utf-8
Content-Length: 4123

The first line of the response is the important bit – it tells the web browser that the response conforms to HTTP version 1.1 and more importantly the response code is 200 and the response type is OK! The 200 type is the most common response you will come across, it means that the page was found and served up correctly. Generally your web server WILL take care of this one for you. Let’s look at how you can change that status code.

I’ll use PHP as an example, because it is still the most common dynamic language – but you can do this in any language, leave a comment if you would like an example of how to do it in another dialect. It is all very simple – BEFORE you output any HTML, call the header() function as such:

header("HTTP/1.1 404 Page Not Found");

As you have probably guessed, this will tell the browser that the page it requested was not found. Why would you want to do that? Obviously if the script is being run, it has been found? Well, yes – that is true, however, when the HTTP specification was written, CMSs and dynamic product catalogues weren’t even thought about – so we need to think a little bit differently.

Let’s look at an example: Your customer has requested to see the details for product #10 by browsing to

http://www.yourcomany.com/products/view/10

Product #10 exists, so we serve up the details, but what if the customer decides to see what product #20 is? If product 20 doesn’t exist, then what should we do? One option is to print out “Product not found” which is fine for the user, but what happens if your favourite search engine tries to hit product #20? If you maintain the default action the search engine will receive a 200 OK status which makes it think that the product exists and it will index it! This just pollutes the search engines’ index and hurts your ranking, which is bad. So what we can do is serve up the 404 header from above and this let’s both the user AND the search engine know that what they have requested doesn’t exist.

So what other response codes can we use? There is a complete list here, but I’ll run through a couple of the common ones:

301 Moved Permanently: This response code means the resource that has been requested USED to live here but has now moved somewhere else and will never return. Returning this status code is extremely important if you are changing the structure of your website, as you can tell the browser where it needs to go to get the resource. More importantly, it also tells your search engine to update it’s index with the new URL. You need to supply the new URL as part of the request, so it looks something like this:

header("HTTP:1/1 301 http://www.yoursite.com/new-url");

302 Found: The 302 is actually generally used incorrectly. The most common use is to redirect a user to another page TEMPORARILY which is actually what the 303 code is for. unfortunately, not all browsers support 303 and actually expect a 302 in this case. So who are we to argue? If you are a PHP developer, you have probably used

header("Location http://www.yoursite.com/somewhere");

before – this is exactly what this does.

403 Forbidden: If you wanted to really play HTTP right, you would return this code every time someone tried to access a private URL when they weren’t logged in. It means the server knows what you are trying to do, but isn’t going to let you do it.

404 Page not found: This has been covered – basically if the resource the agent wants doesn’t exist, you serve this up.

410 Gone: This page you are looking for used to exist, but it doesn’t anymore. In fact there isn’t even a new URL, so if you are a search engine, just forget about it. Whether search engines listen to this, I’m not sure, but it can’t hurt.

500 Server Error: Something went wrong with the server. I would throw this up if there is an error that is stopping the page from loading, such as a missing database or a broken web service or similar.

Don’t forget that you can also server up content to the browser (in fact, if you don’t humans will just get a blank page), so it is recommended that you serve up a nice friendly message to your visitors explaining what happened.

So there you go – now there is no excuse for serving up errors to your users and forgetting about our automated friends. So when you are writing your next kick-arse web app, spare a thought for the visitors that aren’t so good at parsing human talk.

WGET: The poor man’s SVN – Using Capistrano on a host with out subversion

I have a client for whom I created a CakePHP-based website for over a year ago. He has since come back to me and asked for a number of changes. I thought I would take the opportunity to use capistrano, because there are a number of steps I always had to perform when updating his site and I hate having to do them manually.

I went about checking all the necessary requirements on his host:

  1. SSH access – check! The host his site was on allows an SSH connection which is required by capistrano
  2. Apache follows symbolic links – check! Because capistrano uses a symbolic link from the document root to the latest version of the site, Apache needs to be able to follow them (i.e the site’s apache configuration needs FollowSymLinks enabled)
  3. Has svn installed – fail! This could be a problem. Capistrano by default checks out the HEAD revision from the defined repository – if it can’t use SVN, it can’t download the latest version of the site.

So close! If only I could download the HEAD revision of a site using a common command line system. I thought about writing a SVN-to-web interface, that would check out the latest version and post them as a website, but then I remembered SVN does that out of the box using the SVN apache module. Thankfully, when I was building my development machine, I made I installed the SVN module – it was now time to use it!

First I needed to tell Apache serve up the a copy of the SVN repository. Dropping the following into the Apache config file did the trick:

<li class="li1">
  <div class="de1">
    <ins class="in"> </ins> DAV svn
  </div>
</li>

<li class="li1">
  <div class="de1">
    <ins class="in"> </ins> SVNPath /path/to/svn/repository
  </div>
</li>

<li class="li1">
  <div class="de1">
    <ins class="in"> </ins> AuthType Basic
  </div>
</li>

<li class="li1">
  <div class="de1">
    <ins class="in"> </ins> AuthName &#8220;My Secret SVN Repository&#8221;
  </div>
</li>

<li class="li1">
  <div class="de1">
    <ins class="in"> </ins> AuthUserFile /path/to/a/.htpassword/file
  </div>
</li>

<li class="li1">
  <div class="de1">
    <ins class="in"> </ins> Require valid-user<span class="sc3"></span><span class="re1"><span class="re2" /></span>
  </div>
</li>

<li class="li1">
  </Location>
</li>

For those playing at home, replace /url/you/would/like/to/access with a nice easy path – this is the URL you will access to download the files, replace /path/to/svn/repository with the actual physical path to your repository and create a .htpassword file so you can limit access to the repository by using the htpasswd2 command: htpasswd2 -c /path/to/a/.htpassword/file username would work in this case (After substituting a username and real path, of course)

If you point you browser to the URL you just setup, you should see the root directory of the repository, after you enter the username and password you setup. Congratulations! You are basically there. Now you just need to reconfigure your capistrano to use wget instead of svn. I do this by overriding the deploy method – because I’m not using rails for this project, the paths and shard folders are different anyway. If you are using rails, you might need to have a look at the original recipe file and replace the svn command with the one below.

<li class="li1">
  <div class="de1">
    <ins class="in"> </ins> <ins class="in"> </ins> <ins class="in"> </ins> <ins class="in"> </ins> run <span class="st0">&#8220;wget &#8211;user=#{wget_user} &#8211;password=#{wget_pass} -m &#8211;cut-dirs=4 -nH -P #{release_path} -q -R index.html #{repository}&#8221;</span>
  </div>
</li>

<li class="li1">
  <div class="de1">
    <ins class="in"> </ins> <ins class="in"> </ins> run <span class="st0">&#8220;ln -nfs #{release_path} #{current_path}&#8221;</span>
  </div>
</li>

<li class="li1">
  <div class="de1">
  </div>
</li>

<li class="li1">
  <div class="de1">
    <ins class="in"> </ins> <ins class="in"> </ins> <ins class="in"> </ins> <ins class="in"> </ins> run <span class="st0">&#8220;rm -rf #{release_path}/app/webroot/files&#8221;</span>
  </div>
</li>

<li class="li1">
  <div class="de1">
    <span class="kw1">end</span>
  </div>
</li>

The only modification to that line is the number after the –cut-dirs switch – it should be equal to the number of directories in the URL. In our example the URL is /url/you/would/like/to/access so –cut-dirs it needs to be equal to 6.

The last thing to do is to setup the wget_user and wget_pass variables to be equal to the username and password you created using htpasswd2.

That should do it! You can now deploy to a server that is sans SVN!

Caveats: Because of the way the SVN module and WGET work, I’ve had to not include he downloading of index.html (Basically WGET treats the directory listing as a page, and will output it as index.html) so this technique will not work if you have any pages called index.html in your structure. Work around: Rename all instances of index.html to index.htm

You might get some weird results if some one checks in code at the same time as you do a deploy – unless you have a bucket load of developers working on your system and you have no communication between developers, this is pretty unlikely.

(Names have been changed to protect the innocent)

Next