There’s some simple rules to follow to reduce the unexpected – particularly in build and deployment as part of a Continuous Integration process. If something works, I expect it to work again next time, and will put in place something to make sure it happens exactly the same next time. If something fails instead of just fixing it, I want to put something in place to make sure it never happens again. Simple application of these rules can bring calm and order.
There should be no manual steps required to deploy an application to a target environment (test or production). You should not for example have to unpack a zip file, change the contents of files x and y, and restart service X. If deployment instructions include the word ‘click’ then something is wrong. Every manual step introduces a chance for variation, and removes an opportunity to add an automated check.
Some customers claim to have an automated deployment process – when we dig deeper we find that the instructions to run the automated deployment process run to dozens of steps. Deployments are done into different environments by different people – each of which interpret the manual steps differently, and use different workarounds and additional steps where the process is not well defined or fails regularly.
What do we need to implement true repeatability of deployment?
- don’t fix problems ‘in situ’. When a deployment to a test environment fails, do not fix it in place. Investigate the problem, then add something to the deployment process to ensure it cannot happen. This might be a ‘pre flight check’ that makes assertions about the target environment, or a post-deployment verification test that will provide fast feedback that something went wrong.  Sometimes this means changing the behaviour of other groups like IT operations or release management to remove this  ‘quick just patch it’ approach.
- externalise environment-specific configuration. Deploy the *exact* same artefacts in your test and production environments. Anything that is specific to “system test” should be sourced outside of the artefact – from config files, environment vars etc. I have a lot to say here which I’ll save for a dedicated post.
- make test environments as close to production as possible. The closer test environments are to production, the less likely there is to have a ‘whoops’ on the production date. Audit this regularly – OS version, service packs, app server versions, database names, directory locations, load balancer configs. This will minimise the number of items you need to place in environment-specific configuration.
- automate the deployment of *everything*. Including e.g. apache configs, load balancer config, firewall settings, database upgrade scripts. Everything should come from a known configuration coming out of source control. I’m very keen to learn how to use tools like puppet and chef to assist here.
- use exactly the same deployment process from dev to production. Too many times we develop deployment automation that is only used in the test environments, and the production deployment is done by humans following an invisible set of instructions.
- share responsibility for building, maintaining, and testing deployment scripts between development and operations. Ensure that changes to scripts are checked back in to source control (easiest way is to embed them in the deployment artefact built by your CI server). Give your ops team commit access to source control.
- release everything every time. Don’t cherry pick a set of components to deploy. In every release try deploying all components together – including components that haven’t been changed. Two benefits I’ve realised – eliminate the risk of forgetting a dependant change, and confidence that a rarely-changed component CAN be deployed. If you feel it is risky to deploy a component unnecessarily, then you really need to address those risks. Don’t cop out with the ‘let sleeping dogs lie’ approach. That dog will bite you badly when you come to build and deploy it in a year’s time.
These are just a few of the things I’ve learned help to make deployments boring. Boring *should* be the goal, although you get a heck of a lot less champagne.
Sudden thought – perhaps there is a hidden incentive here that’s driving behaviour? – app deployments that happen like clockwork every two weeks without raising a sweat are boring for some folk – and there is no opportunity to be a hero. Â I feel a little queasy at this thought…