Category Archives: Tech

Continuous Integration – Repeatability

There’s some simple rules to follow to reduce the unexpected – particularly in build and deployment as part of a Continuous Integration process. If something works, I expect it to work again next time, and will put in place something to make sure it happens exactly the same next time. If something fails instead of just fixing it, I want to put something in place to make sure it never happens again. Simple application of these rules can bring calm and order.

There should be no manual steps required to deploy an application to a target environment (test or production). You should not for example have to unpack a zip file, change the contents of files x and y, and restart service X. If deployment instructions include the word ‘click’ then something is wrong. Every manual step introduces a chance for variation, and removes an opportunity to add an automated check.

Some customers claim to have an automated deployment process – when we dig deeper we find that the instructions to run the automated deployment process run to dozens of steps. Deployments are done into different environments by different people – each of which interpret the manual steps differently, and use different workarounds and additional steps where the process is not well defined or fails regularly.

What do we need to implement true repeatability of deployment?

  • don’t fix problems ‘in situ’. When a deployment to a test environment fails, do not fix it in place. Investigate the problem, then add something to the deployment process to ensure it cannot happen. This might be a ‘pre flight check’ that makes assertions about the target environment, or a post-deployment verification test that will provide fast feedback that something went wrong.  Sometimes this means changing the behaviour of other groups like IT operations or release management to remove this  ‘quick just patch it’ approach.
  • externalise environment-specific configuration. Deploy the *exact* same artefacts in your test and production environments. Anything that is specific to “system test” should be sourced outside of the artefact – from config files, environment vars etc. I have a lot to say here which I’ll save for a dedicated post.
  • make test environments as close to production as possible. The closer test environments are to production, the less likely there is to have a ‘whoops’ on the production date. Audit this regularly – OS version, service packs, app server versions, database names, directory locations, load balancer configs. This will minimise the number of items you need to place in environment-specific configuration.
  • automate the deployment of *everything*. Including e.g. apache configs, load balancer config, firewall settings, database upgrade scripts. Everything should come from a known configuration coming out of source control. I’m very keen to learn how to use tools like puppet and chef to assist here.
  • use exactly the same deployment process from dev to production. Too many times we develop deployment automation that is only used in the test environments, and the production deployment is done by humans following an invisible set of instructions.
  • share responsibility for building, maintaining, and testing deployment scripts between development and operations. Ensure that changes to scripts are checked back in to source control (easiest way is to embed them in the deployment artefact built by your CI server). Give your ops team commit access to source control.
  • release everything every time. Don’t cherry pick a set of components to deploy. In every release try deploying all components together – including components that haven’t been changed. Two benefits I’ve realised – eliminate the risk of forgetting a dependant change, and confidence that a rarely-changed component CAN be deployed. If you feel it is risky to deploy a component unnecessarily, then you really need to address those risks. Don’t cop out with the ‘let sleeping dogs lie’ approach. That dog will bite you badly when you come to build and deploy it in a year’s time.

These are just a few of the things I’ve learned help to make deployments boring. Boring *should* be the goal, although you get a heck of a lot less champagne.

Sudden thought – perhaps there is a hidden incentive here that’s driving behaviour? – app deployments that happen like clockwork every two weeks without raising a sweat are boring for some folk – and there is no opportunity to be a hero.  I feel a little queasy at this thought…

Continuous Integration – Single Code Line

A common practice in SCM is to create multiple branches (code lines) from a stable baseline, allow teams to work in isolation on these feature branches until they meet some quality gate. The feature branch can then be merged into the baseline to form a release. I find this approach abhorrent in almost all cases. My three main objections are:

1. Multiple active code lines force a conservative approach to design improvement (refactoring)

While there is more than one active code line most teams will defer any widespread design improvement, as any widespread change will be difficult to merge. This means that emergent design and refactoring do not occur, and the software will build further inconsistency and duplication. This effect must not be underestimated – effectively it’s another source of fear, preventing the teams from moving forward.

2. Deferring integration of code lines usually leads to high risk late in delivery

The longer an isolated code line lives, the more pain and risk incurred when merging. This risk can be largely mitigated if the teams are disciplined in regularly merging changes into the feature branches from baseline. However most teams I’ve observed aren’t very disciplined in this regard, and this risk becomes a real issue.

3. Multiple active code lines works against collective code ownership

Teams working in isolation on a separate code line share their work with other teams as late as possible. This leads to code ownership problems, and inconsistency. The code introduced by an isolated team is often quite clearly different to the rest of the codebase, and is disowned by other developers working on other branches.

Other issues with multiple code lines:

  • complexity can cause significant errors that may not be caught by automated or manual testing, risking production stability.
  • it is very difficult to consistently spread good technical practices (automated testing, coding standard)
  • it works against the CI principle of production-ready increments – isolated branches are often used as excuses to leave the software in a broken state for some period of time, instead of working out how to implement a major change incrementally.

But what if I’m working on a feature that isn’t going to be ready in time for the next release? Firstly, are there any smaller increments that we can release to production and get benefit earlier? If not, then we need to release partial work into production, without it changing the current behaviour of the production system until the feature is complete and can be activated. This involves the introduction of ‘feature toggles’ – configuration that disables the new feature implementation in production until it is ready.

This doesn’t have to be runtime configuration – simple switches introduced to environment-specific config files will usually be enough. There is a cost in introducing this conditional behaviour, but in my opinion this is far outweighed by the enablement of single code line and regular metronomic releases.

The approach is also more challenging when altering the behaviour of an existing feature – sometimes requiring significant refactoring to introduce the switch. Sometimes we need to introduce a whole abstraction to be able to switch implementations – this is an enabler for significant ‘architectural refactorings’. This is referred to by Paul Hammant as Branch by Abstraction – and is a very powerful technique.

Further reading:
http://martinfowler.com/bliki/FeatureBranch.html
http://paulhammant.com/blog/branch_by_abstraction.html
http://pauljulius.com/blog/2009/09/03/feature-branches-are-poor-mans-modular-architecture/

Continuous Integration – If something hurts, do it more often

The prevailing attitude in software development still seems to be that if something is difficult or expensive (or even just not much fun), we try to do it as few times as possible.  This usually means deferring it until as late as possible.

Examples:

  • merging and integrating the work of multiple people
  • merging and integrating the work of multiple teams
  • execution of tests
  • testing the integration of components
  • deploying into a production like environment
  • deploying into production

Most of these things are difficult and expensive, and the temptation is to make more rapid progress early in a project or release by deferring these things until late.  Unfortunately these things are also very difficult to predict – in complexity and effort.  This means that we often find that we have a significant amount of complexity late in a project or release, just as the pressure on the team starts to rise to deliver.  This inevitably leads to delays being announced very late in the delivery of a project or release, or the team to abandon quality.

I don’t want one day to be significantly harder or anxiety-inducing than any other.  I don’t like the ‘deployment day’ being a time which people dread, or merging the work of multiple teams to be an unloved task which is risky and error-prone.

My goal with CI is to do these ‘hard’ tasks as often as possible, to invent ways to make these things easier, and to keep this up until the ‘hard’ tasks are painless and risk-free.

Blogging and Continuous Integration

I find I have a high barrier to writing – getting started is hard, if I write few things then I get obsessive about the quality of my writing.  I need to get back on the horse – and write a few short topics that can get me flowing.

I’m going to write about old topics – things that I talk about and teach on a daily basis, and get down in this journal the views that I hold right now.  I’m going to start with a favourite topic – Continuous Integration.

I’m a big advocate of Continuous Integration, but I regularly find myself having to explain the wider concept.  Lots of folk I find think of CI as the server itself (Hudson, CruiseControl etc.) which performs automated build processes and runs your automated test suite.  That’s a useful tool to achieve Continuous Integration, but it certainly doesn’t stop there.  I think of CI more as a holistic approach – one in which we seek accelerated feedback at any opportunity.

I’ll drill down into a few topics here, unfortunately all of these topics are linked together in some way, so hopefully this makes sense…

Disclaimer: I’m writing these entries to distill some of the conversations I have with customers on a regular basis – I hope it’ll help with my thinking and expression and provide some feedback.  None of this is original thought – it’s based on a few years of reading and listening to smart people.  I probably don’t remember where I heard a lot of this, so I won’t do a good job of attribution.

Danger Will Robinson

I wrote my previous post about running linux on my desktop a couple of weeks back and realise I didn’t put a disclaimer on the post, so here it is…

IF IT ALL GOES HORRIBLY WRONG DON’T BLAME ME!

What I didn’t mention in my post is that the first time I installed vmware and loaded up the windows partition it worked great.  Then I was tooling about in windows (I think uninstalling some cruft) and it required a reboot.  No worries, but then my attention drifted and I went off to another desktop on the linux host.  When I looked back a few minutes later I realised in great horror that the vm had automatically booted via GRUB into the ubuntu partition.  I was looking at the ubuntu login screen, in a vm running on the same ubuntu install.  Nasty feeling.  Power off button on the VM.

Too late – everything in the running host OS started to unravel and I quickly crashed and burned.  Someone in the audience will probably pipe up at this point and say “you could have recovered by xyz” but after a couple of people had a poke around with no success I took the quick route and reinstalled ubuntu again.

Not recommended…

Luckily the windows partition was untouched.

Linux at work again!

For the past couple of months I’ve been working in a very nice workplace with a bunch of nice people and even better some of them run Linux on their desktops without any fear of the SOE-police coming to march them out of the building! I’m also working again with a chap who despite an unnatural love of emacs is a great help at solving any problems I face running Linux at work. So off I set to install Ubuntu Hardy on the client desktop – with their blessing!

I love Linux (and particularly Ubuntu) as a working environment. Most of the applications I’ve worked on in the last 10 years have been deployed on a flavour of unix, and despite doing a lot of Java I also do a lot of scripting and glue code, build and deployment tooling in particular. Working on Windows even with cygwin is just a world of pain.

Unfortunately I’m not able to avoid Microsoft Outlook *sigh*. I spend more time in meetings than I care to admit, and I regularly have to set up meetings with invitees and meeting room resources. Webmail doesn’t cut it. Evolution really doesn’t cut it for much more than reading mail – even sending mail it’s a bit flakey. Great effort, I really hope one day someone gets involved who can make it stable.

So what I and some others are doing is running VMWare Server under Ubuntu, booting the physical Windows partition from disk. When I installed Ubuntu I resized the existing single partition and left the client SOE Windows installation fully functional. It works incredibly well booted under VMWare once everything is loaded – we run Outlook 2007 which is slow as a dog even non-virtualized – and I’m never going to look back. The best part is if I ever NEED to I can boot the same partition directly.

It’s a little tricky getting VMWare Server to run on Ubuntu, as you have to apply a patch (vmware-any-any) to make it install. I followed the instructions here on howtoforge, and they worked on Hardy. I found I had to run vmware-config.pl twice – once when the patch is applied, and then once more to make it work properly.

Once VMWare is installed it is important to boot under Windows directly, and set up a new hardware profile for “virtual boot”. When you boot Windows under VMWare the first time, choose the virtual boot profile. It will detect a bunch of new VMWare hardware, which will then be associated with the virtual boot profile. This means when you boot directly for some reason later, you can choose the default boot profile and everything will work as it always did.

I’m stoked at how well this works. I have VMWare tools installed so that it gives focus to Windows when I move my mouse over the VMWare window (no Ctrl-Alt-Esc). Highly recommended.

Distracted

Whenever I think I’ll write a blog post, I get as far as logging into my wordpress admin and it’s always the case that a new version of wordpress has been released and it nags me to upgrade.  Ooh I think and head off down a nice little alleyway of distraction.  I usually take a few minutes to remember my username/password for my hosting provider, upload and unpack files yada yada.  That normally takes care of my urge to write a blog post…

I’m frighteningly similar when it comes to writing code on my desktop machine at home – of course there are 227 incoming critical updates from ubuntu.  Or maybe I need to upgrade from Hefty to Iggy (or whatever).  Of course my Eclipse is 3 versions out of date, and I couldn’t possibly use that Java version (or ruby or pascal, or whatever it is I use these days).  By the time I get everything up to date I’ve usually forgotten what it was I came upstairs to work on anyway.

They’re my habits and I love them.

BarCamping (without the actual camping part)

I attended my first BarCamp today and I was really impressed. It was a very full day of interesting presentations by a lot of very smart people, the range of topics was really quite startling. I’m familiar with user groups (although I don’t go along to as many nights as I’d like) and they tend to be quite focussed, however today people talked about whatever their current interest is – from Perl 5.1 to hardware devices to MythTV. I was particularly impressed with the support and encouragement attendees gave to speakers. If you’ve never given a public presentation and you’re unsure of yourself today would have been an excellent opportunity.

Like at user group nights and conferences I’m awkwardly shy and awful at striking conversation so I missed talking to a lot of interesting people today. It’s a different mix of people than I’m used to (like the Ruby user group) with a slant towards freelancing and smaller projects which is so refreshing compared to the behemoth corporate beasts I’m more familiar with these days.

Favorite presentations today: Mark Ryall’s Intro to Scala (second time around), and Paul Fenwick on an Illustrated History of Failure (with sound effects).

I winged a short demo of the Rspec Story Framework which is my current shiny-toy-of-the-week. I might have been better placed to answer questions if I’d actually used it in the wild yet…

Don’t miss BarCamp Melbourne 2009.

Build Hat

I like the build patterns that TWer Sam Newman has been blogging about. A recent favorite was a post titled “build fix flag” where Sam describes using a paper flag to show visually who is fixing the build. The rules according to Sam:

1. If you saw a CI build breakage, you looked for the flag
2. If someone had the flag, you left them alone
3. If you couldn’t see the flag, you tried to identify the person who made the last check in
4. If you couldn’t find a likely culprit, you raised the flag and fixed it yourself

We use a couple of TW branded USB build lights on our project to give visibility to the current build status across the floor, but with a large team there is a bit of time wasted asking ‘who broke the build? are you fixing it? who’s fixing it?’. We talked about the idea of an indicator, and this morning there appeared an large furry multi-colored wizard’s hat. Replace ‘flag‘ in the rules above with ‘funny hat‘.

I think it’s brilliant – I’m going to jump in and fix the build next time I notice it broken, just so I can wear the bloody hat!