Continuous Integration

来源:百度文库 编辑:神马文学网 时间:2024/04/29 14:45:37
Martin Fowler
Continuous Integration is a software development practice wheremembers of a team integrate their work frequently, usually each personintegrates at least daily - leading to multiple integrations perday. Each integration is verified by an automated build (includingtest) to detect integration errors as quickly as possible. Many teamsfind that this approach leads to significantly reduced integrationproblems and allows a team to develop cohesive software morerapidly. This article is a quick overview of Continuous Integrationsummarizing the technique and its current usage.
Last significant update:01 May 06
Contents
Building a Feature with Continuous Integration
Practices of Continuous IntegrationMaintain a Single Source Repository.
Automate the Build
Make Your Build Self-Testing
Everyone Commits Every Day
Every Commit Should Build the Mainline on an Integration Machine
Keep the Build Fast
Test in a Clone of the Production Environment
Make it Easy for Anyone to Get the Latest Executable
Everyone can see what‘s happening
Automate Deployment
Benefits of Continuous Integration
Introducing Continuous Integration
Final Thoughts
Related Articles
Continuous Integration (original version)
Evolutionary Database Design
I vividly remember one of my first sightings of a largesoftware project. I was taking a summer internship at a large Englishelectronics company. My manager, part of the QA group, gave me atour of a site and we entered a huge depressing warehouse stacked fullwith cubes. I was told that this project had been in development for acouple of years and was currently integrating, and had beenintegrating for several months. My guide told me that nobody reallyknew how long it would take to finish integrating. From this I learneda common story of software projects: integration is a long andunpredictable process.
But this needn‘t be the way. Most projects done by my colleaguesat ThoughtWorks, and by many others around the world, treatintegration as a non-event. Any individual developer‘s work isonly a few hours away from a shared project state and canintegrated back into that state in minutes. Any integration errorsare found rapidly and can be fixed rapidly.
This contrast isn‘t the result of an expensive and complextool. The essence of it lies in the simple practice of everyone onthe team integrating frequently, usually daily, against acontrolled source code repository.
When I‘ve described this practice to people, I commonly findtwo reactions: "it can‘t work (here)" and "doing it won‘t make muchdifference". When people find that it‘s much easier than it sounds,and see that it makes a huge difference to development. Thus the thirdcommon reaction is "yes we do that - how could you live withoutit?"
The term ‘Continuous Integration‘ originated with the ExtremeProgramming development process, as one of its original twelvepractices. When I started at ThoughtWorks, as a consultant, Iencouraged the project I was working with to use thetechnique. Matthew Foemmel turned my vague exhortations into solidaction and we saw the project go from rare and complexintegrations to the non-event I described. Matthew and I wrote upour experience in the original version of this paper, which hasbeen one of the most popular papers on my site.
See Related Article:Continuous Integration (original version)
The original articleon Continuous Integration from this website. If you arefollowing most links, this is the article they intended you toread (although I think the new one is better). It describes theexperiences that we went through as Matt helped put togethercontinuous integration through a project at ThoughtWorks in theearly 00‘s.
Although Continuous Integration is a practice that requires noparticular tooling to deploy, we‘ve found that it is useful to use aContinuous Integration server. The best known such server isCruiseControl, an open source tool originally built by several peopleat ThoughtWorks and now maintained by a wide community. The originalCruiseControl is written in Java but is also available for theMicrosoft platform as CruiseControl.net.
Building a Feature with Continuous Integration
The easiest way for me to explain what CI is and how it worksis to show a quick example of how it works with the development of asmall feature. Let‘s assume I have to do something to a piece ofsoftware, it doesn‘t really matter what the task is, for the momentI‘ll assume it‘s small and can be done in a few hours. (We‘ll explorelonger tasks, and other issues later on.)
I begin by taking a copy of the current integrated sourceonto my local development machine. I do this by using a sourcecode management system by checking out a working copy from themainline.
The above paragraph will make sense to people who use sourcecode control systems, but be gibberish to those who don‘t. Solet me quickly explain that for the latter. A source codecontrol system keeps all of a project‘s source code in arepository. The current state of the system is usually referredto as the mainline. At any time a developer can do a controlledcopy of the mainline onto their own machine, this is calledchecking out. The copy on the developer‘s machineis called a working copy. (Most of the time you actually updateyour working copy to the mainline - in practice it‘s the same thing.)
Now I take my working copy and do whatever I need to do tocomplete my task. This will consist of both altering theproduction code, and also adding or changing automatedtests. Continuous Integration assumes a high degree of testswhich are automated into the software: a facility I callself-testing code. Often these use a version of the popularXUnit testing frameworks.
Once I‘m done (and usually at various points when I‘mworking) I carry out an automated build on my developmentmachine. This takes the source code in my working copy, compilesand links it into an executable, and runs the automatedtests. Only if it all builds and tests without errors is theoverall build considered to be good.
With a good build, I can then think aboutcommitting my changes into the repository. The twist, of course, isthat other people may, and usually have, made changes to themainline before I get chance to commit. So first I update myworking copy with their changes and rebuild. If their changesclash with my changes, it will manifest as a failure either inthe compilation or in the tests. In this case it‘s myresponsibility to fix this and repeat until I can build aworking copy that is properly synchronized with the mainline.
Once I have made my own build of a properly synchronizedworking copy I can then finally commit my changes into the mainline,which then updates the repository.
However my commit doesn‘t finish my work. At this point webuild again, but this time on an integration machine based onthe mainline code. Only when this build succeeds can we say thatmy changes are done. There is always a chance that I missedsomething on my machine and the repository wasn‘t properlyupdated. Only when my committed changes build successfully onthe integration is my job done. This integration build can beexecuted manually by me, or done automatically by CruiseControl.
If a clash occurs between twodevelopers, it is usually caught when the second developer tocommit builds their updated working copy. If not the integrationbuild should fail. Either way the error is detected rapidly. Atthis point the most important task is to fix it, and get thebuild working properly again. In a Continuous Integrationenvironment you should never have a failed integration buildstay failed for long. A good team should have many correctbuilds a day. Bad builds do occur from time to time, but shouldbe quickly fixed.
The result of doing this is that there is a stable piece ofsoftware that works properly and contains few bugs. Everybodydevelops off that shared stable base and never gets so far awayfrom that base that it takes very long to integrate back withit. Less time is spent trying to find bugs because they show upquickly.
Practices of Continuous Integration
The story above is the overview of CI and how it works indaily life. Getting all this to work smoothly is obviouslyrather more than that. I‘ll focus now on the key practices thatmake up effective CI.
Maintain a Single Source Repository.
Software projects involve lots of files that need to beorchestrated together to build a product. Keeping track of all ofthese is a major effort, particularly when there‘s multiple peopleinvolved. So it‘s not surprising that over the years softwaredevelopment teams have built tools to manage all this. These tools -called Source Code Management tools, configuration management, versioncontrol systems, repositories, or various other names - are anintegral part of most development projects. The sad and surprisingthing is that they aren‘t part of all projects. It is rare, butI do run into projects that don‘t use such a system and use some messycombination of local and shared drives.
So as a simple basis make sure you get a decent source codemanagement system. Cost isn‘t an issue as good quality open-sourcetools are available. The current open source repository of choice isSubversion. (The olderopen-source toolCVS is still widely used, and is much better thannothing, but Subversion is the modern choice.) Interestingly as I talkto developers I know most commercial source code management tools areliked less than Subversion. The only tool I‘ve consistently heardpeople say is worth paying for isPerforce.
Once you get a source code management system, make sure itis the well known place for everyone to go get sourcecode. Nobody should ever ask "where is the foo-whiffle file?"Everything should be in the repository.
Although many teams use repositories a common mistake I seeis that they don‘t put everything in the repository. If peopleuse one they‘ll put code in there, but everything you need todo a build should inthere including: test scripts, properties files, databaseschema, install scripts, and third partylibraries. I‘ve known projects that check their compilers intothe repository (important in the early days of flaky C++compilers). The basic rule of thumb is that you should be ableto walk up to the project with a virgin machine, do acheckout, and be able to fully build the system. Only aminimal amount of things should be on the virgin machine -usually things that are large, complicated to install, andstable. An operating system, Java development environment, or basedatabase system are typical examples.
You must put everything required for a build in the sourcecontrol system, however you may also put other stuff thatpeople generally work with in there too. IDE configurationsare good to put in there because that way it‘s easy for peopleto share the same IDE setups.
One of the features of version control systems is that theyallow you to create multiple branches, to handle differentstreams of development. This is a useful, nay essential,feature - but it‘s frequently overused and gets people intotrouble. Keep your use of branches to a minimum. In particularhave a mainline: a single branch of the projectcurrently under development. Pretty much everyone should workoff this mainline most of the time. (Reasonable branches arebug fixes of prior production releases and temporary experiments.)
In general you should store in source control everythingyou need to build anything, but nothing that you actuallybuild. Some people do keep the build products in sourcecontrol, but I consider that to be a smell - an indication ofa deeper problem, usually an inability to reliably recreate builds.
Automate the Build
Getting the sources turned into a running system can oftenbe a complicated process involving compilation, moving files around,loading schemas into the databases, and so on. However like most tasksin this part of software development it can be automated - and as aresult should be automated. Asking people to type in strange commandsor clicking through dialog boxes is a waste of time and a breedingground for mistakes.
Automated environments for builds are a common feature ofsystems. The Unix world has had make for decades, the Java communitydeveloped Ant, the .NET community has had Nant and now has MSBuild.Make sure you can build and launch your system using these scriptsusing a single command.
A common mistake is not to include everything in theautomated build. The build should include getting the database schemaout of the repository and firing it up in the executionenvironment. I‘ll elaborate my earlier rule of thumb:anyone should be able to bring in a virgin machine, check the sourcesout of the repository, issue a single command, and have a runningsystem on their machine.
Build scripts come in various flavors and are oftenparticular to a platform or community, but they don‘t have to be.Although most of our Java projects use Ant, some have used Ruby (theRuby Rake system is a very nice build script tool). We got a lot ofvalue from automating an early Microsoft COM project with Ant.
A big build often takes time, you don‘t want to do all ofthese steps if you‘ve only made a small change. So a good build toolanalyzes what needs to be changed as part of the process. The commonway to do this is to check the dates of the source and object filesand only compile if the source date is later. Dependencies then gettricky: if one object file changes those that depend on it may alsoneed to be rebuilt. Compilers may handle this kind of thing, or theymay not.
Depending on what you need, you may need different kinds ofthings to be built. You can build a system with or without test code,or with different sets of tests. Some components can be builtstand-alone. A build script should allow you to build alternativetargets for different cases.
Many of us use IDEs, and most IDEs have some kind of buildmanagement process within them. However these files are alwaysproprietary to the IDE and often fragile. Furthermore they need theIDE to work. It‘s okay for IDE users set up their own project filesand use them for individual development. However it‘s essential tohave a master build that is usable on a server and runnable from otherscripts. So on a Java project we‘re okay with having developers buildin their IDE, but the master build uses Ant to ensure it can be run onthe development server.
Make Your Build Self-Testing
Traditionally a build means compiling, linking, and all theadditional stuff required to get a program to execute. A program mayrun, but that doesn‘t mean it does the right thing. Modern staticallytyped languages can catch many bugs, but far more slip through thatnet.
A good way to catch bugs more quickly and efficiently is toinclude automated tests in the build process. Testing isn‘t perfect,of course, but it can catch a lot of bugs - enough to be useful. Inparticular the rise of Extreme Programming (XP) and Test DrivenDevelopment (TDD) have done a great deal to popularize self-testingcode and as a result many people have seen the value of thetechnique.
Regular readers of my work will know that I‘m a big fanboth both TDD and XP, however I want to stress that neither of theseapproaches are necessary to gain the benefits of self-testing code.Both of these approaches make a point of writing tests before youwrite the code that makes them pass - in this mode the tests are asmuch about exploring the design of the system as they are about bugcatching. This is a Good Thing, but it‘s not necessary for thepurposes of Continuous Integration, where we have the weakerrequirement of self-testing code. (Although TDD is my preferred way ofproducing self-testing code.)
For self-testing code you need a suite of automated teststhat can check a large part of the code base for bugs. The tests needto be able to be kicked off from a simple command and to beself-checking. The result of running the test suite should indicate ifany tests failed. For a build to be self-testing the failure of a testshould cause the build to fail.
Over the last few years the rise of TDD has popularized theXUnit family of open-source tools which are ideal for this kind oftesting. XUnit tools have proved very valuable to us at ThoughtWorksand I always suggest to people that they use them. These tools,pioneered by Kent Beck, make it very easy for you to set up a fullyself-testing environment.
XUnit tools are certainlythe starting point for making your code self-testing. You should alsolook out for other tools that focus on more end-to-end testing,there‘s quite a range of these out there at the moment includingFIT,Selenium,Sahi,Watir,FITnesse, and plenty of others that I‘m nottrying to comprehensively list here.
Of course you can‘t count on tests to find everything. Asit‘s often been said: tests don‘t prove the absence of bugs. Howeverperfection isn‘t the only point at which you get payback for aself-testing build. Imperfect tests, run frequently, are much betterthan perfect tests that are never written at all.
Everyone Commits Every Day
Integration is primarily about communication. Integrationallows developers to tell other developers about the changesthey have made. Frequent communication allows people to knowquickly as changes develop.
The one prerequisite for a developer committing to themainline is that they can correctly build their code. This, ofcourse, includes passing the build tests. As with any commitcycle the developer first updates their working copy to matchthe mainline, resolves any conflicts with the mainline, thenbuilds on their local machine. If the build passes, then theyare free to commit to the mainline.
By doing this frequently, developers quickly find out ifthere‘s a conflict between two developers. The key to fixingproblems quickly is finding them quickly. With developerscommitting every few hours a conflict can be detected within afew hours of it occurring, at that point not much hashappened and it‘s easy to resolve. Conflicts that stayundetected for weeks can be very hard to resolve.
The fact that you build when you update your working copymeans that you detect compilation conflicts as well as textualconflicts. Since the build is self-testing, you also detectconflicts in the running of the code. The latter conflicts areparticularly awkward bugs to find if they sit for a long timeundetected in the code. Since there‘s only a few hours ofchanges between commits, there‘s only so many places where theproblem could be hiding. Furthermore since not much haschanged you can usediff-debugging to help you find the bug.
My general rule of thumb is that every developer shouldcommit to the repository every day. In practice it‘s oftenuseful if developers commit more frequently thanthat. The more frequently you commit, the less places you haveto look for conflict errors, and the more rapidly you fixconflicts.
Frequent commits encourage developers to break down theirwork into small chunks of a few hours each. This helpstrack progress and provides a sense of progress. Often peopleinitially feel they can‘t do something meaningful in just a fewhours, but we‘ve found that mentoring and practice helps them learn.
Every Commit Should Build the Mainline on an Integration Machine
Using daily commits, a team gets frequent testedbuilds. This ought to mean that the mainline stays in ahealthy state. In practice, however, things still do gowrong. One reason is discipline, people not doing an updateand build before they commit. Another is environmentaldifferences between developers‘ machines.
As a result you should ensure that regular builds happen onan integration machine and only if this integration buildsucceeds should the commit be considered to be done. Since thedeveloper who commits is responsible for this, that developerneeds to monitor the mainline build so they can fix it if itbreaks. A corollary of this is that you shouldn‘t go homeuntil the mainline build has passed with any commits you‘veadded late in the day.
There are two main ways I‘ve seen to ensure this: using amanual build or a continuous integration server.
The manual build approach is the simplest one todescribe. Essentially it‘s a similar thing to the local buildthat a developer does before the commit into therepository. The developer goes to the integration machine,checks out the head of the mainline (which now houses his lastcommit) and kicks off the integration build. He keeps an eyeon its progress, and if the build succeeds he‘s done with hiscommit. (Also see Jim Shore‘sdescription.)
A continuous integration server acts as a monitor to therepository. Every time a commit against the repositoryfinishes the server automatically checks out the sources ontothe integration machine, initiates a build, and notifies thecommitter of the result of the build. The committer isn‘t doneuntil she gets the notification - usually an email.
At ThoughtWorks, we‘re big fans of continuousintegration servers - indeed we led the development ofCruiseControlandCruiseControl.NET, the widely used open-source CI servers.ThoughtWorkers like Paul Julius, Jason Yip, and OwenRodgers are still active committers to these open source projects. Weuse CruiseControl on nearly every project we do and have been veryhappy with the results.
Not everyone prefers to use a CI server. Jim Shore gave awell argued description of why he prefers the manualapproach. I agree with him that CI is much more than justinstalling CruiseControl. All the practices here need to bein play to do Continuous Integration effectively. But equallymany teams who do CI well find CruiseControl a helpfultool.
Many organizations do regular builds on a timed schedule,such as every night. This is not the same thing as acontinuous build and isn‘t enough for continuousintegration. The whole point of continuous integration is tofind problems as soon as you can. Nightly builds mean thatbugs lie undetected for a whole day before anyone discoversthem. Once they are in the system that long, it takes a longtime to find and remove them.
A key part of doing a continuous build is that if themainline build fails, it needs to be fixed right away. Thewhole point of working with CI is that you‘re alwaysdeveloping on a known stable base. It‘s not a bad thing forthe mainline build to break, although if it‘s happening allthe time it suggests people aren‘t being careful enough aboutupdating andbuilding locally before a commit. When the mainline build doesbreak, however, it‘s important that it gets fixed fast.
When teams are introducing CI, often this is one of thehardest things to sort out. Early on a team can struggle toget into the regular habit of working mainline builds,particularly if they are working on an existing codebase. Patience and steady application does seem to regularlydo the trick, so don‘t get discouraged.
Keep the Build Fast
The whole point of Continuous Integration is to providerapid feedback. Nothing sucks the blood of a CI activity morethan a build that takes a long time. Here I must admit acertain crotchety old guy amusement at what‘s considered tobe a long build. Most of my colleagues consider a build thattakes an hour to be totally unreasonable. I remember teamsdreaming that they could get it so fast - and occasionally westill run into cases where it‘s very hard to get builds tothat speed.
For most projects, however, the XP guideline of a tenminute build is perfectly within reason. Most of our modernprojects achieve this. It‘s worth putting in concentratedeffort to make it happen, because every minute you reduce offthe build time is a minute saved for each developer every timethey commit. Since CI demands frequent commits, this adds upto a lot of time.
If you‘re staring at a one hour build time, then getting toa faster build may seem like a daunting prospect. It can evenbe daunting to work on a new project and think about how tokeep things fast. For enterprise applications, at least, we‘vefound the usual bottleneck is testing - particularly teststhat involve external services such as a database.
Probably the most crucial step is to start working onsetting up a staged build. The idea behind a staged build(also known as build pipeline) is that there are in factmultiple builds done in sequence. The commit to the mainline triggersthe first build - what I call the commit build. The commitbuild is the build that‘s needed when someone commits to themainline. The commit build is the one that has to be done quickly, asa result it will take a number of shortcuts that will reduce theability to detect bugs. The trick is to balance the needs of bugfinding and speed so that a good commit build is stable enough forother people to work on.
Once the commit build is good then other people can work onthe code with confidence. However there are further, slower,tests that you can start to do. Additional machines can runfurther testing routines on the build that take longer todo.
A simple example of this is a two stage build. The firststage would do the compilation and run tests that are morelocalized unit tests with the database completely stubbed out.Such tests can run very fast, keeping within the ten minuteguideline. However any bugs that involve larger scaleinteractions, particularly those involving the real database,won‘t be found. The second stage build runs a different suiteof tests that do hit the real database and involve moreend-to-end behavior. This suite might take a couple of hoursto run.
In this scenario people use the first stage as the commit build and use thisas their main CI cycle. The second-stage build is a secondarybuild which runs when itcan, picking up the latest good commit build for furthertesting. If the secondary build fails, then this doesn‘thave the same ‘stop everything‘ quality, but the team does aimto fix such bugs as rapidly as possible, while keeping thecommit build running. Indeed the secondary builddoesn‘t have to stay good, as long as each known bug isidentified and dealt with in a next few days.
If the secondary build detects a bug, that‘s a sign thatthe commit build could do with another test. As much as possible youwant to ensure that any secondary build failure leads to new tests inthe commit build that would have caught the bug, so the bug staysfixed in the commit build. This way the commit tests are strengthenedwhenever something gets past them. There are cases where there‘s noway to build a fast-running test that exposes the bug, so you maydecide to only test for that condition in the secondary build. Most oftime, fortunately, you can add suitable tests to the commit build.
This example is of a two-stage build, but the basicprinciple can be extended to any number of later builds. Thebuilds after the commit build can also be done in parallel, so ifyou have two hours of secondary tests you can improveresponsiveness by having two machines that run half the testseach. By using parallel secondary builds like this you canintroduce all sorts of further automated testing, includingperformance testing, into the regular build process. (I‘ve runinto a lot of interesting techniques around this as I‘vevisited various ThoughtWorks projects over the last couple ofyears - I‘m hoping to persuade some of the developers to writethese up.)
Test in a Clone of the Production Environment
The point of testing is to flush out, under controlledconditions, any problem that the system will have inproduction. A significant part of this is the environmentwithin which the production system will run. If you test in adifferent environment, every difference results in a risk thatwhat happens under test won‘t happen in production.
As a result you want to set up your test environment to beas exact a mimic of your production environment aspossible. Use the same database software, with the sameversions, use the same version of operating system. Put allthe appropriate libraries that are in the productionenvironment into the test environment, even if the systemdoesn‘t actually use them. Use the same IP addresses andports, run it on the same hardware.
Well, in reality there are limits. If you‘re writingdesktop software it‘s not practicable to test in a clone ofevery possible desktop with all the third party software thatdifferent people are running. Similarly some productionenvironments may be prohibitively expensive to duplicate(although I‘ve often come across false economies by not duplicatingmoderately expensive environments). Despite these limits yourgoal should still be to duplicate the production environmentas much as you can, and to understand the risks you areaccepting for every difference between test and production.
If you have a pretty simple setup without many awkwardcommunications, you may be able to run your commit build in amimicked environment. Often, however, you need to usetestdoubles because systems respond slowly or intermittently. As aresult its common to have a very artificial environment forthe commit tests for speed, and use a production clone forsecondary testing.
I‘ve noticed a growing interest in using virtualization tomake it easy to put together test environments. Virtualizedmachines can be saved with all the necessary elements bakedinto the virtualization. It‘s then relatively straightforwardto install the latest build and run tests. Furthermore thiscan allow you to run multiple tests on one machine, orsimulate multiple machines in a network on a singlemachine. As the performance penalty of virtualizationdecreases, this option makes more and more sense.
Make it Easy for Anyone to Get the Latest Executable
One of the most difficult parts of software development ismaking sure that you build the right software. We‘ve foundthat it‘s very hard to specify what you want in advance and becorrect; people find it much easier to see something that‘snot quite right and say how it needs to be changed. Agiledevelopment processes explicitly expect and take advantage ofthis part of human behavior.
To help make this work, anyone involved with a softwareproject should be able to get the latest executable and beable to run it: for demonstrations, exploratory testing, orjust to see what changed this week.
Doing this is pretty straightforward: make sure there‘s awell known place where people can find the latestexecutable. It may be useful to put several executables insuch a store. For the very latest you should put the latestexecutable to pass the commit tests - such an executableshould be pretty stable providing the commit suite isreasonably strong.
If you are following a process with well definediterations, it‘s usually wise to also put the end of iterationbuilds there too. Demonstrations, in particular, need softwarethat whose features are familiar, so then it‘s usually worthsacrificing the very latest for something that thedemonstrator knows how to operate.
Everyone can see what‘s happening
Continuous Integration is all about communication, so youwant to ensure that everyone can easily see the state of thesystem and the changes that have been made to it.
One of the most important things to communicate is thestate of the mainline build. If you‘re using CruiseControlthere‘s a built in web site that will show you if there‘s abuild in progress and what was the state of the last mainlinebuild. Many teams like to make this even more apparent byhooking up a continuous display to the build system - lightsthat glow green when the build works, or red if it fails arepopular. A particularly common touch is red and greenlavalamps - not just do these indicate the state of the build, butalso how long it‘s been in that state. Bubbles on a red lampindicate the build‘s been broken for too long. Each team makesits own choices on these build sensors - it‘s good to beplayful with your choice (recently I saw someone experimentingwith a dancing rabbit.)
If you‘re using a manual CI process, this visibility isstill essential. The monitor of the physical build machinecan show the status of the mainline build. Often you have abuild token to put on the desk of whoever‘s currently doingthe build (again something silly like a rubber chicken is agood choice). Often people like to make a simple noise on goodbuilds, like ringing a bell.
CI servers‘ web pages can carry more information than this,of course. CruiseControl provides an indication not just ofwho is building, but what changes they made. Cruise alsoprovides a history of changes, allowing team members to get agood sense of recent activity on the project. I know teamleads who like to use this to get a sense of what people have beendoing and keep a sense of the changes to the system.
Another advantage of using a web site is that those thatare not co-located can get a sense of the project‘s status. Ingeneral I prefer to have everyone actively working on aproject sitting together, but often there are peripheral peoplewho like to keep an eye on things. It‘s also useful for groupsto aggregate together build information from multiple projects- providing a simple and automated status of different projects.
Good information displays are not only those on a computerscreens. One of my favorite displays was for a project thatwas getting into CI. It had a long history of being unable tomake stable builds. We put a calendar on the wall that showeda full year with a small square for each day. Every day the QAgroup would put a green sticker on the day if they hadreceived one stable build that passed the commit tests,otherwise a red square. Over time the calendar revealed thestate of the build process showing a steady improvement untilgreen squares were so common that the calendar disappeared -its purpose fulfilled.
Automate Deployment
To do Continuous Integration you need multiple ofenvironments, one to run commit tests, one or more to runsecondary tests. Since you are moving executables between theseenvironments multiple times a day, you‘ll want to do thisautomatically. So it‘s important to have scripts that willallow you to deploy the application into any environment easily.
A natural consequence of this is that you should also havescripts that allow you to deploy into production with similarease. You may not be deploying into production every day(although I‘ve run into projects that do), but automaticdeployment helps both speed up the process and reduceerrors. It‘s also a cheap option since it just uses the samecapabilities that you use to deploy into test environments.
If you deploy into production one extra automatedcapability you should consider is automated rollback. Badthings do happen from time to time, and if smelly brownsubstances hit rotating metal, it‘s good to be able to quicklygo back to the last known good state. Being able toautomatically revert also reduces a lot of the tension ofdeployment, encouraging people to deploy more frequently andthus get new features out to users quickly. (The ruby on railscommunity developed a tool calledCapistrano that is a goodexample of a tool that does this sort of thing.)
In clustered environments I‘ve seen rolling deploymentswhere the new software is deployed to one node at a time,gradually replacing the application over the course of a fewhours.
See Related Article:Evolutionary Database Design
A commonroadblock for many people doing frequent releases is databasemigration. Database changes are awkward because you can‘t just changedatabase schemas, you also have to ensure data is correctly migrated.This article describes techniques used by my colleague Pramod Sadalageto do automated refactoring and migration of databases. The article isan early attempt the capture the information that‘s described in moredetail by Pramod and Scott Amblers book on refactoring databases[ambler-sadalage].
A particularly interesting variation of this that I‘ve comeacross with public web application is the idea of deploying atrial build to a subset of users. The team then sees how thetrial build is used before deciding whether to deploy it tothe full user population. This allows you to test out newfeatures and user-interfaces before committing to a finalchoice. Automated deployment, tied into good CI discipline, isessential to making this work.
Benefits of Continuous Integration
On the whole I think the greatest and most wide rangingbenefit of Continuous Integration is reduced risk. My mind stillfloats back to that early software project I mentioned in myfirst paragraph. There they were at the end (they hoped) of along project, yet with no real idea of how long it would be beforethey were done.
The trouble with deferred integration is that it‘s very hardto predict how long it will take to do, and worse it‘s very hardto see how far you are through the process. The result is thatyou are putting yourself into a complete blind spot right at oneof tensest parts of a project - even if you‘re one the rarecases where you aren‘t already late.
Continuous Integration completely finesses thisproblem. There‘s no long integration, you completely eliminatethe blind spot. At all times you know where you are, what works,what doesn‘t, the outstanding bugs you have in your system.
Bugs - these are the nasty things that destroy confidence andmess up schedules and reputations. Bugs in deployed softwaremake users angry with you. Bugs in work in progress get in yourway, making it harder to get the rest of the software working correctly.
Continuous Integrations doesn‘t get rid of bugs, but it doesmake them dramatically easier to find and remove. In this respect it‘srather like self-testing code. If you introduce a bug and detect itquickly it‘s far easier to get rid of. Since you‘ve only changed asmall bit of the system, you don‘t have far to look. Since that bit ofthe system is the bit you just worked, it‘s fresh in your memory -again making it easier to find the bug. You can also usediffdebugging - comparing the current version of the system to anearlier one that didn‘t have the bug.
Bugs are also cumulative. The more bugs you have, the harderit is to remove each one. This is partly because you get buginteractions, where failures show as the result of multiplefaults - making each fault harder to find. It‘s alsopsychological - people have less energy to find and get rid ofbugs when there are many of them - a phenomenon that thePragmatic Programmers call theBroken Windows syndrome.
As a result projects with Continuous Integration tend to havedramatically less bugs, both in production and inprocess. However I should stress that the degree of this benefitis directly tied to how good your test suite is. You should findthat it‘s not too difficult to build a test suite that makes anoticeable difference. Usually, however, it takes a while beforea team really gets to the low level of bugs that they have thepotential to reach. Getting there means constantly working onand improving your tests.
If you have continuous integration, it removes one of thebiggest barriers to frequent deployment. Frequent deployment isvaluable because it allows your users to get new features morerapidly, to give more rapid feedback on those features, andgenerally become more collaborative in the developmentcycle. This helps break down the barriers between customers anddevelopment - barriers which I believe are the biggest barriersto successful software development.
Introducing Continuous Integration
So you fancy trying out Continuous Integration - where do youstart? The full set of practices I outlined above give you thefull benefits - but you don‘t need to start with all of them.
There‘s no fixed recipe here - much depends on the nature ofyour setup and team. But here are a few things that we‘velearned to get things going.
One of the first steps is to get the build automated. Geteverything you need into source control get it so that you canbuild the whole system with a single command. For many projectsthis is not a minor undertaking - yet it‘s essential for any ofthe other things to work. Initially you may only do buildoccasionally on demand, or just do an automated nightlybuild. While these aren‘t continuous integration an automatednightly build is a fine step on the way.
Introduce some automated testing into you build. Try toidentify the major areas where things go wrong and get automatedtests to expose those failures. Particularly on an existingproject it‘s hard to get a really good suite of tests goingrapidly - it takes time to build tests up. You have to startsomewhere though - all those cliches about Rome‘s build scheduleapply.
Try to speed up the commit build. Continuous Integration on a buildof a few hours is better than nothing, but getting down to thatmagic ten minute number is much better. This usually requiressome pretty serious surgery on your code base to do as you breakdependencies on slow parts of the system.
If you are starting a new project, begin with ContinuousIntegration from the beginning. Keep an eye on build times andtake action as soon as you start going slower than the tenminute rule. By acting quickly you‘ll make the necessaryrestructurings before the code base gets so big that it becomesa major pain.
Above all get some help. Find someone who has done ContinuousIntegration before to help you. Like any new technique it‘s hard tointroduce it when you don‘t know what the final result looks like. Itmay cost money to get a mentor, but you‘ll also pay in lost time andproductivity if you don‘t do it. (Disclaimer / Advert - yes we atThoughtWorks do do some consultancy in this area. After all we‘ve mademost of the mistakes that there are to make.)
Final Thoughts
In the years since Matt and I wrote the original paper onthis site, Continuous Integration has become a mainstreamtechnique for software development. Hardly any ThoughtWorksprojects goes without it - and we see others using CI all overthe world. I‘ve hardly ever heard negative things about theapproach - unlike some of the more controversial ExtremeProgramming practices.
If you‘re not using Continuous Integration I strongly urgeyou give it a try. If you are, maybe there are some ideas inthis article that can help you do it more effectively. We‘velearned a lot about Continuous Integration in the last fewyears, I hope there‘s still more to learn and improve.
Acknowledgments
First and foremost to Kent Beck and my many colleagues on theChrysler Comprehensive Compensation (C3) project. This was myfirst chance to see Continuous Integration in action with ameaningful amount of unit tests. It showed me what was possibleand gave me an inspiration that led me for many years.
Thanks to Matt Foemmel, Dave Rice, and everyone else whobuilt and maintained Continuous Integration on Atlas. Thatproject was a sign of CI on a larger scale and showed thebenefits it made to an existing project.
Paul Julius, Jason Yip, Owen Rodgers, Mike Roberts and manyother open source contributors haveparticipated in building some variant of CruiseControl. Although the tool isn‘t essential, many teams find ithelpful. Cruise has played a big part in popularizing andenabling software developers to use Continuous Integration.
One of the reasons I work at ThoughtWorks is to get goodaccess to practical projects done by talented people. Nearlyevery project I‘ve visited has given tasty morsels of continuousintegration information.
Significant Revisions
01 May 06: Complete rewrite of article to bring itup to date and to clarify the description of the approach.
10 Sep 00: Original version published.