MF Bliki: FeatureBranch

来源:百度文库 编辑:神马文学网 时间:2024/04/30 12:37:31
FeatureBranch

 

With the rise of Distributed Version Control Systems (DVCS) suchas git and Mercurial, I've seen more conversations about strategiesfor branching and merging and how they fit in with ContinuousIntegration (CI). There's a bit of confusion here, particularlyon the practice of feature branching and how it fits in with CI.

Simple (isolated) Feature Branch

The basic idea of a feature branch is that when you start work ona feature (or story if you prefer that term) you take a branch ofthe repository to work on that feature. In a DVCS, you'll do thisin your personal repository, but the same kind of thing works in acentralized VCS too.

I'm going to illustrate this with a series of diagrams. I have ashared project mainline, colored blue, and two developers, coloredpurple and green (since the developers names are Reverend Green andProfessor Plum).

I'm using labeled colored boxes (eg P1 and P2) to representlocal commits on the branch. Arrows between branches representmerges between branches, the boxes are colored orange to make them standout. In this case there are updates, say a couple of bug-fixes,applied to the mainline (presumably by Mrs Peacock). When thesehappen our developers merge them into their work. To give this asense of time, I'll assume we're looking at a few days work here,with each developer committing to their local branch roughly once a day.

In order to ensure things are working properly, they can runbuilds and tests on their branch. Indeed for this article I'llassume that each commit and merge comes with an automated build andtest on the branch it's on.

The advantage of feature branching is that each developer canwork on their own feature and be isolated from changes going onelsewhere. They can pull in changes from the mainline at their ownpace, ensuring they don't break the flow of theirfeature. Furthermore it allows the team to choose its features forrelease. If Reverend Green takes too long, we can release with justProfessor Plum's changes. Or we may want to delay Professor Plum'sfeature, perhaps because we are uncertain that the feature works theway we want to release it. In this case we just tell the professorto not merge his changes into mainline until we are ready for thefeature. This is called cherry-picking, the team decideswhich features to merge in before release.

Attractive though that picture looks, there can be troubleahead.

Although our developers can develop their features in isolation,at some point their work does have to be integrated. In this caseProfessor Plum easily updates the mainline with his ownchanges. There's no merge here because he's already incorporated themainline changes into his own branch (there will be a build). Thingsare however not so simple for Reverend Green, he needs to merge allof his changes (G1-6) with all of Professor Plum's (P1-5).

(At this point many users of DVCSs may feel I'm missingsomething as this is a simple, perhaps simplistic view of featurebranching. I'll get to a more involved scheme later.)

I've made this a big merge box as it's a scary merge. It may bejust fine, the developers may have been working on completelyseparate parts of the code base with no interaction, in which casethe merge will go smoothly. But they may be working on bits that dointeract, in which case here lye dragons.

The dragons can come in many forms, and tooling can help slaysome of them. The most of obvious dragon is the complexity ofmerging the source code and dealing with conflicts as developersedit the same files. Modern DVCSs actually handle this rather well,indeed somewhat magically. Git has quite the reputation for dealingwith complicated merges. So much so that the textual issues ofmerging are much better than they used to be - indeed I'll go so faras to discount textual conflicts for the purposes of thisarticle.

The problem I worry more about is a semantic conflict. A simpleexample of this is that if Professor Plum changes the name of a methodthat Reverend Green's code calls. Refactoring tools allow you torename a method safely, but only on your code base. So if G1-6contain new code that calls foo, Professor Plum can't tell in hiscode base as he doesn't have it. You only find out on the big merge.

A function rename is a relatively obvious case of a semanticconflict. In practice they can be much more subtle. Tests are thekey to discovering them, but the more code there is to merge themore likely you'll have conflicts and the harder it is to fixthem. It's the risk of conflicts, particularly semantic conflicts,that make big merges scary.

This fear of big merges also acts as a deterrent torefactoring. Keeping code clean is constant effort, to do it well itrequires everyone to keep an eye out for cruft and fix it whereverthey see it. However this kind of refactoring on a feature branch isawkward because it makes the Big Scary Merge worse. The result wesee is that teams using feature branches shy away from refactoringwhich leads to uglier code bases.

Continuous Integration

It's these problems that Continuous Integration was designed tosolve. With Continuous Integration my diagram looks like this.

There's a lot more merging going on here, but merging is one ofthose things that's much easier to do frequently and small ratherthan rarely and large. As a result if Professor Plum is changingsome code that Reverend Green relies on, the Reverend will find itearly, such as when he merges in P1-2. At that point he's only gotto modify G1-2 to work with the changes, rather than G1-6.

CI is effective at removing the problem of big merges, but it'salso a vital communication mechanism. In this scenario the potentialconflict will actually appear when Professor Plum merges G1 andrealizes that Reverend Green is actively building on Plum'slibraries. At this point Professor Plum can go and find ReverendGreen and they can discuss how their two features interact. It maybe that Professor Plum's feature requires some changes that don'tmesh well with Reverend Green's changes. By looking at both theirfeatures they can come up with a better design that affects boththeir work-streams. With the isolated feature branches ourdevelopers don't discover this till late, probably too late to domuch about it. Communication is one of the key factors in softwaredevelopment and one of CI's most important features is that itfacilitates human communication.

It's important to note that, most of the time, feature branchinglike this is a different approach to CI. One of the principles of CIis that everyone commits to the mainline every day. So unlessfeature branches only last less than a day, running a feature branchis a different animal to CI. I've heard people say they are doing CIbecause they are running builds, perhaps using a CI server, on everybranch with every commit. That's continuous building, and a GoodThing, but there's no integration, so it's not CI.

Promiscuous Integration

Earlier I said parenthetically that there are other ways of doingfeature branching. Say Professor Plum and Reverend Green take teatogether early in the cycle. While chatting they discover they areworking on features that interact. At this point they may choose tointegrate with each other directly, like this.

With this approach they only push to the mainline at the end, asbefore. But they merge frequently with each other, so this avoidsthe Big Scary Merge. The point here is that the primary issue withthe isolated feature branching scheme is its isolation. When youisolate the feature branches, there is a risk of a nasty conflictgrowing without you realizing it. Then the isolation is an illusion,and will be shattered painfully sooner or later.

So is this more ad-hoc integration a form of CI or a differentanimal entirely? I think it is a different animal, again a key pointof CI is everyone integrates to the mainline everyday. Integrating across feature branches, which I shall callpromiscuous integration (PI), doesn't involve or even need amainline. I think this difference is important.

I see CI as primarily giving birth toa release candidate at each commit. The job of the CI system anddeployment process is to disprove the production-readiness of arelease candidate. This model relies on the need to have somemainline that represents the current shared, most up to datepicture of complete.

--Dave Farley

Promiscuous Integration vs Continuous Integration

So if it's different is PI better than CI, or morerealistically under what circumstances is PI better than CI?

With CI, you lose the ability to use the VCS to do cherrypicking. Every developer is touching mainline, so all features growin the mainline. With CI, the mainline must always be healthy, so intheory (and often in practice) you can safely release after anycommit. Having a half built feature or a feature you'd rather notrelease yet won't damage the other functionality of the software,but may require some masking if you don't want it to be visible inthe user-interface. This can be as simple as not including a menuitem in the UI to trigger the feature.

PI can provide some middle ground here. It allows Reverend Greenthe choice of when to incorporate Professor Plum's changes. IfProfessor Plum makes some core API changes in P2, then ReverendGreen can import P1-2 but leave the others until Professor Plum'sfeature is put onto the release.

One worry with all this picking and choosing is that PI makes itreally hard to keep track of who has what in their branch. Inpractice, it seems tooling pretty much solves this problem. DVCSskeep a clear track of changes and their origins and can figure outthat when Professor Plum pulls G3 he already has G2 but doesn't haveB2. I may have made mistakes drawing the diagram by hand, but toolsdo keep track of these things well.

On the whole, however, I don't think cherry-picking with the VCSis a good idea.

Feature Branching is a poor man'smodular architecture, instead of building systems with the abilityto easy swap in and out features at runtime/deploytime they couplethemselves to the source control providing this mechanism throughmanual merging.

--Dan Bodart

I much prefer designing the software in such a way that makes iteasy to enable or disable features through configuration changes. Mycolleague Paul Hammant calls this Branch byAbstraction. This requires you to put some thought into whatneeds to be modularized and how to control that variation, but we'vefound the result to be far less messy that relying on the VCS.

The main thing that makes me nervous about PI is the influence onhuman communication. With CI the mainline acts as a communicationpoint. Even if Professor Plum and Reverend Green never talk, theywill discover the nascent conflict - within a day of itforming. With PI they have to notice they are working on interactingcode. An up-to-date mainline also makes it easy for someone to besure they are integrating with everyone, they don't have to pokearound to find out who is doing what - so less chance of somechanges being hidden until a late integration.

PI arose outof open-source work, and it could be that the less intensive tempoof open-source could be a factor here. In a full time job, you workseveral hours a day on a project. This makes it easier for featuresto be worked in priority. With an open source project people oftenput in a hour here, and the next hour a few days later. A featuremay take one developer quite a while to complete while otherdevelopers with more time are able to get features into a releasablestate earlier. In this situation cherry picking can be moreimportant.

It's important to realize that the tools you use are largelyindependent of the integration strategy you use. Although manypeople associate DVCSs with feature branching, they can be used withCI. All you need to do is mark one branch on one repository as themainline. If everyone pulls and pushes to that every day, then youhave a CI mainline. Indeed with a disciplined team, I would usuallyprefer to use a DVCS on a CI project than a centralized one. With aless disciplined team I would worry that a DVCS would nudge peopletowards long lived branches, while a centralized VCS and areluctance to branch nudges them towards frequent mainlinecommits. Paul Hammant may be right: "I wonder though, if a teamshould not be adept with trunk-based development before they move todistributed."