Branching And Forking

In branch and merge patterns, each contributor maintains his own version of the software (a branch or a fork). The contributor copies the mainline at the beginning, then maintains it by merging changes from the mainline to stay up to date. When the contributor has a complete change to contribute, he can merge it directly to the mainline, or put a "merge request" or "pull request" into a review system.

This is a very flexible pattern that can implement several of the previous patterns.

Informal. You can update frequently and merge each change directly, without review

Branch review. You will naturally get the branch review pattern if you make "task branches" for each change that you want to work on.

Cascading. You will get a cascading pattern if you make branches or forks for each major feature or team. This is commonly called a "feature branch" pattern if you use it for features. If you use it for teams or individual contributors, you get a "maintainer" pattern where each contributor submits changes up to a maintainer, who integrates them into a bigger system.

Distributed Continuous Delivery: In the continuous delivery chapter, we will see how branches can be used to test and release each change.

Branching has several flavors, illustrated below.

Branch: Contributors can make a branch inside any repository. Branches and branch permissions are authorized by a repository owner. This workflow makes content and changes visible to everyone in a small team.

Stream: In a stream-based system like Perforce, the developer branches remember the upstream branches that should be the source of their updates, and the downstream branches to which their changes should flow. Stream systems are useful if you are assembling a system or a developer environment from a complicated set of components. In this example, we see a branch or a workspace that has components from several other branches.

Fork: The contributor uses a DVCS (distributed version control system) like Git to make a copy of the mainline version, called a fork. When you fork, there are two copies of the code in separate repositories, with separate permissions. The owner of the fork decides when to update the fork, and the owner of the origin decides when to accept contributions. You can grant read permission to a lot of contributors, and you do not need to administer write permissions.

This is a great way to expand the number of contributors to an open source project. It is also popular for commercial development because of the flexibility that it gives to the contributor.

Update branches or forks frequently

Critics often observe that developers tend to work a long time on branches, then cause integration problems by merging big batches of code. Finding problems in these big batches of code is difficult. Branching processes will break down when contributions from the feature branches or forks are big, or difficult to merge, or contain bugs that must be resolved at the time of the merge.

This dreaded "long running branches" problem is the major reason why advocates of centralized continuous integration are opposed to distributed forking and branching.

However, you can easily avoid this problem if you update your feature branch frequently with changes from the origin. If you use branches or forks, you should:

  • Merge frequently from the origin or production version to your development version. By doing frequent update, integration, and test in your development version, you can stay close to the production version.
  • Test on your branch BEFORE you submit changes to the origin. You must make sure that the changes in the feature branch or fork work correctly before you submit them to be merged.
  • Release changes as frequently as possible. When you release smaller amounts of code, you will find problems faster because you have fewer places to look for problems.

Merging with Git

Git uses branches and forks by default. Git has excellent support for the process of updating a branch or fork from the origin. Git has a unique feature called "rebase" which allows a contributor to update from the origin, and then deliver a changeset that is easy to merge. Rebase takes away the whole history of code exchanges leading up to the delivery of the changeset and puts all of the changes into one package.

This is a radical approach because it "rewrites history" by taking away the record of previous commits. Other VCS are designed to prevent any loss of history. It is possible that this radical approach to rebase is the reason that Git has become the most popular DVCS. As we shall see, it is important in the distributed continuous integration pattern.

Merging with Subversion

Subversion does not work well when you try to merge multiple times from the origin or trunk branch. Because of this, most Subversion users learn to avoid long-running branches, and most Subversion teams use a centralized continuous delivery model that avoids branching.

If you encounter a long-running branch in Subversion, you can use two tricks to merge it. The first is the new merge implementation in Subversion 1.8, from Julian Foad. He has updated Subversion merge so that it is smarter about moving changes between trunk and branches. As I write this in May 2013, the Apache Subversion team is testing these changes for release.

The second technique is a manual rebase. To do a manual rebase, first make a second branch from the trunk or origin, then merge your changes into the second branch and test it. This has the effect of putting all of your changes into one revision or set of revisions at the head of the branch history - the same effect as a Git rebase. After that you will find the revisions easy to merge to the trunk or origin.

Why use long-running feature branches?

We talked earlier about using temporary branches or task branches for code review. You can also use long-running feature branches or forks to do code review, with automated testing, commenting and voting. In this system, contributors to not make a temporary branch for each change. Instead, they maintain a longer running version, and they post "pull requests" or "merge requests" to tell reviewers to review and accept changes.

Long-running feature branches or forks will be useful if you:

  • Assemble components into a complete system using the cascading pattern
  • Have independent contributors that like to experiment, or are not part of the team with inside permissions
  • Do not have full-time reviewers