Rebase is not the only way to deliver clean code

I'm a bit perplexed by fans of git's rebase feature. I often hear git users recommending it as the way to work with distributed version control. I think they're conflating “the series of patches I want to share” with “the revision history of my work”.

Rebase? What's that?

It's a feature of some version control tools, probably best known from git, but there's a plugin that adds it to bzr too. The rebase command will take a branch and rewrite its history so that it is as if the branch had been based off a different branch or revision than the one it actually was.

Rebasing throws away¹ the history of a branch. Unfortunately, throwing away that history hampers collaboration on that branch: if someone has branched off your branch, you now have two branches that appear unrelated to your VCS but make nearly identical changes to the same code. In other words, you now have two branches that are basically guaranteed to conflict when merged: for instance, if both branches are merged to a common trunk, almost certainly all the changes in the second branch that were present in the first will conflict. Ouch!

That discarded history is potentially useful to humans too: sometimes someone has to dig through the original diffs and commit logs, and if those have been automatically rewritten the chances of them making sense in their new context is significantly reduced.

So why do people use it?

Despite the disadvantages, I regularly hear people, mainly git users, say how great rebase is. When I ask why the answer is always something like “to clean up my commits”. So I'll ask what they want to clean up, and why. Eventually I realize that they don't actually want to lose their history, what they really want control over how their code is displayed and delivered.

For example, people use rebase to help with maintaining a change as a precise series of smaller changes, which can be reviewed and merged one-by-one when finally delivered. Specifically, the series of changes should be as readable and to the point as possible for the recipient. Patch authors don't want to subject people to a series of patches against old, deprecated code interspersed with the occasional merge from when updated your changes for new APIs. It's longer and harder to read that than a series of changes all made directly against the current version of the target branch. You'd like to present all of the steps in your series of changes against the current target branch, with no noise.

This is a good way to work. You make life easier for the recipient (always important if you want them to merge your code!), and if you have the freedom to revise the earlier steps as you go along you make it easier for yourself too. But it's a mistake to think that maintaining and delivering code in a neat series of steps is mutually exclusive with using the original history of that code.

Maintaining a series of patches and the revision history

So what can you do instead of using rebase? Stop conflating “the series of patches I want to share” with “the revision history of my work”.

For example, Bazaar has a plugin designed specifically for managing a series of changes like this: Loom. With a loom you can maintain a series of steps without discarding any history (see the quick guide). It's still a fairly new tool so the UI isn't quite as polished as core Bazaar, but it's already a pleasure to use and will only get better.

Another example, which was told to me on #twisted: you have started a new project, just as a personal experiment. After 200 revisions you decide it's useful and that it's time to share it with the world. But your early commit messages are junk like “lol butts” because you were just experimenting rather than thinking about sharing the code. You want to "clean" the history, i.e. rebase. Or do you?

Actually, all the person wanted to do was to provide more useful commit messages after the fact. They didn't actually want to discard the real history if they didn't have to. Here's a simple technique that can do that without forgetting the real history and synthesising a new one:

# Create a new branch with no history.
bzr init my_project
cd my_project

# Merge in the first 10 revisions of the experiment, and
# give commit that with a useful message.
bzr merge ../experimental-junk -r 0..10
bzr commit -m "Initial implementation of Frobnicator"

# Merge in the next group of changes and commit those. 
bzr merge ../experimental-junk -r 10..33
bzr commit -m "Add Twizzler class, remove Frobnicator.twizzle() method"

# Etcetera...

Summary

If you're thinking of using rebase, ask yourself if it's really the only way to do what you want. It probably not the only way, and, in my experience, it's probably not even the best way.

Further reading:

Thanks to Mary Gardiner and Jono Lange for reading drafts of this post; without them it would have been twice as long and half as interesting!

1. “throws away” might sound too strong, because generally the revisions are still in a repository somewhere, if you know exactly where to look. But if they're going to be ignored, then that's irrelevant. From the perspective of the branch, that history is no longer there, and that's all that matters.

Andrew Bennetts, July 2008