<rss version="2.0">
    <channel>
        <title>bemusement.org: diary</title>
        <link>http://bemusement.org/diary/</link>
        <description>Updates to bemusement.org: diary</description>
        <language>en</language>
        <webMaster>andrew-puzzling@puzzling.org</webMaster>
        <ttl>1440</ttl>
        <item><title>Tests are code, doctests aren't</title><link>http://bemusement.org/diary/2008/October/24/more-doctest-problems</link><guid isPermaLink="true">http://bemusement.org/diary/2008/October/24/more-doctest-problems</guid><pubDate>Thu, 23 Oct 2008 23:41:00 GMT</pubDate><description><![CDATA[<p>In my <a
href="/diary/2008/October/23/narrative-tests">last
post</a> I explained why I think narrative-style tests make poor unit tests.
That alone is a good reason not to write unit tests in Python's <a
href="http://docs.python.org/lib/module-doctest.html">doctest</a> format.
Here are more reasons why I don't like doctest for writing tests.

<ul>

<li><strong>Writing test infrastructure becomes harder</strong>
    (any multi-line statement, like defining a class or even a function,
    becomes awkward), but test code benefits from factoring logic out just as
    much as any other code &mdash; and that means classes and functions.</li>

<li><strong>Doctests require contortions to fit the way they compare
    output</strong>, like using <code>sorted(...)</code> when comparing
    dictionaries to get a deterministic comparision.  This detracts
    from readability.  In xUnit, a simple, obvious, and clear
    <code>assertEqual</code> would just work.  In doctests, if this fails:
<pre>&gt;&gt;&gt; foo == bar
True</pre>
    then you get a completely unhelpful error, but doctest leaves you with
    little choice if you have dynamic values that vary between test runs.
    Again, this Just Works in xUnit with <code>assertEqual</code>.  In general,
    xUnit <a href="http://xunitpatterns.com/Custom%20Assertion.html">custom
    assertions</a> are more flexible and readable than doctest's output
    matching.  As Guido <a
    href="http://mail.python.org/pipermail/python-dev/2008-July/081421.html"
    >said on python-dev</a> in July:
<blockquote>
This is an example of the problem with doctest -- it's easy to
overspecify the tests. I don't think that whether the repr() of a
Decimal uses single or double quotes should be considered a spec cast
in stone by doctests.
</blockquote>
    </li>

<li><strong>It's hard to see an overview of the tests at glance</strong>.  With
    a doctest file, individual tests are typically introduced by a sentence or
    three.  Conventions vary from file to file.  There's no tool I know of that
    can give me an outline of the unit tests in a doctest file.  In contrast,
    almost every code editor I know of has at least one way to display an
    outline of the classes and methods of a Python file, which gives a good
    overview of unit tests written in the xUnit framework.  (And if your editor
    can't do it, there's always the amusingly named <a
    href="https://launchpad.net/testdoc">testdoc</a>.)  This sort of outline is
    useful as it gives you a summary of all the conditions being explicitly
    tested.  This helps you spot gaps in coverage, understand what the code
    being tested can do, and know where the most appropriate place to add a
    particular new test is (if you can't easily browse the existing tests,
    people will just add them in arbitrary places like the end, making the test
    file a disorganised, unnavigatable swamp).</li>

<li><strong>Doctest is a mini-language with ugly corners and outright
    bugs.</strong>You cannot start expected output with an ellipsis.  The
    syntax for blanklines in expected output (“<code>&lt;BLANKLINE&gt;</code>”)
    is ugly.  The syntax for toggling various doctest features inline
    (“<code>#doctest: +IGNORE_EXCEPTION_DETAIL</code>”) is worse.  The language
    is outright buggy in places &mdash; the following doctest passes:
<pre>&gt;&gt;&gt; print 'hello'
... print 'world'
hello</pre>
    This one passes too:
<pre>&gt;&gt;&gt; assert True
... garbage
&gt;&gt;&gt; print 1
1</pre>
    
    Testing APIs like pyunit can and do have ugly corners and bugs too, but the
    scope for problems is larger with a mini-language.  I've never heard of an
    outright syntax error being silently ignored by pyunit!  I might be more
    forgiving of doctest's quirks if it wasn't <a
    href="http://groups.google.com/group/comp.lang.python/browse_thread/thread/789d4b8b9e021065/1715acb4f9d4c5ee?lnk=st&q=python+doctest#1715acb4f9d4c5ee"
    >almost 10 years old</a> already.
    </li>

</ul>

<p>But that's not all.  A more fundamental reason why I dislike doctests is
that <strong>tests are code, and code works better in a <code>.py</code> file
than a <code>.txt</code> file</strong>.  There are a couple of reasons for
this:<p>

<ul>

<li><strong>Tool support</strong>.  Text editors already know how to syntax
    highlight <code>.py</code> files correctly.  Pdb works better with normal
    code (in doctests the capturing of stdout confuses the prompting).  I can
    use standard profiling tools.  I can run <a
    href="http://pychecker.sourceforge.net/">PyChecker</a> and <a
    href="http://divmod.org/projects/pyflakes">Pyflakes</a> on <code>.py</code>
    files.  I can use <a href="http://ctags.sourceforge.net/">ctags</a>.  I can
    use <a href="http://bicyclerepair.sourceforge.net/">bicyclerepairman</a>.
    I can use <a href="http://codespeak.net/~mwh/pydoctor/">pydoctor</a> or <a
    href="http://epydoc.sourceforge.net/">epydoc</a>.  There are many more
    examples.</li>

<li><strong>Tests are code, and code needs organisation</strong>.  Tests
    suites in many ways are just like any other code: logic gets reused.
    Normal python modules provide well-known, effective ways to manage this:
    you can make classes that inherit from other classes, you can create
    modules for storing common utility functions, etc.  But you can't import
    code <em>from</em> a doctest.  Defining a function, let alone a class, in a
    doctest just plain looks weird.  And because code is code even inside a
    doctest, sometimes you want to refactor it.  Gerard Meszaros' <em>xUnit
    Test Patterns</em> book is subtitled “Refactoring Test Code” because tests
    need refactoring too.</li>

<li><strong>Prose isn't always a good substitute for comments in the
    code.</strong> A commonly stated benefit of doctests is that they make
    prose easier to write &mdash; but equally they make code comments and
    docstrings <em>harder</em> to write.  In a Python file you can write:
<pre>class Thing(object):
    """Docstring."""
    # Comment.</pre>
    In doctests, you have to write 
<pre>&gt;&gt;&gt; class Thing(object):
...     """Docstring."""
...     # Comment.</pre>
    Those tedious “<code>... </code>” mean that almost every single code
    snippet I've seen in a doctest has lacked even a single comment or
    docstring, even when they really needed it.  A prose preamble isn't always
    the best place to explain code.</li>

</ul>

<p>Tools can be improved to cope with doctest (for instance I heard that my pdb
problems may be solved in Python 2.5), but new tools are continually being
invented, and I want to be able to use those too.  For instance, the <a
href="http://svn.python.org/view/sandbox/trunk/2to3/README?view=auto">2to3</a>
tool for converting Python 2.6 code to the upcoming Python 3.0 doesn't fix code
in doctest files.  And I still can't do “<code>set filetype=doctest</code>” in
vim, which is hardly a new tool.</p>

<p>With sufficiently improved tool support and infrastructure many (but not
all) of my concerns would be reduced.  For instance, it would help if there
were a way to easily reset all state during a long doctest, so that different
parts of the same file could be independent.  And then it would be good if
there were also then a convenient way to put names on these independent
sections.  But you'd still be left with a design that gently encourages people
to do things a worse way (write a big story), and you'd be reinventing the
wheel: xUnit already gives you those things.</p>

<p>In my experience many developers with the best of intentions will
produce poor unit tests with doctest because of the way it subtly encourages
bad practices.  One bad habit I've seen over and over again is
copying-and-pasting helper functions, even large, complicated ones, from
doctest to doctest.  Is it because it's not “real” code, so the instinct to
organise it and avoid duplication doesn't trigger?  Is it because there's no
obvious home for helper functions, because a doctest is not a module?  I wish I
knew.</p>

<p>I do not think doctests are evil.  The doctest format is fine for some
things.  For “page tests” (e.g. using <a
href="http://cheeseshop.python.org/pypi/zope.testbrowser/">zope.testbrowser</a>,
as <a
href="http://plone.org/documentation/tutorial/testing/functional-tests">demonstrated
here</a>), where there's a narrative of a user story driving them, doctests are
a pretty good fit.  They can be good for writing testable documentation (which
is not the same as tests and documentation mixed together!) too.
But those things aren't <em>unit</em> tests.</p>

<p>I've mentioned this book a couple of times, and I do recommend it:</p>

<dl>
<dt style="font-weight: bold">Title</dt>
<dd><em>xUnit Test Patterns: Refactoring Test Code</em></dd>
<dt style="font-weight: bold">Author</dt>
<dd>Gerard Meszaros</dd>
<dt style="font-weight: bold">Website</dt>
<dd><a href="http://xunitpatterns.com/">http://xunitpatterns.com/</a></dd>
</dl>

<p>You can find it <a
href="http://www.amazon.com/xUnit-Test-Patterns-Refactoring-Addison-Wesley/dp/0131495054/">on
Amazon here</a>.</p>

<p>If nothing else, reading it encourages thinking about the way you write
tests, and ways you could do it better.</p>

<p>So despite the hype, I don't think doctest has an advantage over xUnit in
producing readable tests.  Code needs to be clear (including an appropriate
amount of docstrings and comments) whether or not it's test code.  If your
developers aren't writing clear code, you have a serious problem: you are sure
to have difficulty maintaining that code.  It is just as possible to write
incomprehensible tests using doctest as it is using <code>TestCase</code>
classes with test methods.  I know this because, unfortunately, I've seen
plenty of both.  Writing good tests is a skill that takes time and practice to
learn.  Using doctest is obviously not a silver bullet.  Not using doctest
isn't a silver bullet either, but I do think it's usually the better choice.</p>

]]></description></item><item><title>Narrative tests are lousy unit tests</title><link>http://bemusement.org/diary/2008/October/23/narrative-tests</link><guid isPermaLink="true">http://bemusement.org/diary/2008/October/23/narrative-tests</guid><pubDate>Wed, 22 Oct 2008 14:26:53 GMT</pubDate><description><![CDATA[<p>I want to stop people abusing Python's <a
href="http://docs.python.org/lib/module-dctest.html">doctest</a> format.  Many
of the tests I've seen written as doctest files would have been better off as
plain <a href="http://docs.python.org/lib/module-unittest.html">unittest</a>
files.  I'm going to try explain why.  I have many gripes about how people use
doctests, but probably the biggest is that narrative tests are lousy unit
tests.</p>

<p>Narratives tell a story.  Something happens, then another thing, and another
thing, one after the other, in sequence.  Earlier events influence later ones
as the story gradually assembles a complete picture.  Humans like stories, our
brains are used to telling them and receiving them.</p>

<p>Technical documentation is often written with a narrative.  Tutorials are an
obvious case, but not the only one.  A guide to an API may show a series of
different examples, each contrasting with the others in ways that explain to the
reader what they need to understand.</p>

<p>Automated tests can have narratives too, of course.  A narrative test is
quite easy to write: write some code that does something (and check the
result), then do something else (and check that result), and so on until you've
done (and checked) everything you want to do (and check).  Doctests make this
particularly easy.  Here's a toy example of a doctest:</p>

<pre>
   Instantiate a Frobber.

     &gt;&gt;&gt; frobber = Frobber()
     &gt;&gt;&gt; frobber.has_frobbed()
     False

   Now frob it.

     &gt;&gt;&gt; frobber.frob()
     &gt;&gt;&gt; frobber.has_frobbed()
     True

   It can't be frobbed twice.

     &gt;&gt;&gt; frobber.frob()
     Traceback (most recent call last):
     ...
     AlreadyFrobbedError: ...
</pre>

<p>Narrative tests can be good acceptance tests.  An acceptance test often
takes the form of a story; an example might be “<em>an unlogged in user visits a
web page.  They click a particular link that needs a logged in user, so they
get taken to a login screen.  The user has no account yet, so they walk through
the account creation wizard.  Once the wizard is completed, the account is
created and they logged in, and they are taken to the link they originally
clicked on.</em>”</p>

<p>So, having shown how they are easy to write, and appropriate for some tests,
I'll now explain why narratives make lousy unit tests.</p>

<p>A typical unit test has <a
href="http://xunitpatterns.com/Four%20Phase%20Test.html">four phases</a>:

<ol>
	<li>Set up a fixture</li>
	<li>Interact with the system-under-test</li>
	<li>Verify the outcome</li>
	<li>Tear down the fixture</li>
</ol>

Or phrase it the way <a
href="http://en.wikipedia.org/wiki/Behavior_Driven_Development">Behaviour-driven
Development</a> people might, each unit test says: &ldquo;Given situation
<em>X</em>, when I do <em>Y</em>, then <em>Z</em> happens.&rdquo;</p>

<p>Good unit tests are small and specific: they will test just one condition
per test method, i.e. the <em>X</em> and the <em>Y</em> will be as minimal as
reasonably possible.  There's considerable benefit to this style:</p>

<ul>

<li><strong>Every individual test has a name</strong>.  I can refer to a
    failing test precisely by name when communicating with my fellow
    developers.  I can communicate the name to the test runner too: when I am
    trying to focus on just one problem, it's extremely useful to be able to
    easily and precisely specify the subset of the full suite I want to run,
    down to just one test if necessary.  I can even jump straight to a test
    method definition with <a href="http://ctags.sourceforge.net">ctags</a>.
    Compare that with doctests, where you have to say things like “about line
    300 of foo-bar.txt” or “Just after where it says ...”.  That's awkward and
    imprecise, especially when developers are often looking at slightly
    different versions of the same file.</li>

<li><strong>Specific tests give clearer failures, and are easier to
    debug</strong>.  Good unit tests keep the context of the test fairly
    minimal (Meszaros' <em>xUnit Test Patterns</em> book explicitly describes
    “General Fixture” as a cause of the “Obscure Test”).  Narratives inherently
    accumulate context with every line, whether it's relevant to anything else
    or not.  You have to be aware of everything that happened earlier in the
    story to understand and debug a failure (and if this isn't the case, then
    what's the point in having a narrative?).  Unit tests also tend to generate
    more relevant failures, because only tests that are actually affected by
    the problem fail, rather than everything after line 100 because that's
    where the first failure was (and if you suppress the secondary failures,
    you may be suppressing interesting ones along with the irrelevant
    ones).</li>

<li><strong>Specific, narrow tests are better at communicating intent and
    ensuring converage</strong>.  If each test is there to verify
    just one condition, then you can't accidentally lose test coverage just by
    “tidying” the code (automated coverage analysis tools won't necessarily
    notice either; there's more to coverage than just tracking lines executed).
    If you have long, rambling tests, there's a tendency to have a bunch of
    stuff that's exercised only implicitly, as a side-effect of doing it all in
    one big <a
    href="http://xunitpatterns.com/Obscure%20Test.html#Eager%20Test">eager</a>
    narrative...  so changes to that narrative can easily lose that
    coverage.  Simple, specific code is easier to maintain than single a
    meandering story that tries to hit as many cases as possible.  Make
    single-condition unit tests an explicit part of your coding standard!</li>

</ul>

<p>So that's why I think narrative tests are poor unit tests.  And I think unit
tests ought to be the bulk of most automated test suites.</p>

<p>Tomorrow I'll post about some other problems with the doctest
format.</p>

]]></description></item><item><title>Rebase is not the only way to deliver clean code</title><link>http://bemusement.org/diary/2008/July/29/rebase-criticism</link><guid isPermaLink="true">http://bemusement.org/diary/2008/July/29/rebase-criticism</guid><pubDate>Mon, 28 Jul 2008 14:20:15 GMT</pubDate><description><![CDATA[
<p>
I'm a bit perplexed by fans of git's rebase feature.  I often hear git users
recommending it as <em>the</em> way to work with distributed version control.  I
think they're conflating “the series of patches I want to share” with “the
revision history of my work”.
</p>


<h2>Rebase?  What's that?</h2>

<p>
It's a feature of some version control tools, probably best known from <a
href="http://git.or.cz/">git</a>, but there's a plugin that adds it to <a
href="http://bazaar-vcs.org/">bzr</a> too.  The rebase command will take a
branch and <a
href="http://www.kernel.org/pub/software/scm/git/docs/user-manual.html#using-git-rebase">rewrite
its history</a> so that it is as if the branch had been based off a different
branch or revision than the one it actually was.
</p>

<p>
Rebasing throws away<a href="#rebase-criticism-footnote-1">&sup1;</a> the
history of a branch.  Unfortunately, throwing away that history hampers
collaboration on that branch: if someone has branched off your branch, you now
have two branches that appear unrelated to your VCS but make nearly identical
changes to the same code.  In other words, you now have two branches that are
basically guaranteed to conflict when merged: for instance, if both branches are
merged to a common trunk, almost certainly all the changes in the second branch
that were present in the first will conflict.  Ouch!
</p>

<p>
That discarded history is potentially useful to humans too: sometimes someone
has to dig through the original diffs and commit logs, and if those have been
automatically rewritten the chances of them making sense in their new context is
significantly reduced.
</p>


<h2>So why do people use it?</h2>

<p>
Despite the disadvantages, I regularly hear people, mainly git users, say how
great rebase is.  When I ask why the answer is always something like “to clean
up my commits”.  So I'll ask what they want to clean up, and why.  Eventually I
realize that they don't actually want to lose their history, what they really
want control over how their code is displayed and delivered.
</p>

<p>
For example, people use rebase to help with maintaining a change as a precise
series of smaller changes, which can be reviewed and merged one-by-one when
finally delivered.  Specifically, the series of changes should be as readable
and to the point as possible for the recipient.  Patch authors don't want to
subject people to a series of patches against old, deprecated code interspersed
with the occasional merge from when updated your changes for new APIs.  It's
longer and harder to read that than a series of changes all made directly
against the current version of the target branch.  You'd like to present all 
of the steps in your series of changes against the current target branch, with
no noise.
</p>

<p>
This is a good way to work.  You make life easier for the recipient (always
important if you want them to merge your code!), and if you have the freedom to
revise the earlier steps as you go along you make it easier for yourself too.
But it's a mistake to think that maintaining and delivering code in a neat
series of steps is mutually exclusive with using the original history of that
code.
</p>


<h2>Maintaining a series of patches <em>and</em> the revision history</h2>

<p>
So what can you do instead of using rebase?  Stop conflating “the series of
patches I want to share” with “the revision history of my work”.
</p>

<p>
For example, Bazaar has a plugin designed specifically for managing a series of
changes like this: <a href="https://launchpad.net/bzr-loom">Loom</a>.  With a
loom you can maintain a series of steps without discarding any history (see <a
href="http://bazaar.launchpad.net/~bzr-loom-devs/bzr-loom/trunk/annotate/head:/HOWTO">the
quick guide</a>).  It's still a fairly new tool so the UI isn't quite as
polished as core Bazaar, but it's already a pleasure to use and will only get
better.
</p>

<p>
Another example, which was told to me on #twisted: you have started a new
project, just as a personal experiment.  After 200 revisions you decide it's
useful and that it's time to share it with the world.  But your early commit
messages are junk like “lol butts” because you were just experimenting rather
than thinking about sharing the code.  You want to "clean" the history, i.e.
rebase.  Or do you?
</p>

<p>
Actually, all the person wanted to do was to provide more useful commit
messages after the fact.  They didn't actually want to discard the real history
if they didn't have to.  Here's a simple technique that can do that without
forgetting the real history and synthesising a new one:
</p>

<pre>
<em># Create a new branch with no history.</em>
bzr init my_project
cd my_project

<em># Merge in the first 10 revisions of the experiment, and
# give commit that with a useful message.</em>
bzr merge ../experimental-junk -r 0..10
bzr commit -m "Initial implementation of Frobnicator"

<em># Merge in the next group of changes and commit those. </em>
bzr merge ../experimental-junk -r 10..33
bzr commit -m "Add Twizzler class, remove Frobnicator.twizzle() method"

<em># Etcetera...</em>
</pre>


<h2>Summary</h2>

<p>
If you're thinking of using rebase, ask yourself if it's really the only way
to do what you want.  It probably not the only way, and, in my experience, it's
probably not even the best way.
</p>

<p>Further reading:</p>
<ul>
<li><a
href="http://changelog.complete.org/posts/586-Rebase-Considered-Harmful.html">Rebase
Considered Harmful</a> by John Goerzen.</li>
<li><a
href="http://blog.madism.org/index.php/2007/02/25/124-git-rebase-is-not-harmful-it-s-just-_not_-always-the-best-solution-that-s-all">git
rebase is not harmful, it's just _not_ always the best solution, that's all.</a>
by Pierre Habouzi; a reply to John Goerzen that says rebase is fine so long as
you don't share the branch.</li>
<li><a href="http://lwn.net/Articles/269120/">linux-next and patch management
process</a> at LWN.net; includes coverage of linux-kernel mailing list
discussion about rebase.</li>
<li><a href="http://kerneltrap.org/Linux/Git_Management">Git Management</a> at
Kerneltrap; more linux-kernel discussion of rebase.</li>
</ul>

<p>
Thanks to Mary Gardiner and Jono Lange for reading drafts of this post; without
them it would have been twice as long and half as interesting!
</p>

<p>&mdash;</p>

<a name="rebase-criticism-footnote-1"></a>1. “throws away” might sound too
strong, because generally the revisions are still in a repository somewhere, if
you know exactly where to look.  But if they're going to be ignored, then that's
irrelevant.  From the perspective of the branch, that history is no longer
there, and that's all that matters.

]]></description></item><item><title>Using Bazaar to hack on Twisted</title><link>http://bemusement.org/diary/2008/April/6/twisted-bzr</link><guid isPermaLink="true">http://bemusement.org/diary/2008/April/6/twisted-bzr</guid><pubDate>Sun, 06 Apr 2008 14:58:03 GMT</pubDate><description><![CDATA[<p>At PyCon, I helped several Twisted developers to use Bazaar to work on
Twisted, even though the repository is still SVN.  Here's how to do it.</p>

<p>First, install bzr and the <a
href="http://bazaar-vcs.org/BzrForeignBranches/Subversion">bzr-svn</a> plugin.
I strongly recommend using the latest release of bzr-svn, 0.4.9, as Jelmer is
fixing bugs and improving the speed at an impressive rate.  0.4.9 is in Ubuntu
Hardy.</p>

<p>Then make a shared repository for your Twisted branches:</p>

<pre>
bzr init-repo --rich-root-pack Twisted
</pre>

<p>(Forgetting <code>--rich-root-pack</code> is a common gotcha: the default
repository format in bzr 1.3 doesn't have a feature bzr-svn needs.)</p>

<p>Then checkout the SVN trunk into the repository.  You could do this
direct from SVN, but it's much faster to use an already converted copy:</p>

<pre>
cd Twisted
bzr checkout http://www.twistedmatrix.com/users/spiv/bzr/twisted/trunk/
cd trunk
bzr switch svn://svn.twistedmatrix.com/svn/Twisted/trunk
</pre>

<p>To work on a branch already in SVN, you can just checkout or branch
(depending on which workflow you prefer) it directly into your local
repository.  Checkouts will be more familiar to SVN users than independent
branches, but if you want to make commits you'll need write access to Twisted's
SVN repository.</p>

<pre>
cd ..
bzr checkout svn://svn.twistedmatrix.com/svn/Twisted/branches/feature-branch-1
cd feature-branch-1
<i>hack hack hack...</i>
bzr commit
</pre>

<p>(Don't worry too much about branches vs. checkouts.  You can always use
<code>bzr reconfigure</code> to change a checkout into a branch and vice versa
if you change your mind later.)</p>

<p>To make a new branch in the SVN repository it's best to just use svn
cp as before, then make a checkout/branch of that with bzr. (See <a
href="https://bugs.launchpad.net/bzr-svn/+bug/203368">bug 203368</a>)</p>

<pre>
svn cp svn://svn.twistedmatrix.com/svn/Twisted/trunk svn://svn.twistedmatrix.com/svn/Twisted/branches/new-branch
bzr checkout svn://svn.twistedmatrix.com/svn/Twisted/branches/new-branch
</pre>

<p>If you can't commit to Twisted SVN you can still use bzr.  Just make
independent branches rather than checkouts.  To share your branches with
other people you can simply <code>bzr push</code> them to anywhere you
like, including Launchpad.</p>

<p>And that's really it.  Everything else is just regular bzr, including
the excellent merging and offline commits.</p>

<p>For example, here's how you'd merge a branch to trunk and commit it:</p>

<pre>
bzr checkout svn+ssh://svn.twistedmatrix.com/svn/Twisted/trunk
cd trunk
bzr merge ../my-branch   # or you could use an SVN url!
bzr commit
</pre>

]]></description></item><item><title>Re: Dangerous Merging</title><link>http://bemusement.org/diary/2007/August/15/re-dangerous-merging</link><guid isPermaLink="true">http://bemusement.org/diary/2007/August/15/re-dangerous-merging</guid><pubDate>Wed, 15 Aug 2007 06:28:47 GMT</pubDate><description><![CDATA[<p>Matthew Palmer <a
href="http://www.hezmatt.org/~mpalmer/blog/general/dangerous_merging.html">argues
against</a> “having your revision control system assume that patches which are
the same text is the same”:</p>

<blockquote>If you think this is right, "think about two different patches each
adding a new keyword and also changing the line ``#define NUM_OF_KEYWORDS 17''
to ``#define NUM_OF_KEYWORDS 18''." (example taken from the <a
href="http://www.darcs.net/manual/node6.html#SECTION00635000000000000000">darcs
manual</a>, because I'm not going to come up with a better example than
that).</blockquote>

<p>In the example he cites, there <em>will</em> most likely be a conflict,
assuming the added keywords are different (and depending on what the source
changes to add keywords look like, but the obvious schemes would cause a
conflict).  The conflict will be at the point where the different keywords are
added, rather than at the “#define NUM_OF_KEYWORDS”, but the human resolving
that conflict would, if your source is clear, know that they need to check the
NUM_OF_KEYWORDS value too.  (Although if your source code has the property that
you need to update multiple places to register one new thing, it's violating
“Don't Repeat Yourself” principle, but that's another discussion...)</p>

<p>But more significantly, there are plenty of ways that two patches that don't
conflict textually can conflict semantically.  One patch might change the
signature of a function, and another may add a new caller of that function that
assumes it has the old signature.  And there's lots of more subtle ways that
they could conflict.</p>

<p>If you want to ensure your code is correct in the face of changes, then the
tool for the job isn't a Version Control System.  It's a Test Suite, preferably
an automated one.  Version Control Systems help you collaborate and manage
changes, but it's your Test Suite that tells you if the code still works.</p>

]]></description></item>
    </channel>
</rss>