Narrative tests are lousy unit tests
I want to stop people abusing Python's doctest format. Many
of the tests I've seen written as doctest files would have been better off as
plain unittest
files. I'm going to try explain why. I have many gripes about how people use
doctests, but probably the biggest is that narrative tests are lousy unit
tests.
Narratives tell a story. Something happens, then another thing, and another
thing, one after the other, in sequence. Earlier events influence later ones
as the story gradually assembles a complete picture. Humans like stories, our
brains are used to telling them and receiving them.
Technical documentation is often written with a narrative. Tutorials are an
obvious case, but not the only one. A guide to an API may show a series of
different examples, each contrasting with the others in ways that explain to the
reader what they need to understand.
Automated tests can have narratives too, of course. A narrative test is
quite easy to write: write some code that does something (and check the
result), then do something else (and check that result), and so on until you've
done (and checked) everything you want to do (and check). Doctests make this
particularly easy. Here's a toy example of a doctest:
Instantiate a Frobber.
>>> frobber = Frobber()
>>> frobber.has_frobbed()
False
Now frob it.
>>> frobber.frob()
>>> frobber.has_frobbed()
True
It can't be frobbed twice.
>>> frobber.frob()
Traceback (most recent call last):
...
AlreadyFrobbedError: ...
Narrative tests can be good acceptance tests. An acceptance test often
takes the form of a story; an example might be “an unlogged in user visits a
web page. They click a particular link that needs a logged in user, so they
get taken to a login screen. The user has no account yet, so they walk through
the account creation wizard. Once the wizard is completed, the account is
created and they logged in, and they are taken to the link they originally
clicked on.”
So, having shown how they are easy to write, and appropriate for some tests,
I'll now explain why narratives make lousy unit tests.
A typical unit test has four phases:
- Set up a fixture
- Interact with the system-under-test
- Verify the outcome
- Tear down the fixture
Or phrase it the way Behaviour-driven
Development people might, each unit test says: “Given situation
X, when I do Y, then Z happens.”
Good unit tests are small and specific: they will test just one condition
per test method, i.e. the X and the Y will be as minimal as
reasonably possible. There's considerable benefit to this style:
- Every individual test has a name. I can refer to a
failing test precisely by name when communicating with my fellow
developers. I can communicate the name to the test runner too: when I am
trying to focus on just one problem, it's extremely useful to be able to
easily and precisely specify the subset of the full suite I want to run,
down to just one test if necessary. I can even jump straight to a test
method definition with ctags.
Compare that with doctests, where you have to say things like “about line
300 of foo-bar.txt” or “Just after where it says ...”. That's awkward and
imprecise, especially when developers are often looking at slightly
different versions of the same file.
- Specific tests give clearer failures, and are easier to
debug. Good unit tests keep the context of the test fairly
minimal (Meszaros' xUnit Test Patterns book explicitly describes
“General Fixture” as a cause of the “Obscure Test”). Narratives inherently
accumulate context with every line, whether it's relevant to anything else
or not. You have to be aware of everything that happened earlier in the
story to understand and debug a failure (and if this isn't the case, then
what's the point in having a narrative?). Unit tests also tend to generate
more relevant failures, because only tests that are actually affected by
the problem fail, rather than everything after line 100 because that's
where the first failure was (and if you suppress the secondary failures,
you may be suppressing interesting ones along with the irrelevant
ones).
- Specific, narrow tests are better at communicating intent and
ensuring converage. If each test is there to verify
just one condition, then you can't accidentally lose test coverage just by
“tidying” the code (automated coverage analysis tools won't necessarily
notice either; there's more to coverage than just tracking lines executed).
If you have long, rambling tests, there's a tendency to have a bunch of
stuff that's exercised only implicitly, as a side-effect of doing it all in
one big eager
narrative... so changes to that narrative can easily lose that
coverage. Simple, specific code is easier to maintain than single a
meandering story that tries to hit as many cases as possible. Make
single-condition unit tests an explicit part of your coding standard!
So that's why I think narrative tests are poor unit tests. And I think unit
tests ought to be the bulk of most automated test suites.
See also Tests are code, doctests
aren't.
— Andrew Bennetts, October 2008