Posts Tagged ‘bdd’

please don’t use before(:all)

Saturday, February 7th, 2009

Please do not use “before(:all)” (in rspec) unless it is imperative for performance reasons. Even then, it should only be used with the utmost care given towards any global or shared state, and with no expectations on order of the spec run. You will cause strange build failures if you do not heed this warning.

Each individual example (”it” blocks) should be able to run by itself or in any order and the example group (”describe” blocks) may run in any order, even in random order.

Corollaries:  If you change any global or shared state inside an example, please ensure that you reset it to a baseline before the next example.  If you depend on any global shared state, please set it to a baseline before this example.  Both of these are achieved easily with a before(:each) block (or after(:each) in some cases).

Nine times out of ten, when I see a weird sporadic build failure and there is a before(:all) nearby, I can fix it merely by changing it to a before(:each).  The original author of the spec assumed that it would always run front to back and then proceed to change global or shared state in some of examples (sometimes inadvertantly).

Although I’ve phrased this advice for rspec, it applies equally to test/unit in ruby, or xUnit on any platform.

mock abuse – asserting the implementation

Tuesday, February 3rd, 2009

I use mocks (and stubs and other test doubles) for checking expectations: I’m a “mockist“.  Likewise, a guideline for specs or unit tests is to focus on testing your code (separately from integration tests which might exercise the full stack), and assume that the framework has tested its own code.  However, I think that this guideline is sometimes taken too far, leading to mock abuse.

Here are some guidelines that help me follow another important principle: test the interface, not the implementation.

  • If the test looks like a copy and paste or trivial transformation of the production code, something is probably wrong.
  • “Complex” code should be avoided in specs, especially if it is a reproduction of the production code (e.g. SQL, ActiveRecord::Base#find conditions, and non-trivial regular expressions).
  • I prefer business language examples over technical language modeling (e.g. “belongs_to” or “verifies_format_of“).

Some (bad) examples

As an example of what I’m talking about: Testing Named Scope with shoulda’s should_have_named_scope (also should_belong_to, should_have_one, and should_have_many from shoulda’s ActiveRecord Macros).  The rails documentation for named_scope also suggests that you can test it by looking at the proxy_options method.

What’s wrong with the following?

# from rails rdoc:
expected_options = { :conditions => { :colored => 'red' } }
assert_equal expected_options, Shirt.colored('red').proxy_options
# from shoulda
should_have_named_scope :eighteen,  :conditions => { :age => 18 }
# example of the sort of thing I've seen in codebases I work with
Model.should_receive(:find)
     .with(:all,
           :conditions => ["foo => ? and (bar like ? or blatz in ?)",
                           foo, bar, blatz]).and_return([...])
# I've also seen bastardizations of the above for SQL building
# code which gets piped into a mocked Model.find_by_sql

Should your specs care if those conditions are created as hashes or arrays, or even at all?  From the outside, it is completely unimportant whether it uses named_scope or a conditions hash behind the scenes.   What you should care about is whether or not they return the correct objects, don’t return the wrong objects, and (maybe) stack appropriately with other named scopes or finder methods.

More practically, those conditions hashes qualify as production code that represents the very core of the business logic you are supposed to be testing.  If you expect that {:colored => 'red'}, but the column name is “color” or the value is actually stored in hex RGB, you won’t have a hint that the code is broken.  If you expect that {:age => 18}, your specs won’t remind you that you were actually supposed to be matching 18 and older.  If you expect the wrong SQL, then your specs have lost their power to help you find the right SQL.  You are wide open for letting sloppy bugs through the very specs that should have caught them.

So, these mocks verify expectations that don’t matter much, but they copy and paste the bits that do matter, the parts you really should be testing.   You are gaining “code coverage”, and thus false confidence.  It’s like a weird form of regulatory capture.  However, a few examples of what you should and should not match would clarify many mistakes immediately.  When working with examples, the boundaries that you need to poke often become obvious.

Good examples: Named scope or finder methods

# the following lines would generally be done using factories
good_examples = [FooBar.create!(...),
                 FooBar.create!(...), ...]
bad_examples  = [FooBar.create!(...)
                 FooBar.create!(...), ...]
# now exercise the code under test
results = FooBar.my_spectacular_finder_or_named_scope(...)
# expectations
good_examples.each {|e| results.should     include(e) }
bad_examples. each {|e| results.should_not include(e) }
# or perhaps
results.should == good_examples

This wins in almost every regard over mocking SQL or checking the conditions hash or proxy_options.  It is clear what is being tested and how. We are oriented towards results, not implementation. The examples should make it clear if we are missing anything.

However, it is not as concise, but it doesn’t matter how concise bad specs are.  We could make macros to make the above code more concise, while still retaining the clarity.  It would also be nice to follow the “one assertion per example” rule, so that developers get fine grained failure notifications.

And it may be slower if it hits the DB, but it doesn’t matter how fast bad specs run.  This is a trade-off  of the ActiveRecord ORM style: to test that you wrote the right queries, you need run the queries and verify the results.

Alternatively, this style is well suited to cucumber or fit tables, enabling the subject matter experts to easily contribute their own examples (and detect mistakes that the developer wouldn’t know about).

Good examples: Validations

Sorry to pick on shoulda above… but it was the easiest example to google for.  To balance the scales there, I do like two of the macros shoulda uses for validations: should_allow_values_for and should_not_allow_values_for.  I also wrote up a short little rspec matcher that does about the same thing.

# shoulda's macros
should_not_allow_values_for :phone_number, "abcd", "1234"
should_allow_values_for     :phone_number, "(123) 456-7890"
# my rspec validation matchers
it { should.reject(:phone_number).of "abcd" }
it { should.reject(:phone_number).of "1234" }
it { should.accept(:phone_number).of "(123) 456-7890" }
it { should.accept(:phone_number).of "(123)456-7890" }
it { should.accept(:phone_number).of "123-456-7890" }

Specs that verify that the “appropriate” regular expression was sent to validates_format_of scare me.  Regular expressions aren’t just opaque blobs, they are a DSL, a programming language in their own right.  Regexes are code, and we test code.  You shouldn’t care that a particular regular expression is sent to a particular rails validation macro; you care that your model validations work appropriately, i.e. accept some values and reject others.

When I’ll ignore my own advice

Here are some contraindications:

I just don’t know what to do:  I might be exploring new realms, and feel a bit lost.  Writing some specs, any specs, is a good way to learn, a good way to keep focused while working, and a good way to just get the job done (TATFT).  Even if the specs are no good and tightly coupled to implementation, at least they can act as “change notifiers”.  The next time someone edits the code, they’ll trigger a build failure that will force them to re-evaluate the specs.  Maybe they’ll come at it with more knowledge and improve the situation.

Pair programming significantly reduces the likelihood of this occurring.

too much yak shaving, time to punt:  I know what to do, but it’s too much work.  For example, if you have some models that are so truly heinous that you cannot simply or clearly create them right there in your spec, my first instinct would be to invest some time in my factories.  It’ll pay off.  But that is yak-shaving, and it may be too much for now.  “Emergency mocking” is excusable.  Document your compromise in the spec, and hopefully that technical debt will be repaid in the future.

But excessive mocking can feel like yak-shaving too.  Weigh your options before going the “quick and dirty” route; it might actually be slow and dirty.

debugging and sanity checks:  If I’m trying to figure out why the results aren’t what I expect them to be, I’ll probably throw in some extra assertions about the implementation, as a sanity check.  I might delete them when I’m done or keep them in but document them as special.

There may be other contraindications, but these are what come to my mind.

What do you think?

a better progress bar for rspec

Tuesday, December 9th, 2008

Updated 2008-12-10: added screenshot and video.

I’ve always been a little bit bemused by the default ruby test/unit and rspec output: a series of dots, one for each passing test (or example), and the dots become “F” characters in the event of a failure.  Most importantly: it’s simple, unambiguous about pass/fail, and easy to read at a glance, so no problem there.  You get a feel for the speed at which the tests are running, and you can visualize how many tests have run already.  And it’s easy enough to have them output in color (red or green), to make the errors visually pop out.  And it’s drop dead simple.  That’s good.

So what’s missing?

(for my tastes)

  • Immediate Feedback: Yeah, the default output gives me an “F”, but what am I supposed to do with an “F”? I won’t know where and what that “F” was until the end of the entire test run, when the summary is dumped.  When running through a (functional) test suite that takes three minutes or even fifteen minutes, that just doesn’t work.
  • Concise output: if everything is passing, I don’t need to see 4000+ dots displayed across my screen. It would be nice if the entire output would fit into a single line (when there are no failures).
  • Percentage: A nice to have, not a need to have. But knowing how many tests have already run is generally not as interesting to me as how many tests are there total, and what percentage of them have run.
  • ETA: This isn’t so important for the short test runs that I do over and over again during the BDD cycle… because I expect those tests to finish in under 10 seconds anyway.  But anything more than 10 seconds, and I get distracted. An ETA helps me limit my distraction.

So what do I do about it?

Basically, what I want is a progress bar and that the errors and warnings be displayed immediately. I also want warnings to be printed for slow specs. When using color, I want the entire progress bar printed in green if everything is good, yellow if there has been a warning, and red if there has been an error.

Fortunately, rspec makes it very simple to write a custom output formatter. So, a couple of weekends ago, I threw something together in about half an hour. I’ve used it ever since, and I’m pretty happy with it (I’ve tweaked it a little bit).

What does it look like?

A static image doesn’t really show off the progress bar very well, but here’s a screenshot (click thumbnail for full-size):

And here is a video, so you can see it in action (very low quality, sorry):

The code

http://ekenosen.net/bzr/rspec_support/compact_progress_bar_formatter.rb

It’s currently hosted in a bzr branch, but I’ll stick it up on github if anyone really cares about that.  Also, it depends upon the progressbar gem, so you’ll need to install that.

To use it

The location you download the script to is not important, but this is where I’m (currently) keeping it. You could also use bzr to pull down the entire branch.

$ gem install progressbar
 
$ mkdir -p ~/lib/rspec/
$ cd ~/lib/rspec
$ wget http://ekenosen.net/bzr/rspec_support/compact_progress_bar_formatter.rb
 
$ spec --require ~/lib/rspec/compact_progress_bar_formatter.rb -c -f Spec::Runner::Formatter::CompactProgressBarFormatter path/to/specs

I’m currently using some bash aliases to keep the spec command line nice and short. Perhaps there’s a better way?  In ~/.bash_aliases:

alias  spb='spec        --require ~/lib/rspec/compact_progress_bar_formatter.rb --format Spec::Runner::Formatter::CompactProgressBarFormatter --color'
alias sspb='script/spec --require ~/lib/rspec/compact_progress_bar_formatter.rb --format Spec::Runner::Formatter::CompactProgressBarFormatter --color'

For rake tasks, I use a spec/local_spec.opts (spec/spec.opts is checked into version control as a default).

Nitpicking

I cheated: I didn’t develop it BDD-style.  I intend to fix that soon by rewriting it spec-first; certainly before adding any more functionality. TATFT indeed.

I’d really like to be able to send the threshold (for reporting of slow specs) in on the command line.  It might be easy to do, I haven’t looked into it… I’m currently just editing the file every time I want to change the threshold (e.g. normal specs run with a threshold of 0.1 sec, functional specs run with a threshold of 5 or 10 seconds).  Messy.

Also, it would be nice if the command line didn’t need to be so long… perhaps a supported gem/plugin system for rspec?  If that were easily doable, then I would probably package this up into a gem.

I had to break into the progressbar’s internals a little bit… it didn’t publically expose all of the functionality I wanted.  Still, that’s better than creating my own progressbar library.

TDD (Test Driven Development) is not about testing

Sunday, October 26th, 2008

The ruby/rails developer community seems to talk far more about automated developer testing than many other developer communities.  This is great.  There’s some disagreement and debate as to the level and type of testing that should be done, and that is to be expected.  There’s some debate as to which testing tools one should use, and that is also just fine and dandy (although I’ll admit that I don’t have a clue what the rspec haters are going on about).

First, anyone who claims that automated unit tests removes the need for QA testers or usability testing or any form of exploratory (manual) testing… they’re completely nuts. :-)   Nor does it remove the need for code reviews, or refactoring, or paying attention to software design.  This seems to be main point of posts like like Hampton Hates Automated Testing and Luke Francl’s “Testing is Overated” talk at RubyFringe.  However, this is not a claim that I’ve ever heard any XPer, agilist or TDD/BDD proponent make. Instead, they’ve been saying for years now that “test driven development is not about testing”.  And talking about automated developer testing without talking about TDD/BDD seems to me like an adventure in missing the point.

If automated developer testing leads to undue confidence, then Hampton is correct: the developer’s hubris will allow more bugs to get through. In my experience, a humble/paranoid developer will benefit greatly from BDD and put out code with fewer bugs in less time. And an arrogant “my code doesn’t have bugs because of pet theory #462″ developer will eventually get themselves into trouble with or without automated tests… but the automated tests may help them dig out from it and perhaps not get into that particular brand of trouble again.

People who confuse automated developer testing with QA testing often talk about “bugs” and “reducing defect count” as if the main point of automated developer testing is to reduce bugs.  Test Driven Development is about driving yourself towards a better design (which is often also more easily testable as a happy byproduct).  This is why BDD (Behavior Driven Development) was coined: to help people grok TDD without being biased by the word “test” and some of its other connotations.  Several other terms were tried out (“eXecutable eXample Driven design” or “XXD” was my personal favorite), but BDD is the one that seemed to catch on and win out.

Also, BDD isn’t meant to be a “crutch” for “if you’re not good about thinking about programming.” It’s about giving all programmers, the so-so programmers and the guru programmers, another paradigm through which to view their code. BDD is about imagining the best possible API/interface/outcome, giving some example of how that code might work (if only the implementation were there), and then filling in the implementation until it works. And then doing it again in short incremental improvements. It’s about getting into “the flow” in minutes, instead of hours.

Yeah, those examples and their assertions also stay around until later as a regression suite. That’s nice. The better examples also hang around as documentation to future developers for how the system is expected to behave. That’s very nice. But, in my experience, they also allow me to develop better, cleaner code more quickly than otherwise… and the “tests” are both a happy byproduct and an enabler.