mock abuse – asserting the implementation

I use mocks (and stubs and other test doubles) for checking expectations: I’m a “mockist“.  Likewise, a guideline for specs or unit tests is to focus on testing your code (separately from integration tests which might exercise the full stack), and assume that the framework has tested its own code.  However, I think that this guideline is sometimes taken too far, leading to mock abuse.

Here are some guidelines that help me follow another important principle: test the interface, not the implementation.

  • If the test looks like a copy and paste or trivial transformation of the production code, something is probably wrong.
  • “Complex” code should be avoided in specs, especially if it is a reproduction of the production code (e.g. SQL, ActiveRecord::Base#find conditions, and non-trivial regular expressions).
  • I prefer business language examples over technical language modeling (e.g. “belongs_to” or “verifies_format_of“).

Some (bad) examples

As an example of what I’m talking about: Testing Named Scope with shoulda’s should_have_named_scope (also should_belong_to, should_have_one, and should_have_many from shoulda’s ActiveRecord Macros).  The rails documentation for named_scope also suggests that you can test it by looking at the proxy_options method.

What’s wrong with the following?

# from rails rdoc:
expected_options = { :conditions => { :colored => 'red' } }
assert_equal expected_options, Shirt.colored('red').proxy_options
# from shoulda
should_have_named_scope :eighteen,  :conditions => { :age => 18 }
# example of the sort of thing I've seen in codebases I work with
Model.should_receive(:find)
     .with(:all,
           :conditions => ["foo => ? and (bar like ? or blatz in ?)",
                           foo, bar, blatz]).and_return([...])
# I've also seen bastardizations of the above for SQL building
# code which gets piped into a mocked Model.find_by_sql

Should your specs care if those conditions are created as hashes or arrays, or even at all?  From the outside, it is completely unimportant whether it uses named_scope or a conditions hash behind the scenes.   What you should care about is whether or not they return the correct objects, don’t return the wrong objects, and (maybe) stack appropriately with other named scopes or finder methods.

More practically, those conditions hashes qualify as production code that represents the very core of the business logic you are supposed to be testing.  If you expect that {:colored => 'red'}, but the column name is “color” or the value is actually stored in hex RGB, you won’t have a hint that the code is broken.  If you expect that {:age => 18}, your specs won’t remind you that you were actually supposed to be matching 18 and older.  If you expect the wrong SQL, then your specs have lost their power to help you find the right SQL.  You are wide open for letting sloppy bugs through the very specs that should have caught them.

So, these mocks verify expectations that don’t matter much, but they copy and paste the bits that do matter, the parts you really should be testing.   You are gaining “code coverage”, and thus false confidence.  It’s like a weird form of regulatory capture.  However, a few examples of what you should and should not match would clarify many mistakes immediately.  When working with examples, the boundaries that you need to poke often become obvious.

Good examples: Named scope or finder methods

# the following lines would generally be done using factories
good_examples = [FooBar.create!(...),
                 FooBar.create!(...), ...]
bad_examples  = [FooBar.create!(...)
                 FooBar.create!(...), ...]
# now exercise the code under test
results = FooBar.my_spectacular_finder_or_named_scope(...)
# expectations
good_examples.each {|e| results.should     include(e) }
bad_examples. each {|e| results.should_not include(e) }
# or perhaps
results.should == good_examples

This wins in almost every regard over mocking SQL or checking the conditions hash or proxy_options.  It is clear what is being tested and how. We are oriented towards results, not implementation. The examples should make it clear if we are missing anything.

However, it is not as concise, but it doesn’t matter how concise bad specs are.  We could make macros to make the above code more concise, while still retaining the clarity.  It would also be nice to follow the “one assertion per example” rule, so that developers get fine grained failure notifications.

And it may be slower if it hits the DB, but it doesn’t matter how fast bad specs run.  This is a trade-off  of the ActiveRecord ORM style: to test that you wrote the right queries, you need run the queries and verify the results.

Alternatively, this style is well suited to cucumber or fit tables, enabling the subject matter experts to easily contribute their own examples (and detect mistakes that the developer wouldn’t know about).

Good examples: Validations

Sorry to pick on shoulda above… but it was the easiest example to google for.  To balance the scales there, I do like two of the macros shoulda uses for validations: should_allow_values_for and should_not_allow_values_for.  I also wrote up a short little rspec matcher that does about the same thing.

# shoulda's macros
should_not_allow_values_for :phone_number, "abcd", "1234"
should_allow_values_for     :phone_number, "(123) 456-7890"
# my rspec validation matchers
it { should.reject(:phone_number).of "abcd" }
it { should.reject(:phone_number).of "1234" }
it { should.accept(:phone_number).of "(123) 456-7890" }
it { should.accept(:phone_number).of "(123)456-7890" }
it { should.accept(:phone_number).of "123-456-7890" }

Specs that verify that the “appropriate” regular expression was sent to validates_format_of scare me.  Regular expressions aren’t just opaque blobs, they are a DSL, a programming language in their own right.  Regexes are code, and we test code.  You shouldn’t care that a particular regular expression is sent to a particular rails validation macro; you care that your model validations work appropriately, i.e. accept some values and reject others.

When I’ll ignore my own advice

Here are some contraindications:

I just don’t know what to do:  I might be exploring new realms, and feel a bit lost.  Writing some specs, any specs, is a good way to learn, a good way to keep focused while working, and a good way to just get the job done (TATFT).  Even if the specs are no good and tightly coupled to implementation, at least they can act as “change notifiers”.  The next time someone edits the code, they’ll trigger a build failure that will force them to re-evaluate the specs.  Maybe they’ll come at it with more knowledge and improve the situation.

Pair programming significantly reduces the likelihood of this occurring.

too much yak shaving, time to punt:  I know what to do, but it’s too much work.  For example, if you have some models that are so truly heinous that you cannot simply or clearly create them right there in your spec, my first instinct would be to invest some time in my factories.  It’ll pay off.  But that is yak-shaving, and it may be too much for now.  “Emergency mocking” is excusable.  Document your compromise in the spec, and hopefully that technical debt will be repaid in the future.

But excessive mocking can feel like yak-shaving too.  Weigh your options before going the “quick and dirty” route; it might actually be slow and dirty.

debugging and sanity checks:  If I’m trying to figure out why the results aren’t what I expect them to be, I’ll probably throw in some extra assertions about the implementation, as a sanity check.  I might delete them when I’m done or keep them in but document them as special.

There may be other contraindications, but these are what come to my mind.

What do you think?

Tags: , , , , ,

Leave a Reply