Posts Tagged ‘testing’

mock abuse – asserting the implementation

Tuesday, February 3rd, 2009

I use mocks (and stubs and other test doubles) for checking expectations: I’m a “mockist“.  Likewise, a guideline for specs or unit tests is to focus on testing your code (separately from integration tests which might exercise the full stack), and assume that the framework has tested its own code.  However, I think that this guideline is sometimes taken too far, leading to mock abuse.

Here are some guidelines that help me follow another important principle: test the interface, not the implementation.

  • If the test looks like a copy and paste or trivial transformation of the production code, something is probably wrong.
  • “Complex” code should be avoided in specs, especially if it is a reproduction of the production code (e.g. SQL, ActiveRecord::Base#find conditions, and non-trivial regular expressions).
  • I prefer business language examples over technical language modeling (e.g. “belongs_to” or “verifies_format_of“).

Some (bad) examples

As an example of what I’m talking about: Testing Named Scope with shoulda’s should_have_named_scope (also should_belong_to, should_have_one, and should_have_many from shoulda’s ActiveRecord Macros).  The rails documentation for named_scope also suggests that you can test it by looking at the proxy_options method.

What’s wrong with the following?

# from rails rdoc:
expected_options = { :conditions => { :colored => 'red' } }
assert_equal expected_options, Shirt.colored('red').proxy_options
# from shoulda
should_have_named_scope :eighteen,  :conditions => { :age => 18 }
# example of the sort of thing I've seen in codebases I work with
Model.should_receive(:find)
     .with(:all,
           :conditions => ["foo => ? and (bar like ? or blatz in ?)",
                           foo, bar, blatz]).and_return([...])
# I've also seen bastardizations of the above for SQL building
# code which gets piped into a mocked Model.find_by_sql

Should your specs care if those conditions are created as hashes or arrays, or even at all?  From the outside, it is completely unimportant whether it uses named_scope or a conditions hash behind the scenes.   What you should care about is whether or not they return the correct objects, don’t return the wrong objects, and (maybe) stack appropriately with other named scopes or finder methods.

More practically, those conditions hashes qualify as production code that represents the very core of the business logic you are supposed to be testing.  If you expect that {:colored => 'red'}, but the column name is “color” or the value is actually stored in hex RGB, you won’t have a hint that the code is broken.  If you expect that {:age => 18}, your specs won’t remind you that you were actually supposed to be matching 18 and older.  If you expect the wrong SQL, then your specs have lost their power to help you find the right SQL.  You are wide open for letting sloppy bugs through the very specs that should have caught them.

So, these mocks verify expectations that don’t matter much, but they copy and paste the bits that do matter, the parts you really should be testing.   You are gaining “code coverage”, and thus false confidence.  It’s like a weird form of regulatory capture.  However, a few examples of what you should and should not match would clarify many mistakes immediately.  When working with examples, the boundaries that you need to poke often become obvious.

Good examples: Named scope or finder methods

# the following lines would generally be done using factories
good_examples = [FooBar.create!(...),
                 FooBar.create!(...), ...]
bad_examples  = [FooBar.create!(...)
                 FooBar.create!(...), ...]
# now exercise the code under test
results = FooBar.my_spectacular_finder_or_named_scope(...)
# expectations
good_examples.each {|e| results.should     include(e) }
bad_examples. each {|e| results.should_not include(e) }
# or perhaps
results.should == good_examples

This wins in almost every regard over mocking SQL or checking the conditions hash or proxy_options.  It is clear what is being tested and how. We are oriented towards results, not implementation. The examples should make it clear if we are missing anything.

However, it is not as concise, but it doesn’t matter how concise bad specs are.  We could make macros to make the above code more concise, while still retaining the clarity.  It would also be nice to follow the “one assertion per example” rule, so that developers get fine grained failure notifications.

And it may be slower if it hits the DB, but it doesn’t matter how fast bad specs run.  This is a trade-off  of the ActiveRecord ORM style: to test that you wrote the right queries, you need run the queries and verify the results.

Alternatively, this style is well suited to cucumber or fit tables, enabling the subject matter experts to easily contribute their own examples (and detect mistakes that the developer wouldn’t know about).

Good examples: Validations

Sorry to pick on shoulda above… but it was the easiest example to google for.  To balance the scales there, I do like two of the macros shoulda uses for validations: should_allow_values_for and should_not_allow_values_for.  I also wrote up a short little rspec matcher that does about the same thing.

# shoulda's macros
should_not_allow_values_for :phone_number, "abcd", "1234"
should_allow_values_for     :phone_number, "(123) 456-7890"
# my rspec validation matchers
it { should.reject(:phone_number).of "abcd" }
it { should.reject(:phone_number).of "1234" }
it { should.accept(:phone_number).of "(123) 456-7890" }
it { should.accept(:phone_number).of "(123)456-7890" }
it { should.accept(:phone_number).of "123-456-7890" }

Specs that verify that the “appropriate” regular expression was sent to validates_format_of scare me.  Regular expressions aren’t just opaque blobs, they are a DSL, a programming language in their own right.  Regexes are code, and we test code.  You shouldn’t care that a particular regular expression is sent to a particular rails validation macro; you care that your model validations work appropriately, i.e. accept some values and reject others.

When I’ll ignore my own advice

Here are some contraindications:

I just don’t know what to do:  I might be exploring new realms, and feel a bit lost.  Writing some specs, any specs, is a good way to learn, a good way to keep focused while working, and a good way to just get the job done (TATFT).  Even if the specs are no good and tightly coupled to implementation, at least they can act as “change notifiers”.  The next time someone edits the code, they’ll trigger a build failure that will force them to re-evaluate the specs.  Maybe they’ll come at it with more knowledge and improve the situation.

Pair programming significantly reduces the likelihood of this occurring.

too much yak shaving, time to punt:  I know what to do, but it’s too much work.  For example, if you have some models that are so truly heinous that you cannot simply or clearly create them right there in your spec, my first instinct would be to invest some time in my factories.  It’ll pay off.  But that is yak-shaving, and it may be too much for now.  “Emergency mocking” is excusable.  Document your compromise in the spec, and hopefully that technical debt will be repaid in the future.

But excessive mocking can feel like yak-shaving too.  Weigh your options before going the “quick and dirty” route; it might actually be slow and dirty.

debugging and sanity checks:  If I’m trying to figure out why the results aren’t what I expect them to be, I’ll probably throw in some extra assertions about the implementation, as a sanity check.  I might delete them when I’m done or keep them in but document them as special.

There may be other contraindications, but these are what come to my mind.

What do you think?

Times and time zones considered dangerous (especially in tests)

Friday, December 5th, 2008

Just wanted to put out a note of caution to other developers.  Any code (test/spec or production) that depends on global environment settings or non-deterministic functions needs to be treated with fear and trembling.  As a concrete example of this maxim, code that deals with Time, Date, and DateTime has the possibility to trip you up with both global environment and non-determinism.  I recently discovered a potential bug related to this… but it took me 45 minutes to track down… so I wrote up the following afterwards:


nota bene: some of the methods described herein are not defined by Ruby’s core library’s but by Rails’ ActiveSupport.

Any non-determinism in tests is dangerous, and Time.now and its close cousins are certainly on that list.  Whether the code which depends on the current date is in the production code or in the spec code, we’ll need to be very careful of it, lest it trip us up.

I’ve known this for a long time… but I just spent about 45 minutes perplexed with a particularly stupid bug that simply required an attentive eye to unravel… I’ll gladly share my debugging session (condensed for time and minus the aggravation and going in circles), and hopefully save someone else the hassle:

>> DateTime.yesterday < DateTime.now
=> true
>> DateTime.tomorrow < DateTime.now
=> true

Well that shouldn’t ever happen.  Confused yet?  Or do you already know where this is going?  I spent a great deal of time wondering if it was some weird interaction between the Date, Time, and DateTime classes (and trust me, those weird interactions do exist, although maybe they weren’t the cause of this particular bug).

>> DateTime.tomorrow
=> Sun, 24 Aug 2008
>> DateTime.yesterday
=> Fri, 22 Aug 2008
>> DateTime.now
=> Sat, 23 Aug 2008 23:24:35 -0400

Ummm… nothing that would explain the comparison results above… but that’s curious… DateTime.now gives a different inspect output than the others.  I wonder why?

>> DateTime.yesterday.class
=> Date
>> DateTime.tomorrow.class
=> Date
>> DateTime.now.class
=> DateTime

Intriguing.  Certainly not what I expected, given that all of these values were obtained from class methods on DateTime.  Perhaps this means that the comparison requires some coercion to work…  I didn’t investigate the source to see if it uses “to_datetime” as opposed to “to_time”, but I’m assuming that it does, because of the following:

>> DateTime.tomorrow.to_time < DateTime.now.to_time
=> false
>> DateTime.tomorrow.to_datetime < DateTime.now.to_datetime
=> true

So what does Date#to_datetime do to screw us over?

>> DateTime.yesterday.to_datetime
=> Fri, 22 Aug 2008 00:00:00 0000
>> DateTime.tomorrow.to_datetime
=> Sun, 24 Aug 2008 00:00:00 0000
>> DateTime.now.to_datetime
=> Sat, 23 Aug 2008 23:28:11 -0400

Do you see it?  The answer is right there, but it’s easy to miss.  I missed it over and over again for about 40 minutes.  ”-0400” — the timezone.  The timezone had been screwing me up all along.  The DateTime object has my timezone from the get-go, but the Date objects were converted as UTC. Lovely… just lovely.

Now, I wasn’t the original author of the code.  Just an archaeologist.  My guess: the code worked when it was built (during business hours, not in the wee hours of the night when the timezones would trip them up), and it looked like the simplest thing needed.  Generally, you really can’t fault a developer for going that route. I certainly don’t.  But times and dates and especially anything related to the current time and date… well, these are special animals, and need to be dealt with special care.

The long and short of it?  Timezones are global environment state, and you need to be aware of it with any code (especially tests) that deal with time.  Avoid DateTime.yesterday or DateTime.tomorrow like the plague in your test/spec code (unless you enjoy tests that spuriously fail near the witching hour).  You should write your tests (and your production code) in such a way that you can simulate different timezones and different times of day in order to tease out bugs.  And if you need to compare or convert dates and times in your production code, be very aware of what time and timezone your business rules will require, especially if you have any Date objects that might be automatically coerced into DateTime objects.

TDD (Test Driven Development) is not about testing

Sunday, October 26th, 2008

The ruby/rails developer community seems to talk far more about automated developer testing than many other developer communities.  This is great.  There’s some disagreement and debate as to the level and type of testing that should be done, and that is to be expected.  There’s some debate as to which testing tools one should use, and that is also just fine and dandy (although I’ll admit that I don’t have a clue what the rspec haters are going on about).

First, anyone who claims that automated unit tests removes the need for QA testers or usability testing or any form of exploratory (manual) testing… they’re completely nuts. :-)   Nor does it remove the need for code reviews, or refactoring, or paying attention to software design.  This seems to be main point of posts like like Hampton Hates Automated Testing and Luke Francl’s “Testing is Overated” talk at RubyFringe.  However, this is not a claim that I’ve ever heard any XPer, agilist or TDD/BDD proponent make. Instead, they’ve been saying for years now that “test driven development is not about testing”.  And talking about automated developer testing without talking about TDD/BDD seems to me like an adventure in missing the point.

If automated developer testing leads to undue confidence, then Hampton is correct: the developer’s hubris will allow more bugs to get through. In my experience, a humble/paranoid developer will benefit greatly from BDD and put out code with fewer bugs in less time. And an arrogant “my code doesn’t have bugs because of pet theory #462″ developer will eventually get themselves into trouble with or without automated tests… but the automated tests may help them dig out from it and perhaps not get into that particular brand of trouble again.

People who confuse automated developer testing with QA testing often talk about “bugs” and “reducing defect count” as if the main point of automated developer testing is to reduce bugs.  Test Driven Development is about driving yourself towards a better design (which is often also more easily testable as a happy byproduct).  This is why BDD (Behavior Driven Development) was coined: to help people grok TDD without being biased by the word “test” and some of its other connotations.  Several other terms were tried out (“eXecutable eXample Driven design” or “XXD” was my personal favorite), but BDD is the one that seemed to catch on and win out.

Also, BDD isn’t meant to be a “crutch” for “if you’re not good about thinking about programming.” It’s about giving all programmers, the so-so programmers and the guru programmers, another paradigm through which to view their code. BDD is about imagining the best possible API/interface/outcome, giving some example of how that code might work (if only the implementation were there), and then filling in the implementation until it works. And then doing it again in short incremental improvements. It’s about getting into “the flow” in minutes, instead of hours.

Yeah, those examples and their assertions also stay around until later as a regression suite. That’s nice. The better examples also hang around as documentation to future developers for how the system is expected to behave. That’s very nice. But, in my experience, they also allow me to develop better, cleaner code more quickly than otherwise… and the “tests” are both a happy byproduct and an enabler.