features of rails migrations you should probably use

November 29th, 2008

I recently paired with another developer to fix a bug in a rails DB migration.  As we cleaned up the code in order to analyze the bug, we noticed two simple features that were not being used, and the other developer recommended that I write up an email to point these features out to everyone else.  And now I’m cleaning up that email to post here.  Hopefully this helps someone else out.  Both of these (and more) are documented at http://api.rubyonrails.org/classes/ActiveRecord/Migration.html

Cool feature: say_with_time

If you find yourself putting comments around your code to explain to developers what’s going on, please consider instead using “say_with_time“.  Then you can document what is happening both in the code and on the console when the migration is actually running… and you’ll get other nice info printed out (like the elapsed time) as well.

Important feature: ActiveRecord::IrreversableMigration

If the migration cannot safely or easily be migrated downwards, then we need to communicate that clearly to other developers.  But “puts” isn’t good enough.  Instead, “raise ActiveRecord::IrreversableMigration“.

For complicated migrations, even if it is possible to safely reverse the migration, I strongly prefer simply raising the exception.  It’s too easy to make a mistake, and then you’ll have a DB that claims to be at one version, but has corrupted data for that version, which will most likely lead to more pain and suffering down the line.

If the migration is just cleaning up bad data, then there’s probably no real need to reverse it.  But in that case maybe you should at least print out a message to the screen letting the developer know that nothing is happening, and why that is okay.

Since I rarely ever use down migrations, my threshold is probably lower than most; if :Rinvert from rails.vim can’t automatically generate the down migration, then I will probably simply raise the exception.  I’ve personally witnessed too many needless bugs due to corrupted data and too many broken down migrations to invest any significant time into them.  At any rate, developers should use discretion with down migrations.

Oh, and please don’t ever run down migrations in production.  That’s what database backups are for, and you are backing up before you upgrade your production database, aren’t you?

Use the progressbar gem for long running data migrations

And one other thing that is not included with rails that you should probably be using anyway: the progressbar gem.  If you any long running data migrations, this is a must.  And just because it isn’t long running for you with your developer DB doesn’t mean it won’t be long running during deployment to production.  It’s trivially easy to use, and your deployer’s won’t be stuck wondering if their connection has been dropped or the migration has locked up.  And the ETA will let them know if they have time to get a cup of coffee.  The other developers and deployers will thank you.

Simple Example

(albeit, also a poorly contrived example)

require 'progressbar'

class CreateWidgetAuxiliaryFrobs < ActiveRecord::Migration

  def self.up
    create_table :widget_auxiliary_frobs do |t|
      t.integer "widget_id"
      t.string  "frob_type"
      t.integer "frobitude"
      # etc...
    end

    say_with_time("migrating froms from widgets") do
      widgets = Widget.find(:all)
      pbar = ProgressBar.new("Generating Widget Frobs", widgets.size)
      widgets.each do |w|
       # this code changes the data irreversibly
       # this code can't be (easily) rewritten with a SQL UPDATE or INSERT
       # etc  etc  etc
       pbar.inc
      end
      pbar.finish
    end

    say_with_time("delete obsolete widget/wadgit data") do
      Wadget.delete_all("value = 'kerfluffle'")
      remove_column :widget, :foo
      remove_column :widget, :bar_id
      # etc...
    end
  end

  def self.down
    raise ActiveRecord::IrreversibleMigration
  end
end

If the dataset is very large

If the dataset is especially large, you'll want to iterate through it in a less naive manner than I did above: "Widget.find(:all).each".  At the very least, you'll want to iterate in such a way that already handled objects can be garbage collected prior to the end of the loop.  This might be necessary to avoiding the dreaded NoMemoryError (or decreased speed due to massive swapping).  This can be handled simply by iterating through the dataset using pagination, but you could also employ a more sophisticated strategy.

TDD (Test Driven Development) is not about testing

October 26th, 2008

The ruby/rails developer community seems to talk far more about automated developer testing than many other developer communities.  This is great.  There’s some disagreement and debate as to the level and type of testing that should be done, and that is to be expected.  There’s some debate as to which testing tools one should use, and that is also just fine and dandy (although I’ll admit that I don’t have a clue what the rspec haters are going on about).

First, anyone who claims that automated unit tests removes the need for QA testers or usability testing or any form of exploratory (manual) testing… they’re completely nuts. :-)   Nor does it remove the need for code reviews, or refactoring, or paying attention to software design.  This seems to be main point of posts like like Hampton Hates Automated Testing and Luke Francl’s “Testing is Overated” talk at RubyFringe.  However, this is not a claim that I’ve ever heard any XPer, agilist or TDD/BDD proponent make. Instead, they’ve been saying for years now that “test driven development is not about testing”.  And talking about automated developer testing without talking about TDD/BDD seems to me like an adventure in missing the point.

If automated developer testing leads to undue confidence, then Hampton is correct: the developer’s hubris will allow more bugs to get through. In my experience, a humble/paranoid developer will benefit greatly from BDD and put out code with fewer bugs in less time. And an arrogant “my code doesn’t have bugs because of pet theory #462″ developer will eventually get themselves into trouble with or without automated tests… but the automated tests may help them dig out from it and perhaps not get into that particular brand of trouble again.

People who confuse automated developer testing with QA testing often talk about “bugs” and “reducing defect count” as if the main point of automated developer testing is to reduce bugs.  Test Driven Development is about driving yourself towards a better design (which is often also more easily testable as a happy byproduct).  This is why BDD (Behavior Driven Development) was coined: to help people grok TDD without being biased by the word “test” and some of its other connotations.  Several other terms were tried out (“eXecutable eXample Driven design” or “XXD” was my personal favorite), but BDD is the one that seemed to catch on and win out.

Also, BDD isn’t meant to be a “crutch” for “if you’re not good about thinking about programming.” It’s about giving all programmers, the so-so programmers and the guru programmers, another paradigm through which to view their code. BDD is about imagining the best possible API/interface/outcome, giving some example of how that code might work (if only the implementation were there), and then filling in the implementation until it works. And then doing it again in short incremental improvements. It’s about getting into “the flow” in minutes, instead of hours.

Yeah, those examples and their assertions also stay around until later as a regression suite. That’s nice. The better examples also hang around as documentation to future developers for how the system is expected to behave. That’s very nice. But, in my experience, they also allow me to develop better, cleaner code more quickly than otherwise… and the “tests” are both a happy byproduct and an enabler.

Hello world!

October 26th, 2008

So… at every software conference I go to, someone asks me, “where’s your blog?”  And developers like Jay Fields have made the case for why good developers have blogs.  But Wired has decided that blogs are now obsolete (in typically pretentious Wired manner).  And that means that now is absolutely the time for me to start up my own blog.

Honestly, the thought of me blogging sort of weirds me out.

The permanence of data on the internet is part of it, but that exists with emails to public mailing lists, and comments on other blogs, and even with IRC transcripts… and doesn’t bother me as much.  But blogging is more like, “this site here represents my thoughts”.  So I’m going to put up a disclaimer here: not only does this site not represent the opinions of my employers (past, present, and future), but it might not represent my own thoughts (past, present, or future).  I reserve the right to change my mind, and also to post incomplete thoughts which a part of me disagrees with.

It seems like the responsibility and formality are expected to be a notch up from other more informal communication media: how often one posts, how well they write, etc.  I suppose these things aren’t that big of a deal.  I don’t hold other blog writers to any particularly high standards in this realm.  But still, these thoughts tend to increase stress levels, and why would I willingly do that to myself?

Most importantly, I don’t really think that I will write anything that I personally would be interested in reading.  I don’t think I have anything particularly important to say.  And I don’t really care if people across the world are affected by what I say.  Again, this isn’t something that I judge other bloggers by.

So what will I write about?  When will I write?  And what will happen when (not if) I’m embarrassed by what I’ve written?  Enough naval-gazing… we’ll see what happens.  :-)