Delete Like a Crazed Madman

28 May 2009

Lately I’ve been working with this guy Felix who likes to rip the application into shreds when confronted with a problem. Initially, it’s a bit scary. I know we have source control and we can get it all back, but ripping out large crucial swaths of code just seems so violent and yet it’s a pretty effective strategy.

On this one project we’ve had a problem for well over a year of our tests not rolling back the database after each test. We’ve got all the configuration set up for that to happen, but it just doesn’t and nobody could figure out why. One of the problems is that this is a Rails application that is 2.5ish years old and serves millions of users every day. So in order to make that happen back in the early days of Rails we had to extend and modify a lot of the Rails framework. Apparently one of those modification broke the mechanism for rolling back the database after every test.

Not rolling back the database means that one tests creation/updating of a model can break other tests so we have to remember to clean up after ourselves. Invariably someone forgets to and we get a lot of weird test failures that have to be worked out (or worse disabled with the comment “runs by itself but not with all the other tests”).

Felix’s solution to our tricky problem was to create a new crazy simple ActiveRecord model ‘Foo’ that one had one validation: Its name had to be unique. Then we wrote a crazy simple set of tests that tried to create and save a Foo object in separate tests with the same name. If the database was rolling back properly then there should be no problem. Of course they failed in our messed up environment. Now since this model Foo has no dependencies on any of our complicated models, we could delete lots of stuff without breaking it. And we did. We deleted every plugin and gem we had until we found the problem. Of course we did it pseudo-binary search style – cutting out large sections of gems/plugins in one go to see it it had any effect and then narrowing it down when it worked. In a relatively short amount of time we had found the culprit: A plugin we had written that attempted to make Rails rollback multiple databases in nested transactions. Now the best solution would be to move up to a more modern version of Rails where this just works (or at least the internet says it does), but our solution was to make this particular plugin not load while in the testing environment. Not a perfect solution, but a good enough one.

The real take away from this exercise for me is that simplification and deletion are powerful tools in bug fixing toolkit.