One year ago last week, I wrapped up work on a Ruby gem called test_data that aims to be a more robust alternative to Rails’ rather-limited database fixtures. It also stands in sharp contrast to factory_bot, the design of which has led countless developers to tie their test suites into very slow, very confusing knots.
I iterated on
test_data’s basic design in meetings and on whiteboards for
years before I broke ground. Once I started coding, I couldn’t stop. I worked
tirelessly to cover every edge case. I plugged it into several of my own apps to
prove that it imbued the benefits I thought it would: dramatically simpler test
code and gobsmackingly-fast runtime performance. And then I walked away from
the project without so much as telling anyone about it.
Why the hell would I do that? Let’s chat.
If you’ve ever seen the total runtime of a Rails test suite grow over time, and
if you’ve ever observed the superlinear rate of that duration’s growth as each
additional feature increases the cost of every existing test’s setup, then
you’ll understand what
test_data is solving for the minute you look at it.
(And if you’re not familiar with this phenomenon, a younger, faster-talking
version of me gave a sprightly description of it in 2015, 36 minutes into my
talk “How to Stop Hating Your
Tests.”) Despite the
fact that slow test suites are one of the top two or three systemic problems
plaguing large Rails apps, hardly anyone has innovated in this area since the
mid-2000s. If you follow conventional approaches to testing, once your suite
places a bottleneck on your organization’s productivity, your only hope is to
be successful enough that you can afford to pay untold sums to a cloud
provider to parallelize the shit out of your CI build in an effort to keep it
under 30 minutes.
That’s why we’re long past due for a novel, unconventional approach.
test_data gem works by facilitating the creation and maintenance of a
single version-controlled database of test data that grows up alongside your
It all starts with the introduction of a fourth Rails environment in addition to
production. You start up the app in the
test_dataenvironment and prepare the system just like you might for a staging or user acceptance testing environment: run seed scripts, create a user to cover each role, and generate just enough data to exercise all the features of the application
Once you’ve generated the data your tests need, run
rake test_data:dumpto save that entire database’s schema and data to SQL files to be committed along with your code
Each test that requires your test data starts by calling
TestData.uses_test_datain a setup hook, which will load that database at most once and then—through the power and speed of transaction savepoints—ensure each individual test operates in a clean environment with the entirety of your test database in place
For existing tests that depend on factories and fixtures, compatibility can be retained by way of additional transactional save points, accessible via
The cherry on top is that because the
test_datadatabase needs to be migrated over the life of the application, you’ll catch migration problems that would otherwise be missed, especially if you’re in the habit of resetting your
There’s a lot going on in the above description, but hopefully the README breaks down everything well enough if you’re interested in digging into how it all works.
And I realize I’m burying the lede here, but it’s not an exaggeration to say
that very large test suites that use
test_data could reasonably be expected to
be 100 to 1000 times faster than comparable suite written with
course, I’m free to pick whatever numbers I want out of thin air, because as I
stated earlier, nobody’s actually used
test_data so there’s no one out there
to dispute my claims. I’d be thrilled if you tried it out and wrote some
benchmarks just to prove me wrong, though!
While this approach represents a radical departure from how most developers treat test data, it might gin up feelings of nostalgia to anyone who’s ever worked in QA, as this is strikingly similar to how many QA departments were testing web application 20 years ago. Along the same lines, nothing about this gem is Ruby or Rails specific—consider it a reference implementation for a pattern that every web application framework should consider adopting.
Anyway, that’s the thing and what it does.
But here we are, five minutes into your having even heard of this
test_data gem and I have another piece of news
to share: I’m resigning as lead maintainer of the project, effective
I haven’t given up on
test_data—I am as convinced as ever it’s a powerful idea
that could help teams build much faster, much more comprehensible test suites.
It seems to work well already, but it needs the hardening that can only be
achieved by a brave handful of users willing to adopt it and use it in anger.
There’s not a doubt in my mind that with more edge cases covered and more
compelling documentation, this could be a game-changer for the next
generation of web apps.
But if any of that comes to pass, it won’t be me who takes it across the finish line.
I’ve created hundreds of repositories over the years. Mostly to scratch an itch that I had for a brief moment in time. And most of that work is obscure—the median number of users of my open source projects is probably zero. But at some point a few years ago, my projects cumulatively crossed an invisible threshold, beyond which I could easily fill a 40-hour workweek doing nothing but responding to issues and pull requests on projects I don’t even use anymore.
I don’t think I’m unusual in feeling a bit of guilt and shame when someone asks for help and I tell them no, or I ignore them, or I agree to help them only to realize doing so will cause me to fall short of a commitment to somebody else. That latent stress is why I never developed a rational, consistent approach to responding to users’ feature requests on GitHub. If it’s a light day and the thing somebody’s asking seems easily explicably or achievable, I’ll try to do right by them and respond (even if it ends up consuming my entire morning). If I’m busy and feeling overwhelmed, I might archive the email after skimming it—the software is provided “as-is,” after all. Even half-assing it like this, it feels like I have to hustle just to stay in place.
The upshot of all this is that when I try to tackle something new and devote my complete attention to it, I can’t. Lingering responsibilities across the web of (for lack of a better descriptor) executable content I have produced over the years inevitably interrupt my focus and represent a persistent drag on my productivity.
I want to be clear: this isn’t burnout. This is clear-eyed recognition of a predicament I created for myself.
I don’t pretend to have the solution to the seemingly-intractable problem of making long-term open source maintenance sustainable. I don’t need your $5 GitHub sponsorship. I don’t want a big company to hire me to do open source drudgery full-time. I’m not interested in cultivating a community of contributors if it means doing the emotional labor of moderating their communication and mediating their conflicts. I just want to be able to build cool stuff, share it to make a broader point, and then never think about it again.
So, getting back to the question at hand: if this library is so great, why
didn’t I tell anybody about it? It’s because deep down, my open source
experience told me this was such an ambitious project and such a radical
approach that the worst thing that could happen for me personally would be for
test_data to take off and be successful. I wasn’t prepared to handle the
amount of technical and social work that would result. And I’m still not.
But at the same time, if I can be honest with myself about this, I may as well be honest with you: this thing has great potential, but I can’t be the one to carry it forward. Having admitted that, there’s no longer any reason to hide its existence from you.
I realize I’ve just done a great job making the life of an open source maintainer sound really enticing, but I’ll make the pitch anyway: somebody, anybody, please take this gem and run with it.
In my travels, I’ve met countless brilliant developers who share a similar
outlook on software and subscribe to many of the same principles that I do. One
common difference, however, is that they’ll express an interest in contributing
to open source, if only for one problem: they don’t have an idea of something
worth making. And if they have two problems, the second usually goes, “I don’t
know how to get started.” Well, if you sign up to take
test_data off my hands,
both of those problems are already solved: the idea is clear and compelling, and
an initial working version has already been released.
So here’s what I’m offering: if you start using
test_data in your Rails app
and become interested in maintaining and promoting the gem moving forward, I
will work with you to make it happen. Just e-mail
me. And if more than one person raises their
hand, we’ll figure out the best path forward, together.
And if nobody takes the leap and
test_data fades from memory (despite its
tremendous potential to reduce human suffering), that’s perfectly fine too.
I’ll have moved onto the next thing.