The future of Rails test data management

… rests entirely on your shoulders, it turns out

Publish Date: November 13, 2023
Authors: Justin Searls

One year ago last week, I wrapped up work on a Ruby gem called test_data that aims to be a more robust alternative to Rails’ rather-limited database fixtures. It also stands in sharp contrast to factory_bot, the design of which has led countless developers to tie their test suites into very slow, very confusing knots.

I iterated on test_data’s basic design in meetings and on whiteboards for years before I broke ground. Once I started coding, I couldn’t stop. I worked tirelessly to cover every edge case. I plugged it into several of my own apps to prove that it imbued the benefits I thought it would: dramatically simpler test code and gobsmackingly-fast runtime performance. And then I walked away from the project without so much as telling anyone about it.

Why the hell would I do that? Let’s chat.

What `test_data` does

If you’ve ever seen the total runtime of a Rails test suite grow over time, and if you’ve ever observed the superlinear rate of that duration’s growth as each additional feature increases the cost of every existing test’s setup, then you’ll understand what test_data is solving for the minute you look at it. (And if you’re not familiar with this phenomenon, a younger, faster-talking version of me gave a sprightly description of it in 2015, 36 minutes into my talk “How to Stop Hating Your Tests.”) Despite the fact that slow test suites are one of the top two or three systemic problems plaguing large Rails apps, hardly anyone has innovated in this area since the mid-2000s. If you follow conventional approaches to testing, once your suite places a bottleneck on your organization’s productivity, your only hope is to be successful enough that you can afford to pay untold sums to a cloud provider to parallelize the shit out of your CI build in an effort to keep it under 30 minutes.

That’s why we’re long past due for a novel, unconventional approach.

The test_data gem works by facilitating the creation and maintenance of a single version-controlled database of test data that grows up alongside your app:

It all starts with the introduction of a fourth Rails environment in addition to development, test, and production. You start up the app in the test_data environment and prepare the system just like you might for a staging or user acceptance testing environment: run seed scripts, create a user to cover each role, and generate just enough data to exercise all the features of the application
Once you’ve generated the data your tests need, run rake test_data:dump to save that entire database’s schema and data to SQL files to be committed along with your code
Each test that requires your test data starts by calling TestData.uses_test_data in a setup hook, which will load that database at most once and then—through the power and speed of transaction savepoints—ensure each individual test operates in a clean environment with the entirety of your test database in place
For existing tests that depend on factories and fixtures, compatibility can be retained by way of additional transactional save points, accessible via TestData.uses_clean_slate and TestData.uses_rails_fixtures(self), respectively
The cherry on top is that because the test_data database needs to be migrated over the life of the application, you’ll catch migration problems that would otherwise be missed, especially if you’re in the habit of resetting your development and test databases frequently

There’s a lot going on in the above description, but hopefully the README breaks down everything well enough if you’re interested in digging into how it all works.

And I realize I’m burying the lede here, but it’s not an exaggeration to say that very large test suites that use test_data could reasonably be expected to be 100 to 1000 times faster than comparable suite written with factory_bot. Of course, I’m free to pick whatever numbers I want out of thin air, because as I stated earlier, nobody’s actually used test_data so there’s no one out there to dispute my claims. I’d be thrilled if you tried it out and wrote some benchmarks just to prove me wrong, though!

While this approach represents a radical departure from how most developers treat test data, it might gin up feelings of nostalgia to anyone who’s ever worked in QA, as this is strikingly similar to how many QA departments were testing web application 20 years ago. Along the same lines, nothing about this gem is Ruby or Rails specific—consider it a reference implementation for a pattern that every web application framework should consider adopting.

Anyway, that’s the thing and what it does.

I have a second announcement to make

But here we are, five minutes into your having even heard of this potentially revolutionary test_data gem and I have another piece of news to share: I’m resigning as lead maintainer of the project, effective immediately.

I haven’t given up on test_data—I am as convinced as ever it’s a powerful idea that could help teams build much faster, much more comprehensible test suites. It seems to work well already, but it needs the hardening that can only be achieved by a brave handful of users willing to adopt it and use it in anger. There’s not a doubt in my mind that with more edge cases covered and more compelling documentation, this could be a game-changer for the next generation of web apps.

But if any of that comes to pass, it won’t be me who takes it across the finish line.

I’ve created hundreds of repositories over the years. Mostly to scratch an itch that I had for a brief moment in time. And most of that work is obscure—the median number of users of my open source projects is probably zero. But at some point a few years ago, my projects cumulatively crossed an invisible threshold, beyond which I could easily fill a 40-hour workweek doing nothing but responding to issues and pull requests on projects I don’t even use anymore.

I don’t think I’m unusual in feeling a bit of guilt and shame when someone asks for help and I tell them no, or I ignore them, or I agree to help them only to realize doing so will cause me to fall short of a commitment to somebody else. That latent stress is why I never developed a rational, consistent approach to responding to users’ feature requests on GitHub. If it’s a light day and the thing somebody’s asking seems easily explicably or achievable, I’ll try to do right by them and respond (even if it ends up consuming my entire morning). If I’m busy and feeling overwhelmed, I might archive the email after skimming it—the software is provided “as-is,” after all. Even half-assing it like this, it feels like I have to hustle just to stay in place.

The upshot of all this is that when I try to tackle something new and devote my complete attention to it, I can’t. Lingering responsibilities across the web of (for lack of a better descriptor) executable content I have produced over the years inevitably interrupt my focus and represent a persistent drag on my productivity.

I want to be clear: this isn’t burnout. This is clear-eyed recognition of a predicament I created for myself.

I don’t pretend to have the solution to the seemingly-intractable problem of making long-term open source maintenance sustainable. I don’t need your $5 GitHub sponsorship. I don’t want a big company to hire me to do open source drudgery full-time. I’m not interested in cultivating a community of contributors if it means doing the emotional labor of moderating their communication and mediating their conflicts. I just want to be able to build cool stuff, share it to make a broader point, and then never think about it again.

So, getting back to the question at hand: if this library is so great, why didn’t I tell anybody about it? It’s because deep down, my open source experience told me this was such an ambitious project and such a radical approach that the worst thing that could happen for me personally would be for test_data to take off and be successful. I wasn’t prepared to handle the amount of technical and social work that would result. And I’m still not.

But at the same time, if I can be honest with myself about this, I may as well be honest with you: this thing has great potential, but I can’t be the one to carry it forward. Having admitted that, there’s no longer any reason to hide its existence from you.

The call to action part

I realize I’ve just done a great job making the life of an open source maintainer sound really enticing, but I’ll make the pitch anyway: somebody, anybody, please take this gem and run with it.

In my travels, I’ve met countless brilliant developers who share a similar outlook on software and subscribe to many of the same principles that I do. One common difference, however, is that they’ll express an interest in contributing to open source, if only for one problem: they don’t have an idea of something worth making. And if they have two problems, the second usually goes, “I don’t know how to get started.” Well, if you sign up to take test_data off my hands, both of those problems are already solved: the idea is clear and compelling, and an initial working version has already been released.

So here’s what I’m offering: if you start using test_data in your Rails app and become interested in maintaining and promoting the gem moving forward, I will work with you to make it happen. Just e-mail me. And if more than one person raises their hand, we’ll figure out the best path forward, together.

And if nobody takes the leap and test_data fades from memory (despite its tremendous potential to reduce human suffering), that’s perfectly fine too. I’ll have moved onto the next thing.

Justin Searls

Status

Double Agent

Code Name

Agent 002

Location

Orlando, FL

Twitter Mastodon Github LinkedIn Website

Enumerate your enums

Rails defaults can lead you astray when creating enum attributes. This guide will show you an easier and less error-prone way that leverages custom enum types in Postgres.

Publish Date: June 3, 2019
Authors: Justin Searls
Categories: Ruby

The Selfish Programmer

Publish Date: May 8, 2019
Authors: Justin Searls
Categories: Development

Handling Heroku's new "heroku_ext" schema for Postgres extensions

Heroku recently made a change that requires all Postgres extensions to be created inside a special "heroku_ext" schema. Here's how to migrate your app.

Publish Date: August 15, 2022
Authors: Justin Searls
Categories: DevOps

Looking for developers? Work with people who care about what you care about.

We level up teams striving to ship great code.

Let's talk