The TLDR on Ruby's new TLDR testing framework

jk this is totally the "too long; read anyway" post

Publish Date: October 3, 2023
Authors: Justin Searls

My buddy Aaron “tenderlove” Patterson and I paired on his livestream recently on a brand-spankin’ new testing framework for Ruby. I spent the next few days and, yes, nights sporting my mad scientist lab coat and experimenting to see if I could make a test framework real enough for real people to use in real life, and I think the answer is… yes, maybe!

It’s called TLDR and it blows up if your tests take more than 1.8 seconds to run. ⏰💣

The big idea is to run as much of your test suite as you can as often as you can, as opposed to only running whatever test you’re actively working on and saving the full test suite for CI.

To wit:

# LESS of this:
$ tldr test/thing/i_am_working_on_test.rb:13
# MORE of this:
$ tldr

TLDR automatically prepends the most-recently modified test file to the beginning of the suite, so running the tldr command without arguments still ensures the test you’re actively working on will run, even if you’re over the 1.8s time limit.

It has some other quality-of-life features, too:

A no-nonsense CLI that runs your tests in random order and in parallel by default (modern machines have so many CPU cores, it really cooks! 🔥)
Command-line options to filter tests by one or more line numbers (foo_test.rb:42:99), or names (--name test_42,test_99), or patterns (--name "/test_\d+/")
A Minitest-like API with the same basic class structure (test_*, setup, and teardown methods) and the same assertions (assert_ and refute_). You should be able to try out TLDR as a drop-in replacement for most existing suites
Very nice, full color diff outputs when assert_equal assertions fail, thanks to the super_diff gem
A --fail-fast mode, for when you’re not expecting a failure and 1.8 seconds is too long to wait
A pluggable reporter system that can be swapped out with --reporter YourClassName (you can extend TLDR::Reporters::Base for this)
Lots of emoji, which the JavaScript community has taught us is the key to mainstream adoption of open source tools

Anyway, it’d be awesome if you ran gem install tldr and gave it a try. It was a lot of fun to build. And if you’ve ever been curious about how to write a test runner, you might enjoy perusing the codebase as well.

Wait a second, what did you say about 1.8 seconds?!

Aaron and I have advocated for better software testing practices for more than a decade. In the early days, it was a victory to convince programmers to write tests at all. Later, the challenge shifted to writing tests that reliably verified the thing being tested actually worked. Now, we’ve moved onto an entirely new problem: people write so many tests they don’t have time to run them.

Just me, or does nobody run their full test suite on their own computer anymore? If they do, it probably isn’t often. And how many people who diligently run their tests locally secretly enjoy the excuse to go make a sandwich as they drop that one xkcd about compiling into Slack?

tl;dr, Running some of your tests frequently is better than running all of your tests infrequently

So, what do people do instead of running their tests? They push their code to a CI server to run the tests for them. Instead of shipping, they’re waiting. Or, let’s be real, they’re immediately moving on to the next thing in the hope the build will eventually turn green. But what if the build fails? Now they have to switch context back to whatever they did to break it, reproduce the failure locally (note: this may take all day), and then fix the issue. Neat.

Look, I’m not here to convince you that “fast tests are good”. I’m not even here to convince you that fast feedback loops are vital to programmer focus, productivity, and happiness. You seem smart; you probably know that already. Besides, nobody wakes up and decides, “I’m going to design a sprawling suite of leisurely-paced tests that will require an exorbitantly-priced parallel CI service to run in under an hour, and then I’m going to organize my life around the fear that I won’t notice an email or Slack message about an inscrutable build failure until hours after pushing my code.” Almost nobody wakes up and thinks that.

As a consultant with few non-testing-related skills, I have talked to quite a few teams about testing practices over the years, and there’s one thing I can tell you: the people trying the hardest to do this stuff right always end up in the worst situation. You know who isn’t waiting for CI to finish only for more commits to land so they can wait for CI to finish? People who don’t write so many damn tests.

No, the real reason this keeps happening, especially to us testing enthusiasts, is that our testing tools lack any limiting factors to help teams prevent their build duration from spiraling out of control.

Ask developers bogged down by very slow test suites what they want from their tests, and they might feel too defeated to even imagine a better way. But, because Aaron and I have escaped the responsibility of working on real systems for most of our careers, I can at least tell you what the two of us want.

We want tests that are so fast we look forward to running them all the time. Run them before each commit. Run them with a keyboard shortcut. Run them after each file save. We want to see our tests fail so quickly that fixing them doesn’t feel like work. We want our editor to still have the right file open so we can hit undo a few times and get back to passing.

What don’t we want? We don’t want to wait a few extra seconds, minutes, or hours to find out our tests are failing. We don’t want to figure out what went wrong after a bunch of other changes have been piled on top of a bug. We don’t want a simple fix to require hours of forensic analysis because no one was running the tests. And we really don’t want to waste our time trying to figure out what we were thinking at 4pm last Tuesday—we can’t even remember what we had for dinner that night.

Each time a developer crosses a line from, “my tests are so fast I run them constantly,” to “I run them before each git push,” to “I occasionally verify they can still be run on a MacBook,” to “scratch that, they can no longer be run on a MacBook,” there’s absolutely nothing incentivizing them to go back and reclaim their lost productivity. Everything is pulling in the opposite direction. Every new feature needs new tests. Old tests are only revisited when changing old functionality (and we only revisit old functionality to add features to it, resulting in more tests). And once committed to a repository—even if poorly designed, unnecessarily slow, dependent on countless network requests, and with no one being sure what it was supposed to verify—once included in the build, literally nobody ever feels the freedom to delete a bad test.

(Fun fact: the highest bill rate I ever charged a client was as an advisor to a company who just needed someone to say out loud that they had permission to delete their worthless tests. We are happy to provide this service to anyone who might have the need.)

So TLDR humbly imposes a limiting factor to discourage slow test suites: it strictly enforces a maximum time budget of 1.8 seconds for your tests. And, taking a cue from our Standard Ruby gem, you can’t simply configure a longer timeout for yourself.

This is a radical design, so let’s briefly zoom out and compare the fundamental difference between tldr and its leading competition.

When you use RSpec like this:

$ rspec

RSpec’s guarantee is that it will run all of your tests, no matter how long it takes. If you end up writing a lot of tests, the trade-off is that you’ll eventually stop running the rspec command at all. And at that point, it doesn’t matter that rspec runs your full suite—zero of your tests are being run.

When you use tldr like this:

$ tldr

TLDR’s guarantee is that it will run in under 1.8 seconds, no matter how many tests you have. If you end up writing a lot of tests, the trade-off is that you’ll only exercise a subset of your tests when you run tldr. But because the command never gets slower, you can keep running tldr throughout the day, ensuring all your tests run frequently in the course of your work.

Do you deal with a lot of flaky tests, where certain tests fail sporadically or in a specific build order? You’ll catch a lot of those flakes sooner if you’re running a big chunk of your full test suite locally with TLDR many times a day than if you wait for CI to run the full suite much less frequently.

tl;dr, Running some of your tests frequently is better than running all of your tests infrequently.

You can still write all the tests you want. And if time expires, you’ll get a report telling you how many tests ran, which ones were cancelled in progress, a top 10 list of your slowest ones, and a snazzily-reconstructed CLI command you can copy-paste to execute only the tests that didn’t run successfully.

With TLDR, you’ll have an incentive to optimize your code, speed up your tests, combine multiple slow tests into one, and start deleting your low-value tests. Why? Because once your suite crosses the 1.8 second threshold, your test runner will be staring you in the face telling you that your tests didn’t fail, you did.

Time is our most valuable resource. If our tests are so slow they derail our ability to do our best work, are we really succeeding when all the tests pass? It’s time we started hesitating before answering “yes” to that question.

TLDR’s time limit is designed as a liberating constraint, supplanting the primacy of infrequently-run CI builds over your working life and freeing you to stay focused and productive.‌ We hope you’ll give it a try. 💚