Maintenance Matters: Good Tests
Automated testing is a (perhaps THE) critical component of sustainable software development. Here's what I consider good testing.
This article is part of a series focusing on how developers can center and streamline software maintenance. The other articles in the Maintenance Matters series are: Continuous Integration, Code Coverage, Documentation, Default Formatting, Building Helpful Logs, Timely Upgrades, Code Reviews, and Monitoring.
In this latest entry to our Maintenance Matters series, I want to talk about automated testing. Annie said it well in her intro post:
There is a lot to say about testing, but from a maintainer’s perspective, let’s define good tests as tests that prevent regressions. Unit tests should have clear expectations and fail when behavior changes, so a developer can either update the expectations or fix their code. Feature tests should pass when features work and break when features break.
This is a topic better suited to a book than a blog post (and indeed there are many), but I do think there are a few high-level concepts that are important to internalize in order to build robust, long-lasting software.
My first exposure to automated testing was with Ruby on Rails. Since then, I’ve written production software in many different languages, but nothing matches the Rails testing story. Tom MacWright said it well in “A year of Rails”:
Testing fully-server-rendered applications, on the other hand, is amazing. A vanilla testing setup with Rails & RSpec can give you fast, stable, concise, and actually-useful test coverage. You can actually assert for behavior and navigate through an application like a user would. These tests are solving a simpler problem - making requests and parsing responses, without the need for a full browser or headless browser, without multiple kinds of state to track.
Partly, I think Rails testing is so good because it’s baked into the framework: run rails generate to create a new model or controller and the relevant test files are generated automatically. This helped establish a community focus on testing, which led to a robust third-party ecosystem around it. Additionally, Ruby is such a flexible language that automated testing is really the only viable way to ensure things are working as expected.
This post isn’t about Rails testing specifically, but I wanted to be clear on my perspective before we really dive in. And with that out of the way, here’s what we’ll cover:
The single most important reason to make automated testing part of your development process is that it gives you confidence to make changes. This gets more and more important over time. With a reliable test suite in place, you can refactor code, change functionality, and make upgrades with reasonable certainty that you haven’t broken anything. Without good tests … good luck.
- helps during the development process (testable code is correlated with well-factored code, and it’s a good way to review your work before you ship it off);
- provides a guide to code reviewers; and
- serves as a kind of documentation (though not a particularly concise one, and not as a replacement for proper written docs).
Types of Tests
I write two main kinds of tests, which I call unit tests and integration tests, though my definitions differ slightly from the original meanings.
- Unit tests call application code directly – instantiate an object, call a method on it, make assertions about the result. I don’t particularly care what the object under test does in the course of doing its work – calling off to other objects, performing I/O, etc. (this is where I differ from the official definition).
- Integration tests test the entire system end-to-end, using a framework like Capybara or Playwright. We sometimes refer to these as “feature” tests in our codebases.
End-to-end, black-box integration tests are absolutely critical and can cover most of your application’s functionality by themselves. But it often makes sense to wrap complex logic in a module, test that directly (this is where test-driven development can come into play), and then write a simple integration test to ensure that the module is getting called correctly. I avoid mocking and stubbing if at all possible – again, “tests should pass when features work and break when features break” – and really only reach for it when it’s the only option to hit 100% code coverage. In all cases, each test case should run against an empty database to avoid ordering issues.
One important exception to the “avoid mocking” rule is third-party APIs: your test suite should be entirely self-contained and shouldn’t call out to outside services. We use webmock in our Ruby apps to block access to the wider web entirely. Some providers offer mock services that provide API-conformant responses you can test against (e.g., stripe-mock). If that’s not an option, you can use something like VCR, which stores network responses as files and returns cached values on subsequent calls. Beware, though: VCR works impressively in small doses, but you can lose a lot of time re-recording “cassettes” over time.
Rather than leaning on VCR, I’ve instead adopted the following approach:
- Wrap the API integration into a standalone object/module
- Create a second stub module with the same interface for use in tests
- Create a JSON Schema that defines the acceptable API responses
- Use that schema to validate what comes back from your API modules (both the real one and the stub)
If ever the responses coming from the real API fail to match the schema, that indicates that your app and your tests have fallen out of sync, and you need to update both.
Flaky tests (tests that fail intermittently, or only fail under certain conditions) are bad. They eat up a lot of development time, especially as build times increase. It’s important to stay on top of them and squash them as they arise. A single test that fails one time in five maybe doesn’t seem so bad, and it’s easier to rerun the build than spend time tracking it down. But five tests like that mean the build is failing two-thirds of the time.
Some frameworks have libraries that will retry a failing test a set number of times before giving up (e.g., rspec-retry, pytest-rerunfailures). These can be helpful, but they’re a bandage, not a cure.
The speed of your test suite is a much lower priority than the performance of your application. All else being equal, faster is better, but a slow test suite that fully exercises your application is vastly preferable to a fast one that doesn’t. Time spent performance-tuning your tests can generally be better spent on other things. That said, it is worth periodically looking for low-hanging speed-ups – if parallelizing your test runs cuts the build time in half, that’s worth a few hours’ time investment.
During local development, I’ll often run a subset of tests, either by invoking a test file or specific test case directly, or by using a wildcard pattern1 to run all the relevant tests. Combining that with running the full suite in CI provides a good balance of flow and rigor. At some point, if your test suite is getting so slow that it’s meaningfully impacting your team’s work, it’s probably a sign that your app has gotten too large and needs to be broken up into multiple discrete services.
App Code vs. Test Code
Tests are code, but they’re not application code, and the way you approach them should be slightly different. Some (or even a lot of) repetition is OK; don’t be too quick to refactor. Ideally, someone can get a sense of what a test is doing by looking at a single screen of code, as opposed to jumping around between early setup, shared examples, complex factories with side-effects, etc.
I think of a test case sort of like a page in a book. I don’t expect to be able to open any random page in any random book and immediately grasp the material, but assuming I’m otherwise familiar with the book’s content, I should be able to look at a single page and have a pretty good sense of what’s going on. A book that frequently required me to jump to multiple other pages to understand a concept would not be a very good book, and a test that spreads its setup across multiple other files is not a very good test.
Automated testing is a (perhaps the) critical component of sustainable software development. It’s not a replacement for human testing, but with a reliable automated test suite in place, your testers can focus on what’s changed and not worry about regressions in other parts of the system. It really doesn’t add much time to the development process (provided you know what you’re doing), and any increase in velocity you gain by forgoing testing is quickly erased by time spent fixing bugs.
The next article in this series is Maintenance Matters: Monitoring.