It’s 2021, and I’m sure we all agree that running automated tests is a crucial aspect of delivering excellent software. Continuous Integration (CI) systems allow us to run tests before changes make it to the main code branch. This is to ensure we don’t introduce any mistakes or regressions. These days, teams use CI not only for tests, but for a plethora of other checks as well: formatters, linters, type checkers — you name it. CI is required to ship high-quality code.
But something always feels off about it. I’ve never heard anyone saying they love their CI solution. Software engineers can spend hours glorifying their new favorite programming language, but when it comes to CI, it’s either silence or angry grumbling from above the keyboard (or in a Slack channel). I believe the reason is that CI is always slow.
Slow CI curbs your team’s productivity and speed of delivery. While fast CI isn’t a magic wand that will make your team 10 times more effective, the ripple effect created by slow CI jeopardizes all activities that engineers do to ensure code quality.
Slow CI encourages large changes. If it takes an hour or more to run automated checks on your codebase, then it’s no wonder that all the changes end up in huge feature branches. Engineers who work on those changes make an (often unconscious) choice: “If every set of changes I make needs to go through the same sluggish process, then in the end, the fewer changesets I create, the shorter the wait time.” But this thinking doesn’t consider that another team member will need to review those changes. It’s much harder to comprehend a large change than a small one, which makes it more difficult to achieve two of the main goals of code reviews: sharing knowledge about the codebase, and ensuring quality.
Slow CI discourages continuous improvement. Have you ever spotted a typo in a variable name, noticed that a module could be renamed to better express its intent, or found out that a new version of a library you use would allow you to delete some code? Great! You made the change, but now you need to wait at least an hour to know if you didn’t break anything. Of course, you can do something else in the meantime, but then you’ll be checking if the tests have passed at least a few times while doing the other thing. Slow CI makes for long feedback loops, and long feedback loops discourage change. And if you don’t make small improvements on a regular basis, then you’ll need to do much larger refactors once in a while — which takes us back to the previous argument.
So far, you know that slow CI is bad for your team’s productivity and code quality. But how slow exactly is slow? There’s no definitive answer to this question, so instead of giving you a number, I encourage you to look for the following symptoms:
Large feature branches. I already talked about this to explain why slow CI is bad, but this is also one of the major symptoms of it. I won’t argue that all large changesets are bad — after all, some changes can’t be broken down into smaller pieces. But if you notice that most of the changes in the system you maintain are brewing in silos over long periods of time, it means your CI duration might be below productivity threshold.
Engineers not running tests and other checks locally. If checks on your CI take a lot of time, they most likely take a lot of time when run on your laptop too. This symptom is simple in principle, but it can be a hard one to spot — your team needs to be really honest about it. A good indicator that you’ve run into this problem is when you notice that you only run tests that directly exercise the part that you’ve changed, and you never run the whole test suite.
It’s important that you look at the above signals, and not the build duration, when assessing if your CI is fast enough. That said, having a specific number in mind is important when you decide to make it faster.
When it comes to finally speeding up your CI, tech isn’t the biggest obstacle. The hardest part is to pursue that goal as a team and commit to it.
To address this problem, teams need to treat CI slowness with the same severity as user-facing bugs. Hopefully, during your development cycle, you set aside some time for fixing bugs. You do it because bugs come with a cost: They make systems unstable, they make users unhappy, and they’re a sign of poor quality. Slow CI also comes with a cost, and wasted engineers’ time and accrued technical debt can often exceed the cost of some critical bugs. Dedicate some time you spend on bug fixing and refactoring to CI maintenance.
The next step is to commit to a goal. For example, you can say that you want your build times to not exceed 15 minutes. It’s important that this goal is ambitious but also feasible. The exact steps you take to speed up your CI pipeline will be specific to your tech stack (stay tuned for a blog post about how we do it for Elixir projects).
Last, it’s critical that everyone on the team has ownership of the CI system. I’m not suggesting that engineers should be responsible for everything down to the hardware level, but it’s important that they have control of exactly what steps are executed during the build. Common understanding of the CI will make sure that no one engineer leaving the team will remove its ability to keep the CI speed at the agreed-upon level.
As engineers, we can often spend hours finetuning our code editors, IDEs, and other tools that run on our computers. At the same time, we neglect the tool that’s shared by everyone on the team: the CI pipeline. Your CI pipeline bundles the tools that everyone on your team uses, and so it is the tool for ensuring code quality in your team. Show it some love.
I hope this article showed you that it’s important to make your CI fast. Stay tuned for a second part where we show you how we make the CI for our Elixir projects at PSPDFKit fast.
If you’re interested in how we manage our CI for macOS, take a look at our post on continuous integration for small iOS/macOS teams.