How Amazon and Google view CI/CD in an entirely different way

To Pre- or to Post-, that is the Question

Carlos Arguelles
9 min readJul 9, 2024

If Medium puts this content behind a paywall, you can also view it here (LinkedIn).

I was the Technical Lead for company-wide Infrastructure for Integration Testing at both Amazon and Google for many years, and the way these two companies think about CI/CD is vastly different.

I spent over 11 years at Amazon (2009–2020). I was a Principal Engineer in the Developer Tools organization, which owned all the CI/CD infrastructure and tooling for the company. We operated all the software that tens of thousands of amazonians used every day to write code, review code, build code, test code, and deploy code.

In 2020, I was curious to try something entirely different, and I joined Google, where I spent 4 years as a Tech Lead of Infrastructure for Integration Testing, a critical part of Google’s CI/CD tooling. While the domain is similar, I couldn’t think of two more different tech stacks.

I am now back at Amazon, after having gained experience on how an entirely different company, Google, thinks about this. It’s been interesting to view things with a different lens now.

Definitions first

Some definitions first so that we’re all in the same page:

  • “CI/CD” stands for Continuous Integration (CI) and Continuous Delivery (CD). Basically, how does a code change make it from a local developer workspace to being deployed to production.
  • “Pre-Submit” refers to the developer experience before you submit a piece of code. What was specifically interesting to me was what kind of validation can you perform on a piece of code before it is checked in. This could include the kind of testing that you do against changes in your local workspace, or as part of the code review process.
  • “Post-Submit” refers to the developer experience after you submit a piece of code. How do we merge changes, how do we deploy them to prod-realistic test environments and validate them there, and how do we shepherd them to production.
  • “Testing” in this case refers to integration testing, or end-to-end testing, not unit testing. Unit testing is trivial to run anywhere, but integration testing requires the candidate code to be deployed to a System Under Test, and wired to dependencies, which adds an exponential level of complexity to the infrastructure.

The realization that I reached eventually is that Google is great at Pre-Submit and not-so-great at Post-Submit, while Amazon is great at Post-Submit, and not-so-great at Pre-Submit.

Introducing Monorepo and Microrepos

For a while this difference puzzled me, but I think I understand the root cause now. Google uses a monorepo, with most of its 120k engineers sharing a single repository, with no branches. Amazon uses tens of thousands of microrepos (that we call “version sets” — technically not quite microrepos but for the purposes of this discussion you can think of them as microrepos). Roughly each service gets its own microrepo (there’s all kinds of exceptions but we’ll skip those, again for the purposes of this discussion).

My good friend Alex Xu, cofounder of ByteByteGo, has a nice article that goes in depth on the monorepo vs. microrepos philosophy, so I won’t fully rehash it here, but I’ll cover it as it pertains to pre-submit and post-submit testing.

Note: This article is not meant to be a comprehensive “monorepo vs microrepo” discussion. It is scoped to focus on pre- or post- submit testing within the context of CI/CD — which happens to significantly impacted by that.

Google, Pre-Submit

When you have more than a hundred thousand developers using a single repository with no branches, the blast radius of a bad checkin is massive. There is a non-zero probability that your checkin will block thousands or tens of thousands of other engineers. While at Google, I was sometimes blocked by a random checkin from a random person in a random part of the company and for the life of me I couldn’t quite understand why. Google had made significant investments to be able to run end-to-end integration tests from a local dev environment, or a code review, against ephemeral, hermetic test environments. In fact, Google being Google, it wasn’t just one infra, but four competing, duplicative infras that had organically and independently grown over the course of the last twenty years. I was part of the effort to converge them, which I’m certain will continue well past the time I’m dead. This was a significant effort, with more than a hundred engineers working on it. Overall, being able to run proper end-to-end integration tests in pre-submit was pretty magical for my amazon reptilian brain.

Google, Post-Submit

As magical as pre-submit was, the flip side was that the post-submit experience was surprisingly poor. My first checkin I excitedly asked an engineer in my team “When will I see these changes in Prod?” He casually responded, “Oh, Thursday.” I was shocked. To my amazonian reptilian brain, that was blasphemous. Amazon strongly believes every code change should go to prod within hours (more on this later). Surely that is aspirational and code changes spanning multiple services for example have a lot of additional complexity and take much longer, but the vast majority of code changes do get to Prod within hours. The notion of deploying only on specific days, at specific times, felt very old-fashioned to me. The problem that Google faces (with its monorepo) is that a single code change can fan out to hundreds of thousands of deployments. Batching code changes helps, but then a single deployment can include dozens or hundreds of independent changes, which means that if a deployment fails, you now have to identify the culprit. Again Google being Google, we had dozens of culprit finders for this purpose, always duplicative and competing (you’re beginning to see a pattern here…).

Amazon, Pre-Submit

The philosophy of sticking to microrepos gives Amazon a very lovely built-in blast-radius reduction mechanism. A bad checkin can break an individual microrepo, but post-submit testing generally catches it and blocks it before it flows onto other microrepos and breaks them. Your bad checkin can annoy your immediate team members, but it can’t immediately break tens of thousands of strangers (it can eventually though, if your microrepo’s post-submit testing doesn’t catch it).

Having a built-in blast radius reduction mechanism meant it wasn’t business critical for us to invest heavily in pre-submit testing infra like it was for Google. It was a nice-to-have, but not a must-have.

It’s not that we didn’t want to invest in better pre-submit testing infra, but with limited headcount we got more bang-for-our-buck by making sure our post-submit infra was best-of-class (since issues were inevitably going to escape to post-submit at some point regardless of how amazing pre-submit tests were). To put things in perspective, Amazon being Frugal, our investment was roughly a tenth of what Google had spent. So as much as I wanted us to fund and build ephemeral test environments to enable pre-submit testing, I had to be ruthlessly pragmatic about where I advocated my org made its headcount investments.

Of course: Shifting Testing to the Left is a marvelous thing. Catching problems in local development, or code review in the worst case scenario, is much more desirable than catching them post-submit. The later in the software development lifecycle that you catch a bug, the more expensive it is to fix. And in retrospect, had I made the case for investing in ephemeral, hermetic test environments at Amazon a decade ago, it would have more than paid for the investment in saved developer toil. Hindsight is 20/20.

Amazon, Post-Submit

As for the post-submit experience, this is where Amazon excels. A microrepo is a much more contained environment than a monorepo, so we can guarantee each and every code change goes through proper testing and makes it to production within hours, for a given service. This gets more complicated when you have changes that affect multiple services and require synchronization of multiple microrepos (monorepo is so much nicer for this!) but in actuality the vast majority of code changes are pretty local to a single service. The Amazon philosophy of microrepos works extremely well for independent microservices, and Amazon bought heavily into this dating back to the 2002 Bezos manifesto on APIs, and had built the entire concept of 2-pizza teams around this. When you point your browser at http://amazon.com, it’s not just a monolithic app in a single repo. Your action results in calls to literally hundreds or thousands of services that reside in just as many microrepos, each one fetching independent bits of data so that you can place your order on Amazon.

Final thoughts

The debate of monorepo vs. microrepos is about as fruitful as “cats vs. dogs, which makes a better pet?” or “is a hot dog is a sandwich?”

I’ve worked in both environments. Here’s what I noticed: if you cut your teeth on microrepos, you tend to dislike monorepos, and vice versa. Engineers from Meta, or Google, who come to Amazon, find microrepos awkward and wish they could convince us to switch into a monorepo. I, on the other hand, had a nearly visceral reaction to the monorepo while I was at Google. It’s a very personal preference. I don’t think I could make a compelling, data-driven argument that one is objectively better than the other.

Ultimately, there’s always complexity in CI/CD for large systems. Google, and the companies that bought into monorepos, have chosen to deal with the complexity in one location. Amazon, and the companies that bought into microrepos, have chosen to deal with entirely different complexity in another location. It’s roughly the same amount of complexity, just different places.

In order for the monorepo to work, not only did Google have to invest in ephemeral test environments to enable pre-submit testing, they also had to invest in smart test selection and deflaking. How do you decide what tests to run for a given code change? Build dependencies are modeled in Blaze (Google’s build system), so in theory you can just compute the build graph and execute all the tests of everybody who has taken a dependency on the package you’re changing. But when hundreds of thousands of engineers have worked on the same repo for decades, those dependency graphs can be enormous and you simply can’t be constantly running millions of tests “just in case.” So Google had built a ton of infra around smart test selection (to trim down the number of tests to execute) and Google being Google there were dozens of duplicative, competing efforts for reducing test flakiness. That mattered because test flakiness grows exponentially. If you have N tests in your test suite and each one has P% chance of succeeding, the likelihood that your entire test suite will succeed is P^N. Even something as simple as 3 tests with an individual success rate of 99% means that combined they only have a success rate of only 97% (99%³). So reducing the amount of tests to run, and reducing their flakiness, was paramount to be able to operate a monorepo at Google scale and keep it healthy. Overall, those infra investments were massive, say 300 engineers at a cost of $400k/yr each (just using publicly available numbers from levels.fyi here, avg SWE L4 makes $278k/yr but keep in mind a SWE costs the company a lot more than what they make) is easily above 100 million dollars per year. Making a monorepo “work” is expensive.

In order for the microrepos to work, Amazon had to invest in infrastructure to manage how code changes safely and timely flow from microrepo to microrepo, which too required a sizable team. In the monorepo world, a change is immediately available to all; in the microrepo world, if you vend a library or a shared component, you need to push or pull between microrepos. If you’re curious my friend and fellow Amazon Senior Principal Engineer Clare Liguori has written extensively about Amazon’s CI/CD philosophy.

My article focuses on the Test aspect of CI/CD, but I should call out that a monorepo does make things like security patches and cross company initiatives much simpler. And dependency version resolution can be a challenge with microrepos: what if your microrepo is using FooBar-1.2, and one of your dependencies is using FooBar-1.3? But for brevity I wanted to scope my article on the pre and post submit integration test story for both companies.

As for Pre-Submit or Post-Submit: ultimately, this doesn’t need to be an OR, it should be an AND. We should have great pre-submit infra, and great post-submit infra. It just takes deliberate investment but when you have tens of thousands of devs, investing in reducing developer toil is massively leveraged so it pays off.

--

--

Carlos Arguelles
Carlos Arguelles

Written by Carlos Arguelles

Hi! I'm a Senior Principal Engineer (L8) at Amazon. In the last 26 years, I've worked at Google and Microsoft as well.

Responses (13)