A Tale of Elephants

How I grew a tool to impact thousands of engineers at Amazon

Carlos Arguelles
Nerd For Tech

--

A lot of what I do as a Senior Staff Engineer at Google today, and a lot of what I did as a Principal Engineer at Amazon for many years before, was essentially getting people to do things. It’s human problems, not technical problems, the ones you have to work hardest on, and the ones that have the most impact. It was critical for me to develop tools to influence effectively, because I chose a non-managerial leadership path, where I lead without authority.

Earlier in my career, I exclusively read technical books. These days, I split my reading time between [a] technical books, [b] business books and [c] books about human behavior. In that third category, I recently read the book Switch: How to Change Things When Change Is Hard by Chip Heath (a professor at Stanford) and his brother Dan Heath (a professor at Duke’s University). It is an amazing read. Switch puts forth a thought framework for changing behavior. We tend to think of two tools as our disposal: the carrot, and the stick. Entice somebody to do something, or punish them if they don’t! The Switch framework is much more sophisticated than that. What was most astonishing to me was to realize that, without ever having heard of the book, I had been following their formula, organically and mostly by intuition. And it had worked well for me.

I created, and grew, what became the internal load and performance testing platform at Amazon (TPSGenerator). I started in 2013, and by the time I left in 2020, it was used by tens of thousands of people in business critical applications, running at hundreds of millions of transactions per second and maintained by an actual team of amazing engineers. It had prevented hundreds of operational issues, and saved millions of dollars.

I wanted to spend my time today telling you about the Switch framework, and to give you a concrete case study: how I (unknowingly) applied these concepts to create and grow a thing from a weekend-and-nights pet project, to what it is today.

My story starts in 2012. On Black Friday, the busiest shopping day of the year, the service I was responsible for experienced operational issues due to load, and it affected the entire operation of amazon.com. It was a multimillion dollar outage. And it was on me. I was mortified, so I set out to redeem myself. I spent months understanding how increasing load changed distributed systems at high scale, and creating code and infrastructure for simulating high fidelity load for testing purposes. I found, and prevented, serious operational issues for Black Friday of 2013, so I sort of redeemed myself. But by then, I had reached an epiphany. All engineers had the intention of load testing their service. Yet most didn’t, because they didn’t have time — it just took too long to create the infrastructure to conduct the tests. OK, so people weren’t doing it because it took 3 weeks. What if it took 3 days? What if it took 3 hours? At some point, I thought, I would find an inflection point when a task that was important, but infrequently done due to cost, could become just another thing that happened every day with no effort.

So I created that thing. Frantically coding away, tens of thousands of lines of Java, mostly on weekends and nights. I did it for fun, but I felt I had created something special, and I wanted others to love it too. A few early adopters understood this and saw the vision, and a few even joined me in building it. We eventually opened it up for Amazon-wide consumption. At first, adoption was modest. One random engineer here and there. I proudly kept track and I showed it to my VP one day. I think I had about 25 customers by then. Llew frowned, unimpressed, and told me “Your adoption graph is linear. This is shit. Come back when it’s exponential!” He was blunt, but he was an awesome dude and he was right. I needed to think bigger and take bolder steps.

Here’s where Switch comes in. I was essentially facing a non-technical challenge: I needed people to change, and change was hard. Switch says: think of a Rider, guiding an Elephant, on a Path. You essentially have three dimensions you can influence. You can [a] Direct the Rider, you can [b] Motivate the Elephant, or you can [c] Shape the Path. The Rider is the logical part of our brain: it is rational, it deliberates and analyzes. The Elephant is our emotions: it behaves based on instinct, pain, or pleasure. Who is the leader? Well, the Rider is 150 pounds, the Elephant is 10,000 pounds. So while the Rider provides the direction, the elephant provides the energy. They have to somewhat work together.

Direct the Rider

Let’s talk about that first dimension. There are three tools for Directing the Rider: [a] Find the Bright Spots, [b] Script the Critical Moves, and [c] Point to the Destination.

To understand Find the Bright Spots, think of a classroom with a bunch of poorly behaved children. A teacher can choose to punish the poorly behaved kids, but it’s exhausting and often does not actually change anybody’s behavior. Alternatively, the teacher can choose to focus on the well-behaved students, and highlight their accomplishments. Finding the Bright Spots shifts you from focusing on “What’s broken and how can I fix it?” to “What’s working and how can I do more of it?” In my case, I relentlessly talked about my load framework to anybody who would listen to me. Most people would smile and nod politely, but not take any action to adopt it. It was a bit demoralizing. But some would. And a smaller fraction of those would actually love it and accomplish some awesome things by using it. Like, realize their service was way over-scaled and reduce the resources dedicated to it (thus saving money), or realizing their service was under-scaled, and fixing that (thus saving operational pain). I started compiling these Success Stories, and eventually I ended up with twenty pages of happy customers saying “I used your thing, it was awesome and here’s my story!” I featured a link to that from the main documentation page of my product so that others could see. Also, it was a nice artifact for my promotion!

To understand Script the Critical Moves, think about two booths in a market. One has 6 flavors of jam. The other booth has 18 flavors of jam — those 6 and twelve additional ones. Which booth sells more jams? Surprisingly, the booth with 6 flavors sells 10x the amount of jam as the booth with 18. How is that possible? Well, when the road is uncertain, the Rider has analysis paralysis, and the Elephant takes the most familiar path (which often is status quo: don’t do anything!). In my case, I realized that if you wanted to load test your service, you had dozens of choices, internally and externally, with varying degrees of quality, support, and feature set, so it was terribly confusing to the average entry-level engineer who just wanted to accomplish the task as soon as possible and move on with their busy lives. I grabbed coffee with some of my internal “competitors” and sweet-talked them into collaborating with me and joining forces, creating win:win situations and simplifying the landscape. Not everybody agreed in joining forces. A particularly stubborn “competitor” showed up to an internal tools fair with a giant comparison table with 2 columns: his product vs. mine. He was providing a fair bit of misinformation (perhaps willfully ignorant). I created a more comprehensive, objective comparison table, got it signed off by dozens of Principals around me, and emailed it to the “competitor” for approval, essentially saying “I think a table is a great idea because it helps engineers choose, let’s provide the right data to our engineers!” When they ignored me after 3 emails, I escalated to their Director for approval (he did). I didn’t remove that competitor, but I scripted the critical moves so that the landscape wasn’t so confusing for potential customers. Both the entry page for my tool and the entry page for their tool had a link to the exact same agreed upon comparison table.

To understand Point to the Destination, think about JFK, who in 1962 put forth the vision of “by the end of the decade, we will put a man on the moon.” It was a bold vision, but it did inspire a generation of NASA engineers and in 1969 we landed on the moon. Closer to my own life, Bill Gates said, “A computer on every desk and in every home” in 1980, and it inspired generations of Microsoft engineers including myself in the nineties. Pointing to the Destination does two things, [a] it shows the Rider where you’re headed, and [b] it shows the Elephant why the journey is worth it. I did not love getting on a plane with two little kids in Seattle and putting up with an airport change in Dallas but I knew I’d be sitting on a tropical beach in Jamaica in a few hours sipping piña coladas. In my specific case at work, I pointed to the postcard destination “No more than 2 days of work to get a load test up and running for your team.” That was appealing to engineers who didn’t want to spend weeks and weeks doing this thing.

Motivate the Elephant

OK, let’s switch to Motivating the Elephant. There are three tools for that: [a] Find the Feeling, [b] Shrink the Change, and [c] Grow your People.

To understand Find the Feeling, think about every anti-smoking campaign you’ve ever seen. Smokers know perfectly well that smoking is bad for you. You don’t need to explain that to the Rider — you need to scare the Elephant straight. So anti-smoking campaigns go after an emotional response. They show you what your lungs actually look like after a decade of smoking. Or a smoker who had his larynx removed and speaks through an artificial voice box. In my case, I went after the feeling of pride. Engineers take pride in efficiency: it is baked into our brains and into our training. When, as an engineer, you prevented an operational issue due to under-scaling, or were able to reduce the hardware footprint of your over-scaled service, you felt pride. By highlighting those success stories, not only did I Find the Bright Spots, but also I helped would-be customers to Find the Feeling. I think those success stories became inspirational and motivated a generation of Amazonians to adopt my little tool.

To understand Shrink the Change, think about the way you can get your kids to clean their rooms. They’re a mess, the task looks daunting so they never start it. But if you say, “spend ten minutes picking up twenty things!” that’s easy and doable right? The Elephant is easily demoralized by big tasks, as it hates things without immediate payoff. So you motivate people by making them feel they’re closer to the finish line than they might have thought! In my case, I shrank the change in multiple ways. One, I focused (relentlessly) on simplifying customer onboarding. Initially, the process for getting a load test environment up and running was fairly convoluted; then it was about five text fields and a couple of clicks. Also initially, you had to write a bunch of code for the load generator; then I introduced Java annotations that took care of a bunch of the boilerplate; eventually I made my platform fully compatible with TestNG and JUnit, so if you had pre-existing code you could just re-use it. I also Shrank the Change by offering a 2-hours hands-on bootcamp: you came in, listened to me, worked on some hands-on exercises and came out 2 hours later with a working load test. Come on, who can’t spare just two hours of their lives? I scaled the bootcamp up by finding others who could also teach the course; by the time I left Amazon we had given the course to 1400 people in 15 countries.

So you’ve got People, and a Change that you want them to undertake. If you can’t Shrink the Change, you can Grow Your People. JFK did this brilliantly: “Ask not what your country can do for you, ask what you can do for your country” was the inspiration that our country needed. Humans are pack animals. When faced with a situation, we ask ourselves 3 questions: [a] Who am I? [b] What kind of a situation is this? And most importantly [c] What would someone like me do in a situation like this? So I needed to appeal to my would-be customers to be the kind of person that would load test their product. This was easy. My message, that I repeated over and over again, was: You’re an Amazon engineer. Amazon hires some of the smartest engineers in the world to work on some of the hardest scaling problems in the world, and Amazonians take pride in load testing their product to ensure it scales the way Amazon products should scale! I also very candidly and frequently told my story about how I hadn’t load tested my product, and it had failed miserably during the busiest shopping day of the year; “Don’t be like me, be the other person!” was a strong message.

Shape the Path

So we talked about the Rider and the Elephant. The third dimension here is Shape The Path. Like in the other two dimensions, there are three things you can do.

First, you can Tweak the Environment. Let me illustrate with an example. When I travel to India, I am always fascinated by the dynamics of a busy intersection. There is a mass of humanity: cars, rickshaws, camels, elephants, bikes, motorcycles, pedestrians, all somehow navigating the chaos. It’s not particularly tidy. Then after a couple of weeks backpacking through India, I come back to the United States, and how orderly the streets are is a shock to me. We’re all humans; how come one set of humans is behaving so entirely different from the other set of humans? Think about all the little things in the US that are tweaking that environment: street lights, stop signs, turn lanes, the expectation you’ll use your turn signal when you want to turn, separate lanes for bicycles, sidewalks for pedestrians, red-light cameras and jaywalking fees. You can go a long way getting people to behave a certain way by tweaking their environment. As for how that translates to my load test platform, while originally it was executed via command line tool, I quickly integrated it with Amazon’s CI/CD (Continuous Integration/Continuous Deployment) tooling, Pipelines, so that load testing could easily be added as an approval step to somebody’s development release cycle. Pipelines already orchestrated the entire cycle: you checked in a piece of code, it would get auto-built, it would get auto-deployed to a staging environment, integration tests would automatically run against the staging environment, and if they passed, the code would be deployed to production. Pipelines was already used by tens of thousands of Amazon developers. Why not add load testing as another approval step? Pipelines provided the UI to configure the test, the scheduling of the runs, the hardware resources to run the test, and persisted and presented the logs. And it did it in a way that was homogeneous with the way integration tests ran at Amazon, so developers were accustomed to the model. My load test platform was also easily discoverable now, because it became a checkbox in a UI that was mainstream. I had tweaked the environment.

The second way to control the Path is to Build Habits. This is very powerful because when we’re in behavioral autopilot there’s less work for the Rider. Let me illustrate with an example. Many years ago, our team was following Agile and we were struggling with being on time for standup. Our scrum master bought a slingshot-flying-screaming monkey. It was essentially a stuffed animal, and you could fling it across the room and as it was flying it would make a horrible monkey screech. First day we all jumped out of our chairs, startled. “Standup!” he yelled. We all obliged. After a week, he had trained us all to get up and go to the meeting as soon as we heard the noise — 100% of us were always on time! As for how that related to my load platform case study, integrating it into Amazon Pipelines did something amazing. Before, load testing was something that, at best, was done once a year by most Amazon services, generally a little bit before the engineers expected peak traffic. Then they would forget it existed. Because it was an infrequent task, it was painful to do. I thought, if I could turn this into something that happened all the time, ubiquitously, that would shift culture and build habits. Adding it as an approval step to CI/CD turned my product into a sticky product. Before, it was a command line tool that ran only when people remembered; after, engineers could block any and all code checkins that negatively impacted the product throughput or latency.

Lastly, Rally the Herd. Humans are herd animals. We do things because others in the community do. Do you know the term “Designated Driver”? It was actually an ad campaign. Rather than having ads directly telling people not to drink and drive, they managed to embed story lines about people being a designated driver in 160 prime time shows between 1988 and 1992, reducing the drunk driving deaths from 24k/yr to 18k/yr. You heard, show after show, the term ‘designated driver’, and you saw people behaving responsibly in your favorite shows, so you decided to behave responsibly too. For me, rallying the herd came in many forms to build community. I started an email list for people to exchange ideas about load testing. I created a Success Stories page where my customers could add their happy findings. I made a coveted load testing patch that would be attached to people’s information in Amazon’s phonetool. I spoke at every single internal conference I could about load testing. I created a load testing hands-on bootcamp that we presented to over a thousand engineers over the years. And, I enthusiastically encouraged anybody who seemed to have a passion to collaborate and add features to the framework. That shifted the load framework from “this thing that Carlos made” to “this thing we all own together!”. Dozens of engineers contributed amazing features over the years in their free time, and I made amazing friendships. This vibrant community was an accelerant. The more people that were engaged in the community, the more appealing it became to be a part of it. I knew it had succeeded when people made memes about my platform!

Summary

I (together with an amazing team of great engineers) had to solve a ton of interesting technical challenges to operate a load test framework at Amazon scale. But even if the clouds had parted, and a ray of sunshine had come down from the heavens as I wrote the most beautiful, perfect code in the world, without understanding that this was also a human problem, my tool would have never succeeded. It was interesting to see that the steps I took, sort of haphazardly, and organically, as they felt right, actually fit within a deliberate thought framework significantly more complex than carrots-and-sticks that people much smarter than me conceived over an entire life studying psychology. Often, we tend to forget that even software companies need a little help from psychology: shifting culture, helping ideas stick, and switching people’s behaviors.

--

--

Carlos Arguelles
Nerd For Tech

Hi! I'm a Senior Principal Engineer (L8) at Amazon. In the last 26 years, I've worked at Google and Microsoft as well.