This framework helps engineers proactively identify bottlenecks, evaluate capacity, and ensure system reliability through robust, decentralized, and automated load testing integrated with CI/CD.
Comprehensive Load Testing with Load Generator, Dependency Mocker, Traffic Collector, and More

Authors: Chenhao Yang, Haoyue Wang, Xiaoya Wei, Zay Guan, Yaolin Chen and Fei Yuan
System-level load testing is crucial for reliability and efficiency. It identifies bottlenecks, evaluates capacity for peak traffic, establishes performance baselines, and detects errors. At a company of Airbnb’s size and complexity, we’ve learned that load testing needs to be robust, flexible, and decentralized. This requires the right set of tools to enable engineering teams to do self-service load tests that integrate seamlessly with CI.
Impulse is one of our internal load-testing-as-a-service frameworks. It provides tools that can generate synthetic loads, mock dependencies, and collect traffic data from production environments. In this blog post, we’ll share how Impulse is architected to minimize manual effort, seamlessly integrate with our observability stack, and empower teams to proactively address potential issues.
Impulse is a comprehensive load testing framework that allows service owners to conduct context-aware load tests, mock dependencies, and collect traffic data to ensure the system’s performance under various conditions. It includes the following components:

Each of these four tools are independent, allowing service owners the flexibility to select one or more components for their load testing needs.

Context aware
When load testing, requests made to the SUT often require some information from the previous response or need to be sent in a specific order. For example, if an update API needs to provide an entity_id to update, we must ensure the entity already exists in the testing environment context.
Our load generator tool allows users to write arbitrary testing logic in Java or Kotlin and launch containers to run these tests at scale against the SUT. Why write code instead of DSL/configuration logic?
Here is an example of synthetic context-aware test case:
class HelloWorldLoadGenerator : LoadGenerator {
override suspend fun run() {
val createdEntity = sutApiClient.create(CreateRequest(name="foo", ...)).data
// request with id from previous response (context)
val updateResponse = sutApiClient.update(UpdateRequest(id=createdEntity.id, name="bar"))
// ... other operations
// clean up
sutApiClient.delete(DeleteRequest(id=createdEntity.id))
}
}Decentralized
The load generator is decentralized and containerized, which means each time a load test is triggered, a set of new containers will be created to run the test. This design has several benefits:
What’s more, as our services are cloud based, a subtle point is that the Impulse framework will evenly distribute the workers among all our data centers, and the load will be emitted evenly from all the workers. Impulse’s load generator ensures the overall trigger per second (TPS) is as configured. Based on this, we can better leverage the locality settings in load balancers, which can better mimic the real traffic distribution in production.
Execution
The load generator is designed to be executed in the CI/CD pipeline, which means we can trigger load testing automatically. Developers can configure the testing spec in multiple phases, e.g., a warm up phase, a steady state phase, a peak phase, etc. Each phase can be configured with:

Impulse is a decentralized framework where each service has its own dependency mocker. This can eliminate interference between services and reduce communication costs. Each dependency mocker is an out-of-process service, which means the SUT behaves just as it does in production. We run the mockers in separate instances to avoid any impact on the performance of the SUT. The mock servers are all short lived — they only start before tests run and shut down afterwards to save costs and maintenance effort. The response latency and exceptions are configurable and the number of mocker instances can be adjusted on demand to support large amounts of traffic.
Other noteworthy features:
Impulse supports two options for generating mock responses:
Here is an example of a synthetic response with latency in Kotlin:
downstreamsMocking.every(
thriftRequest<FooRequest>().having { it.message == "hello" }
).returns { request ->
ThriftDownstream.Response.thriftEncoded(
HttpStatus.OK,
FooResponse.builder.reply("${request.message} world").build()
)
}.with {
delay = latencyFromP95(p95=500.miliseconds, min=200.miliseconds, max=2000.miliseconds)
}

The traffic collector component is designed to capture both upstream and downstream traffic, along with the relationships between them. This approach allows Impulse to accurately replay production traffic during load testing, avoiding inconsistencies in downstream data or behavior. By replicating downstream responses — including production-like latency and errors — via the dependency mocker, the system ensures high-fidelity load testing. As a result, services in the testing environment behave identically to those in production, enabling more realistic and reliable performance evaluations.
We rely heavily on event-driven, asynchronous workflows that are critical to our business operations. These include processing events from a message queue (MQ) and executing delayed jobs. Most of the MQ events/jobs are emitted from synchronous flows (e.g., API calls), so theoretically they can be covered by API load testing. However, the real world is more complex. These asynchronous flows often involve long chains of event and job emissions originating from various sources, making it difficult to replicate and test them accurately using only API-based methods.
To address this, the testing API generator component creates HTTP APIs during the CI stage according to the event or job schema. These APIs act as wrappers around the underlying asynchronous flows and are registered exclusively in the testing environment. This setup enables load testing tools — such as load generators — to send traffic to these synthetic APIs, allowing asynchronous flows to be exercised as if they were synchronous. As a result, it’s possible to perform targeted, realistic load testing on asynchronous logic that would otherwise be hard to simulate.

The goal of the testing API generator is to help developers identify performance bottlenecks and potential issues in their async flow implementations and under high traffic conditions. It does this by enabling direct load testing of async flows without involving middleware components like MQs. The rationale is that developers typically aim to evaluate the behavior of their own logic, not the middleware, which is usually already well-tested. By bypassing these components, this approach simplifies the load testing process and empowers developers to independently manage and execute their own tests.
Airbnb emphasizes product quality, utilizing versatile testing frameworks that cover integration and API tests across development, staging, and production environments, and integrate smoothly into CI/CD pipelines. The modular design of Impulse facilitates its integration with these frameworks, offering systematic service testing.

In this blog post, we shared how Impulse and its four core components help developers perform self-service load testing at Airbnb. As of this writing, Impulse has been implemented in several customer support backend services and is currently under review with different teams across the company who are planning to leverage Impulse to conduct load testing.
We’ve received a lot of good feedback in the process. For example: “Impulse helps us to identify and address potential issues in our service. During testing, it detected an ApiClientThreadToolExhaustionException caused by thread pool pressure. Additionally, it alerted us about occasional timeout errors in client API calls during service deployments. Impulse helped us identify high memory usage in the main service container, enabling us to fine-tune the memory allocation and optimize our service’s resource usage. Highly recommend utilizing Impulse as an integral part of the development and testing processes.”
Thanks to Jeremy Werner, Yashar Mehdad, Raj Rajagopal, Claire Cheng, Tim L., Wei Ji, Jay Wu, Brian Wallace for support on the Impulse project.
Does this type of work interest you? Check out our open roles here.
Load Testing with Impulse at Airbnb was originally published in The Airbnb Tech Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.
Continue reading on the original blog to support the author
Read full articleDynamic configuration is a powerful but risky tool. Airbnb's approach demonstrates how to treat configuration with the same rigor as code, using staged rollouts and architectural separation to prevent global outages while maintaining developer velocity.
This article provides a roadmap for career growth from IC to senior leadership while highlighting technical transitions from monoliths to microservices. It emphasizes the importance of designing for failure in distributed systems and the cultural impact of infrastructure on developer velocity.
This architecture demonstrates how to scale global payment systems by abstracting vendor-specific complexities into standardized archetypes. It enables rapid expansion into new markets while maintaining high reliability and consistency through domain-driven design and asynchronous orchestration.
This article details how to build resilient distributed systems by moving beyond static rate limits to adaptive traffic management. Engineers can learn to maximize goodput and ensure reliability in high-traffic, multi-tenant environments.