This innovation significantly streamlines frontend and mobile development by automating the creation of realistic, type-safe mock data. It frees engineers from tedious manual work, accelerates feature delivery, and improves the reliability of tests and demos.
How Airbnb combines GraphQL infra, product context, and LLMs to generate and maintain convincing, type-safe mock data using a new directive.

Producing valid and realistic mock data for testing and prototyping with GraphQL has been a persistent challenge across the industry for years. Mock data is tedious to write and maintain, and attempts to improve the process, such as random value generation and field-level stubbing, fall short because they lack essential domain context to make test data realistic and meaningful. The time spent on this manual work ultimately takes away from what most engineers would like to focus on: building features.
In this post, we’ll explore how we’ve reimagined mocking GraphQL data at Airbnb by combining GraphQL validation, rich product and schema context, and LLMs to generate and maintain convincing, type-safe mock data. Our solution centers around a simple new GraphQL client directive — @generateMock — that engineers can add to any operation, fragment, or field. This approach eliminates the need for engineers to manually write and maintain mocks as queries evolve, freeing up time to focus on building the product.
After meeting with Airbnb product engineers and analyzing results from internal surveys, we distilled the most common pain points around GraphQL mocking down into three key challenges:
These challenges are not unique to Airbnb and are common across the industry. Although tooling like random value generators and local field resolvers can provide some assistance, they lack the domain knowledge and context needed to produce realistic, meaningful data for high-quality demos, quick product iteration, and reliable testing.
When setting out to solve these challenges at Airbnb, we established three north-star goals:
To generate mock data while keeping engineers in their local focus loops, we introduced a new client GraphQL directive called @generateMock, which engineers can use to automatically generate mock data for a given GraphQL operation, fragment, or field:

This directive accepts a few optional arguments that engineers can use to customize the generated mock data, and the directive itself can be repeated with different input arguments to generate different mock variations:
At Airbnb, engineers use a command line tool we call Niobe to generate code for their GraphQL queries and fragments. After modifying a .graphql file locally, engineers run this code generator, then use the generated TypeScript/Kotlin/Swift files to send GraphQL requests. To generate mock data using @generateMock, engineers simply need to run Niobe code generation after adding the directive — just as they would after making any other GraphQL change.
During code generation, Niobe produces both a JSON file containing the actual mock data for each @generateMock directive, as well as a source file that provides functions for loading and consuming mock data from demo apps, snapshot tests, and unit tests. As shown in the Swift code below, the mockMixedStatusIndicators() function is generated on the InboxSyncQuery’s root Data type. It provides access to an instantiated type that’s populated with the generated mock data for mixed_status_indicators, allowing engineers to use the mock without having to load the JSON data manually:

Engineers are free to modify the generated mock JSON data as well — as we’ll see below, Niobe will avoid overwriting their modifications on subsequent generation invocations.
The context that we provide to the LLM is vital to generating data that is realistic enough to use in demos. To this end, Niobe collects the following information and includes it in the context passed to the LLM:

All this information is consolidated into a prompt we fine-tuned against Gemini 2.5 Pro. We chose this model because of its 1-million token context window, plus the fact that in our internal tests this configuration performed significantly faster than comparable models while producing mock data of similar quality. Using this approach, we’re able to produce highly realistic JSON mocks which, when loaded into the application, yield very convincing results as shown below:

The data in the screenshot on the right looks quite realistic, but if you look closely you may notice that the data is indeed mocked — all the photos are coming from the seed data set that we feed the LLM.
When an engineer uses the Niobe CLI to generate code for their GraphQL files, Niobe automatically performs mock generation as the final step of this process, as shown in the flowchart below:

In addition to generating realistic mock data with @generateMock, we also wanted to empower client engineers to iterate on features without waiting for the backend server implementation. A second directive, @respondWithMock, works alongside @generateMock to make this possible:

When this directive is present, Niobe alters the code that’s generated alongside the mock data to include extra details about this annotation. At runtime, the GraphQL client uses this to load the generated mock data, then seamlessly returns the mocked response instead of using data from the server. This effectively allows client engineers to unblock themselves from waiting on the server implementation, since they can easily use locally mocked data when querying unimplemented fields. The screenshot of the inbox screen earlier in this post is actually a real screenshot that was taken by generating with these two directives and running the Airbnb app in an iOS simulator — no manual mocking, proxying, or response modification needed!
@respondWithMock can also be specified on individual fields. When used on fields within a query instead of on the query itself, the GraphQL client will actually request all fields from the server except those annotated with @respondWithMock, then patch in locally mocked data for the remaining fields — producing a hybrid of production and mock data, and making it possible for client engineers to develop against new (unimplemented) fields in existing queries. Engineers can even repeat this directive and use query input variables to decide if and when to return a specific generated mock at runtime, as shown below:

The final challenge we addressed was the issue of keeping mocks in sync with queries as they evolve over time. Since Niobe manages mock data that is generated via the @generateMock directive, it can be smart about maintaining that mock data. As part of mock generation, Niobe embeds two extra keys in each generated JSON file:

Each time code generation runs, Niobe determines whether existing mocks’ hashes differ from what their current hashes should be based on the GraphQL document. If they match, it skips mock generation for those types. On the other hand, if one of the hashes changed, Niobe intelligently updates that mock by including the existing mock in the context provided to the LLM, along with instructions on how to modify it.
It’s important that Niobe doesn’t unnecessarily modify existing mock data for fields that are unchanged and still valid, since doing so could overwrite manual tweaks that were made to the JSON by engineers or break existing tests that rely on this data. To avoid this, we provide the LLM with a diff of what changed in the query, and tuned the prompt to focus on that diff and avoid making spurious changes to unrelated fields.
Finally, each client codebase includes an automated check that ensures mock version hashes are up to date when code is submitted. This provides a guarantee that all generated mocks stay in sync with queries as they evolve over time. When engineers encounter these validation failures, they just re-run code generation locally — no manual updates required.
“@generateMock has significantly sped up my local development and made working with local data much more enjoyable.” — Senior Software Engineer
By integrating highly contextualized LLMs — informed by the GraphQL schema, product context, and UX designs — directly into existing GraphQL tooling, we’ve unlocked the ability to generate valid and realistic mock data while eliminating the need for engineers to manually hand-write and maintain mocks. The directive-driven approach of @generateMock and @respondWithMock allows engineers to build clients before the server implementation is complete while keeping them in their focus loops and providing a guarantee that mock data stays in sync as queries evolve.
In just the past few months, Airbnb engineers have generated and merged over 700 mocks across iOS, Android, and Web using @generateMock, and we plan to roll out internal support for backend services soon. These tools have fundamentally changed how engineers mock GraphQL data for tests and prototypes at Airbnb, allowing them to focus on building product features rather than crafting and maintaining mock data.
Special thanks to Raymond Wang and Virgil King for their contributions bringing @generateMock support to Web and Android clients, as well as to many other engineers and teams at Airbnb who participated in design reviews, built supporting infrastructure, and provided usage feedback.
Does this type of work interest you? Check out our open roles here.
GraphQL Data Mocking at Scale with LLMs and @generateMock was originally published in The Airbnb Tech Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.
Continue reading on the original blog to support the author
Read full articleThis approach demonstrates how to adapt NLP architectures for travel recommendations by balancing short-term intent with long-term history. It addresses the cold-start problem for dormant users while improving geolocation accuracy through multi-task learning.
Airbnb's research demonstrates how to bridge the gap between academic theory and production-scale systems. By using bimodal embeddings and specialized ranking metrics, they solve complex marketplace challenges, providing a blueprint for driving revenue through advanced machine learning.
This article provides crucial insights into SwiftUI's underlying performance characteristics, especially its view diffing mechanism. Understanding these nuances and implementing strategies like custom Equatable conformance is vital for building high-performance and scalable mobile applications.
Validating alert behavior before deployment prevents alert fatigue and missed incidents. By shifting validation left through backtesting and visual diffs, teams can iterate on complex monitoring patterns at scale without risking production reliability or developer trust.