This article provides a blueprint for scaling data architecture during rapid product expansion. It demonstrates how to balance consistency and flexibility through a principled framework, preventing technical debt and data silos while supporting diverse business requirements.

By: Patrick Lam, Namrata Lamba, Jamie Stober
With the May 2025 Summer Release, Airbnb redesigned its app, relaunched Experiences, and debuted Services, pushing us beyond our traditional Homes focus. For the data teams, this meant rapidly evolving a decade-old infrastructure to integrate two brand-new product pillars. Our data engineers and analytics engineers rose to the challenge by building a consistent and flexible framework to serve as a robust and scalable data foundation for the next decade of growth.
But getting there wasn’t straightforward. This fundamental shift surfaced a critical question for our data organization: How do you evolve your offline data architecture to support new product lines without introducing disorder in vital analytics services?
We knew the approach we took would have long-lasting implications. A fragmented strategy risked creating data silos, inconsistent analytics, and a tangled web of technical debt that would likely slow down future innovation. In this post, we’ll take you behind the scenes to share key decisions that we made, the framework that emerged, and the lessons that helped reshape our offline data warehouse for the future. Note that we focus specifically on our offline data warehouse (the analytics-oriented data infrastructure owned by our data engineers and analytics engineers) rather than the online data systems that serve the app directly, as the two domains have fundamentally different requirements, constraints, and design philosophies that warrant separate treatment.
The first and most critical question was how to structure offline data for the new, three-product world, with Homes, a refreshed Experiences product, and the new Services offering. This involved a trade-off between two main approaches:
It became clear that neither approach was universally superior. The optimal choice depended heavily on the specific business domain. A model that was perfect for guest data, for example, would be suboptimal for payments data.
We chose a path that balanced consistency with flexibility. We established a framework that combined firm, centralized principles with decentralized modeling guidelines, empowering each data team to make the right choice for its domain.
To ensure a baseline of consistency across all teams, and to keep the door open for any new product categories that emerge in the future, we established three foundational principles. These principles ensured that no matter which modeling path a team chose, the results would be consistent, scalable, and easy for all data consumers to understand:
These principles set firm boundaries for our modeling efforts and ensured a consistent foundation across the company.
With the foundational principles in place, we gave each team a set of guidelines to help them decide how to model their data. This empowered them to pick the right model for their specific domain, using a common set of considerations.
These guidelines provided a framework for every domain team to use in analyzing their specific situation.
When teams applied this framework to their specific data, a clear pattern emerged. While every guideline played a role, one question in particular proved to be the most decisive factor: Do the product lines share mostly common data attributes, or do they have significant unique attributes?
Teams working on features closest to the user experience found that the product attributes were too distinct to combine. They overwhelmingly chose to build separate models to capture the unique nature of each product, particularly as we introduced several new concepts with our Services product. Some examples include:
While product-facing domains chose separation, other teams managing more foundational, cross-cutting concepts found that a monolithic model was a much better fit.
This clear delineation, choosing the separate models approach for product-specific logic and monolithic models for cross-cutting concepts, allowed us to successfully model our new, complex business landscape. The framework gave us the flexibility we needed within a consistent and scalable structure.
Designing a framework on paper and implementing it across a large, fast-moving organization are two very different things. To make this initiative successful, we would need to navigate two significant, real-world challenges.
Our offline data warehouse doesn’t exist in a vacuum. Upstream online data models and their corresponding event logging were rightly optimized for the immediate needs of running the app, such as transactional speed and stability, rather than the structural clarity that is ideal for offline analytics.
As a result, the raw data flowing into our warehouse was often structured in ways that were not ideal for analytics. This reality underscores a core tenet of our data strategy: the offline data warehouse must act as a crucial translation layer. Taking the raw production data and transforming it into a standardized, reliable source of truth is a key function performed by our data engineers and analytics engineers, enabling our downstream consumers to surface insights quickly and accurately.
The new standards gave us a clear vision, but they also highlighted legacy tables and dashboards, especially from the older version of Experiences, that no longer met the criteria. Migrating and deprecating these assets is always a massive undertaking. These tables often have hundreds of downstream consumers, so the process requires extreme care, involving extensive communication, dual pipeline runs for validation, and a painstakingly slow deprecation cycle to avoid breaking the business processes that depend on each asset.
These challenges are the reality for any data organization supporting rapid product innovation. Months after our initial launch, we continue to refine our translation layers and to carefully migrate legacy assets.
The journey to a multi-product data architecture was as much about people and process as it was about technology. By establishing clear principles while empowering teams with a flexible set of modeling guidelines, we successfully navigated the data complexity of launching the new Services product line and overhauling the existing Experiences product line, while ensuring that functionality within the core Homes product line was not placed at risk.
This journey reinforced a key principle of data modeling at scale. The best answer is rarely “one size fits all.” It’s about creating a system that balances central consistency with domain-specific flexibility, and having the discipline to not only build for the future but also to thoughtfully address the past. It’s a continuous effort, one that has already paid dividends in scalability, clarity, and our ability to deliver insights quickly and accurately.
If this type of work interests you, check out some of our related positions!
This was one of Airbnb’s biggest product releases ever and the data work behind it was a true team effort. Thanks to all who contributed big and small.
All product names, logos, and brands are property of their respective owners. All company, product, and service names used in this website are for identification purposes only. Use of these names, logos, and brands does not imply endorsement.
Scaling beyond one: How Airbnb evolved its data architecture for a multi-product world was originally published in The Airbnb Tech Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.
Continue reading on the original blog to support the author
Read full articleScaling graph databases for real-time applications is difficult. Airbnb's move to an internal JanusGraph platform demonstrates how to decouple storage from logic to achieve high performance, reliability, and operational control for massive identity resolution workloads.
Viaduct offers a middle ground between monolithic GraphQL and complex Federation by allowing teams to contribute to a shared schema via modules. This reduces operational overhead while maintaining developer autonomy, making it easier to scale data access across large organizations.
Scaling observability for 1,000+ services requires balancing multi-tenant isolation with operational efficiency. Airbnb's approach to shuffle sharding and automated control planes provides a blueprint for building resilient, petabyte-scale metrics systems that avoid 'flying blind' during outages.
This article details how a large-scale key-value store was rearchitected to meet modern demands for real-time data, scalability, and operational efficiency. It offers valuable insights into addressing common distributed system challenges and executing complex migrations.