Airbnb Engineering

Why it matters: This approach demonstrates how to adapt NLP architectures for travel recommendations by balancing short-term intent with long-term history. It addresses the cold-start problem for dormant users while improving geolocation accuracy through multi-task learning.

Developed a Transformer-based framework that treats user actions like bookings, views, and searches as tokens to predict destination intent.
Integrated long-term historical interests with short-term contextual signals to capture both stable preferences and immediate travel needs.
Implemented a dual-training strategy to balance 'active' users with recent activity and 'dormant' users who haven't visited the platform recently.
Utilized multi-task learning with city-level and region-level prediction heads to improve the model's understanding of geolocation relationships.
Deployed the model in search autosuggest and abandoned search email notifications, leading to significant gains in bookings and user engagement.

#mlp #data

Read original

Airbnb EngineeringMar 4, 2026

It Wasn’t a Culture Problem: Upleveling Alert Development at Airbnb

Why it matters: Validating alert behavior before deployment prevents alert fatigue and missed incidents. By shifting validation left through backtesting and visual diffs, teams can iterate on complex monitoring patterns at scale without risking production reliability or developer trust.

Airbnb transitioned to a Prometheus-based Observability as Code (OaC) platform managing 300,000 alerts.
Identified that traditional code reviews fail to predict alert behavior, leading to noise or missed incidents.
Implemented a local-first workflow ensuring identical execution across developer environments, CI, and production.
Developed Change Reports providing side-by-side configuration diffs and visual previews of production alerts.
Integrated bulk backtesting to simulate alerts against historical data, calculating noisiness metrics before deployment.
Reduced development cycles from weeks to minutes by shifting alert validation left in the development lifecycle.

#sre #culture

Read original

Airbnb EngineeringFeb 24, 2026

Academic Publications & Airbnb Tech: 2025 Year in Review

Why it matters: Airbnb's research demonstrates how to bridge the gap between academic theory and production-scale systems. By using bimodal embeddings and specialized ranking metrics, they solve complex marketplace challenges, providing a blueprint for driving revenue through advanced machine learning.

Airbnb expanded its 2025 research footprint across major conferences including KDD, CIKM, and VLDB, focusing on ML, NLP, and optimization.
Developed rapid pre-A/B assessment techniques using interleaving and counterfactual evaluation to streamline search ranking experiments.
Introduced BiListing embeddings which leverage LLMs and language-image models to unify unstructured text and photo data into ranking signals.
Optimized map-based search by creating a map-specific NDCG metric that better models user attention compared to traditional list-based metrics.
Implemented extreme classification for high-precision audience expansion and location retrieval within a two-sided marketplace.
Enhanced pairwise learning-to-rank algorithms by capturing item interactions during comparisons to more accurately reflect user intent.

#mlp #data

Read original

Airbnb EngineeringFeb 18, 2026

Safeguarding Dynamic Configuration Changes at Scale

Why it matters: Dynamic configuration is a powerful but risky tool. Airbnb's approach demonstrates how to treat configuration with the same rigor as code, using staged rollouts and architectural separation to prevent global outages while maintaining developer velocity.

Airbnb's Sitar platform manages dynamic configurations using a Git-based workflow to provide versioning, peer reviews, and automated CI/CD validation.
The architecture separates the control plane, which handles rollout logic and authorization, from the data plane, which manages high-scale distribution.
Safety is enforced through staged rollouts that gradually expand the blast radius across AWS zones or Kubernetes pod percentages.
A sidecar agent model fetches configurations and maintains a local cache, ensuring low-latency access and system resilience during network partitions.
The platform supports multi-tenancy, allowing individual teams to define custom guardrails, deployment triggers, and risk profiles for their services.

#sre #dist

Read original

Airbnb EngineeringFeb 11, 2026

My Journey to Airbnb — Anna Sulkina

Why it matters: This article provides a roadmap for career growth from IC to senior leadership while highlighting technical transitions from monoliths to microservices. It emphasizes the importance of designing for failure in distributed systems and the cultural impact of infrastructure on developer velocity.

Anna Sulkina transitioned from hardware diagnostics through the full stack to Senior Director of Engineering for Application & Cloud infrastructure at Airbnb.
During her tenure at Twitter, she managed the migration from a monolith to a microservices architecture to handle high-scale traffic events.
She emphasizes that failure is inevitable in complex distributed systems, requiring engineers to design for resilience rather than avoidance.
Sulkina successfully championed GraphQL adoption at Twitter by building cross-team consensus, which significantly accelerated product development velocity.
At Airbnb, her focus is on unifying siloed infrastructure projects into a cohesive strategy to improve the overall developer experience.

#culture #dist #sre

Read original

Airbnb EngineeringJan 28, 2026

My Journey to Airbnb: Peter Coles

Why it matters: This article highlights the critical role of economics and market design in scaling global platforms. It demonstrates how data science bridges the gap between product strategy and public policy, providing a blueprint for using forensic analysis to solve complex business challenges.

Peter Coles transitioned from a PhD in economics and a Harvard professorship to leading data science and economics teams at eBay and Airbnb.
At eBay, he led Data Labs and developed 'What's it Worth' models to determine fair market value using a mix of modeling and hands-on practical work.
He established Airbnb's global economics team to analyze the relationship between short-term rentals and city policies using data-driven insights.
As a co-founder of the Central Strategy & Insights (CSI) team, he led forensic data investigations to navigate pandemic-driven market shifts and prepare for the IPO.
His work applies theoretical Market Design and 'Matching' mechanisms to solve real-world platform challenges where price alone cannot clear the market.

#data #culture

Read original

Airbnb EngineeringJan 12, 2026

Pay As a Local

Why it matters: This architecture demonstrates how to scale global payment systems by abstracting vendor-specific complexities into standardized archetypes. It enables rapid expansion into new markets while maintaining high reliability and consistency through domain-driven design and asynchronous orchestration.

Replatformed from a monolith to a domain-driven microservices architecture (Payments LTA) to improve scalability and team autonomy.
Implemented a connector and plugin-based architecture to standardize third-party Payment Service Provider (PSP) integrations.
Developed the Multi-Step Transactions (MST) framework, a processor-agnostic system for handling complex flows like redirects and SCA.
Categorized 20+ local payment methods into three standardized archetypes—Redirect, Async, and Direct flows—to maximize code reuse.
Utilized asynchronous orchestration with webhooks and polling to manage external payment confirmations and ensure data consistency.
Enforced strict idempotency and built comprehensive observability dashboards to monitor transaction success rates and latency across regions.

#dist #finops #sre

Read original

Airbnb EngineeringOct 30, 2025

GraphQL Data Mocking at Scale with LLMs and @generateMock

Why it matters: This innovation significantly streamlines frontend and mobile development by automating the creation of realistic, type-safe mock data. It frees engineers from tedious manual work, accelerates feature delivery, and improves the reliability of tests and demos.

Airbnb introduces @generateMock, a new GraphQL client directive, to automate the creation and maintenance of realistic, type-safe mock data.
The solution combines GraphQL schema validation, rich product context, and Large Language Models (LLMs) to generate convincing mock data.
Engineers can use @generateMock on any GraphQL operation, fragment, or field, providing optional hints and design URLs to guide the LLM's data generation.
Integrated with Airbnb's Niobe CLI tool, it generates JSON mock files and helper functions (TypeScript/Kotlin/Swift) for seamless consumption in tests and demo apps.
This approach eliminates the tedious manual process of writing and updating mocks, enabling faster parallel client/server development and ensuring data consistency.

#frontend #mobile #mlp

Read original

Airbnb EngineeringOct 9, 2025

From Static Rate Limiting to Adaptive Traffic Management in Airbnb’s Key-Value Store

Why it matters: This article details how to build resilient distributed systems by moving beyond static rate limits to adaptive traffic management. Engineers can learn to maximize goodput and ensure reliability in high-traffic, multi-tenant environments.

Airbnb evolved Mussel, their multi-tenant key-value store, from static QPS rate limiting to adaptive traffic management for improved reliability and goodput during traffic spikes.
The initial QoS system used simple Redis-backed QPS limits, effective for basic isolation but unable to account for varying request costs or adapt to real-time traffic shifts.
Resource-aware rate control (RARC) was introduced, charging requests in "request units" (RU) based on fixed overhead, rows processed, payload bytes, and crucial latency, reflecting actual backend load.
RARC uses a linear model for RU calculation, allowing the system to differentiate between cheap and expensive operations, even with similar surface metrics.
Future layers include load shedding with criticality tiers for priority traffic and hot-key detection/DDoS mitigation to handle skewed access patterns and shield the backend.

#sre #dist

Read original

Airbnb EngineeringSep 24, 2025

Building a Next-Generation Key-Value Store at Airbnb

Why it matters: This article details how a large-scale key-value store was rearchitected to meet modern demands for real-time data, scalability, and operational efficiency. It offers valuable insights into addressing common distributed system challenges and executing complex migrations.

Airbnb rearchitected its core key-value store, Mussel, from v1 to v2 to handle real-time demands, massive data, and improve operational efficiency.
Mussel v1 faced issues with operational complexity, static partitioning leading to hotspots, limited consistency, and opaque costs.
Mussel v2 leverages Kubernetes for automation, dynamic range sharding for scalability, flexible consistency, and enhanced cost visibility.
The new architecture includes a stateless Dispatcher, Kafka-backed writes for durability, and an event-driven model for ingestion.
Bulk data loading is supported via Airflow orchestration and distributed workers, maintaining familiar semantics.
Automated TTL in v2 uses a topology-aware expiration service for efficient, parallel data deletion, improving on v1's compaction cycle.
A blue/green migration strategy with custom bootstrapping and dual writes ensured a seamless transition with zero downtime and data loss.

#dist #data #sre

Read original

Page 1 of 2

Prev1 2 Next