Posts tagged with dist
Why it matters: This article demonstrates how Netflix optimized its workflow orchestrator by 100X, crucial for supporting evolving business needs like real-time data processing and low-latency applications. It highlights the importance of engine redesign for scalability and developer productivity.
- •Netflix's Maestro workflow orchestrator achieved a 100X performance improvement, reducing overhead from seconds to milliseconds for Data/ML workflows.
- •The previous Maestro engine, based on deprecated Conductor 2.x, suffered from performance bottlenecks and race conditions due to its internal flow engine layer.
- •New business needs like Live, Ads, Games, and low-latency use cases necessitated a high-performance workflow engine.
- •The team evaluated options including upgrading Conductor, using Temporal, or implementing a custom internal flow engine.
- •They opted to rewrite Maestro's internal flow engine to simplify the architecture, eliminate complex database synchronizations, and ensure strong guarantees.
Why it matters: This article details how Netflix built a robust WAL system to solve common, critical data challenges like consistency, replication, and reliable retries at massive scale. It offers a blueprint for building resilient data platforms, enhancing developer efficiency and preventing outages.
- •Netflix developed a generic, distributed Write-Ahead Log (WAL) system to address critical data challenges at scale, including data loss, corruption, and replication.
- •The WAL provides strong durability guarantees and reliably delivers data changes to various downstream consumers.
- •Its simple WriteToLog API abstracts internal complexities, using namespaces to define storage (Kafka, SQS) and configurations.
- •Key use cases (personas) include enabling delayed message queues for reliable retries in real-time data pipelines.
- •It facilitates generic cross-region data replication for services like EVCache.
- •The WAL also supports complex operations like handling multi-partition mutations in Key-Value stores, ensuring eventual consistency via two-phase commit.
Why it matters: This article details how a large-scale key-value store was rearchitected to meet modern demands for real-time data, scalability, and operational efficiency. It offers valuable insights into addressing common distributed system challenges and executing complex migrations.
- •Airbnb rearchitected its core key-value store, Mussel, from v1 to v2 to handle real-time demands, massive data, and improve operational efficiency.
- •Mussel v1 faced issues with operational complexity, static partitioning leading to hotspots, limited consistency, and opaque costs.
- •Mussel v2 leverages Kubernetes for automation, dynamic range sharding for scalability, flexible consistency, and enhanced cost visibility.
- •The new architecture includes a stateless Dispatcher, Kafka-backed writes for durability, and an event-driven model for ingestion.
- •Bulk data loading is supported via Airflow orchestration and distributed workers, maintaining familiar semantics.
- •Automated TTL in v2 uses a topology-aware expiration service for efficient, parallel data deletion, improving on v1's compaction cycle.
- •A blue/green migration strategy with custom bootstrapping and dual writes ensured a seamless transition with zero downtime and data loss.
Why it matters: This article details how Netflix scaled a critical OLAP application to handle trillions of rows and complex queries. It showcases practical strategies using approximate distinct counts (HLL) and in-memory precomputed aggregates (Hollow) to achieve high performance and data accuracy.
- •Netflix's Muse application, an OLAP system for creative insights, evolved its architecture to handle trillions of rows and complex queries.
- •The updated data serving layer leverages HyperLogLog (HLL) sketches for efficient, approximate distinct counts, reducing query latencies by approximately 50%.
- •Hollow is used as a read-only, in-memory key-value store for precomputed aggregates, offloading Druid and improving performance for specific data access patterns.
- •The architecture now includes React, GraphQL, and Spring Boot GRPC microservices, with significant tuning applied to the Druid cluster.
- •The solution addresses challenges like dynamic analysis by audience affinities and combinatorial data explosion.
Why it matters: This article showcases a successful approach to managing a large, evolving data graph in a service-oriented architecture. It provides insights into how a data-oriented service mesh can simplify developer experience, improve modularity, and scale efficiently.
- •Viaduct, Airbnb's data-oriented service mesh, has been open-sourced after five years of significant growth and evolution within the company.
- •It's built on three core principles: a central, integrated GraphQL schema, hosting business logic directly within the mesh, and re-entrancy for modular composition.
- •The "Viaduct Modern" initiative simplified its developer-facing Tenant API, reducing complexity from multiple mechanisms to just node and field resolvers.
- •Modularity was enhanced through formal "tenant modules," enabling teams to own schema and code while composing via GraphQL fragments and queries, avoiding direct code dependencies.
- •This modernization effort has allowed Viaduct to scale dramatically (8x traffic, 3x codebase) while maintaining operational efficiency and reducing incidents.
Why it matters: This article introduces a novel approach to managing complex microservice architectures. By shifting to a data-oriented service mesh with a central GraphQL schema, engineers can significantly improve modularity, simplify dependency management, and enhance data agility in large-scale SOAs.
- •Airbnb introduced Viaduct, a data-oriented service mesh, to improve modularity and address the complexity of massive dependency graphs in microservices-based Service-Oriented Architectures (SOA).
- •Traditional service meshes are procedure-oriented, leading to 'spaghetti SOA' where managing and modifying services becomes increasingly difficult.
- •Viaduct shifts to a data-oriented design, leveraging GraphQL to define a central schema comprising types, queries, and mutations across the entire service mesh.
- •This data-oriented approach abstracts service dependencies from data consumers, as Viaduct intelligently routes requests to the appropriate microservices.
- •The central GraphQL schema acts as a single source of truth, aiming to define service APIs and potentially database schemas, which significantly enhances data agility.
- •By centralizing schema definition, Viaduct seeks to streamline changes, allowing database updates to propagate to client code with a single, coordinated update, reducing weeks of effort.
Why it matters: This article details Pinterest's approach to building a scalable data processing platform on EKS, covering deployment and critical logging infrastructure. It offers insights into managing large-scale data systems and ensuring observability in cloud-native environments.
- •Pinterest is transitioning to Moka, a new data processing platform, deploying it on AWS EKS across standardized test, dev, staging, and production environments.
- •EKS cluster deployment utilizes Terraform with a layered structure of AWS-originated and Pinterest-specific modules and Helm charts.
- •A comprehensive logging strategy is implemented for Moka, addressing EKS control plane logs (via CloudWatch), Spark application logs (driver, executor, event logs), and system pod logs.
- •A key challenge in logging is ensuring reliable upload of Spark event logs to S3, even during job failures, for consumption by Spark History Server.
- •They are exploring custom Spark listeners and sidecar containers to guarantee event log persistence and availability for debugging and performance analysis.
Why it matters: This article details how Netflix is innovating data engineering to tackle the unique challenges of media data for advanced ML. It offers insights into building specialized data platforms and roles for multi-modal content, crucial for any company dealing with large-scale unstructured media.
- •Netflix is evolving its data engineering function to "Media ML Data Engineering" to handle complex, multi-modal media data at scale.
- •This new specialization focuses on centralizing, standardizing, and managing media assets and their metadata for machine learning applications.
- •The "Media Data Lake" is introduced as a platform for storing and serving media assets, leveraging vector storage solutions like LanceDB.
- •Its architecture includes a Media Table for metadata, a robust data model, a Pythonic Data API, and distributed compute for ML training and inference.
- •The initiative aims to bridge creative media workflows with cutting-edge ML demands, enabling applications like content embedding and quality measures.
Why it matters: This article demonstrates how a large-scale monorepo build system migration can dramatically improve developer productivity and build reliability. It provides valuable insights into leveraging Bazel's features like remote execution and hermeticity for complex JVM environments.
- •Airbnb migrated its JVM monorepo (Java, Kotlin, Scala) to Bazel, achieving 3-5x faster local builds/tests and 2-3x faster deploys over 4.5 years.
- •The move to Bazel was driven by needs for superior build speed via remote execution, enhanced reliability through hermeticity, and a uniform build infrastructure across all language repos.
- •Bazel's remote build execution (RBE) and "Build without the Bytes" boosted performance by enabling parallel actions and reducing data transfer.
- •Hermetic builds, enforced by sandboxing, ensured consistent, repeatable results by isolating build actions from external environment dependencies.
- •The migration strategy included a proof-of-concept on a critical service with co-existing Gradle/Bazel builds, followed by a breadth-first rollout.
Why it matters: This article details how to perform large-scale, zero-downtime Istio upgrades across diverse environments. It offers a blueprint for managing complex service mesh updates, ensuring high availability and minimizing operational overhead for thousands of workloads.
- •Airbnb developed a robust process for seamless Istio upgrades across tens of thousands of pods and VMs on dozens of Kubernetes clusters.
- •The strategy employs Istio's canary upgrade model, running multiple Istiod revisions concurrently within a single logical service mesh.
- •Upgrades are atomic, rolling out new istio-proxy versions and connecting them to the corresponding new Istiod revision simultaneously.
- •A rollouts.yml file dictates the gradual rollout, defining namespace patterns and percentage distributions for Istio versions using consistent hashing.
- •For Kubernetes, MutatingAdmissionWebhooks inject the correct istio-proxy and configure its connection to the specific Istiod revision based on a istio.io/rev label.
- •The process prioritizes zero downtime, gradual rollouts, easy rollbacks, and independent upgrades for thousands of diverse workloads.