Netflix Tech Blog

Why it matters: Scaling localization requires moving from siloed data pipelines to a centralized architecture. By consolidating business logic and focusing on backend reliability, engineers reduce technical debt and ensure data consistency across global teams while unlocking granular user behavior insights.

Netflix modernized its localization analytics by consolidating fragmented pipelines and siloed dashboards into a unified backend architecture.
The team implemented a 'write once, read many' strategy, centralizing complex business logic into core tables to ensure data consistency across domains.
An audit of over 40 tools led to the prioritization of backend consolidation over frontend patches to reduce long-term maintenance burdens.
They addressed 'Not-So-Tech Debt' by improving the user experience and creating intuitive metrics like unified Language Asset Consumption.
Future initiatives include event-level analytics to capture granular data, such as subtitle reading speed, to optimize member engagement and style guidelines.

#data #culture

Read original

Netflix Tech BlogMar 3, 2026

Optimizing Recommendation Systems with JDK’s Vector API

Why it matters: This shows how to optimize high-scale Java services using the JDK Vector API. It highlights that algorithmic changes like matrix multiplication require cache-friendly data layouts and SIMD acceleration to overcome JNI overhead and GC bottlenecks in production environments.

Netflix optimized its Ranker service by converting O(M×N) sequential dot products into a single matrix multiplication operation.
Initial batching attempts caused performance regressions due to GC pressure and non-contiguous memory layouts in double[][] arrays.
The team implemented flat double[] buffers and ThreadLocal reuse to ensure cache locality and eliminate per-request allocations.
Evaluation of BLAS libraries via JNI showed that call overhead outweighed compute gains for the service's specific matrix sizes.
The JDK Vector API (Project Panama) provided a pure Java SIMD implementation that outperformed both scalar Java and JNI-based BLAS.
The final optimization significantly reduced CPU usage per request, leading to a smaller cluster footprint for one of Netflix's largest services.

#mlp #data #finops

Read original

Netflix Tech BlogFeb 28, 2026

Mount Mayhem at Netflix: Scaling Containers on Modern CPUs

Why it matters: Rapidly scaling containers with many layers can trigger kernel VFS lock contention when using idmap mounts for security. Understanding how hardware architecture, like NUMA domains and cache line bouncing, impacts system-level locks is crucial for high-density container orchestration.

Netflix identified a container startup bottleneck where nodes stalled for 30+ seconds due to kernel-level VFS mount lock contention.
The issue was triggered by the new container runtime's use of idmap mounts for unique user namespaces, which significantly increased mount operations.
Container images with many layers (50+) exacerbated the problem, requiring thousands of mount/unmount operations during rapid scale-up events.
Performance analysis using Intel's TMA revealed that 95.5% of pipeline slots were stalled on contested accesses and cache line bouncing.
Older multi-socket hardware (r5.metal) suffered more than newer single-socket instances (m7i, m7a) due to NUMA-related latency in lock acquisition.
The investigation highlights the critical intersection of container runtime security features, kernel lock management, and modern CPU architecture.

#sre #security

Read original

Netflix Tech BlogFeb 23, 2026

MediaFM: The Multimodal AI Foundation for Media Understanding at Netflix

Why it matters: MediaFM demonstrates how to scale multimodal foundation models for long-form video. By fusing audio, visual, and text signals with temporal context, it enables nuanced content understanding that improves recommendation cold starts, ad placement, and automated asset creation.

MediaFM is a tri-modal Transformer-based encoder that generates contextual embeddings for video shots by fusing audio, video, and text signals.
The model processes sequences of up to 512 shots, utilizing a global context token for title-level metadata to capture long-form narrative dependencies.
Input modalities include SeqCLIP for video frames, wav2vec2 for audio samples, and OpenAI's text-embedding-3-large for subtitles and closed captions.
Training employs a Masked Shot Modeling (MSM) objective, where the model predicts masked fused embeddings by minimizing cosine distance.
Optimization was performed using Muon for hidden parameters and AdamW for others, showing noticeable improvements in model performance.
Evaluation demonstrates that 'embedding in context'—extracting shots within their full episode sequence—significantly outperforms standalone clip embedding.
The foundation model supports diverse downstream applications including ad relevancy classification, clip popularity ranking, and automated tagging.

#mlp #data

Read original

Netflix Tech BlogFeb 13, 2026

Scaling LLM Post-Training at Netflix

Why it matters: Scaling LLM post-training requires solving complex distributed systems problems like GPU synchronization. This framework allows engineers to focus on model innovation rather than infrastructure, enabling faster iteration on domain-specific AI experiences at scale.

Netflix developed an internal Post-Training Framework to abstract infrastructure complexity for LLM alignment tasks like SFT, DPO, and Reinforcement Learning.
The framework addresses data engineering hurdles including precise loss masking for chat templates and efficient sequence packing to minimize GPU idle time.
It utilizes PyTorch FSDP and Ray to manage distributed state and orchestrate multi-node GPU clusters for models that exceed single-device memory.
The architecture supports complex RL workflows by interleaving rollout generation with policy updates across decoupled Ray actors.
Modular components for data, model, compute, and workflow allow developers to customize architectures and vocabularies for domain-specific Netflix use cases.

#mlp #dist #data

Read original

Netflix Tech BlogFeb 12, 2026

Automating RDS Postgres to Aurora Postgres Migration

Why it matters: This migration strategy demonstrates how to handle large-scale database transitions with minimal downtime and zero data loss. It provides a blueprint for automating complex stateful migrations in a self-service manner while maintaining strict security and operational standards.

Netflix standardized on Amazon Aurora PostgreSQL to leverage its cloud-native architecture for scalability and high availability across 400+ clusters.
The team developed a self-service migration workflow to automate the transition from RDS PostgreSQL, reducing manual effort and human error.
They utilized the Aurora Read Replica migration technique, which minimizes downtime by maintaining continuous replication until the final cutover.
To ensure zero data loss without direct credential access, Netflix implemented a control-plane solution that revokes IAM-based database access to quiesce traffic.
The automated pipeline handles complex tasks including parameter group migration, read replica setup, and data parity validation.

#data #sre

Read original

Netflix Tech BlogJan 26, 2026

The AI Evolution of Graph Search at Netflix

Why it matters: Translating natural language to complex DSLs reduces friction for subject matter experts interacting with massive, federated datasets. This approach bridges the gap between intuitive human intent and rigid technical schemas, improving productivity across hundreds of enterprise applications.

Netflix is evolving its Graph Search platform to support natural language queries using Large Language Models (LLMs).
The system translates ambiguous user input into a structured Filter Domain Specific Language (DSL) for federated GraphQL data.
Accuracy is maintained by ensuring syntactic, semantic, and pragmatic correctness through schema validation and controlled vocabularies.
The architecture utilizes Retrieval-Augmented Generation (RAG) to provide domain-specific data processing without replacing existing UIs.
Pre-processing and context engineering are critical to prevent LLM hallucinations and ensure fields match the underlying index.

#data #mlp #dist

Read original

Netflix Tech BlogDec 15, 2025

How Temporal Powers Reliable Cloud Operations at Netflix

Why it matters: This article demonstrates how a Durable Execution platform like Temporal can drastically improve the reliability of critical cloud operations and continuous delivery pipelines, reducing complex failure handling and state management for engineers.

Netflix significantly improved the reliability of its Spinnaker deployments by adopting Temporal, reducing transient failures from 4% to 0.0001%.
Temporal is a Durable Execution platform that allows engineers to write resilient code, abstracting away complexities of distributed system failures.
The previous Spinnaker architecture suffered from complex, undifferentiated internal orchestration, retry logic, and a homegrown Saga framework within its Clouddriver service.
Prior to Temporal, Clouddriver's instance-local task state led to lost operation progress if the service crashed, impacting deployment reliability.
Temporal helped streamline cloud operations by offloading complex state management and failure handling, allowing services like Clouddriver to focus on core infrastructure changes.

#sre #dist

Read original

Netflix Tech BlogDec 15, 2025

Netflix Live Origin

Why it matters: This article details how Netflix built a robust, high-performance live streaming origin and optimized its CDN for live content. It offers insights into handling real-time data defects, ensuring resilience, and optimizing content delivery at scale.

Netflix Live Origin is a multi-tenant microservice bridging cloud live streaming pipelines and Open Connect CDN, managing content distribution.
It ensures resilience through redundant regional pipelines and server-side failover, utilizing epoch locking for intelligent segment selection.
The Origin detects and mitigates live stream defects (e.g., short, missing segments) by selecting valid candidates from multiple pipelines.
Open Connect's nginx-based CDN was optimized for live streaming, extending proxy-caching and adding millisecond-grain caching.
Live Origin "holds open" requests for yet-to-be-published segments, reducing network chatter and improving efficiency.
HTTP headers are leveraged for scalable streaming metadata, providing real-time event notifications to client devices via OCAs.

#dist #sre

Read original

Netflix Tech BlogDec 4, 2025

AV1 — Now Powering 30% of Netflix Streaming

Why it matters: This article highlights how open video codecs like AV1 drive significant improvements in streaming quality and network efficiency. It showcases a successful large-scale rollout across diverse devices, offering valuable insights into optimizing content delivery and user experience.

Netflix's AV1 codec adoption has reached 30% of all streaming, becoming their second most-used codec due to its superior efficiency.
AV1 delivers higher video quality (4.3 VMAF points over AVC) with one-third less bandwidth and 45% fewer buffering interruptions.
The rollout began with Android mobile in 2020 using the dav1d software decoder, expanding to smart TVs, web browsers, and Apple devices with hardware support.
This advanced codec significantly improves network efficiency for Netflix's Open Connect CDN and partner ISPs by reducing overall internet bandwidth consumption.
AV1 enables advanced features like HDR10+ streaming and cinematic film grain, enhancing the overall viewing experience for members.

#dist #mobile

Read original

Page 1 of 2

Prev1 2 Next