Engineering at Meta

https://engineering.fb.com/

Engineering at MetaNov 20, 2025

Why it matters: This article details how Meta scaled a critical security feature, Key Transparency, to Messenger's massive user base. Engineers can learn about distributed system challenges, cryptographic key management, and infrastructure resilience for high-volume, security-sensitive applications.

Messenger launched Key Transparency for end-to-end encrypted chats, providing verifiable and auditable public key records to prevent tampering.
This feature automates the verification of encryption keys, addressing the complexity of manual checks for users with multiple devices and frequent key changes.
The implementation leverages the Auditable Key Directory (AKD) library and integrates Cloudflare's key transparency auditor for enhanced security.
Scaling challenges included managing billions of key entries and hundreds of thousands of updates per 2-minute epoch due to Messenger's multi-device user base.
Engineering advancements involved optimizing AKD algorithmic efficiency for smaller proof sizes and improving infrastructure resilience and recovery processes.

#security #dist

Read original

Engineering at MetaNov 18, 2025

Efficient Optimization With Ax, an Open Platform for Adaptive Experimentation

Why it matters: Engineers can leverage Ax, an open-source ML-driven platform, to efficiently optimize complex systems like AI models and infrastructure. It streamlines experimentation, reduces resource costs, and provides deep insights into system behavior, accelerating development and deployment.

Ax 1.0 is an open-source adaptive experimentation platform leveraging machine learning for efficient optimization of complex systems.
It's widely used at Meta to improve AI models, tune production infrastructure, and accelerate advances in ML and hardware design.
The platform employs Bayesian optimization to guide resource-intensive experiments, identifying optimal configurations efficiently.
Ax provides advanced analytical tools, including Pareto frontiers and sensitivity analysis, for deeper system understanding beyond just finding optimal settings.
An accompanying paper details Ax's core architecture, methodology, and performance comparison against other black-box optimization libraries.

#mlp #sre #data

Read original

Engineering at MetaNov 18, 2025

Announcing the Completion of the Core 2Africa System: Building the Future of Connectivity Together

Why it matters: This project demonstrates cutting-edge subsea cable engineering, utilizing SDM and optical switching to build massive-scale, open-access infrastructure. It's crucial for global connectivity, supporting future AI, cloud, and high-bandwidth applications across three continents.

The core 2Africa system, the world's longest open-access subsea cable, is complete, connecting 33 countries across Africa, Europe, and Asia.
It's the first cable to continuously link East and West Africa, and connect Africa to the Middle East, South Asia, and Europe.
The project, led by a Meta-consortium, uses an open-access model to promote competition and accelerate digital transformation.
Engineering innovations include Spatial Division Multiplexing (SDM) for 16 fiber pairs (double older systems) and undersea optical wavelength switching.
This infrastructure supports evolving demands for AI, cloud, and high-bandwidth applications, enabling connectivity for 3 billion people.

#dist #data

Read original

Engineering at MetaNov 17, 2025

Enhancing HDR on Instagram for iOS With Dolby Vision

Why it matters: This article details the intricate process of preserving HDR video metadata (Dolby Vision, AMVE) across a large-scale video pipeline. It's crucial for engineers working on media processing, mobile development, and ensuring high-quality user experiences on global platforms.

Instagram for iOS now supports Dolby Vision and Ambient Viewing Environment (AMVE) metadata for enhanced HDR video playback.
This involved preserving unique Dolby Vision and AMVE metadata from iPhone-produced HDR videos throughout Meta's video processing pipeline.
Previously, FFmpeg-based transcoding systems discarded this metadata, impacting picture consistency, especially at low screen brightness.
Meta collaborated with the community to add AMVE support to FFmpeg and adopted Dolby Vision Profile 10 for AV1 delivery.
This enhancement makes Instagram the first Meta app to support Dolby Vision video, with future expansion across other Meta platforms.
The solution addresses challenges like carrying Dolby Vision metadata in non-HEVC codecs and managing different Dolby Vision profiles.

#mobile #dist

Read original

Engineering at MetaNov 14, 2025

Open Source Is Good for the Environment

Why it matters: Engineers can learn how open hardware, AI, and collaborative projects like OCP are crucial for achieving environmental sustainability goals in tech. It highlights practical applications of AI in reducing carbon footprints for IT infrastructure and data centers.

Meta's podcast discusses open hardware and the Open Compute Project (OCP) for environmental sustainability.
OCP, a collaborative initiative with over 400 companies, focuses on open hardware designs to reduce environmental impact.
Meta leverages AI and open hardware to advance its goal of achieving net-zero emissions by 2030.
A new open methodology employs AI to enhance the accuracy of Scope 3 emission estimates for IT hardware.
AI is also being used to innovate concrete mixes, leading to lower-carbon data center construction.

#data #mlp #sre

Read original

Engineering at MetaNov 11, 2025

StyleX: A Styling Library for CSS at Scale

Why it matters: StyleX offers a robust solution for managing CSS at scale, providing performance benefits of static CSS with the developer experience of CSS-in-JS. It ensures maintainability, reduces bundle sizes, and prevents styling conflicts in large, complex applications.

StyleX is Meta's open-sourced styling system, combining CSS-in-JS ergonomics with static CSS performance for large-scale applications.
It functions as a build-time compiler, extracting styles to generate collision-free, atomic CSS, significantly reducing CSS bundle size.
StyleX addresses historical CSS challenges at Meta, such as specificity wars and large bundles, by enforcing constraints for predictable and scalable styling.
The system enables expressive, type-safe style authoring in JavaScript, supporting composition and conditional logic while compiling to static output.
Its core is a Babel plugin that processes style objects, normalizes values, and outputs optimized, atomic CSS classes for efficient rendering.

#frontend #dist

Read original

Engineering at MetaNov 10, 2025

Meta’s Generative Ads Model (GEM): The Central Brain Accelerating Ads Recommendation AI Innovation

Why it matters: This article details how Meta built and scaled a massive LLM-inspired foundation model for ads, showcasing innovations in architecture, training, and knowledge transfer for significant performance gains. It offers insights into building large-scale recommendation systems.

Meta's Generative Ads Model (GEM) is a new LLM-inspired foundation model enhancing ad recommendation performance and advertiser ROI.
Its novel architecture allows efficient scaling and precise predictions, leveraging thousands of GPUs for training.
GEM propagates learnings across Meta's ad model fleet through advanced post-training and knowledge transfer techniques.
It has already delivered significant increases in ad conversions on Instagram (5%) and Facebook (3%).
GEM achieves 4x efficiency in performance gains, 2x knowledge transfer effectiveness, and a 23x increase in training FLOPS.

#mlp #data #dist

Read original

Engineering at MetaNov 4, 2025

Video Invisible Watermarking at Scale

Why it matters: This article details how Meta scaled invisible video watermarking, a critical technology for content provenance. It's vital for engineers tackling challenges like detecting AI-generated media and ensuring content authenticity at massive scale with operational efficiency.

Meta utilizes invisible watermarking for content provenance, enabling detection of AI-generated videos, verification of original posters, and identification of content sources.
Invisible watermarking embeds imperceptible signals into media, designed to be robust and persistent through transcodes and edits, unlike traditional metadata.
Scaling this technology presented significant challenges related to deployment environments, bitrate increases, and maintaining visual quality.
Meta developed a CPU-based solution for invisible video watermarking that achieves performance comparable to GPU-based systems while offering superior operational efficiency.
This technology is crucial for maintaining content authenticity and distinguishing between real and AI-generated media in today's rapidly evolving digital landscape.

#security #dist #mlp

Read original

Engineering at MetaOct 23, 2025

Scaling Privacy Infrastructure for GenAI Product Innovation

Why it matters: This article is crucial for engineers building GenAI products, demonstrating how to integrate privacy-aware infrastructure and data lineage to manage complex data flows, ensure compliance, and accelerate innovation responsibly.

Meta addresses GenAI privacy challenges by scaling its Privacy Aware Infrastructure (PAI), using AI glasses as a key example.
GenAI products like AI glasses introduce new data types, increased volumes, and complex real-time data flows, necessitating robust privacy systems.
Key challenges include managing explosive data growth, adapting to shifting privacy requirements, and supporting rapid innovation cycles.
PAI leverages data lineage insights and automated privacy controls to embed privacy deeply into product development.
This approach enables Meta to accelerate GenAI product innovation while upholding user trust and data protection.

#security #data

Read original

Engineering at MetaOct 20, 2025

Disaggregated Scheduled Fabric: Scaling Meta’s AI Journey

Why it matters: DSF revolutionizes AI network scaling by overcoming traditional fabric limitations. Its disaggregated architecture, packet spraying, and advanced congestion control ensure high-performance, lossless connectivity for massive GPU clusters, crucial for the future of large-scale AI model training.

Meta's Disaggregated Scheduled Fabric (DSF) is a next-generation network technology designed to scale AI training networks beyond the physical limits of traditional Clos-based architectures.
DSF disaggregates line cards (Interface Nodes) and fabric cards (Fabric Nodes) into distinct hardware, creating a distributed system for enhanced scalability and performance.
It addresses critical challenges in AI workloads, such as "elephant flows" and "low entropy" traffic patterns, which cause congestion and suboptimal utilization in conventional IP fabrics.
The system employs a two-domain architecture, packet spraying, and a credit-based congestion control algorithm for efficient, lossless traffic management.
Built on open standards like OCP-SAI and managed by FBOSS, DSF enables the creation of large virtual chassis switches capable of interconnecting thousands of GPUs for massive AI clusters.

#dist #data #mlp

Read original

Page 5 of 6

Prev 1...3 4 5 6 Next