Explore the latest engineering posts and summaries

Search by topic, company, or concept and scan results quickly.

Posts indexed584

Last indexedApr 30, 2026

Microsoft Azure BlogOct 23, 2025

Fully managed cloud-to-cloud transfers with Azure Storage Mover

Why it matters: This simplifies complex cloud-to-cloud data migrations, especially from AWS S3 to Azure Blob, reducing operational overhead and costs. Engineers can now securely and efficiently move large datasets, accelerating multicloud strategies and leveraging Azure's advanced analytics and AI.

Azure Storage Mover now offers General Availability for cloud-to-cloud migration from AWS S3 to Azure Blob Storage.
This fully managed service simplifies data transfers by removing the need for agents, scripts, or third-party tools, reducing overhead and costs.
Key features include high-speed parallel transfers, integrated automation, secure encrypted data movement, and incremental sync capabilities.
The service provides comprehensive monitoring via Azure Monitor and Log Analytics for tracking migration progress.
Customers have successfully migrated petabytes of data, leveraging Azure's analytics and AI capabilities immediately.
New updates also include migration support for on-premises SMB shares to Azure Object storage and NFS shares to Azure Files NFS 4.1.

#data #dist #sre

Read original

Dropbox Tech BlogOct 23, 2025

With Mobius Labs' Aana models, we're bringing deeper multimodal understanding to Dropbox Dash

Why it matters: Engineers must process massive unstructured multimedia data efficiently. This integration demonstrates how specialized architectures can achieve deep multimodal understanding at exabyte scale while maintaining low computational overhead and high search relevance.

Dropbox is integrating Mobius Labs' Aana models into Dropbox Dash to enhance multimodal search and understanding.
The Aana architecture is designed for high efficiency, significantly reducing computational requirements compared to traditional multimodal models.
Unlike siloed processing, Aana analyzes the relationships between text, audio, and video to interpret complex scenes and actions.
The system is built to handle 'Dropbox scale,' processing exabytes of rich media content across various applications.
This integration allows users to query multimedia files for specific insights without manual tagging or folder navigation.

#mlp #data #dist

Read original

Engineering at MetaOct 23, 2025

Scaling Privacy Infrastructure for GenAI Product Innovation

Why it matters: This article is crucial for engineers building GenAI products, demonstrating how to integrate privacy-aware infrastructure and data lineage to manage complex data flows, ensure compliance, and accelerate innovation responsibly.

Meta addresses GenAI privacy challenges by scaling its Privacy Aware Infrastructure (PAI), using AI glasses as a key example.
GenAI products like AI glasses introduce new data types, increased volumes, and complex real-time data flows, necessitating robust privacy systems.
Key challenges include managing explosive data growth, adapting to shifting privacy requirements, and supporting rapid innovation cycles.
PAI leverages data lineage insights and automated privacy controls to embed privacy deeply into product development.
This approach enables Meta to accelerate GenAI product innovation while upholding user trust and data protection.

#security #data

Read original

Dropbox Tech BlogOct 22, 2025

Half-Quadratic Quantization of large machine learning models

Why it matters: HQQ enables engineers to deploy massive LLMs on consumer-grade hardware with minimal setup. By removing the need for calibration data and drastically reducing quantization time, it simplifies the pipeline for optimizing and testing state-of-the-art models at scale.

Introduces Half-Quadratic Quantization (HQQ), a data-free quantization technique for Large Language Models.
Achieves quantization speeds up to 50x faster than GPTQ, processing a Llama-2-70B model in under 5 minutes.
Eliminates the need for calibration datasets, removing data bias and reducing computational overhead during deployment.
Utilizes sparsity-promoting loss and hyper-Laplacian distributions to better model weight outliers compared to standard squared error.
Demonstrates that 2-bit quantized large models can outperform smaller full-precision models within similar memory constraints.

#mlp

Read original

Pinterest EngineeringOct 21, 2025

Identify User Journeys at Pinterest

Why it matters: This article details how Pinterest uses advanced ML and LLMs to understand complex user intent, moving beyond simple recommendations to goal-oriented assistance. It offers a practical blueprint for building robust, extensible recommendation systems from limited initial data.

Pinterest developed a system to identify "user journeys" – sequences of user-item interactions revealing long-term goals beyond immediate interests.
The system uses a dynamic keyword extraction approach, leveraging user search history, activity, and boards.
Keywords are processed with pretrained text embeddings (e.g., SearchSage) and then hierarchically clustered to form journey candidates.
Specialized models handle journey naming (currently keyword-based, evolving to LLMs), expansion (LLM-generated recommendations), ranking, and diversification.
The architecture emphasizes lean development, starting small with annotated data, and extensibility for future advanced ML/LLM techniques.
The inference pipeline runs on a streaming system for quick adaptation to recent user activities.

#mlp #data

Read original

Netflix Tech BlogOct 21, 2025

Behind the Streams: Real-Time Recommendations for Live Events Part 3

Why it matters: This article details how Netflix scaled real-time recommendations for live events to millions of users, solving the "thundering herd" problem. It offers a robust, two-phase architectural pattern for high-concurrency, low-latency updates, crucial for distributed systems engineers.

Netflix developed a real-time recommendation system for live events to handle millions of concurrent users without overwhelming cloud services.
The core solution involves a two-phase approach: prefetching data to devices ahead of time and broadcasting low-cardinality messages to trigger updates.
Prefetching distributes load over a longer period, avoiding traffic spikes and optimizing request throughput and compute cardinality.
Real-time broadcasting uses state keys and timestamps to ensure devices update locally with prefetched data, guaranteeing delivery even on unstable networks.
This system successfully delivers updates to over 100 million devices in under a minute during peak live event loads.
It leverages a robust two-tier pub/sub architecture built on Pushy (WebSocket proxy), Apache Kafka, and Netflix's KV store for efficient, low-latency fanout.

#dist #sre

Read original

Engineering at MetaOct 20, 2025

Disaggregated Scheduled Fabric: Scaling Meta’s AI Journey

Why it matters: DSF revolutionizes AI network scaling by overcoming traditional fabric limitations. Its disaggregated architecture, packet spraying, and advanced congestion control ensure high-performance, lossless connectivity for massive GPU clusters, crucial for the future of large-scale AI model training.

Meta's Disaggregated Scheduled Fabric (DSF) is a next-generation network technology designed to scale AI training networks beyond the physical limits of traditional Clos-based architectures.
DSF disaggregates line cards (Interface Nodes) and fabric cards (Fabric Nodes) into distinct hardware, creating a distributed system for enhanced scalability and performance.
It addresses critical challenges in AI workloads, such as "elephant flows" and "low entropy" traffic patterns, which cause congestion and suboptimal utilization in conventional IP fabrics.
The system employs a two-domain architecture, packet spraying, and a credit-based congestion control algorithm for efficient, lossless traffic management.
Built on open standards like OCP-SAI and managed by FBOSS, DSF enables the creation of large virtual chassis switches capable of interconnecting thousands of GPUs for massive AI clusters.

#dist #data #mlp

Read original

Netflix Tech BlogOct 17, 2025

How and Why Netflix Built a Real-Time Distributed Graph: Part 1 — Ingesting and Processing Data…

Why it matters: This article details how Netflix built a real-time distributed graph to unify disparate data from microservices, enabling complex relationship analysis and personalized experiences. It showcases a robust stream processing architecture for internet-scale data.

Netflix developed a Real-Time Distributed Graph (RDG) to unify member interaction data across diverse services and devices, addressing data silos from their microservices architecture.
The RDG provides advantages like relationship-centric queries, schema flexibility, and efficient pattern detection over traditional data warehousing.
Its ingestion and processing pipeline relies on a stream processing architecture for real-time updates, crucial for maintaining an up-to-date graph.
Apache Kafka acts as the ingestion backbone, handling up to 1M messages/second, with Avro-encoded records and schema registry.
Apache Flink jobs process these Kafka streams in near real-time, leveraging robust internal platform support for integration.
Data is also persisted to Apache Iceberg for backfilling, complementing Kafka's retention policies.

#dist #data

Read original

Engineering at MetaOct 17, 2025

Scaling LLM Inference: Innovations in Tensor Parallelism, Context Parallelism, and Expert Parallelism

Why it matters: This article details Meta's innovations in LLM inference parallelism, offering critical strategies for engineers to achieve high throughput, low latency, and better resource efficiency when deploying large language models at scale. It provides practical solutions for optimizing performance.

Meta developed advanced parallelism techniques to optimize LLM inference for resource efficiency, throughput, and latency, crucial for applications like Meta AI.
LLM inference comprises two stages: compute-bound prefill for prompt processing and memory-bound decoding for token generation, each with distinct computational demands.
Tensor Parallelism shards model layers across GPUs, utilizing novel Direct Data Access (DDA) algorithms (flat, tree) to significantly reduce 'allreduce' communication latency.
DDA solutions demonstrated substantial speedups (10-50% for decode, 10-30% for prefill) on AMD MI300X, achieving performance parity with NVIDIA H100.
Context Parallelism, implemented via 'ring attention' variants (Pass-KV, Pass-Q), addresses the challenges of processing extremely long contexts by distributing input tokens and exchanging tensors.

#dist #mlp #sre

Read original

Engineering at MetaOct 16, 2025

Branching in a Sapling Monorepo

Why it matters: This article introduces Sapling's innovative directory branching solution for monorepos, enabling scalable version management and merging without compromising performance or developer experience. It's crucial for engineers working with large codebases to maintain agility.

Meta's Sapling monorepo utilizes two distinct branching workflows to effectively balance scalability and developer experience.
Non-mergeable full-repo branching, supported by `sl bookmark`, is ideal for temporary product releases that do not require merging back to the main branch.
Mergeable directory branching is a novel solution for product development, allowing specific directories to be treated like traditional repository branches.
This new workflow enables copying, cherry-picking, and merging changes between directories using `sl subtree` commands.
Crucially, directory merges appear as linear commits in the monorepo's commit graph, preserving performance for operations like `sl log` and `sl blame`.
This approach resolves the challenges of managing multiple code versions within a large monorepo without sacrificing performance or essential developer tools.

#sre #dist

Read original

Page 50 of 59

Prev 1...48 49 50 51 52...59 Next