Curated topic

dist

Posts tagged with dist

Pinterest EngineeringFeb 13, 2026

GPU-Serving Two-Tower Models for Lightweight Ads Engagement Prediction

Why it matters: Transitioning to GPU serving for lightweight ranking allows engineers to deploy sophisticated architectures like MMOE-DCN. This shift significantly improves prediction accuracy and business metrics without sacrificing the strict latency requirements of real-time recommendation systems.

Pinterest transitioned its ads lightweight ranking from CPU to GPU serving to support more complex model architectures while maintaining low latency.
The new architecture replaces Multi-Task Multi-Domain (MTMD) models with a Multi-gate Mixture-of-Experts (MMOE) and Deep & Cross Network (DCN) design.
GPU serving enabled a 5-10% reduction in offline CTR loss and significant improvements in online metrics like Cost-Per-Click (CPC) and Click-Through Rate (CTR).
Training efficiency was optimized using BF16 precision, fused kernels, GPU prefetching, and increased batch sizes on p4d instances.
Segmenting standard and shopping ad scenarios for separate training doubled offline model iteration speed.
The two-tower paradigm uses offline batch updates for Pin embeddings and real-time generation for query embeddings to balance performance and latency.

#mlp #dist

Read original

Cloudflare BlogFeb 13, 2026

Shedding old code with ecdysis: graceful restarts for Rust services at Cloudflare

Why it matters: Graceful restarts are critical for high-availability services where even millisecond outages cause millions of failed requests. ecdysis provides a battle-tested Rust implementation for zero-downtime upgrades, ensuring continuous connection handling during security patches and deployments.

Cloudflare open-sourced ecdysis, a Rust library for zero-downtime graceful restarts of high-traffic network services.
It solves the orphaned connection problem inherent in SO_REUSEPORT by passing socket file descriptors directly between processes.
The mechanism uses fork() and execve(), allowing the new process to inherit listening sockets via a named pipe.
It ensures crash safety: if a new version fails during initialization, the existing process continues serving traffic without interruption.
The library integrates natively with the Tokio async runtime and supports systemd-notify for seamless service management.

#sre #dist

Read original

Netflix Tech BlogFeb 13, 2026

Scaling LLM Post-Training at Netflix

Why it matters: Scaling LLM post-training requires solving complex distributed systems problems like GPU synchronization. This framework allows engineers to focus on model innovation rather than infrastructure, enabling faster iteration on domain-specific AI experiences at scale.

Netflix developed an internal Post-Training Framework to abstract infrastructure complexity for LLM alignment tasks like SFT, DPO, and Reinforcement Learning.
The framework addresses data engineering hurdles including precise loss masking for chat templates and efficient sequence packing to minimize GPU idle time.
It utilizes PyTorch FSDP and Ray to manage distributed state and orchestrate multi-node GPU clusters for models that exceed single-device memory.
The architecture supports complex RL workflows by interleaving rollout generation with policy updates across decoupled Ray actors.
Modular components for data, model, compute, and workflow allow developers to customize architectures and vocabularies for domain-specific Netflix use cases.

#mlp #dist #data

Read original

Cloudflare BlogFeb 12, 2026

Introducing Markdown for Agents

Why it matters: As AI agents become primary web consumers, optimizing content for them is crucial. This feature reduces LLM token costs by 80% and simplifies data ingestion pipelines, making it easier to build efficient, agent-friendly applications at the edge.

Cloudflare introduced 'Markdown for Agents' to automatically convert HTML content to Markdown in real-time for AI crawlers.
Converting HTML to Markdown can reduce token usage by approximately 80%, lowering costs and processing complexity for AI pipelines.
The feature leverages HTTP content negotiation, enabling agents to request Markdown via the 'Accept: text/markdown' header.
Responses include an 'x-markdown-tokens' header to help developers manage context windows and chunking strategies.
The service integrates with the Content Signals framework to define how content is used for AI training and search.

#mlp #dist #finops

Read original

Cloudflare BlogFeb 12, 2026

Introducing Markdown for Agents

Why it matters: As AI agents become primary web consumers, serving raw HTML is inefficient and costly. This feature treats agents as first-class citizens, drastically reducing LLM token costs and improving parsing accuracy by providing clean, structured data directly at the network edge.

Cloudflare introduced 'Markdown for Agents,' a feature that automatically converts HTML content to Markdown in real-time at the edge.
Markdown significantly reduces token consumption by up to 80% compared to HTML, optimizing costs and context window usage for LLMs.
The feature utilizes standard HTTP content negotiation, allowing AI agents to request Markdown via the 'Accept: text/markdown' header.
Responses include an 'x-markdown-tokens' header to help developers manage context windows and chunking strategies effectively.
The system integrates with the Content Signals framework to define how content should be used for AI training and search indexing.

#dist #mlp #finops

Read original

Airbnb EngineeringFeb 11, 2026

My Journey to Airbnb — Anna Sulkina

Why it matters: This article provides a roadmap for career growth from IC to senior leadership while highlighting technical transitions from monoliths to microservices. It emphasizes the importance of designing for failure in distributed systems and the cultural impact of infrastructure on developer velocity.

Anna Sulkina transitioned from hardware diagnostics through the full stack to Senior Director of Engineering for Application & Cloud infrastructure at Airbnb.
During her tenure at Twitter, she managed the migration from a monolith to a microservices architecture to handle high-scale traffic events.
She emphasizes that failure is inevitable in complex distributed systems, requiring engineers to design for resilience rather than avoidance.
Sulkina successfully championed GraphQL adoption at Twitter by building cross-team consensus, which significantly accelerated product development velocity.
At Airbnb, her focus is on unifying siloed infrastructure projects into a cohesive strategy to improve the overall developer experience.

#culture #dist #sre

Read original

Salesforce EngineeringFeb 9, 2026

How Agentic Memory Enables Durable, Reliable AI Agents Across Millions of Enterprise Users

Why it matters: This architecture solves the statelessness problem in AI agents, enabling long-term context and reliability at scale. It provides a blueprint for building governable, auditable AI systems that maintain user trust while reducing prompt noise and latency through structured memory layers.

Agentic Memory transforms stateless AI agents into durable collaborators by externalizing memory into a structured, persistent data layer linked to a profile graph.
The architecture separates short-term session context from long-term memory, ensuring continuity across different communication channels and sessions.
To ensure reliability, the system uses a pipeline with confidence scoring, write/read gates, and hybrid semantic validation to filter and update memory records.
Adaptive context allows agents to dynamically prioritize and prune information in real-time, reducing latency and noise compared to raw prompt injection.
Structured reasoning and session-level tracing provide an auditable history of agent decisions, making AI behavior explainable and compliant with enterprise standards.

#mlp #data #dist

Read original

Engineering at MetaFeb 9, 2026

Building Prometheus: How Backend Aggregation Enables Gigawatt-Scale AI Clusters

Why it matters: Scaling AI to gigawatt levels requires solving massive networking bottlenecks. BAG enables petabit-scale interconnectivity between distributed data centers, allowing thousands of GPUs to function as a single cluster, which is essential for training next-generation large-scale AI models.

Meta is developing Prometheus, a 1-gigawatt AI cluster designed to interconnect tens of thousands of GPUs across multiple data centers.
Backend Aggregation (BAG) serves as a centralized Ethernet-based super spine layer, enabling petabit-scale bandwidth (16-48 Pbps) between regions.
The architecture bridges two distinct network fabrics: Disaggregated Scheduled Fabric (DSF) and Non-Scheduled Fabric (NSF).
BAG utilizes planar and spread topologies to optimize for either management simplicity or enhanced path diversity and resilience.
The system manages strict distance, buffer, and latency constraints to maintain high-performance GPU-to-GPU communication.
BAG acts as the critical aggregation point between regional networks and Meta's backbone to support massive AI training demands.

#dist #mlp

Read original

Pinterest EngineeringFeb 5, 2026

Next Generation DB Ingestion at Pinterest

Why it matters: Transitioning from batch to real-time ingestion is critical for modern data-driven apps. Pinterest's architecture shows how to use CDC and Iceberg to reduce latency from days to minutes while cutting costs and ensuring compliance through efficient row-level updates and unified pipelines.

Pinterest replaced fragmented, high-latency batch ingestion with a unified CDC-based framework using Flink, Spark, and Apache Iceberg.
The system captures changes from MySQL, TiDB, and KVStore via a custom CDC service, writing events to Kafka with sub-second latency.
A dual-table architecture uses append-only CDC tables for change logs and Base tables for mirrored snapshots updated via Spark's MERGE INTO.
Standardizing on Iceberg's Merge-on-Read (MOR) strategy significantly reduced storage and compute costs compared to Copy-on-Write (COW).
The framework supports row-level deletions natively, improving data compliance and handling petabyte-scale data across thousands of pipelines.

#data #dist #finops

Read original

Salesforce EngineeringFeb 5, 2026

Re-Architecting Enterprise Applications for an Agentic System of Action

Why it matters: This shift moves beyond AI wrappers to fundamental architectural changes. It enables software to handle edge cases and cross-domain coordination autonomously, reducing the need for human intervention while maintaining reliability through governed action contracts.

Enterprise software is shifting from static systems of record to dynamic systems of action by integrating agentic reasoning into core architectures.
Deterministic workflows are combined with agents in a hybrid model where agents handle situational judgment within governed, validated execution paths.
Applications must be decomposed into explicit, machine-readable actions that encode business intent, preconditions, and constraints.
One-way system events are reimagined as conversational entry points, allowing agents to maintain state and continuity across multiple communication channels.
Scalable orchestration requires managing interdependent agents that share structured context while balancing reasoning fidelity with latency and cost.

#mlp #dist #data

Read original

Page 5 of 22

Prev 1...3 4 5 6 7...22 Next