Curated topic

data

Posts tagged with data

Pinterest EngineeringFeb 24, 2026

Piqama: Pinterest Quota Management Ecosystem

Why it matters: Managing resources at scale requires more than just hard limits. Piqama provides a unified framework for capacity and rate-limiting, enabling automated rightsizing and budget alignment. This reduces manual overhead while improving resource efficiency and system reliability across platforms.

Piqama is a unified quota management ecosystem at Pinterest handling physical resources, service limits like QPS, and application-specific units.
The platform manages the full quota lifecycle including schema definition, pluggable validation rules, and ownership-based authorization.
It supports both capacity-based quotas for Big Data workloads (integrated with Yunikorn) and rate-limiting for online storage services.
A centralized management portal provides visibility and self-service capabilities for quota updates and usage tracking.
Governance features include automated usage statistics collection via Apache Iceberg and an auto-rightsizing service for predictive resource allocation.
The system integrates with Pinterest's chargeback and entitlement systems to align resource consumption with financial budgets.

#sre #data #finops

Read original

Netflix Tech BlogFeb 23, 2026

MediaFM: The Multimodal AI Foundation for Media Understanding at Netflix

Why it matters: MediaFM demonstrates how to scale multimodal foundation models for long-form video. By fusing audio, visual, and text signals with temporal context, it enables nuanced content understanding that improves recommendation cold starts, ad placement, and automated asset creation.

MediaFM is a tri-modal Transformer-based encoder that generates contextual embeddings for video shots by fusing audio, video, and text signals.
The model processes sequences of up to 512 shots, utilizing a global context token for title-level metadata to capture long-form narrative dependencies.
Input modalities include SeqCLIP for video frames, wav2vec2 for audio samples, and OpenAI's text-embedding-3-large for subtitles and closed captions.
Training employs a Masked Shot Modeling (MSM) objective, where the model predicts masked fused embeddings by minimizing cosine distance.
Optimization was performed using Muon for hidden parameters and AdamW for others, showing noticeable improvements in model performance.
Evaluation demonstrates that 'embedding in context'—extracting shots within their full episode sequence—significantly outperforms standalone clip embedding.
The foundation model supports diverse downstream applications including ad relevancy classification, clip popularity ranking, and automated tagging.

#mlp #data

Read original

Salesforce EngineeringFeb 21, 2026

From Audio to Action: How Speech Invocable Action Powers Native AI Automation Across Salesforce

Why it matters: This shift to native speech automation eliminates third-party security risks and simplifies complex AI integration. It demonstrates how to build resource-intensive AI features within a multi-tenant environment while maintaining strict data residency and platform stability.

Salesforce developed Speech Invocable Action to provide native, secure speech-to-text and translation within its platform trust boundary.
The architecture manages shared memory and compute resources to ensure stability across concurrent multi-tenant workloads.
Defensive design uses structured error categories, enabling developers to implement explicit fallback logic in Flows and Agentforce.
The team leveraged AI-assisted development tools like Claude Code to navigate complex internal APIs and accelerate production delivery.
Standardizing speech as a composable action removes the need for external integrations and boilerplate code for audio streaming.

#mlp #security #data

Read original

Spotify EngineeringFeb 19, 2026

Our Multi-Agent Architecture for Smarter Advertising

Why it matters: This shift from monolithic AI features to a multi-agent architecture demonstrates how to scale complex ML systems. It provides a blueprint for managing autonomous components that collaborate to solve high-stakes business problems like ad optimization.

Spotify transitioned from simple AI features to a robust multi-agent architecture for their advertising platform.
The architecture addresses structural inefficiencies by delegating specialized tasks to autonomous agents.
This approach enables better decision-making and optimization across complex advertising workflows.
The system focuses on scalability and modularity, allowing for independent agent updates and improvements.
By using multi-agent systems, Spotify can handle high-dimensional data and real-time constraints more effectively.

#mlp #dist #data

Read original

PlanetScale Tech BlogFeb 19, 2026

Faster PlanetScale Postgres connections with Cloudflare Hyperdrive

Why it matters: This article provides a blueprint for building high-concurrency, real-time applications by combining edge computing with optimized database pooling. It demonstrates how to minimize latency between globally distributed users and centralized stateful databases.

Cloudflare Hyperdrive optimizes Postgres performance by automating connection pooling and reducing the seven-step connection handshake latency.
PlanetScale Postgres Metal provides a high-performance backend using locally-attached NVMe SSDs rather than network-attached storage.
The architecture leverages Cloudflare Workers for global distribution and Durable Objects to manage stateful WebSocket connections for real-time updates.
Engineers must evaluate 'smart placement' to decide whether running Workers closer to the database or closer to the user yields better latency for their specific workload.
Hyperdrive consists of an edge component for connection preparation and a connection pooler located physically near the database to maintain warm connections.

#dist #data #sre

Read original

Microsoft Azure BlogFeb 17, 2026

Claude Sonnet 4.6 in Microsoft Foundry-Frontier Performance for Scale

Why it matters: Claude Sonnet 4.6 brings frontier-level reasoning and a 1M token context window to Microsoft Foundry. For engineers, this enables more efficient large-scale code analysis, sophisticated browser automation, and better cost-performance control for agentic workflows in enterprise environments.

Claude Sonnet 4.6 is now available in Microsoft Foundry, offering near-Opus-level intelligence with improved token efficiency and lower costs.
Features a 1 million token context window (beta) and 128K output limit, enabling analysis of massive codebases and long-form documents.
Introduces adaptive thinking and effort parameters, allowing developers to tune the model's reasoning for better quality-latency-cost tradeoffs.
Enhanced for software engineering with stronger reasoning across complex codebases and reliable performance in iterative development cycles.
Significant improvements in computer use, scoring 72.5% on OSWorld Verified for precise browser automation and cross-app task execution.
Designed as a direct upgrade to Sonnet 4.5, requiring minimal prompting changes for existing enterprise search and agentic pipelines.

#mlp #data

Read original

Pinterest EngineeringFeb 17, 2026

Drastically Reducing Out-of-Memory Errors in Apache Spark at Pinterest

Why it matters: OOM errors are a primary cause of Spark job failures at scale. Pinterest's elastic executor sizing allows jobs to be tuned for average usage while automatically handling memory-intensive tasks, significantly reducing manual tuning effort, job failures, and infrastructure costs.

Pinterest implemented Auto Memory Retries to handle Spark Out-of-Memory (OOM) errors by dynamically adjusting resource profiles for failed tasks.
The system uses a hybrid strategy: first increasing the CPU property to reduce task concurrency on existing executors, then launching physically larger executors if OOMs persist.
Core Spark classes like TaskSetManager and TaskSchedulerImpl were modified to support task-level resource profiles, deviating from the standard TaskSet-wide configuration.
This elastic sizing allows engineers to tune jobs for P90 memory usage rather than peak requirements, improving overall cluster resource efficiency.
A proactive OOM Prediction feature was introduced to preemptively assign larger resource profiles to tasks likely to fail based on historical job data.
The implementation resulted in a 40% reduction in OOM-related job failures and a 2.5% reduction in total cluster memory consumption.

#data #dist #finops

Read original

Netflix Tech BlogFeb 13, 2026

Scaling LLM Post-Training at Netflix

Why it matters: Scaling LLM post-training requires solving complex distributed systems problems like GPU synchronization. This framework allows engineers to focus on model innovation rather than infrastructure, enabling faster iteration on domain-specific AI experiences at scale.

Netflix developed an internal Post-Training Framework to abstract infrastructure complexity for LLM alignment tasks like SFT, DPO, and Reinforcement Learning.
The framework addresses data engineering hurdles including precise loss masking for chat templates and efficient sequence packing to minimize GPU idle time.
It utilizes PyTorch FSDP and Ray to manage distributed state and orchestrate multi-node GPU clusters for models that exceed single-device memory.
The architecture supports complex RL workflows by interleaving rollout generation with policy updates across decoupled Ray actors.
Modular components for data, model, compute, and workflow allow developers to customize architectures and vocabularies for domain-specific Netflix use cases.

#mlp #dist #data

Read original

Microsoft Azure BlogFeb 12, 2026

The data behind the design: How Pantone built agentic AI with an AI-ready database

Why it matters: Pantone's approach provides a blueprint for scaling niche domain expertise via agentic AI. It demonstrates how a multi-agent architecture supported by a robust NoSQL database like Azure Cosmos DB can transform static data into interactive, high-value creative tools.

Pantone implemented a multi-agent AI architecture to digitize and scale decades of proprietary color science and trend expertise.
The system features specialized agents, including a "chief color scientist" agent, to manage complex reasoning and context-aware responses.
Azure Cosmos DB serves as the foundational database, providing the low-latency retrieval needed for real-time conversational context and history.
The Palette Generator uses these agents to transform abstract user prompts into curated, data-driven color palettes instantly.
The project highlights a shift from basic LLM prompting to sophisticated orchestration where databases play a critical role in agentic reasoning.

#data #mlp

Read original

Netflix Tech BlogFeb 12, 2026

Automating RDS Postgres to Aurora Postgres Migration

Why it matters: This migration strategy demonstrates how to handle large-scale database transitions with minimal downtime and zero data loss. It provides a blueprint for automating complex stateful migrations in a self-service manner while maintaining strict security and operational standards.

Netflix standardized on Amazon Aurora PostgreSQL to leverage its cloud-native architecture for scalability and high availability across 400+ clusters.
The team developed a self-service migration workflow to automate the transition from RDS PostgreSQL, reducing manual effort and human error.
They utilized the Aurora Read Replica migration technique, which minimizes downtime by maintaining continuous replication until the final cutover.
To ensure zero data loss without direct credential access, Netflix implemented a control-plane solution that revokes IAM-based database access to quiesce traffic.
The automated pipeline handles complex tasks including parameter group migration, read replica setup, and data parity validation.

#data #sre

Read original

Page 4 of 19

Prev 1 2 3 4 5 6...19 Next