Pinterest Engineering

Why it matters: Optimizing for sparse conversion events is a major challenge in ad tech. This architecture shows how to effectively combine sparse labels with dense engagement signals using parallel DCN v2 and multi-task learning to drive significant business value and advertiser RoAS.

Pinterest developed a dedicated candidate generation model to optimize for lower-funnel conversions, addressing the sparsity and noise of offsite purchase signals.
The architecture utilizes a two-tower model with parallel DCN v2 and MLP layers, decoupling the learning of explicit feature interactions from implicit abstract patterns.
To mitigate data sparsity, the model uses multi-task learning, supplementing conversion labels with log-weighted engagement data based on click duration.
Feature engineering combines real-time context via GraphSAGE embeddings with long-term user history processed through a Transformer.
A unified multi-task architecture with a single dot-product head was adopted to simplify retrieval while maintaining performance across multiple objectives.
Training incorporates hard negatives from non-engaged impressions to better reflect the actual distribution of served ads and improve model robustness.

#mlp #data

Read original

Pinterest EngineeringApr 20, 2026

Smarter URL Normalization at Scale: How MIQPS Powers Content Deduplication at Pinterest

Why it matters: Redundant processing of duplicate URLs wastes massive computational resources. This automated, data-driven approach to normalization reduces infrastructure costs and improves data quality by identifying content identity before expensive rendering or ingestion steps occur.

Pinterest developed MIQPS (Minimal Important Query Param Set) to automate URL normalization and content deduplication across millions of merchant domains.
The algorithm classifies query parameters as 'neutral' (noise like tracking IDs) or 'non-neutral' (content-defining like product IDs) through empirical testing.
URLs are grouped by parameter patterns to evaluate parameter importance within specific contexts, such as distinguishing between product and category pages.
The system uses visual content fingerprints to determine if stripping a parameter changes the rendered page, ensuring high-fidelity deduplication.
By normalizing URLs before ingestion, Pinterest significantly reduces redundant fetching, rendering, and computational waste in their media platform.

#data #dist #finops

Read original

Pinterest EngineeringApr 15, 2026

Finding zombies in our systems: A real-world story of CPU bottlenecks

Why it matters: This case study demonstrates how high-level ML workloads can cause low-level kernel starvation, leading to network driver resets. It is a critical lesson in debugging performance bottlenecks that span the entire stack from distributed frameworks to cloud infrastructure drivers.

Pinterest's Ray-based ML training jobs on Kubernetes experienced intermittent crashes linked to AWS ENA network driver resets.
The ENA driver triggered resets after detecting that Tx threads were paused for over five seconds, a symptom of CPU starvation.
Impacted nodes showed high system CPU utilization and significant page faulting, correlating with the network instability.
The investigation focused on the interaction between Ray's high-volume gRPC traffic and the underlying EC2 network infrastructure.
Initial mitigation attempts included optimizing memory allocators and implementing Huge pages to reduce page fault overhead.

#mlp #sre #dist

Read original

Pinterest EngineeringApr 13, 2026

Scaling Recommendation Systems with Request-Level Deduplication

Why it matters: Scaling ML models often leads to exponential costs. This approach demonstrates how architectural changes like request-level deduplication and SyncBatchNorm can decouple model complexity from infrastructure overhead, enabling massive scale-ups without proportional cost increases.

Pinterest implemented request-level deduplication to manage infrastructure costs as recommendation models scaled 100x in parameter count.
By sorting data by request ID in Apache Iceberg, the team achieved 10-50x storage compression for user-heavy feature columns.
Request-sorted training data initially disrupted the IID assumption, causing performance regressions in ranking models due to Batch Normalization instability.
The team resolved training regressions by implementing Synchronized Batch Normalization (SyncBatchNorm) to aggregate statistics across all devices.
Deduplication allows processing massive user sequences (16K tokens) once per request rather than redundantly for every candidate item scored.

#mlp #data #finops

Read original

Pinterest EngineeringApr 8, 2026

Performance for Everyone

Why it matters: Automating performance metrics lowers the barrier for product teams to prioritize speed. By making Visually Complete latency a default feature, engineers can focus on optimization rather than instrumentation, ensuring a consistently fast user experience across all app surfaces.

Pinterest automated the measurement of Visually Complete latency by integrating tracking logic directly into base UI classes.
The system reduces engineering effort from two weeks per surface to zero by automatically measuring any feature built on the BaseSurface class.
Measurement is achieved by walking the Android view tree to verify the rendering status of visible media components like images, text, and video.
Standardized interfaces such as PerfImageView and PerfTextView provide methods like isDrawn() to report precise rendering timestamps.
The solution currently monitors over 60 surfaces on Android and has been successfully ported to iOS and Web platforms for cross-platform consistency.

#mobile #frontend

Read original

Pinterest EngineeringApr 7, 2026

Evolution of Multi-Objective Optimization at Pinterest Home feed

Why it matters: This article demonstrates how moving from heuristic-heavy re-ranking to sophisticated algorithms like SSD improves both system performance and long-term user retention. It highlights the importance of balancing immediate clicks with content diversity in large-scale recommendation engines.

Pinterest uses a cascaded funnel design where the final re-ranking layer performs multi-objective optimization to balance short-term engagement with long-term user satisfaction.
The system evolved from Determinantal Point Process (DPP) to Sliding Spectrum Decomposition (SSD) for feed diversification to address computational complexity and numerical stability issues.
SSD views the candidate feed as a mixture of latent spectra, using a sliding window to penalize over-represented topics and promote under-represented ones.
The SSD implementation leverages standard linear-algebra blocks in PyTorch, avoiding the log-determinants and Cholesky failures common in DPP-based approaches.
A Unified Soft-Spacing Framework was integrated into the SSD objective to prevent low-quality content from clustering together while maintaining engagement.
Moving re-ranking logic to company-wide PyTorch model serving clusters improved developer velocity and enabled the use of more complex pairwise similarity features.

#mlp #data

Read original

Pinterest EngineeringMar 19, 2026

Building an MCP Ecosystem at Pinterest

Why it matters: This architecture demonstrates how to scale AI agent capabilities securely in an enterprise environment. By standardizing tool access via MCP and a central registry, Pinterest enables safe, automated engineering workflows while maintaining strict governance and security controls.

Pinterest implemented a Model Context Protocol (MCP) ecosystem using cloud-hosted, domain-specific servers rather than a monolithic architecture.
A central MCP registry serves as the source of truth for discovery, governance, and security validation across internal AI clients.
The platform uses a unified deployment pipeline to abstract infrastructure management, allowing domain experts to focus on tool business logic.
Security is enforced through a dedicated MCP Security Standard, utilizing JWT-based authentication and mesh identities for granular access control.
Production integrations include internal LLM web chat, IDE plugins, and AI bots that automatically handle OAuth flows and tool binding.
High-leverage servers for Presto, Spark, and internal knowledge bases enable agents to perform tasks like log analysis and data retrieval.

#mlp #security #data

Read original

Pinterest EngineeringMar 6, 2026

Unified Context-Intent Embeddings for Scalable Text-to-SQL

Why it matters: Scaling Text-to-SQL in large enterprises fails with simple RAG due to schema complexity. By encoding historical analyst intent and governance metadata into embeddings, engineers can build agents that provide trustworthy, context-aware queries instead of just syntactically correct ones.

Pinterest evolved its Text-to-SQL system into a production Analytics Agent by focusing on analytical intent rather than just raw SQL syntax.
The system utilizes unified context-intent embeddings, which translate historical SQL queries into semantically rich natural language descriptions using LLMs.
A three-step pipeline injects domain context, such as glossary terms and metric definitions, before converting SQL to structured text summaries.
Retrieval is enhanced by structural and statistical patterns, extracting validated join keys and aggregation logic from historical query data.
A governance-aware ranking system prioritizes trustworthy data by incorporating table tiers, usage signals, and documentation quality from the PinCat catalog.
This approach addresses the challenges of a massive data warehouse by grounding AI outputs in patterns that have historically worked for human analysts.

#data #mlp

Read original

Pinterest EngineeringMar 3, 2026

Unifying Ads Engagement Modeling Across Pinterest Surfaces

Why it matters: Consolidating fragmented ML models reduces technical debt and operational overhead while boosting performance through shared representations. This case study provides a blueprint for balancing architectural unification with the need for surface-specific specialization in large-scale systems.

Pinterest unified fragmented ads engagement models from Home Feed and Search into a single architecture to increase iteration velocity and reduce maintenance.
The unified model utilizes a multi-task learning design with surface-specific tower trees and calibration layers to handle distinct user intents across surfaces.
To mitigate latency increases from larger feature maps, the team implemented DCNv2 projection layers to compress Transformer outputs.
Infrastructure efficiency was improved via request-level broadcasting, fetching user embeddings once per unique user rather than per candidate pin.
The approach leverages shared representation learning, allowing surface-specific models to benefit from combined training data and complementary features.

#mlp #data

Read original

Pinterest EngineeringFeb 27, 2026

Bridging the Gap: Diagnosing Online–Offline Discrepancy in Pinterest’s L1 Conversion Models

Why it matters: This case study highlights that even mathematically superior models fail if serving infrastructure lacks feature parity with training. It provides a blueprint for diagnosing ML system discrepancies by auditing the entire pipeline from embedding generation to funnel alignment.

Pinterest investigated why L1 conversion models with 20-45% offline LogMAE gains failed to produce online CPA improvements during A/B testing.
The team ruled out offline evaluation bugs, exposure bias, and serving latency, focusing instead on structural discrepancies between training and inference.
A critical root cause was feature disparity: high-impact signals like targeting specs and conversion counts were available in training logs but missing from the L1 embedding builder.
Temporal misalignment between query and Pin tower embeddings further degraded online performance, as the two towers were not synchronized during real-time serving.
The investigation highlights the necessity of a full-stack diagnosis framework covering model evaluation, serving pipelines, and funnel utility to isolate ML system failures.

#mlp #data

Read original

Page 1 of 4

Prev1 2 3 4 Next