Pinterest Engineering

Why it matters: Transitioning to GPU serving for lightweight ranking allows engineers to deploy sophisticated architectures like MMOE-DCN. This shift significantly improves prediction accuracy and business metrics without sacrificing the strict latency requirements of real-time recommendation systems.

Pinterest transitioned its ads lightweight ranking from CPU to GPU serving to support more complex model architectures while maintaining low latency.
The new architecture replaces Multi-Task Multi-Domain (MTMD) models with a Multi-gate Mixture-of-Experts (MMOE) and Deep & Cross Network (DCN) design.
GPU serving enabled a 5-10% reduction in offline CTR loss and significant improvements in online metrics like Cost-Per-Click (CPC) and Click-Through Rate (CTR).
Training efficiency was optimized using BF16 precision, fused kernels, GPU prefetching, and increased batch sizes on p4d instances.
Segmenting standard and shopping ad scenarios for separate training doubled offline model iteration speed.
The two-tower paradigm uses offline batch updates for Pin embeddings and real-time generation for query embeddings to balance performance and latency.

#mlp #dist

Read original

Pinterest EngineeringFeb 5, 2026

Next Generation DB Ingestion at Pinterest

Why it matters: Transitioning from batch to real-time ingestion is critical for modern data-driven apps. Pinterest's architecture shows how to use CDC and Iceberg to reduce latency from days to minutes while cutting costs and ensuring compliance through efficient row-level updates and unified pipelines.

Pinterest replaced fragmented, high-latency batch ingestion with a unified CDC-based framework using Flink, Spark, and Apache Iceberg.
The system captures changes from MySQL, TiDB, and KVStore via a custom CDC service, writing events to Kafka with sub-second latency.
A dual-table architecture uses append-only CDC tables for change logs and Base tables for mirrored snapshots updated via Spark's MERGE INTO.
Standardizing on Iceberg's Merge-on-Read (MOR) strategy significantly reduced storage and compute costs compared to Copy-on-Write (COW).
The framework supports row-level deletions natively, improving data compliance and handling petabyte-scale data across thousands of pipelines.

#data #dist #finops

Read original

Pinterest EngineeringFeb 2, 2026

Beyond Two Towers: Re-architecting the Serving Stack for Next-Gen Ads Lightweight Ranking Models…

Why it matters: Moving beyond Two-Tower models allows for more expressive ranking but introduces massive latency. This architecture demonstrates how to integrate heavy GPU inference into real-time stacks by optimizing feature fetching and moving business logic to the device.

Transitioned from Two-Tower architectures to complex neural networks to enable interaction features and target attention.
Implemented an Inventory Segmentation Strategy, bundling high-value document features directly into PyTorch model registered buffers to eliminate network I/O.
Moved business logic, including utility calculations and top-k sorting, into the PyTorch model to minimize data transfer between GPU and CPU.
Optimized GPU inference latency from 4000ms to 20ms using Multi-Stream CUDA to overlap compute and data transfer.
Leveraged in-house model inference engines supporting PyTorch traced models and CUDAGraphs for high-performance serving.

#mlp #dist

Read original

Pinterest EngineeringJan 28, 2026

Ads Candidate Generation using Behavioral Sequence Modeling

Why it matters: This article demonstrates how to scale personalized recommendation systems using transformer-based sequence modeling. It provides a blueprint for transitioning from coarse-grained to fine-grained candidate generation, improving ad relevance and efficiency in large-scale production environments.

Pinterest implemented a transformer-based two-tower model to predict future user interactions with advertisers and specific products based on historical offsite behavior.
The architecture uses a bidirectional transformer for user event sequences and an MLP for advertiser/item representations, trained using sampled softmax loss with log-Q bias correction.
To handle a corpus of over 1 billion items, the system utilizes a combination of in-batch negatives and a randomly sampled set of 20 million Pins for contrastive learning.
The serving flow involves daily offline batch inference for user embeddings, stored in an online feature store for low-latency retrieval during ad requests.
Online experiments showed significant conversion volume increases and CPA reductions, demonstrating the effectiveness of moving from advertiser-level to item-level personalization.

#mlp #data

Read original

Pinterest EngineeringJan 13, 2026

PinLanding: Turn Billions of Products into Instant Shopping Collections with Multimodal AI

Why it matters: It demonstrates how to scale multimodal LLMs for production by combining expensive VLM extraction with efficient dual-encoder retrieval. This architecture allows platforms to organize billions of items into searchable collections while maintaining high precision and low operational costs.

PinLanding is a production pipeline that transforms massive product catalogs into structured shopping collections using multimodal AI.
The system uses Vision-Language Models (VLMs) to extract normalized key-value attributes from product images and metadata.
A curation layer employs LLM-as-judge and embedding-based clustering to consolidate sparse attributes into a searchable vocabulary.
To scale, Pinterest uses a CLIP-style dual-encoder model to map products and attributes into a shared embedding space for efficient assignment.
The infrastructure leverages Ray for distributed batch inference, allowing independent scaling of CPU-bound preprocessing and GPU-bound model execution.
The pipeline processes billions of items in approximately 12 hours on 8 NVIDIA A100 GPUs, costing roughly $500 per run.

#mlp #dist #data

Read original

Pinterest EngineeringDec 10, 2025

LLM-Powered Relevance Assessment for Pinterest Search

Why it matters: This approach enables faster, more cost-effective evaluation of search ranking models in A/B tests. Engineers can detect smaller, more nuanced effects, accelerating product iteration and improving user experience by deploying features with higher confidence.

Pinterest uses fine-tuned open-source LLMs to automate search relevance assessment, overcoming the limitations of costly and slow human annotations.
The LLMs are trained on a 5-level relevance guideline using a cross-encoder architecture and comprehensive Pin textual features, supporting multilingual search.
This approach significantly reduces labeling costs and time, enabling much larger and more sophisticated stratified query sampling designs.
Stratified sampling, based on query interest and popularity, ensures sample representativeness and drastically reduces measurement variance.
The implementation led to a significant reduction in Minimum Detectable Effects (MDEs) from 1.3-1.5% to <= 0.25%, accelerating A/B experiment velocity and feature deployment.
Paired sampling and sDCG@K are used to measure the relevance impact of A/B experiments on search ranking.

#mlp #data

Read original

Pinterest EngineeringDec 8, 2025

How Pinterest Built a Real‑Time Radar for Violative Content using AI

Why it matters: This system provides real-time, statistically robust insights into content safety, enabling platforms to proactively identify and mitigate harms. It's crucial for maintaining user trust and scaling content moderation efficiently with AI.

Pinterest developed an AI-assisted system to measure "prevalence" of policy-violating content, focusing on the percentage of total views.
This system addresses the shortcomings of report-only metrics, which often miss under-reported harms and lack statistical power.
It utilizes ML-assisted sampling from daily user impressions, leveraging production risk scores for efficiency while ensuring unbiased prevalence estimates.
A multimodal LLM (vision + text) enables bulk labeling of sampled content, significantly reducing latency and cost compared to human review.
Inverse-probability weighting ensures unbiased, design-consistent prevalence metrics, decoupling measurement from enforcement model thresholds.
Continuous calibration, human validation, and periodic checks against SME-labeled gold sets maintain LLM accuracy and detect model drift.
The system provides daily, statistically powered insights for faster interventions and effective content safety tracking.

#mlp #data #security

Read original

Pinterest EngineeringDec 5, 2025

Improving Quality of Recommended Content through Pinner Surveys

Why it matters: This article demonstrates a practical approach to de-biasing recommendation systems by integrating direct user feedback via surveys into ML model training. Engineers can learn how to move beyond pure engagement metrics to build more user-centric and high-quality content platforms.

Pinterest implemented in-app Pinner surveys to gather direct user feedback on content visual quality, moving beyond traditional engagement metrics.
The survey design collected at least 10 ratings per image for 5k Pins across diverse interest verticals, averaging scores to ensure data reliability and reduce subjectivity.
A machine learning model was trained using this aggregated survey data, mapping image embedding features to a single score (0-1) indicating perceived visual quality.
This ML model is integrated into Pinterest's core recommendation systems, including Homefeed, Related Pins, and Search, to promote higher quality content.
The approach aims to de-bias recommendation systems, prevent the promotion of low-quality "clickbait," and align content delivery with user well-being and satisfaction.

#mlp #data

Read original

Pinterest EngineeringDec 4, 2025

On the (re)-prioritization of open-source AI

Why it matters: This article demonstrates how Pinterest achieves high-performance AI at significantly lower costs by prioritizing open-source models and fine-tuning with domain-specific data. It's crucial for engineers seeking efficient, scalable, and cost-effective AI development strategies.

Pinterest is strategically shifting AI investments towards fine-tuned open-source models, achieving similar quality at less than 10% the cost of proprietary solutions.
The competitive edge in AI is moving from large general-purpose LLMs to domain-specific data, personalization, and deep product integration.
Pinterest develops user recommendation systems and visual foundation models in-house, leveraging unique, large-scale datasets.
For text-based LLMs, Pinterest utilizes a mix of open-source and third-party proprietary models.
Open-source multimodal LLMs are enabling differentiation through fine-tuning with proprietary data and end-to-end optimization.
The Pinterest Assistant exemplifies this, using an agentic multimodal LLM to route tasks to specialized, Pinterest-native tools, prioritizing tool quality.

#mlp #data

Read original

Pinterest EngineeringDec 3, 2025

Autonomous Observability at Pinterest (Part 1 of 2)

Why it matters: This article demonstrates how to overcome legacy observability challenges by pragmatically integrating AI agents and context engineering, offering a blueprint for unifying fragmented data without costly overhauls.

Pinterest faced fragmented observability data (logs, traces, metrics) due to legacy infrastructure predating OpenTelemetry, hindering efficient root-cause analysis.
They adopted a pragmatic solution using AI agents and a Model Context Protocol (MCP) server to unify disparate observability signals without a full infrastructure overhaul.
The MCP server allows AI agents to interact simultaneously with various data pillars (metrics, logs, traces, change events) to find correlations and build hypotheses.
This "context engineering" approach aims to provide intelligent agents with comprehensive data, leading to faster, clearer root-cause analysis and actionable insights.
The initiative represents a "shift-left" (proactive integration) and "shift-right" (production visibility) strategy, leveraging AI to overcome existing observability limitations.

#sre #dist #data

Read original

Page 3 of 5

Prev 1 2 3 4 5 Next