Why it matters: It demonstrates how to scale multimodal LLMs for production by combining expensive VLM extraction with efficient dual-encoder retrieval. This architecture allows platforms to organize billions of items into searchable collections while maintaining high precision and low operational costs.

  • PinLanding is a production pipeline that transforms massive product catalogs into structured shopping collections using multimodal AI.
  • The system uses Vision-Language Models (VLMs) to extract normalized key-value attributes from product images and metadata.
  • A curation layer employs LLM-as-judge and embedding-based clustering to consolidate sparse attributes into a searchable vocabulary.
  • To scale, Pinterest uses a CLIP-style dual-encoder model to map products and attributes into a shared embedding space for efficient assignment.
  • The infrastructure leverages Ray for distributed batch inference, allowing independent scaling of CPU-bound preprocessing and GPU-bound model execution.
  • The pipeline processes billions of items in approximately 12 hours on 8 NVIDIA A100 GPUs, costing roughly $500 per run.

Why it matters: This approach enables faster, more cost-effective evaluation of search ranking models in A/B tests. Engineers can detect smaller, more nuanced effects, accelerating product iteration and improving user experience by deploying features with higher confidence.

  • Pinterest uses fine-tuned open-source LLMs to automate search relevance assessment, overcoming the limitations of costly and slow human annotations.
  • The LLMs are trained on a 5-level relevance guideline using a cross-encoder architecture and comprehensive Pin textual features, supporting multilingual search.
  • This approach significantly reduces labeling costs and time, enabling much larger and more sophisticated stratified query sampling designs.
  • Stratified sampling, based on query interest and popularity, ensures sample representativeness and drastically reduces measurement variance.
  • The implementation led to a significant reduction in Minimum Detectable Effects (MDEs) from 1.3-1.5% to <= 0.25%, accelerating A/B experiment velocity and feature deployment.
  • Paired sampling and sDCG@K are used to measure the relevance impact of A/B experiments on search ranking.

Why it matters: This system provides real-time, statistically robust insights into content safety, enabling platforms to proactively identify and mitigate harms. It's crucial for maintaining user trust and scaling content moderation efficiently with AI.

  • Pinterest developed an AI-assisted system to measure "prevalence" of policy-violating content, focusing on the percentage of total views.
  • This system addresses the shortcomings of report-only metrics, which often miss under-reported harms and lack statistical power.
  • It utilizes ML-assisted sampling from daily user impressions, leveraging production risk scores for efficiency while ensuring unbiased prevalence estimates.
  • A multimodal LLM (vision + text) enables bulk labeling of sampled content, significantly reducing latency and cost compared to human review.
  • Inverse-probability weighting ensures unbiased, design-consistent prevalence metrics, decoupling measurement from enforcement model thresholds.
  • Continuous calibration, human validation, and periodic checks against SME-labeled gold sets maintain LLM accuracy and detect model drift.
  • The system provides daily, statistically powered insights for faster interventions and effective content safety tracking.

Why it matters: This article demonstrates a practical approach to de-biasing recommendation systems by integrating direct user feedback via surveys into ML model training. Engineers can learn how to move beyond pure engagement metrics to build more user-centric and high-quality content platforms.

  • Pinterest implemented in-app Pinner surveys to gather direct user feedback on content visual quality, moving beyond traditional engagement metrics.
  • The survey design collected at least 10 ratings per image for 5k Pins across diverse interest verticals, averaging scores to ensure data reliability and reduce subjectivity.
  • A machine learning model was trained using this aggregated survey data, mapping image embedding features to a single score (0-1) indicating perceived visual quality.
  • This ML model is integrated into Pinterest's core recommendation systems, including Homefeed, Related Pins, and Search, to promote higher quality content.
  • The approach aims to de-bias recommendation systems, prevent the promotion of low-quality "clickbait," and align content delivery with user well-being and satisfaction.

Why it matters: This article demonstrates how Pinterest achieves high-performance AI at significantly lower costs by prioritizing open-source models and fine-tuning with domain-specific data. It's crucial for engineers seeking efficient, scalable, and cost-effective AI development strategies.

  • Pinterest is strategically shifting AI investments towards fine-tuned open-source models, achieving similar quality at less than 10% the cost of proprietary solutions.
  • The competitive edge in AI is moving from large general-purpose LLMs to domain-specific data, personalization, and deep product integration.
  • Pinterest develops user recommendation systems and visual foundation models in-house, leveraging unique, large-scale datasets.
  • For text-based LLMs, Pinterest utilizes a mix of open-source and third-party proprietary models.
  • Open-source multimodal LLMs are enabling differentiation through fine-tuning with proprietary data and end-to-end optimization.
  • The Pinterest Assistant exemplifies this, using an agentic multimodal LLM to route tasks to specialized, Pinterest-native tools, prioritizing tool quality.

Why it matters: This article demonstrates how to overcome legacy observability challenges by pragmatically integrating AI agents and context engineering, offering a blueprint for unifying fragmented data without costly overhauls.

  • Pinterest faced fragmented observability data (logs, traces, metrics) due to legacy infrastructure predating OpenTelemetry, hindering efficient root-cause analysis.
  • They adopted a pragmatic solution using AI agents and a Model Context Protocol (MCP) server to unify disparate observability signals without a full infrastructure overhaul.
  • The MCP server allows AI agents to interact simultaneously with various data pillars (metrics, logs, traces, change events) to find correlations and build hypotheses.
  • This "context engineering" approach aims to provide intelligent agents with comprehensive data, leading to faster, clearer root-cause analysis and actionable insights.
  • The initiative represents a "shift-left" (proactive integration) and "shift-right" (production visibility) strategy, leveraging AI to overcome existing observability limitations.

Why it matters: This article demonstrates how investing in in-house test infrastructure and smart sharding can drastically improve CI/CD efficiency and developer velocity by reducing build times and flakiness. It highlights the benefits of taking control over critical testing environments.

  • Pinterest significantly reduced Android E2E CI build times by 36% by transitioning from Firebase Test Lab to an in-house testing platform, PinTestLab.
  • The core innovation is a runtime-aware sharding mechanism that uses historical test duration and stability data to balance test loads across parallel shards.
  • This in-house solution, running on EC2 bare-metal instances with optimized resource allocation, provided direct control over the testing stack and eliminated third-party flakiness.
  • The new sharding approach decreased the slowest shard's runtime by 55% and drastically reduced the variance between fastest and slowest shards.
  • Building PinTestLab was driven by FTL's high setup overhead, infrastructure instability, and the lack of suitable third-party alternatives for large-scale native emulator support.

Why it matters: This article offers valuable lessons on building and scaling an AI platform over a decade, emphasizing the interplay between technical choices, organizational alignment, and adapting to rapid ML advancements. It's crucial for engineers developing complex ML infrastructure.

  • Pinterest's AI Platform evolved over a decade from fragmented team stacks to a unified system, driven by organizational alignment and technical necessity.
  • Platform foundations are layered, bottom-up, and temporary, demanding rebuilds to adapt to new ML paradigms like DNNs, GPUs, and LLMs.
  • Early efforts like Linchpin DSL and Scorpion inference unified features and serving, addressing training-serving skew.
  • Custom DSLs proved brittle with evolving ML, emphasizing the need for flexible, industry-standard solutions.
  • Successful platform adoption requires strong organizational incentives, leadership sponsorship, and alignment with product goals.
  • Efficiency and velocity are boosted by concurrent advances in modeling and platform infrastructure, especially for frontier models.

Why it matters: This article details how Pinterest uses advanced ML and LLMs to understand complex user intent, moving beyond simple recommendations to goal-oriented assistance. It offers a practical blueprint for building robust, extensible recommendation systems from limited initial data.

  • Pinterest developed a system to identify "user journeys" – sequences of user-item interactions revealing long-term goals beyond immediate interests.
  • The system uses a dynamic keyword extraction approach, leveraging user search history, activity, and boards.
  • Keywords are processed with pretrained text embeddings (e.g., SearchSage) and then hierarchically clustered to form journey candidates.
  • Specialized models handle journey naming (currently keyword-based, evolving to LLMs), expansion (LLM-generated recommendations), ranking, and diversification.
  • The architecture emphasizes lean development, starting small with annotated data, and extensibility for future advanced ML/LLM techniques.
  • The inference pipeline runs on a streaming system for quick adaptation to recent user activities.
Page 1 of 2
Previous12Next