This article details how Pinterest uses advanced ML and LLMs to understand complex user intent, moving beyond simple recommendations to goal-oriented assistance. It offers a practical blueprint for building robust, extensible recommendation systems from limited initial data.
Lin Zhu | Sr. Staff Machine Learning Engineer
Jaewon Yang | Principal Machine Learning Engineer
Ravi Kiran Holur Vijay | Director, Machine Learning Engineering
Pinterest has always been a go-to destination for inspiration, a place where users explore everything from daily meal ideas to major life events like planning a wedding or renovating a home. Our core mission is to be an inspiration-to-realization platform. To fulfill this, we recognized a critical challenge: we needed to move beyond understanding immediate interests and comprehend the underlying, long-term goals of our users. Therefore, we introduce user journeys as the foundation for recommendations.
We define a journey as the intersection of a user’s interests, intent, and context at a specific point in time. A user journey is a sequence of user-item interactions, often spanning multiple sessions, that centers on a particular interest and reveals a clear intent — such as exploring trends or making a purchase. For example, a journey might involve an interest in “summer dresses,” an intent to “learn what’s in style,” and a context of being “ready to buy.” Users can have multiple, sometimes overlapping, journeys occurring simultaneously as their interests and goals evolve.
Inferring user journeys goes beyond understanding immediate interests, it allows us to comprehend the underlying, long-term goals of our users. By identifying user journeys, we can move from simple content recommendations to becoming a platform that assists users in achieving their goals, whether it’s planning a wedding, renovating a kitchen, or learning a new skill. This aligns with Pinterest’s mission to be an inspiration-to-realization platform, and provides the foundation for journey-aware recommendations.

From the outset, we knew we were building a new product without large amounts of training data. This constraint shaped our engineering philosophy for this project:
To identify these journeys, we evaluated two primary approaches:
We chose the Dynamic Extraction approach to generate journeys based on the user’s information. It offered greater flexibility, personalization, and adaptability, allowing the system to respond to emerging trends and unique user behaviors. This method also allowed us to leverage existing infrastructure and simplify the modeling process by focusing on clustering activities for individual users.

At a high level, we extract keywords from multiple sources and employ hierarchical clustering to generate keyword clusters; each cluster is a journey candidate. We then build specialized models for journey ranking, stage prediction, naming, and expansion. This inference pipeline runs on a streaming system, allowing us to run full inference if there’s algorithm change, or daily incremental inference for recent active users so the journeys respond quickly to a user’s most recent activities.

Let’s break down the key components of this innovative system:
This foundational component is designed to generate fresh, personalized journeys for each user.
Clear and intuitive journey names are crucial for user experience.
To ensure the most relevant journeys are presented, and to prevent monotony, we built a ranking model and applied diversification afterwards.
Similar to traditional ranking problems, our initial approach is to build a point-wise ranking model. We get labels from user email feedback and human annotation. The model takes user features, engagement features (how frequently the user engaged on this journey through search, actions on Pins, etc.) and recency features. This provides a simple, immediate baseline.
To prevent the top ranked journeys from always being similar, we implement a diversifier after the journey ranking stage. The most straightforward approach is to apply a penalty if the journey is similar to the journeys that ranked higher (the similarity is measured using pretrained keyword embedding). For each journey i, score will be updated based on the formula below. Finally, we re-rank the journeys according to the updated score.

Occurrence is the number of similar journeys ranked before the current journey, and penalty is a hyperparameter to tune, usually chosen as 0.95.
Understanding a journey’s lifecycle helps us determine appropriate notification timing. We simplify this into two objectives:
The user journeys could be used in downstream applications for retrieval and ranking. The desired output is a list of distinct user journeys. Each journey should ideally be represented with:
Confidence Score: The confidence score for this predicted journey.

We aim to establish a robust evaluation and monitoring pipeline to ensure consistent and reliable quality assessment of top-k user journey predictions. Because human evaluation is costly and sometimes inconsistent, we leverage LLMs to assess the relevance of predicted user journeys. By providing user features and engagement history, we ask the LLM to generate a 5-level score with explanations. We have validated that LLM judgments closely correlate with human assessments in our use case, giving us confidence in this approach.
We applied user journeys inference to deliver notifications related to the user’s ongoing journeys. Our initial experiments demonstrate the significant impact of Journey-Aware Notifications¹:
As a follow up, we are working on leveraging large language models (LLMs) to infer user journeys given user information and activities, while offering several key benefits:
We tuned our prompts and used GPT to generate ground truth labels for fine-tuning Qwen, enabling us to scale in-house LLM inference while maintaining competitive relevance. Next, we utilized Ray batch inference to improve the efficiency and scalability. Finally, we are implementing full inference for all users and incremental inference for recently active users to reduce overall inference costs. All generated journeys will go through safety checks to ensure they meet our safety standards.

We’d like to thank Kevin Che, Justin Tran, Rui Liu, Anya Trivedi, Binghui Gong, Randy Tumalle, Tianqi Wang, Fangzheng Tian, Eric Tam, Manan Kalra, Mengtian Hu and Mengying Yang for their contribution!
Thanks Jeanette Mukai, Darien Boyd, Samuel Owens, Justin Pangilinan, Blake Weber, Gloria Lee, Jess Adamiak for the product insights!
Thanks Tingting Zhu, Shivani Rao, Dimitra Tsiaousi, Ye Tian, Vishwakarma Singh, Shipeng Yu, Rajat Raina and Randall Keller for the support!
¹Pinterest Internal Data, USA, April-May 2025
Identify User Journeys at Pinterest was originally published in Pinterest Engineering Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.
Continue reading on the original blog to support the author
Read full articleScaling Text-to-SQL in large enterprises fails with simple RAG due to schema complexity. By encoding historical analyst intent and governance metadata into embeddings, engineers can build agents that provide trustworthy, context-aware queries instead of just syntactically correct ones.
Consolidating fragmented ML models reduces technical debt and operational overhead while boosting performance through shared representations. This case study provides a blueprint for balancing architectural unification with the need for surface-specific specialization in large-scale systems.
This case study highlights that even mathematically superior models fail if serving infrastructure lacks feature parity with training. It provides a blueprint for diagnosing ML system discrepancies by auditing the entire pipeline from embedding generation to funnel alignment.
This article demonstrates how to scale personalized recommendation systems using transformer-based sequence modeling. It provides a blueprint for transitioning from coarse-grained to fine-grained candidate generation, improving ad relevance and efficiency in large-scale production environments.