This article details Pinterest's approach to building a scalable data processing platform on EKS, covering deployment and critical logging infrastructure. It offers insights into managing large-scale data systems and ensuring observability in cloud-native environments.
Soam Acharya: Principal Engineer · Rainie Li: Manager, Data Processing Infrastructure · William Tom: Senior Staff Software Engineer · Ang Zhang: Sr. Director, Big Data Platform
As Pinterest’s data processing needs grow and as our current Hadoop-based platform (Monarch) ages, the Big Data Platform (BDP) team within Pinterest Data Engineering started considering alternatives for our next generation massive scale data processing platform. In part one we shared the overall design of Moka, our new next gen data processing platform, and detailed its application focused components. In part two of our series, we spotlight the infrastructure focused aspects of our platform: how we deploy Moka using AWS Elastic Kubernetes Service (EKS), our approach to logging and observability, image management, and how we built a UI for Moka. We conclude with our learnings and future direction.
We have standardized on four cluster environments at Pinterest — test, dev, staging, and production:
Each environment has different levels of access and isolation. For example, it is not possible to write production data from a dev environment. We deploy our EKS clusters within each environment using Terraform augmented by a collection of AWS originated modules and Helm charts. All of our Terraform root modules live in a single internal git repository, terraform-aws-eks-live. They use the following reusable modules:
terraform-aws-eks-data-addons: Forked from open source, contains addons for data on EKS. Used by doeks for Spark Operator.
Figure 1 illustrates how these modules are interlinked.

Our deployment process remains an area of active development. For example, we expect to move the items from #3 above away from Terraform to a separate deploy-focused pipeline in the future.
Effective management of logs output by cluster components and Spark applications is critical to determining how well they are running, identifying issues, and performing post mortem analysis. Our users expect this as a matter of course when running jobs, thus it was important for us to find an effective alternative to the functionality provided by Hadoop. Broadly categorizing, a logging solution for Moka would have to consider: 1) Amazon EKS control plane logs, 2) Spark Application logs, and 3) System pod logs. The following figure illustrates the various log categories.

Control plane logs are those generated by the components that constitute the Amazon EKS control plane. These include the K8s API server, audit system, scheduler, and authenticator. The latter is unique to AWS and represents the control plane component that EKS uses for K8s Role-Based Access Control (RBAC) authentication using AWS Identity and Access Management (IAM) credentials.
Because the control plane is managed by Amazon EKS, logs for these components are not available directly. Instead, Amazon EKS exports logs for each of the components to Amazon CloudWatch. Each log listed previously can be enabled/disabled independently. Once the logs are ingested into CloudWatch, they can be analyzed in-place using CloudWatch Log Insights. Because a solution for collecting these logs already exists, we instead focus in the remainder of this section on how best to collect system pod and spark application logs.
Spark applications generate a variety of logs depending on the application component:
We also felt it would be crucial to persist logs from non Spark system-critical pods in order to diagnose failures that may occur under heavy load. In particular, this is due to the transient nature of pods in K8s and the logs they produce as well as our initial lack of familiarity with operating Amazon EKS at scale.
Taken together, a logging solution for Moka would need to meet the following requirements:
We explored a number of possible solutions and ultimately settled on Fluent Bit, a Cloud Native Computing Foundation (CNCF) graduated project, and a well-known solution for handling K8s logs. Fluent Bit is able to filter, forward, and augment logs, and it can be extended through plugins. In particular, an Amazon S3 plugin allows a Fluent Bit agent to directly upload files to Amazon S3.
We collaborated with members of the AWS Solution Architects team to deploy Fluent Bit on our EKS clusters as a DaemonSet, making sure each node has a Fluent Bit pod running. We configured Fluent Bit to perform the following tasks:
The following figure illustrates the various log flows performed by Fluent Bit on a single node.

Once Spark application logs are uploaded to S3, Archer is able to retrieve and piece sections of the logs together on demand (recall that Fluent Bit uploads all logs to S3 in chunks). For more details on our logging setup, please refer to our joint blog post series with AWS: Inside Pinterest’s Custom Spark Job logging and monitoring on Amazon EKS: Using AWS for Fluent Bit, Amazon S3, and ADOT Part I.
In order to operate a K8s platform efficiently, storing metrics in a queryable, displayable format is critical to overall platform stability, performance/efficiency, and, ultimately, operating costs. Prometheus formatted metrics are the standard for K8s ecosystem tools. Observability frameworks such as Prometheus (the project, not the format), OTEL, and other CNCF projects continue to see increases in activity year over year.
At Pinterest, the current observability framework, Statsboard, is TSDB-based and ingests metrics via a sidecar metrics-agent that runs on every host. Systems typically use TSDB libraries to write to their local metrics-agent, which passes the metrics on to Kafka clusters, after which they are ingested into Goku, Pinterest’s custom TSDB implementation, and made available in Statsboard dashboards. In contrast, the Prometheus-styled frameworks involve systems exposing their metrics for scraping by agents. Unfortunately, support for TSDB as a metrics destination within the open source Cloud Native Computing Foundation (CNCF)/K8s ecosystem is inactive. To address this gap, the Cloud Runtime team at Pinterest has developed kubemetricsexporter, a K8s sidecar container that can periodically scrape Prometheus endpoints in a pod and write the scraped metrics to the local metrics-agent. Because Amazon EKS pods can be in a different network than the host, the batch processing platform team at Pinterest worked with the Cloud Runtime team to extend kubemetricexporter so that it could be configured to use the host IP address instead of localhost. The following figure shows the deployment pattern.

After exploring a variety of options and configurations, we ultimately decided to use a combination of OTEL for extracting detailed insights from our EKS clusters and kube-state-metrics, an open source K8s tool, for providing a broader overview of the Amazon EKS control plane. In contrast with Prometheus, the OTEL framework only focuses on metrics collection and pre-processing, leaving metrics storage to other solutions. A key portion of the framework is the OpenTelemetry Collector, which is an executable binary that can extract telemetry data, optionally process it, and export it further. The Collector supports several popular open source protocols for receiving and sending telemetry data, as well as offering a pluggable architecture for adding more protocols.
Data receiving, processing, and exporting in OTEL is done using Pipelines. The Collector can be configured to have one or more pipelines. Each pipeline includes:
After extensive experimentation, we ended up with a pipeline consisting of a Prometheus receiver, Attributes processor, and a Prometheus exporter. Our OTEL metrics pipeline looks like the following:

For more information on our observability infrastructure for EKS, please visit the second part of our joint blog with AWS: Inside Pinterest’s Custom Spark Job logging and monitoring on Amazon EKS: Using AWS for Fluent Bit, Amazon S3, and ADOT Part II.
Containerization is a key difference between how Pinterest runs Spark applications on Monarch compared to how they run on Moka. On Monarch, Spark drivers and executors were containerized only from the resource perspective but still shared a common environment including things like Hadoop, Spark, and Java versions. Furthermore, containers running on non-Kerberized Monarch clusters had full access to any other container running on that host. In Moka, we get the full isolation benefits of containerization (cgroups and namespaces) by default with Kubernetes. Given our previous operating model, we did not have a structured system in place for defining, building, deploying, and maintaining container images, nor did we have any support for ARM builds. We wanted applications running on Moka to be architecture agnostic, so not only did we have to build our image generation pipelines from scratch, but we had to ensure that they supported both Intel and ARM from the beginning.
Our images needed to mirror the base environment that each Spark application was accustomed to when running on Monarch, with the main requirements being Java, Hadoop libraries, Spark, and in the case of PySpark, both Python and Python modules.
We built three main pipelines:

Each driver for a running Spark application serves a dedicated UI showing the status of various aspects of the job. Because driver pods can run anywhere on a K8s cluster, setting up a dynamic ingress solution per live application can be tricky. Our ingress solution is built using an ingress-nginx controller, AWS LoadBalancer controller, and an AWS Network Load Balancer (NLB) with IP-based routing for each cluster. The AWS LoadBalancer manages the NLB, which configures the user facing NLB for the ingress controller. The Spark on K8s Operator has a uiService component that provisions a Service resource and Ingress Resource for a Spark application. The Service resource is of type ClusterIP, which points to the UI port (4045) on the driver pod. The Ingress resource has the following mappings:
In the example below, the ingress resource for App-A would be configured with host: Moka-NLB.elb.us-east1.amazonaws.com, path: /App-A, and backend: App-A-ui-svc. Users access the actual link to each running application from the Moka UI. Figure 7 visualizes the resulting workflow.

In Moka, there is one Spark History Server (SHS) cluster (consisting of one or more nodes) per environment. This is a shift from the layout in Monarch, our Hadoop setup, which had one more Spark History Servers per cluster. The rationale behind this change from per cluster to per environment is to simplify the overhead in managing many Moka clusters.
Users access SHS through dedicated moka-<environment>-spark-historyserver ingress endpoints, which routes the traffic to the corresponding cluster and performs load-balancing across the history servers in the cluster. We’ve made modifications to SHS for faster parsing and management of event logs, as they are now uploaded to S3 by our logging infrastructure.
One of the main components that we had to build from scratch was an interface to provide both a comprehensive overview of the platform and detailed information about a Spark application. Coming from Hadoop YARN, many users were accustomed to the native interface that the Hadoop project provided. The YARN interface had existing proxy support for Spark applications, which was seamless to the user. However, one downside to the YARN UI was that it is per-cluster, meaning that users had to have knowledge about the different Hadoop clusters underlying the data platform.
When designing Moka, one of our goals was to abstract away individual clusters from the user and have them interact with it as a monolithic platform. To build the interface, we chose to use Internal Tools Platform (ITP), which is a Typescript React-based internal framework for building internal tooling. The first interface we built is our Application Explorer, which aggregates applications running on different clusters and exposes basic information to the user.

The second UI we built was the Moka Application UI, which gives users information about their Spark application. It surfaces commonly used pieces of information such as identity of the client that submitted the job,the identity of the job itself, job run location, and current job state. The UI also surfaces dynamic links such as those to the driver log or Spark UI. These dynamic links redirect based on the state of the underlying Spark application. For example, while the application is running, the log links will return logs fetched from the Kubernetes control plane, which allows users to debug and track their application in real time. After the application completes or if the user requests logs that have already been rotated from the control plane, Archer will coalesce the chunked logs located in S3 and serve them back to the user.

Pinterest’s transition from Monarch to Moka has marked a significant advancement for infrastructure at Pinterest beyond just batch data processing. Spark on EKS is resource intensive beyond just CPUs — it has bursty AWS API requirements and requires a significant number of IP addresses. Consequently, supporting the Spark on EKS use case has catalyzed infrastructure modernization efforts at Pinterest including:
Finally, Moka has opened the doors for EKS adoption by other use cases at Pinterest Data Engineering, particularly those that require access to the Kubernetes API. These include both TiDB on EKS for online systems use cases and Flink for our Stream processing platform. We’re currently working on adopting Ray and PyTorch on EKS and are particularly excited about the possibility of commingling CPU and GPU focused workloads.
Moka was a massive project that necessitated and continues to require extensive cross functional collaboration between teams at multiple organizations at Pinterest and elsewhere. Here’s an incomplete list of folks and teams that helped us build our first set of production Moka clusters:
Data Processing Infra: Aria Wang, Bhavin Pathak, Hengzhe Guo, Royce Poon, Bogdan Pisica
Big Data Query Platform: Zaheen Aziz, Sophia Hetts, Ashish Singh
Batch Processing Platform: Nan Zhu, Yongjun Zhang, Zirui Li, Frida Pulido, Chen Qin
SRE: Ashim Shrestha, Samuel Bahr, Ihor Chaban, Byron Benitez, Juan Pablo Daniel Borgna
TPM: Ping-Huai Jen, Svetlana Vaz Menezes Pereira, Hannah Chen
Cloud Architecture: James Fogel, Sekou Doumbouya
Traffic: James Fish, Scott Beardsley
Security: Henry Luo, Jeremy Krach, Ali Yousefi, Victor Savu, Vedant Radhakrishnan, Cedric Staub
Continuous Integration Platform: Anh Nguyen
Infra Provisioning: Su Song, Matthew Tejo
Cloud Runtime: David Westbrook, Quentin Miao, Yi Li, Ming Zong
Workflow Platform: Dinghang Yu, Yulei Li, Jin Hyuk Chang
ML platform: Se Won Jang, Anderson Lam
AWS Team: Doug Youd, Alan Tyson, Vara Bonthu, Aaron Miller, Sahas Maheswari, Vipul Viswanath, Raj Gajjar, Nirmal Mehta
Leadership: Chunyan Wang, Dave Burgess, David Chaiken, Madhuri Racherla, Jooseong Kim, Anthony Suarez, Amine Kamel, Rizwan Tanoli, Alvaro Lopez Ortega
Next Gen Data Processing at Massive Scale At Pinterest With Moka (Part 2 of 2) was originally published in Pinterest Engineering Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.
Continue reading on the original blog to support the author
Read full articleThis article demonstrates how to overcome legacy observability challenges by pragmatically integrating AI agents and context engineering, offering a blueprint for unifying fragmented data without costly overhauls.
This article details Pinterest's strategic move from Hadoop to Kubernetes for data processing at scale. It offers valuable insights into the challenges and benefits of modernizing big data infrastructure, providing a blueprint for other organizations facing similar migration decisions.
This article demonstrates how to automate the challenging process of migrating and scaling stateful Hadoop clusters, significantly reducing manual effort and operational risk. It offers a blueprint for managing large-scale distributed data infrastructure efficiently.
Managing resources at scale requires more than just hard limits. Piqama provides a unified framework for capacity and rate-limiting, enabling automated rightsizing and budget alignment. This reduces manual overhead while improving resource efficiency and system reliability across platforms.