Engineering Platform Trust: Cutting Customer Case Volume 20x with Petabyte-Scale Health Signals

Salesforce EngineeringMarch 9, 2026

Why it matters

This system demonstrates how to transform massive, fragmented telemetry into actionable insights. By standardizing health metrics and isolating analytics from production, engineers can proactively identify risks, reduce support overhead, and ensure platform stability at a petabyte scale.

Key takeaways

Salesforce's Technical Health Score (THS) quantifies implementation health across five pillars: Security, Efficiency, Operational Excellence, Customization, and Observability.
The architecture processes petabytes of telemetry via an off-core analytics platform, ensuring zero impact on live transactional workloads.
Diverse metrics are normalized into a 1–100 scale using distribution-based methods to compare organizations against peers of similar complexity.
A signal-qualification framework filters for actionability, ensuring the score reflects customer-controlled configurations rather than platform-level issues.
This proactive approach has successfully reduced support case volume by 20x for customers who maintain high technical health scores.

Keywords

TelemetryAnalytics Pipelines

By Sanjeevani Bhardwaj, Ganesh Prasad, Sukumar Surya, and Thomas Bohn.

In our Engineering Energizers Q&A series, we highlight the engineering minds driving innovation across Salesforce. Today, we spotlight Sanjeevani Bhardwaj, CSG Product Director, who leads the Technical Health Score to make platform trust measurable by scoring Salesforce implementations through analytics pipelines that process petabytes of telemetry and historical context.

Explore how the team engineered a system that converts platform trust into actionable signals by defining technical health consistently across multi-tenant environments and building scalable machine learning pipelines that deliver proactive health insights.

What is your team’s mission in building the Technical Health Score within Customer Success Core?

The team builds a transparency layer for the Salesforce platform to turn trust from a subjective sentiment into a measurable engineering signal. Understanding implementation health becomes difficult as you adopt more products and deepen your customizations. Technical Health provides an objective view of that status and offers a clear path toward improvement.

Trust erodes when health indicators stay fragmented across tools or hidden in logs until incidents occur. To solve this, the team designed a continuous feedback loop that aggregates signals across efficiency, security, operational excellence, customization, and observability. This structure allows you to identify risks and optimize your implementation before issues surface as escalations.

The ultimate goal centers on your independence. Maintaining a healthy Salesforce implementation requires continuous effort as your organization evolves, and this score guides that effort over time. By standardizing technical health through a consistent interface, the team helps you balance innovation with stability throughout the lifecycle of your Salesforce footprint.

Mission framework showing how Technical Health builds a transparency layer, transforming trust from subjective sentiment to measurable engineering signal, enabling customer independence through continuous feedback.

What definition and standardization constraints shaped how the team defined “technical health” for Salesforce customers?

Inconsistency creates a major hurdle for Salesforce users. Customers span various industries and architectural patterns, yet everyone needs a shared definition of health. Without a standard framework, technical status remains subjective and impossible to compare across different organizations.

The team introduced a five-pillar taxonomy to serve as a universal interface for technical health:

Security
Efficiency
Operational Excellence
Customization
Observability

Every signal maps into one of these pillars. This structure allows the system to evaluate health consistently regardless of which clouds or features you use. This abstraction helps the score scale across an evolving platform while maintaining its core meaning.

Standardization also requires a common health currency. The team normalized diverse metrics into a unified 1–100 scale, which allows you to view health holistically instead of interpreting disconnected indicators. Distribution-based normalization ensures the system evaluates you against peers with similar scale and complexity. This approach creates a definition of technical health that stays both precise and fair.

What data-scale constraints shaped how the team curated technical health signals from petabytes of Salesforce telemetry?

Extracting meaningful health signals from a massive telemetry surface presents a significant data challenge. These signals originate from UI interactions, API traffic, and security configurations spread across various databases and logs. Many of these sources only retain raw data for short periods.

Engineering architecture addressing petabytes of telemetry through strategic signal curation and off-core analytics platform, ensuring system remains invisible to customer workloads.

The team designed the system around strategic curation instead of ingesting every data point. They identified signals that predict unhealthy behavior by focusing on common pain points like limits, errors, and security vulnerabilities. This method improves the signal-to-noise ratio and keeps the system manageable at scale.

The architecture runs all analytics on an off-core data platform. This isolation from live transactional systems prevents any impact on your daily operations. Aggregation occurs near the source to reduce data volume before ingestion. This approach allows the platform to process massive amounts of telemetry with historical context while remaining invisible to your workloads.

What correctness and explainability constraints shaped how the Technical Health Score distinguishes customer misconfiguration from platform issues?

Maintaining trust requires a clear distinction between platform behavior and user configuration. Performance issues often stem from both sources, but conflating them undermines the credibility of any health metric.

The team engineered a signal-qualification framework based on shared responsibility. Every signal must pass an actionability gate. If you cannot fix the issue through code or configuration changes, the system excludes that signal from your score. This ensures your Technical Health Score reflects your specific implementation choices rather than platform incidents.

Unified framework showing signal qualification mechanism and explainable ML pipeline — ensuring scores reflects only customer-actionable issues with complete audit trail from score to root cause.

Transparency drives the modeling process. While complex neural networks offer theoretical accuracy, they often fail to explain why a score changed. The team built a multi-stage machine learning pipeline to prioritize explainability:

Signals normalize onto a common 0–100 scale using statistical distributions.
Partial Least Squares regression weights these signals against historical outcomes.
Simple weighted averages aggregate the final data.

This design provides a complete audit trail. You can drill down from a top-level score to individual root causes without any ambiguity.

What outcome-validation constraints shaped how the team proved the Technical Health Score drives measurable results?

Validating impact requires operationalizing the score within existing workflows. The team embedded Technical Health into customer success processes to trigger proactive engagement. This shift moves the focus from reactive support to preventive action.

Back-testing confirms the value of this metric. Data shows that users with low scores experience more high-severity incidents and higher costs. Users who improve their score from Fair to Excellent see case volumes drop by nearly 20 times. Support costs for these users also decrease by approximately 35 times.

This system provides significant benefits for both internal teams and users:

Internal teams reduce data gathering cycles from weeks to hours.
Users access 12 months of curated health history.
Proactive refactoring before peak seasons flattens support demand.

These outcomes prove that Technical Health serves as a lever for reliability. It provides a clear path toward sustained success on the platform.

Learn more

Stay connected — join our Talent Community!
Check out our Technology and Product teams to learn how you can get involved.

The post Engineering Platform Trust: Cutting Customer Case Volume 20x with Petabyte-Scale Health Signals appeared first on Salesforce Engineering Blog.

Engineering Platform Trust: Cutting Customer Case Volume 20x with Petabyte-Scale Health Signals

Why it matters

Key takeaways

Keywords

Content preview

What is your team’s mission in building the Technical Health Score within Customer Success Core?

What definition and standardization constraints shaped how the team defined “technical health” for Salesforce customers?

What data-scale constraints shaped how the team curated technical health signals from petabytes of Salesforce telemetry?

What correctness and explainability constraints shaped how the Technical Health Score distinguishes customer misconfiguration from platform issues?

What outcome-validation constraints shaped how the team proved the Technical Health Score drives measurable results?

Learn more

Related posts

How We Increased Code Coverage by 28% Without Writing a Single Test

How Engineering 360 Unified Operations at Scale and Reached 80% Adoption

Building an Enterprise Agent Platform: Enforcing Identity, Data, and API Governance

Agent Fabric Context Catalog and the Future of AI Governance