This system demonstrates how to transform massive, fragmented telemetry into actionable insights. By standardizing health metrics and isolating analytics from production, engineers can proactively identify risks, reduce support overhead, and ensure platform stability at a petabyte scale.
By Sanjeevani Bhardwaj, Ganesh Prasad, Sukumar Surya, and Thomas Bohn.
In our Engineering Energizers Q&A series, we highlight the engineering minds driving innovation across Salesforce. Today, we spotlight Sanjeevani Bhardwaj, CSG Product Director, who leads the Technical Health Score to make platform trust measurable by scoring Salesforce implementations through analytics pipelines that process petabytes of telemetry and historical context.
Explore how the team engineered a system that converts platform trust into actionable signals by defining technical health consistently across multi-tenant environments and building scalable machine learning pipelines that deliver proactive health insights.
The team builds a transparency layer for the Salesforce platform to turn trust from a subjective sentiment into a measurable engineering signal. Understanding implementation health becomes difficult as you adopt more products and deepen your customizations. Technical Health provides an objective view of that status and offers a clear path toward improvement.
Trust erodes when health indicators stay fragmented across tools or hidden in logs until incidents occur. To solve this, the team designed a continuous feedback loop that aggregates signals across efficiency, security, operational excellence, customization, and observability. This structure allows you to identify risks and optimize your implementation before issues surface as escalations.
The ultimate goal centers on your independence. Maintaining a healthy Salesforce implementation requires continuous effort as your organization evolves, and this score guides that effort over time. By standardizing technical health through a consistent interface, the team helps you balance innovation with stability throughout the lifecycle of your Salesforce footprint.

Mission framework showing how Technical Health builds a transparency layer, transforming trust from subjective sentiment to measurable engineering signal, enabling customer independence through continuous feedback.
Inconsistency creates a major hurdle for Salesforce users. Customers span various industries and architectural patterns, yet everyone needs a shared definition of health. Without a standard framework, technical status remains subjective and impossible to compare across different organizations.
The team introduced a five-pillar taxonomy to serve as a universal interface for technical health:
Every signal maps into one of these pillars. This structure allows the system to evaluate health consistently regardless of which clouds or features you use. This abstraction helps the score scale across an evolving platform while maintaining its core meaning.
Standardization also requires a common health currency. The team normalized diverse metrics into a unified 1–100 scale, which allows you to view health holistically instead of interpreting disconnected indicators. Distribution-based normalization ensures the system evaluates you against peers with similar scale and complexity. This approach creates a definition of technical health that stays both precise and fair.
Extracting meaningful health signals from a massive telemetry surface presents a significant data challenge. These signals originate from UI interactions, API traffic, and security configurations spread across various databases and logs. Many of these sources only retain raw data for short periods.

Engineering architecture addressing petabytes of telemetry through strategic signal curation and off-core analytics platform, ensuring system remains invisible to customer workloads.
The team designed the system around strategic curation instead of ingesting every data point. They identified signals that predict unhealthy behavior by focusing on common pain points like limits, errors, and security vulnerabilities. This method improves the signal-to-noise ratio and keeps the system manageable at scale.
The architecture runs all analytics on an off-core data platform. This isolation from live transactional systems prevents any impact on your daily operations. Aggregation occurs near the source to reduce data volume before ingestion. This approach allows the platform to process massive amounts of telemetry with historical context while remaining invisible to your workloads.
Maintaining trust requires a clear distinction between platform behavior and user configuration. Performance issues often stem from both sources, but conflating them undermines the credibility of any health metric.
The team engineered a signal-qualification framework based on shared responsibility. Every signal must pass an actionability gate. If you cannot fix the issue through code or configuration changes, the system excludes that signal from your score. This ensures your Technical Health Score reflects your specific implementation choices rather than platform incidents.

Unified framework showing signal qualification mechanism and explainable ML pipeline — ensuring scores reflects only customer-actionable issues with complete audit trail from score to root cause.
Transparency drives the modeling process. While complex neural networks offer theoretical accuracy, they often fail to explain why a score changed. The team built a multi-stage machine learning pipeline to prioritize explainability:
This design provides a complete audit trail. You can drill down from a top-level score to individual root causes without any ambiguity.
Validating impact requires operationalizing the score within existing workflows. The team embedded Technical Health into customer success processes to trigger proactive engagement. This shift moves the focus from reactive support to preventive action.
Back-testing confirms the value of this metric. Data shows that users with low scores experience more high-severity incidents and higher costs. Users who improve their score from Fair to Excellent see case volumes drop by nearly 20 times. Support costs for these users also decrease by approximately 35 times.
This system provides significant benefits for both internal teams and users:
These outcomes prove that Technical Health serves as a lever for reliability. It provides a clear path toward sustained success on the platform.
The post Engineering Platform Trust: Cutting Customer Case Volume 20x with Petabyte-Scale Health Signals appeared first on Salesforce Engineering Blog.
Continue reading on the original blog to support the author
Read full articleIt demonstrates how to build a scalable, trust-first AI agent architecture. By integrating deterministic graphs with unstructured data and open standards like MCP, it provides a blueprint for enterprise-grade AI orchestration and governance beyond simple chat interfaces.
Automating large-scale infrastructure migrations is critical for reducing operational risk. MIPS demonstrates how to build a deterministic decision engine that maintains auditability and customer trust while scaling to handle tens of thousands of complex organization moves.
This shift to native speech automation eliminates third-party security risks and simplifies complex AI integration. It demonstrates how to build resource-intensive AI features within a multi-tenant environment while maintaining strict data residency and platform stability.
For global-scale perimeter services, traditional sequential rollbacks are too slow. This architecture demonstrates how to achieve 10-minute global recovery through warm-standby blue-green deployments and synchronized autoscaling, ensuring high availability for trillions of requests.