This article demonstrates how to scale agentic AI in complex enterprise environments by balancing LLM reasoning with deterministic logic. It provides a blueprint for reducing latency and ensuring architectural consistency across multi-brand deployments while maintaining high accuracy.
By Kunal Pal and Krista Hardebeck.
In our Engineering Energizers Q&A series, we showcase the engineering minds driving innovation across Salesforce. Today, we introduce Krista Hardebeck, Regional Vice President of Forward Deployed Engineering (FDE), whose team collaborated with a large multi-brand specialty retailer to launch their initial production Agentforce service experience on an accelerated timeline. This effort involved optimizing an evolving architecture, aligning multiple clouds, and establishing the groundwork for brand-specific conversational AI within a complex enterprise environment.
Discover how the team restructured deterministic and LLM-driven responsibilities to resolve early inconsistencies, reduced multi-stage reasoning latency by approximately 20 seconds through consolidated model and Data 360 execution, and designed a multi-brand architecture capable of supporting differentiated tone and voice without compromising accuracy or maintainability.
Our mission started with validating the first production agent. This agent delivered reliable, business-aligned value for the customer’s initial brand. For this specific retailer, the team ensured the service agent handled high-volume order interactions with consistent accuracy and a clear link to operational benefit. Establishing this trust proved essential for expanding Agentforce across additional brands, each having its own identity and requirements.
As the engagement progressed, the mission broadened to support more advanced conversational experiences under fixed timelines. The retailer aimed to move beyond basic service workflows. They sought to explore new forms of interactive AI that could showcase emerging capabilities at enterprise scale. Meeting these expectations required stabilizing early patterns, aligning data sources, and ensuring the architecture supported future scenarios without rework.
Across all phases, our mission remained consistent. We established a durable technical foundation. We ensured the earliest agents demonstrated clear and repeatable value. We also enabled the organization to scale Agentforce with confidence across a diverse multi-brand portfolio.
Early diagnostic work revealed that the initial proof-of-concept design relied heavily on the large language model (LLM) for tasks demanding deterministic precision. The model handled JSON parsing, hierarchical decisioning, and conditional evaluation. These areas, however, introduced small inconsistencies, which created downstream variability. Hardcoded instructions accumulated inside the prompt over time. This resulted in overlapping directives that behaved differently based on subtle user phrasing.
The team also uncovered ambiguity in how upstream data flowed through the system. Failing to resolve this early would have introduced unnecessary rework when scaling across multiple brands. It also would have created architectural dependencies limiting flexibility.
By rebuilding deterministic components in Apex and restructuring the prompt to remove overloaded instructions, the team created a clear separation between conversational reasoning and rule-based processing. These changes eliminated early inconsistencies, made responses more predictable, and laid the groundwork for a scalable multi-brand deployment.

Inside the refactor: engineering a successful Agentforce deployment and launch.
Deeply nested order structures presented the most challenging friction for the LLM. Retail orders include multiple items with independent delivery timelines, accessory attributes, and layered flags that influence operations. The LLM performed well in many scenarios. However, slight variations in structure or vocabulary caused it to choose inconsistent reasoning paths in edge cases needing deterministic precision.
During quality assurance, the customer observed that small changes in user language sometimes triggered different branches of the overloaded prompt instructions. When multiple flags existed at various hierarchy levels, the model occasionally misinterpreted their precedence or applied logic out of sequence. These inconsistencies made repeatable outcomes difficult across every scenario.
To correct this, the team moved all hierarchical interpretation and branching logic out of the model and into deterministic code. The LLM focused on conversational tasks. Structured rule evaluation occurred in Apex. This shift removed ambiguity, closed reliability gaps, and ensured complete consistency across the retailer’s full set of order flows.
Latency constraints came from two major areas:
The second major contributor was the multi-call LLM reasoning loop. Early versions made several sequential model calls to refine relevance scoring and determine next-step logic, which increased latency. Additional latency surfaced when data was retrieved in multiple incremental passes rather than through a single optimized pull. This created unpredictable delays across different interaction paths.
To address these constraints, our FDE team implemented several key optimizations:
Together, these changes reduced end-to-end latency by 75%. This delivered the responsiveness needed for a production-grade conversational experience. It also established a scalable performance baseline for future multi-brand expansion.
The team weighed two options: a single agent for all brands or a dedicated agent for each. A unified model offered simplicity but struggled with brand voice, release schedules, and distinct domain needs. Each brand’s unique identity made a single abstraction compromise quality.
A multi-agent approach proved superior. Each brand gained a tailored agent for its voice, workflows, and user experience. This reduced coordination overhead and allowed independent brand evolution. It also ensured sensitive user interactions, like language and style, could be tuned individually.
The team built on a common architectural foundation, which allowed us to accelerate subsequent brand delivery by 5x. However, they avoided forcing a one-size-fits-all solution. Choosing one agent per brand preserved experience fidelity, simplified long-term maintenance, and created a clear path for future growth.
The post How Agentforce Achieved 3–5x Faster Response Times While Solving Enterprise-Scale Architectural Complexity appeared first on Salesforce Engineering Blog.
Continue reading on the original blog to support the author
Read full articleThis architecture solves the statelessness problem in AI agents, enabling long-term context and reliability at scale. It provides a blueprint for building governable, auditable AI systems that maintain user trust while reducing prompt noise and latency through structured memory layers.
This shift moves beyond AI wrappers to fundamental architectural changes. It enables software to handle edge cases and cross-domain coordination autonomously, reducing the need for human intervention while maintaining reliability through governed action contracts.
This article details the architectural shift from fragmented point solutions to a unified AI stack. It provides a blueprint for solving data consistency and metadata scaling challenges, essential for engineers building reliable, real-time agentic systems at enterprise scale.
Engineers must evolve recommendation engines from passive click-based tracking to active intent extraction. This shift enables autonomous agents to provide contextually relevant responses in real-time, solving the cold-start problem and handling unstructured data at enterprise scale.