This article demonstrates how to scale agentic AI in complex enterprise environments by balancing LLM reasoning with deterministic logic. It provides a blueprint for reducing latency and ensuring architectural consistency across multi-brand deployments while maintaining high accuracy.
By Kunal Pal and Krista Hardebeck.
In our Engineering Energizers Q&A series, we showcase the engineering minds driving innovation across Salesforce. Today, we introduce Krista Hardebeck, Regional Vice President of Forward Deployed Engineering (FDE), whose team collaborated with a large multi-brand specialty retailer to launch their initial production Agentforce service experience on an accelerated timeline. This effort involved optimizing an evolving architecture, aligning multiple clouds, and establishing the groundwork for brand-specific conversational AI within a complex enterprise environment.
Discover how the team restructured deterministic and LLM-driven responsibilities to resolve early inconsistencies, reduced multi-stage reasoning latency by approximately 20 seconds through consolidated model and Data 360 execution, and designed a multi-brand architecture capable of supporting differentiated tone and voice without compromising accuracy or maintainability.
Our mission started with validating the first production agent. This agent delivered reliable, business-aligned value for the customer’s initial brand. For this specific retailer, the team ensured the service agent handled high-volume order interactions with consistent accuracy and a clear link to operational benefit. Establishing this trust proved essential for expanding Agentforce across additional brands, each having its own identity and requirements.
As the engagement progressed, the mission broadened to support more advanced conversational experiences under fixed timelines. The retailer aimed to move beyond basic service workflows. They sought to explore new forms of interactive AI that could showcase emerging capabilities at enterprise scale. Meeting these expectations required stabilizing early patterns, aligning data sources, and ensuring the architecture supported future scenarios without rework.
Across all phases, our mission remained consistent. We established a durable technical foundation. We ensured the earliest agents demonstrated clear and repeatable value. We also enabled the organization to scale Agentforce with confidence across a diverse multi-brand portfolio.
Early diagnostic work revealed that the initial proof-of-concept design relied heavily on the large language model (LLM) for tasks demanding deterministic precision. The model handled JSON parsing, hierarchical decisioning, and conditional evaluation. These areas, however, introduced small inconsistencies, which created downstream variability. Hardcoded instructions accumulated inside the prompt over time. This resulted in overlapping directives that behaved differently based on subtle user phrasing.
The team also uncovered ambiguity in how upstream data flowed through the system. Failing to resolve this early would have introduced unnecessary rework when scaling across multiple brands. It also would have created architectural dependencies limiting flexibility.
By rebuilding deterministic components in Apex and restructuring the prompt to remove overloaded instructions, the team created a clear separation between conversational reasoning and rule-based processing. These changes eliminated early inconsistencies, made responses more predictable, and laid the groundwork for a scalable multi-brand deployment.

Inside the refactor: engineering a successful Agentforce deployment and launch.
Deeply nested order structures presented the most challenging friction for the LLM. Retail orders include multiple items with independent delivery timelines, accessory attributes, and layered flags that influence operations. The LLM performed well in many scenarios. However, slight variations in structure or vocabulary caused it to choose inconsistent reasoning paths in edge cases needing deterministic precision.
During quality assurance, the customer observed that small changes in user language sometimes triggered different branches of the overloaded prompt instructions. When multiple flags existed at various hierarchy levels, the model occasionally misinterpreted their precedence or applied logic out of sequence. These inconsistencies made repeatable outcomes difficult across every scenario.
To correct this, the team moved all hierarchical interpretation and branching logic out of the model and into deterministic code. The LLM focused on conversational tasks. Structured rule evaluation occurred in Apex. This shift removed ambiguity, closed reliability gaps, and ensured complete consistency across the retailer’s full set of order flows.
Latency constraints came from two major areas:
The second major contributor was the multi-call LLM reasoning loop. Early versions made several sequential model calls to refine relevance scoring and determine next-step logic, which increased latency. Additional latency surfaced when data was retrieved in multiple incremental passes rather than through a single optimized pull. This created unpredictable delays across different interaction paths.
To address these constraints, our FDE team implemented several key optimizations:
Together, these changes reduced end-to-end latency by 75%. This delivered the responsiveness needed for a production-grade conversational experience. It also established a scalable performance baseline for future multi-brand expansion.
The team weighed two options: a single agent for all brands or a dedicated agent for each. A unified model offered simplicity but struggled with brand voice, release schedules, and distinct domain needs. Each brand’s unique identity made a single abstraction compromise quality.
A multi-agent approach proved superior. Each brand gained a tailored agent for its voice, workflows, and user experience. This reduced coordination overhead and allowed independent brand evolution. It also ensured sensitive user interactions, like language and style, could be tuned individually.
The team built on a common architectural foundation, which allowed us to accelerate subsequent brand delivery by 5x. However, they avoided forcing a one-size-fits-all solution. Choosing one agent per brand preserved experience fidelity, simplified long-term maintenance, and created a clear path for future growth.
The post How Agentforce Achieved 3–5x Faster Response Times While Solving Enterprise-Scale Architectural Complexity appeared first on Salesforce Engineering Blog.
Continue reading on the original blog to support the author
Read full articleThis article demonstrates how to build scalable, autonomous AI agent systems that overcome infrastructure constraints like rate limits. It provides a blueprint for moving from LLM prototypes to production-grade systems that drive significant business value through automated workflows.
Scaling AI agents for enterprise datasets requires balancing throughput with strict governance. This architecture shows how to overcome rate limits and latency issues while maintaining the explainability and security essential for autonomous CRM systems.
This architecture demonstrates how to solve data fragmentation and identity resolution at scale. By combining a centralized aggregation layer with Agentforce, engineers can automate complex manual workflows and provide real-time, accurate insights within existing business contexts.
This demonstrates how to solve data fragmentation across distributed systems. By integrating AI agents with a centralized aggregation layer, engineers can automate high-latency manual workflows while staying within strict API and performance limits.