How AI-Driven Kubernetes Optimization Reclaimed Millions from 47% Idle Capacity

Salesforce EngineeringApril 30, 2026

Why it matters

Manual cloud cost optimization fails at scale due to configuration drift and lack of trust. This hybrid AI/deterministic approach automates the last mile of FinOps, turning complex resource tuning into safe, reviewable code changes that significantly reduce infrastructure waste.

Key takeaways

Salesforce addressed 47% idle Kubernetes capacity by automating resource allocation across 8,000+ services within their Hyperforce platform.
The system employs a hybrid architecture where LLMs handle repository discovery and configuration parsing, while deterministic algorithms perform the actual optimization.
An Integer Linear Programming (ILP) solver replaces probabilistic LLM reasoning for resource planning to ensure consistent and verifiable results.
The agent automates the 'last mile' of optimization by generating pull requests for Helm charts, moving from manual dashboards to a closed-loop developer workflow.
To maintain safety and trust, the agent only modifies CPU requests while leaving limits untouched, ensuring scaling headroom remains intact during rollouts.

Keywords

KubernetesInteger Linear Programming

By Shan Appajodu and Tuhin Kanti Sharma.

Salesforce faced significant infrastructure challenges while scaling Hyperforce, the platform powering thousands of cloud services. Fragmentation across more than 8,000 Kubernetes-based services led to widespread over-provisioning, with nearly half of managed Kubernetes spend tied to idle capacity. While internal systems identified these inefficiencies, our service owners lacked the visibility and confidence to address them in an environment shaped by years of configuration drift.

The Capacity Optimization Agent solves this by embedding optimization directly into the service owner’s development workflow. The agent analyzes service configurations across repositories, computes optimal resource allocations, and generates pull requests to apply those changes safely. This shifts the process from manual investigation and coordination to a closed-loop agent that does continuous evaluation.

Building this system required us to solve challenges across infrastructure and development workflows. We addressed configuration sprawl and the reasoning limitations of standard LLM agents by developing deterministic optimization that is consistent and safe. This design ensured service owners could trust and safely adopt automated changes at production scale, turning complex optimization into a pull request.

From Config Sprawl to Actionable Optimization

The primary challenge was enabling service owners to act on identified inefficiencies rather than simply finding them. Hyperforce infrastructure grew over years, resulting in configuration sprawl across Helm charts and other configuration files. This complexity meant that even when over-provisioning appeared obvious, service owners struggled to identify where to apply optimizations or how those modifications might propagate.

Traditional methods relied on dashboards that required manual interpretation and implementation. This created uncertainty regarding safety and ownership, which slowed progress. We solved this by shifting focus from delivering insights to executing changes.

The agent now generates pull requests that modify the specific configuration files responsible for capacity allocation. These automated updates remain environment-aware and align with established deployment patterns. By utilizing the pull request model, service owners review precise changes instead of investigating system behavior. This transformation turns capacity optimization into a repeatable, agent-driven workflow that scales across thousands of services.

When LLM-Based Systems Couldn’t Scale Across Infrastructure

In our initial iterations, we built a simple agent for automated optimization, but these early versions produced non-deterministic results on our infrastructure. The large language models could not reason against fragmented service configurations sprawled across multiple repositories and Helm charts.

Non-deterministic outputs further reduced trust, as identical configurations often produced different results. Additionally, these models struggled with global optimization because capacity decisions require simultaneous evaluation of multiple constraints, whereas model reasoning remained localized.

We resolved these issues by narrowing the role of the LLM. It now handles repository discovery, configuration parsing, and context extraction. Meanwhile, a deterministic algorithm manages all optimization decisions.

This architecture creates a clear separation between probabilistic reasoning and deterministic computation. The model navigates the infrastructure while downstream systems perform optimization. By limiting the model to specific responsibilities, the system improves reliability and expands coverage across our complex services.

From Probabilistic AI to Deterministic Optimization

Providing accurate and globally optimal decisions became our priority after redefining the role of the model. Since large language models are probabilistic, they produce inconsistent capacity optimization plans for the same configurations.

We replaced the model-driven planning layer with a deterministic optimization engine to ensure correctness. The new multi-stage pipeline uses AI for repository discovery and configuration parsing, while a dedicated planning layer calculates the optimal configuration. An Integer Linear Programming solver sits at the core to minimize resource waste across all containers.

A preprocessing layer now maps configuration relationships and identifies control points. Structured schemas enforce strict contracts between stages, replacing free-form text. Isolated execution logic ensures the system applies plans exactly as intended. These changes created a deterministic engine that produces verifiable results across thousands of services.

Agents can be made deterministic if paired with code execution

Scaling Adoption Through Trust and Safety

Adoption creates a challenge centered on trust when AI agents generate pull requests across thousands of repositories. Service owners require a clear understanding of changes before they approve them for production.

We redesigned the presentation and validation of optimizations to build confidence. Each pull request now includes clear explanations, projected impacts, and supporting context. Visualization tools demonstrate how configuration updates affect system behavior.

Multiple layers enforce safety throughout the process. The agent modifies only Kubernetes CPU requests and leaves CPU limits untouched to maintain scaling headroom. All changes move through existing deployment pipelines, which include validation and staged rollouts. This transparency and control allow service owners to apply changes safely across production environments.

Expanding Optimization Across Hyperforce

The next phase focuses on expanding coverage and capability across other configurations. Support for multi-repository environments will allow agents to reach services outside standardized paths, even as distributed configurations add complexity to discovery and coordination.

Optimization efforts now include memory utilization and Horizontal Pod Autoscaler (HPA) configurations alongside CPU. These additions require enhancements to the deterministic planning layer to manage the increased complexity.

Scaling adoption remains a priority for the team. Improved visualization, feedback loops, and post-deployment validation help trust grow alongside automation. These expansions will move the system closer to delivering continuous efficiency gains across the entire Hyperforce infrastructure.

From Optimization to Autonomous Infrastructure

The agent fundamentally changes infrastructure management by replacing manual, periodic analysis with continuous, automated decision-making. This approach embeds intelligence directly into the development lifecycle to ensure efficiency happens by design rather than by accident.

AI drives the discovery of optimization opportunities, while deterministic planning provides the necessary guardrails for accuracy, safety, and consistency. This synergy moves the organization away from reactive fixes toward a model where systems optimize themselves in real time.

As these capabilities evolve, infrastructure optimization becomes a permanent, always-on feature of the environment. The focus now shifts toward scaling these processes safely and transparently to maintain the trust of the engineers who manage these complex systems every day.

Learn more

Stay connected — join our Talent Community!
Check out our Technology and Product teams to learn how you can get involved.

The post How AI-Driven Kubernetes Optimization Reclaimed Millions from 47% Idle Capacity appeared first on Salesforce Engineering Blog.

How AI-Driven Kubernetes Optimization Reclaimed Millions from 47% Idle Capacity

Why it matters

Key takeaways

Keywords

Content preview

From Config Sprawl to Actionable Optimization

When LLM-Based Systems Couldn’t Scale Across Infrastructure

From Probabilistic AI to Deterministic Optimization

Scaling Adoption Through Trust and Safety

Expanding Optimization Across Hyperforce

From Optimization to Autonomous Infrastructure

Learn more

Related posts

How a Mock LLM Service Cut $500K in AI Benchmarking Costs, Boosted Developer Productivity

Reducing Agentforce AI Debugging from Two Weeks to Same-Day with Query-Driven Observability

Building a Distributed Persistent Queue That Scaled AI Workloads 5x Under LLM Rate Limits

Building AWS Bedrock Model Availability: Slashing AI Routing Discovery From Days to Minutes