Curated topic

finops

Posts tagged with finops

Salesforce EngineeringJun 9, 2026

How to Build Reliable AI Agents: 5 Engineering Patterns from a Production System

Why it matters: Transitioning AI agents from demos to production requires a shift from prompt engineering to system engineering. This article highlights how to handle non-deterministic tasks in critical infrastructure, ensuring agents can safely automate complex cloud optimization worth millions.

AI agents often fail in production because they struggle with non-deterministic environments and scattered sources of truth in infrastructure.
Relying on increasingly complex prompts creates unmaintainable 'software written in English' without improving underlying reliability.
Multi-agent architectures do not automatically solve consistency issues if the model is tasked with problems that require deterministic logic.
Reliability is achieved by engineering the systems around the model rather than focusing solely on model capability or prompt refinement.
The lack of a single source of truth in complex deployment stacks (Terraform, Helm, etc.) prevents agents from reasoning effectively without external guardrails.

#mlp #finops #sre

Read original

Cloudflare BlogJun 5, 2026

Your AI bill is out of control. Cloudflare can fix it now.

Why it matters: Uncontrolled AI spend is a major challenge for organizations. These tools provide the observability and governance needed to scale AI usage sustainably by offering granular cost attribution and automated guardrails to prevent unexpected bill shock.

Cloudflare AI Gateway now features dollar-based spend limits to prevent budget overages across multiple AI providers.
A new closed beta introduces identity-driven budgets, integrating with Cloudflare Access to attribute costs to specific users or teams.
Dynamic routing allows for automatic fallback to cheaper models once a primary budget threshold is reached, ensuring service continuity.
The gateway provides unified logging and real-time analytics for token counts and costs across OpenAI, Anthropic, Google, and others.
Administrators can define granular policies based on custom attributes, model types, or identity provider (IdP) groups.

#finops #mlp #security

Read original

PlanetScale Tech BlogMay 14, 2026

Egress problems and where to find them

Why it matters: Optimizing database egress is a rare double win that simultaneously improves application latency and reduces cloud infrastructure costs. By refining query patterns and networking, engineers can prevent scaling bottlenecks and unexpected billing spikes.

Minimize egress by selecting specific columns instead of using SELECT * to reduce payload size.
Implement cursor pagination for consistent performance and bounded data transfer as datasets grow.
Use database-native JSON functions to filter large objects before they leave the database.
Explicitly define columns in ORM RETURNING clauses to avoid fetching unnecessary data during writes.
Leverage caching and CDNs to prevent redundant database requests for frequently accessed content.
Utilize private networking like AWS PrivateLink to keep traffic off the public internet and lower costs.

#data #finops #sre

Read original

GitHub EngineeringMay 12, 2026

GitHub Copilot individual plans: Introducing flex allotments in Pro and Pro+, and a new Max plan

Why it matters: This update shifts Copilot to a usage-based model while providing extra value through flex allotments. It allows developers to scale AI usage for complex agentic workflows and multi-step tasks without immediate overage charges, providing more transparency into AI consumption costs.

GitHub is transitioning Copilot individual plans to usage-based billing starting June 1, 2026.
Paid plans now include 'Base credits' matching the subscription price plus a variable 'Flex allotment' for extra usage.
A new 'Max' plan is introduced at $100/month, offering $200 in total included usage for high-volume agentic work.
Code completions and next edit suggestions remain unlimited and do not consume credits on paid tiers.
Flex allotments are designed to adapt over time based on AI model pricing and efficiency improvements.

#mlp #finops

Read original

GitHub EngineeringMay 7, 2026

Improving token efficiency in GitHub Agentic Workflows

Why it matters: Optimizing agentic workflows is critical for managing CI/CD costs. By moving data retrieval out of the LLM reasoning loop and pruning unused tool schemas, engineers can significantly reduce token consumption and latency without sacrificing agent performance.

Implemented an API proxy to capture normalized token usage data across different agent frameworks including Claude and Copilot CLI.
Deployed automated Auditor and Optimizer workflows to identify usage anomalies and propose specific code-level optimizations.
Reduced context overhead by pruning unused Model Context Protocol (MCP) tool registrations, saving 8-12 KB of schema per call.
Shifted data-fetching operations from LLM tool calls to deterministic GitHub CLI commands to minimize reasoning steps and round-trips.
Developed an Effective Tokens (ET) metric to normalize costs across different models and account for prompt caching benefits.

#finops #mlp #sre

Read original

Pinterest EngineeringMay 1, 2026

Optimizing ML Workload Network Efficiency (Part I): Feature Trimmer

Why it matters: This approach addresses the common bottleneck where network I/O limits ML serving efficiency. By implementing feature trimming based on model signatures, engineers can maximize GPU utilization and significantly reduce infrastructure costs by moving away from network-optimized instances.

Pinterest's root-leaf architecture separates CPU-heavy feature fetching from GPU-heavy model inference, but created a network bandwidth bottleneck.
Network usage, rather than compute, dictated scaling needs, preventing full GPU utilization and requiring expensive network-optimized AWS instances.
Implementing lz4 compression provided a 20% bandwidth reduction but increased CPU usage and latency.
The Feature Trimmer system implements a Send What You Use strategy, filtering features at the root level before transmission to leaf nodes.
By extracting required features from model signatures, the system ensures only necessary data is sent for each specific model version.
This optimization reduces network pressure, allowing for fleet downscaling and a transition to cheaper, standard compute instances.

#mlp #dist #finops

Read original

Salesforce EngineeringApr 30, 2026

How AI-Driven Kubernetes Optimization Reclaimed Millions from 47% Idle Capacity

Why it matters: Manual cloud cost optimization fails at scale due to configuration drift and lack of trust. This hybrid AI/deterministic approach automates the last mile of FinOps, turning complex resource tuning into safe, reviewable code changes that significantly reduce infrastructure waste.

Salesforce addressed 47% idle Kubernetes capacity by automating resource allocation across 8,000+ services within their Hyperforce platform.
The system employs a hybrid architecture where LLMs handle repository discovery and configuration parsing, while deterministic algorithms perform the actual optimization.
An Integer Linear Programming (ILP) solver replaces probabilistic LLM reasoning for resource planning to ensure consistent and verifiable results.
The agent automates the 'last mile' of optimization by generating pull requests for Helm charts, moving from manual dashboards to a closed-loop developer workflow.
To maintain safety and trust, the agent only modifies CPU requests while leaving limits untouched, ensuring scaling headroom remains intact during rollouts.

#finops #sre #mlp

Read original

Cloudflare BlogApr 30, 2026

Agents can now create Cloudflare accounts, buy domains, and deploy

Why it matters: This integration removes manual friction from infrastructure setup, allowing AI agents to handle end-to-end deployment. By standardizing service discovery, identity, and payments, it enables fully autonomous DevOps workflows while maintaining human-in-the-loop oversight.

Cloudflare and Stripe have co-designed a new protocol allowing AI agents to autonomously provision accounts, register domains, and deploy software.
The system utilizes Stripe as an identity provider to automatically create or link Cloudflare accounts without manual dashboard intervention.
Agents can discover infrastructure services via a REST API catalog, enabling them to select and configure necessary resources for a project.
Integrated payment tokenization allows agents to handle paid subscriptions and domain purchases on behalf of the user.
The workflow supports Cloudflare’s Code Mode MCP server and Agent Skills to enhance the agent's ability to interact with Cloudflare APIs.
Human-in-the-loop oversight is maintained for granting permissions and accepting terms of service, while eliminating manual API token management.

#sre #finops #mlp

Read original

GitHub EngineeringApr 27, 2026

GitHub Copilot is moving to usage-based billing

Why it matters: This change reflects the increasing cost of running agentic AI models. For engineers, it introduces a metered cost structure, requiring better management of AI consumption while enabling access to high-compute agentic features without the previous hard gates on usage.

GitHub Copilot will transition all plans to usage-based billing starting June 1, 2026, replacing the current premium request model.
Users will receive a monthly allotment of GitHub AI Credits based on their subscription tier, with usage calculated by token consumption (input, output, and cached).
Core features like code completions and Next Edit suggestions will remain included in all plans without consuming AI Credits.
Enterprise and Business plans will benefit from pooled usage across the organization and granular budget controls at the user and cost center levels.
The shift is driven by the evolution of Copilot into an agentic platform, which requires significantly higher compute and inference resources for multi-step coding sessions.

#mlp #finops #culture

Read original

GitHub EngineeringApr 20, 2026

Changes to GitHub Copilot Individual plans

Why it matters: High-intensity agentic workflows are forcing a shift in AI resource management. Engineers must now optimize token consumption and model selection to maintain productivity within new usage constraints and avoid service interruptions.

GitHub is pausing new sign-ups for Copilot Pro, Pro+, and Student plans to stabilize service for existing users.
Usage limits are tightening as agentic workflows and parallelized sessions consume significantly more compute than original plan structures supported.
Claude Opus models are removed from the Pro plan; Opus 4.7 remains exclusive to Pro+ subscribers.
New token-based weekly limits address the high costs of long-trajectory requests and complex coding tasks.
Real-time usage tracking is now integrated into VS Code and Copilot CLI to improve transparency.
Pro+ plans offer over 5x the usage limits of standard Pro plans to accommodate power users.

#mlp #finops #sre

Read original

Page 1 of 6

Prev1 2 3...6 Next