Curated topic

finops

Posts tagged with finops

Microsoft Azure BlogFeb 11, 2026

Agentic cloud operations: A new way to run the cloud

Why it matters: As cloud complexity outpaces human capacity, agentic operations allow engineers to move from manual toil to high-level orchestration. By automating context-aware diagnosis and remediation, teams can maintain reliability and efficiency at the scale required for modern AI workloads.

Agentic cloud operations shift from manual, dashboard-centric management to dynamic, AI-driven systems that correlate signals and take autonomous actions.
Azure Copilot serves as the central agentic interface, integrating with subscriptions, resources, and policies to provide context-aware operational intelligence.
Specialized agents cover the full cloud lifecycle, including migration planning, infrastructure-as-code generation, and automated deployment validation.
Real-time observability and troubleshooting agents accelerate root cause analysis by diagnosing health signals across the full stack and recommending fixes.
Resiliency and optimization agents continuously identify gaps in recovery configurations and execute cost-saving or performance-enhancing adjustments.

#sre #mlp #finops

Read original

Microsoft Azure BlogFeb 10, 2026

Can high-temperature superconductors transform the power infrastructure of datacenters?

Why it matters: As AI workloads drive unprecedented power demands, traditional copper infrastructure faces efficiency and space limits. HTS technology offers a path to lossless power delivery and higher density, enabling sustainable scaling of next-generation datacenter architecture.

Microsoft is investigating High-Temperature Superconductors (HTS) to meet the massive power demands of AI and data-intensive computing.
HTS cables provide lossless power transmission with zero electrical resistance, eliminating heat generation and voltage drops.
The technology allows for higher power density in smaller footprints, enabling more compact and efficient datacenter designs.
Operationalizing HTS requires specialized cryogenic cooling systems to maintain materials at temperatures necessary for superconductivity.
By reducing transmission losses and infrastructure size, HTS supports sustainability goals and helps scale cloud infrastructure effectively.

#sre #mlp #finops

Read original

Pinterest EngineeringFeb 5, 2026

Next Generation DB Ingestion at Pinterest

Why it matters: Transitioning from batch to real-time ingestion is critical for modern data-driven apps. Pinterest's architecture shows how to use CDC and Iceberg to reduce latency from days to minutes while cutting costs and ensuring compliance through efficient row-level updates and unified pipelines.

Pinterest replaced fragmented, high-latency batch ingestion with a unified CDC-based framework using Flink, Spark, and Apache Iceberg.
The system captures changes from MySQL, TiDB, and KVStore via a custom CDC service, writing events to Kafka with sub-second latency.
A dual-table architecture uses append-only CDC tables for change logs and Base tables for mirrored snapshots updated via Spark's MERGE INTO.
Standardizing on Iceberg's Merge-on-Read (MOR) strategy significantly reduced storage and compute costs compared to Copy-on-Write (COW).
The framework supports row-level deletions natively, improving data compliance and handling petabyte-scale data across thousands of pipelines.

#data #dist #finops

Read original

Cloudflare BlogJan 27, 2026

Building a serverless, post-quantum Matrix homeserver

Why it matters: This proof of concept demonstrates how to transform heavy, stateful communication protocols into serverless architectures. It reduces operational overhead and costs to near zero while future-proofing security with post-quantum encryption at the edge.

Ported the Matrix homeserver protocol to Cloudflare Workers using TypeScript and the Hono framework.
Replaced traditional stateful infrastructure with serverless primitives: D1 for SQL, KV for caching, R2 for media, and Durable Objects for state resolution.
Achieved a scale-to-zero cost model, eliminating the fixed overhead of running dedicated virtual private servers.
Integrated post-quantum cryptography by default using hybrid X25519MLKEM768 key agreement for TLS 1.3 connections.
Leveraged Cloudflare's global edge network to reduce latency by executing homeserver logic in over 300 locations.
Maintained end-to-end encryption (Megolm) while adding a quantum-resistant transport layer for defense-in-depth.

#dist #security #finops

Read original

Microsoft Azure BlogJan 26, 2026

Maia 200: The AI accelerator built for inference

Why it matters: Maia 200 represents a shift toward custom first-party silicon optimized for LLM inference. It offers engineers high-performance FP4/FP8 compute and a flexible software stack, significantly reducing the cost and latency of deploying massive models like GPT-5.2 at scale.

Maia 200 is built on a TSMC 3nm process, featuring 140 billion transistors and delivering 10 petaFLOPS of FP4 and 5 petaFLOPS of FP8 performance.
The memory architecture utilizes 216GB of HBM3e at 7 TB/s alongside 272MB of on-chip SRAM to maximize token generation throughput.
It employs a custom Ethernet-based scale-up network providing 2.8 TB/s of bidirectional bandwidth for clusters of up to 6,144 accelerators.
The software ecosystem includes the Maia SDK with a Triton compiler, PyTorch integration, and a low-level programming language (NPL).
Engineered for efficiency, it achieves 30% better performance per dollar than existing hardware for models like GPT-5.2 and synthetic data generation.

#mlp #dist #finops

Read original

Salesforce EngineeringJan 15, 2026

How a Mock LLM Service Cut $500K in AI Benchmarking Costs, Boosted Developer Productivity

Why it matters: Benchmarking AI systems against live providers is expensive and noisy. This mock service provides a deterministic, cost-effective way to validate performance and reliability at scale, allowing engineers to iterate faster without financial friction or external latency fluctuations.

Salesforce developed an internal LLM mock service to simulate AI provider behavior, supporting benchmarks of over 24,000 requests per minute.
The service reduced annual token-based costs by over $500,000 by replacing live LLM dependencies during performance and regression testing.
Deterministic latency controls allow engineers to isolate internal code performance from external provider variability, ensuring repeatable results.
The mock layer enables rapid scale and failover benchmarking by simulating high-volume traffic and controlled outages without external infrastructure.
By providing a shared platform capability, the service accelerates development loops and improves confidence in performance signals.

#mlp #finops #sre

Read original

Microsoft Azure BlogJan 15, 2026

Chart your AI and agent strategy with Microsoft Marketplace

Why it matters: Engineers must balance speed-to-market with customizability. This ecosystem simplifies the 'build vs. buy' decision by providing pre-vetted models and agents that integrate with existing stacks while ensuring governance and cost optimization through cloud consumption commitments.

Microsoft Marketplace provides a central catalog of over 11,000 AI models and 4,000 apps to support build, buy, or hybrid AI strategies.
Pro-code developers can access foundational models from Anthropic, Meta, and OpenAI via Azure Foundry to maintain full control over custom logic and IP.
Low-code development is enabled through Microsoft Copilot Studio, allowing teams to build agents grounded in organizational data with minimal coding.
Ready-made agents and multi-agent systems can be deployed directly into Microsoft 365 Copilot to accelerate time-to-value for common business use cases.
Governance tools like Private Azure Marketplace allow IT teams to curate approved solutions and maintain oversight of AI deployments.
Marketplace transactions can be applied toward Microsoft Azure Consumption Commitment (MACC), helping organizations optimize cloud spend and procurement.

#mlp #finops #data

Read original

Airbnb EngineeringJan 12, 2026

Pay As a Local

Why it matters: This architecture demonstrates how to scale global payment systems by abstracting vendor-specific complexities into standardized archetypes. It enables rapid expansion into new markets while maintaining high reliability and consistency through domain-driven design and asynchronous orchestration.

Replatformed from a monolith to a domain-driven microservices architecture (Payments LTA) to improve scalability and team autonomy.
Implemented a connector and plugin-based architecture to standardize third-party Payment Service Provider (PSP) integrations.
Developed the Multi-Step Transactions (MST) framework, a processor-agnostic system for handling complex flows like redirects and SCA.
Categorized 20+ local payment methods into three standardized archetypes—Redirect, Async, and Direct flows—to maximize code reuse.
Utilized asynchronous orchestration with webhooks and polling to manage external payment confirmations and ensure data consistency.
Enforced strict idempotency and built comprehensive observability dashboards to monitor transaction success rates and latency across regions.

#dist #finops #sre

Read original

PlanetScale Tech BlogDec 15, 2025

$50 PlanetScale Metal is GA for Postgres

Why it matters: Engineers can now access high-performance, NVMe-backed Postgres hardware at a fraction of the previous cost. The decoupling of storage and compute allows for better resource optimization and cost efficiency for diverse workloads, from small high-traffic apps to large data-heavy systems.

PlanetScale Metal for Postgres now offers smaller instances starting at $50/month with 1GiB RAM.
Storage and compute are now decoupled, allowing for up to 300GB of storage per GiB of RAM.
All instances utilize locally attached NVMe drives to ensure low latency and high reliability.
Users can choose from eight storage capacities ranging from 10GB to 1.2TB across various CPU/RAM tiers.
The service supports online resizing and is available on AWS with both Intel and ARM CPU options.

#data #finops #sre

Read original

GitHub EngineeringDec 12, 2025

The future of AI-powered software optimization (and how it can help your team)

Why it matters: This article introduces "Continuous Efficiency," an AI-driven method to embed sustainable and efficient coding practices directly into development workflows. It offers a practical path for engineers to improve code quality, performance, and reduce operational costs without manual effort.

"Continuous Efficiency" integrates AI-powered automation with green software principles to embed sustainability into development workflows.
This approach combines LLM-powered Continuous AI for CI/CD with Green Software practices, aiming for more performant, resilient, and cost-effective code.
It addresses the low priority of green software by enabling near-effortless, always-on optimization for efficiency and reduced environmental impact.
Implemented via Agentic Workflows in GitHub Actions, it allows defining engineering standards in natural language for scalable application.
Benefits include declarative rule authoring, semantic generalizability across languages, and intelligent remediation like automated pull requests.
Pilot projects demonstrate success in applying green software rules and Web Sustainability Guidelines, yielding measurable performance gains.

#mlp #sre #finops

Read original

Page 4 of 6

Prev 1 2 3 4 5 6 Next