Curated topic

finops

Posts tagged with finops

Cloudflare BlogJan 27, 2026

Building a serverless, post-quantum Matrix homeserver

Why it matters: This proof of concept demonstrates how to transform heavy, stateful communication protocols into serverless architectures. It reduces operational overhead and costs to near zero while future-proofing security with post-quantum encryption at the edge.

Ported the Matrix homeserver protocol to Cloudflare Workers using TypeScript and the Hono framework.
Replaced traditional stateful infrastructure with serverless primitives: D1 for SQL, KV for caching, R2 for media, and Durable Objects for state resolution.
Achieved a scale-to-zero cost model, eliminating the fixed overhead of running dedicated virtual private servers.
Integrated post-quantum cryptography by default using hybrid X25519MLKEM768 key agreement for TLS 1.3 connections.
Leveraged Cloudflare's global edge network to reduce latency by executing homeserver logic in over 300 locations.
Maintained end-to-end encryption (Megolm) while adding a quantum-resistant transport layer for defense-in-depth.

#dist #security #finops

Read original

Microsoft Azure BlogJan 26, 2026

Maia 200: The AI accelerator built for inference

Why it matters: Maia 200 represents a shift toward custom first-party silicon optimized for LLM inference. It offers engineers high-performance FP4/FP8 compute and a flexible software stack, significantly reducing the cost and latency of deploying massive models like GPT-5.2 at scale.

Maia 200 is built on a TSMC 3nm process, featuring 140 billion transistors and delivering 10 petaFLOPS of FP4 and 5 petaFLOPS of FP8 performance.
The memory architecture utilizes 216GB of HBM3e at 7 TB/s alongside 272MB of on-chip SRAM to maximize token generation throughput.
It employs a custom Ethernet-based scale-up network providing 2.8 TB/s of bidirectional bandwidth for clusters of up to 6,144 accelerators.
The software ecosystem includes the Maia SDK with a Triton compiler, PyTorch integration, and a low-level programming language (NPL).
Engineered for efficiency, it achieves 30% better performance per dollar than existing hardware for models like GPT-5.2 and synthetic data generation.

#mlp #dist #finops

Read original

Salesforce EngineeringJan 15, 2026

How a Mock LLM Service Cut $500K in AI Benchmarking Costs, Boosted Developer Productivity

Why it matters: Benchmarking AI systems against live providers is expensive and noisy. This mock service provides a deterministic, cost-effective way to validate performance and reliability at scale, allowing engineers to iterate faster without financial friction or external latency fluctuations.

Salesforce developed an internal LLM mock service to simulate AI provider behavior, supporting benchmarks of over 24,000 requests per minute.
The service reduced annual token-based costs by over $500,000 by replacing live LLM dependencies during performance and regression testing.
Deterministic latency controls allow engineers to isolate internal code performance from external provider variability, ensuring repeatable results.
The mock layer enables rapid scale and failover benchmarking by simulating high-volume traffic and controlled outages without external infrastructure.
By providing a shared platform capability, the service accelerates development loops and improves confidence in performance signals.

#mlp #finops #sre

Read original

Microsoft Azure BlogJan 15, 2026

Chart your AI and agent strategy with Microsoft Marketplace

Why it matters: Engineers must balance speed-to-market with customizability. This ecosystem simplifies the 'build vs. buy' decision by providing pre-vetted models and agents that integrate with existing stacks while ensuring governance and cost optimization through cloud consumption commitments.

Microsoft Marketplace provides a central catalog of over 11,000 AI models and 4,000 apps to support build, buy, or hybrid AI strategies.
Pro-code developers can access foundational models from Anthropic, Meta, and OpenAI via Azure Foundry to maintain full control over custom logic and IP.
Low-code development is enabled through Microsoft Copilot Studio, allowing teams to build agents grounded in organizational data with minimal coding.
Ready-made agents and multi-agent systems can be deployed directly into Microsoft 365 Copilot to accelerate time-to-value for common business use cases.
Governance tools like Private Azure Marketplace allow IT teams to curate approved solutions and maintain oversight of AI deployments.
Marketplace transactions can be applied toward Microsoft Azure Consumption Commitment (MACC), helping organizations optimize cloud spend and procurement.

#mlp #finops #data

Read original

Airbnb EngineeringJan 12, 2026

Pay As a Local

Why it matters: This architecture demonstrates how to scale global payment systems by abstracting vendor-specific complexities into standardized archetypes. It enables rapid expansion into new markets while maintaining high reliability and consistency through domain-driven design and asynchronous orchestration.

Replatformed from a monolith to a domain-driven microservices architecture (Payments LTA) to improve scalability and team autonomy.
Implemented a connector and plugin-based architecture to standardize third-party Payment Service Provider (PSP) integrations.
Developed the Multi-Step Transactions (MST) framework, a processor-agnostic system for handling complex flows like redirects and SCA.
Categorized 20+ local payment methods into three standardized archetypes—Redirect, Async, and Direct flows—to maximize code reuse.
Utilized asynchronous orchestration with webhooks and polling to manage external payment confirmations and ensure data consistency.
Enforced strict idempotency and built comprehensive observability dashboards to monitor transaction success rates and latency across regions.

#dist #finops #sre

Read original

PlanetScale Tech BlogDec 15, 2025

$50 PlanetScale Metal is GA for Postgres

Why it matters: Engineers can now access high-performance, NVMe-backed Postgres hardware at a fraction of the previous cost. The decoupling of storage and compute allows for better resource optimization and cost efficiency for diverse workloads, from small high-traffic apps to large data-heavy systems.

PlanetScale Metal for Postgres now offers smaller instances starting at $50/month with 1GiB RAM.
Storage and compute are now decoupled, allowing for up to 300GB of storage per GiB of RAM.
All instances utilize locally attached NVMe drives to ensure low latency and high reliability.
Users can choose from eight storage capacities ranging from 10GB to 1.2TB across various CPU/RAM tiers.
The service supports online resizing and is available on AWS with both Intel and ARM CPU options.

#data #finops #sre

Read original

GitHub EngineeringDec 12, 2025

The future of AI-powered software optimization (and how it can help your team)

Why it matters: This article introduces "Continuous Efficiency," an AI-driven method to embed sustainable and efficient coding practices directly into development workflows. It offers a practical path for engineers to improve code quality, performance, and reduce operational costs without manual effort.

"Continuous Efficiency" integrates AI-powered automation with green software principles to embed sustainability into development workflows.
This approach combines LLM-powered Continuous AI for CI/CD with Green Software practices, aiming for more performant, resilient, and cost-effective code.
It addresses the low priority of green software by enabling near-effortless, always-on optimization for efficiency and reduced environmental impact.
Implemented via Agentic Workflows in GitHub Actions, it allows defining engineering standards in natural language for scalable application.
Benefits include declarative rule authoring, semantic generalizability across languages, and intelligent remediation like automated pull requests.
Pilot projects demonstrate success in applying green software rules and Web Sustainability Guidelines, yielding measurable performance gains.

#mlp #sre #finops

Read original

Microsoft Azure BlogDec 2, 2025

A decade of open innovation: Celebrating 10 years of Microsoft and Red Hat partnership

Why it matters: This article highlights how a decade-long partnership between Microsoft and Red Hat has driven significant advancements in hybrid cloud, open source, and AI. Engineers can learn about integrated platforms like ARO, cost-saving benefits, and tools for modernizing applications and scaling AI.

Microsoft and Red Hat mark a decade of partnership, advancing open source and enterprise cloud innovation, particularly for hybrid cloud transformation.
Key offerings include Red Hat Enterprise Linux (RHEL) on Azure and Azure Red Hat OpenShift (ARO), a jointly engineered, fully managed application platform.
The collaboration has enabled digital transformation, cost savings, and accelerated AI initiatives for global enterprises across various industries.
Technical accomplishments include deep integration of Red Hat solutions on Azure, OpenShift Virtualization, Confidential Containers, and contributions to Kubernetes.
The partnership provides a secure, governable foundation for scalable AI adoption, leveraging ARO with Azure OpenAI Service and Microsoft Foundry.
Flexible pricing through Azure Hybrid Benefit for RHEL helps optimize costs for organizations running workloads on Azure.

#dist #mlp #finops

Read original

Engineering at MetaNov 21, 2025

Zoomer: Powering AI Performance at Meta’s Scale Through Intelligent Debugging and Optimization

Why it matters: Zoomer is crucial for optimizing AI performance at Meta's massive scale, ensuring efficient GPU utilization, reducing energy consumption, and cutting operational costs. This accelerates AI development and innovation across all Meta products, from GenAI to recommendations.

Zoomer is Meta's automated, comprehensive platform for debugging and optimizing AI training and inference workloads at scale.
It provides deep performance insights, leading to significant energy savings, accelerated workflows, and improved efficiency across Meta's AI infrastructure.
The platform has reduced training times and improved Queries Per Second (QPS), making it Meta's primary tool for AI performance optimization.
Zoomer's architecture comprises an Infrastructure/Platform layer for scalability, an Analytics/Insights Engine for deep analysis (using Kineto, StrobeLight, dyno telemetry), and a Visualization/UI layer for actionable insights.
It addresses critical challenges of GPU underutilization, operational costs, and suboptimal hardware use in large-scale AI environments.

#mlp #sre #finops

Read original

PlanetScale Tech BlogNov 14, 2025

$5 PlanetScale is live

Why it matters: PlanetScale lowers the entry barrier for developers by offering affordable Postgres instances with advanced features like branching. It provides a seamless growth path from a single node to sharded architectures without requiring painful database migrations.

PlanetScale has launched $5/month single node Postgres databases globally for startups and side projects.
Development branch pricing is reduced from $10 to $5 per month, lowering the cost of staging environments.
Single node instances include advanced features like Query Insights, schema recommendations, and branching.
Users can vertically scale clusters or upgrade to High Availability mode with multi-replica configurations.
The platform offers a seamless growth path to horizontal scaling via Neki, their upcoming sharded Postgres solution.

#data #finops #sre

Read original

Page 2 of 3

Prev 1 2 3 Next