Curated topic

dist

Posts tagged with dist

Microsoft Azure BlogDec 1, 2025

Azure networking updates on security, reliability, and high availability

Why it matters: This article highlights Azure's commitment to scaling its network for demanding AI workloads and enhancing resilience. Engineers gain insights into new features like zone-redundant NAT Gateway V2, crucial for building highly available and performant cloud-native applications.

Azure's global network has expanded to 18 Pbps WAN capacity, optimized for hyperscale AI and data workloads across 60+ AI regions.
The network fabric is specifically engineered for AI, integrating InfiniBand and high-speed Ethernet for low-latency, high-bandwidth GPU cluster communication and distributed AI WAN.
Azure is enhancing resiliency with zone-redundant services, including the public preview of Standard NAT Gateway V2.
Standard NAT Gateway V2 provides zone-redundant outbound connectivity, 100 Gbps throughput, 10M packets/sec, IPv6 support, and flow logs.

#sre #dist #security

Read original

Cloudflare BlogDec 1, 2025

Why Replicate is joining Cloudflare

Why it matters: Replicate's acquisition by Cloudflare signifies a major step towards building a comprehensive, integrated AI infrastructure. It promises to simplify the deployment and scaling of complex AI applications by combining model serving with a global network and full-stack primitives.

Replicate, founded in 2019, aimed to democratize access to research-grade ML models by abstracting away infrastructure complexities.
They developed Cog for model packaging and the Replicate platform for running models as cloud API endpoints, successfully scaling with models like Stable Diffusion.
The modern AI stack has evolved beyond just model inference, requiring a full suite of services like microservices, storage, and databases.
Replicate is joining Cloudflare to leverage Cloudflare's extensive network, Workers, R2, and other primitives to build a complete, integrated AI infrastructure layer.
This acquisition will enable faster edge models, model pipelines on Workers, and streaming model I/O, realizing a vision where "the network is the computer" for AI.

#mlp #dist #data

Read original

Dropbox Tech BlogNov 26, 2025

Building the future: highlights from Dropbox’s 2025 summer intern class

Why it matters: This article showcases how intern-led projects drive critical production improvements in ML observability, storage latency, and developer productivity, highlighting the practical application of AI in enterprise-scale infrastructure.

Dropbox's 2025 intern program integrated 28 engineering interns into high-impact projects supporting Dropbox Dash, an AI-powered universal search tool.
Interns refactored the file history tracking system within the metadata infrastructure, significantly reducing operational costs and simplifying legacy systems.
The ML Platform team developed 'AI Sentinel,' a monitoring system providing real-time operational visibility into the health of machine learning model deployments.
Storage Core improvements included implementing health-aware routing in Magic Pocket to mitigate PUT latencies during scheduled disk restarts.
The Web Developer Experience team built an AI-powered automation tool for code migrations that automatically generates pull requests for developers.

#culture #mlp #dist

Read original

Engineering at MetaNov 20, 2025

Key Transparency Comes to Messenger

Why it matters: This article details how Meta scaled a critical security feature, Key Transparency, to Messenger's massive user base. Engineers can learn about distributed system challenges, cryptographic key management, and infrastructure resilience for high-volume, security-sensitive applications.

Messenger launched Key Transparency for end-to-end encrypted chats, providing verifiable and auditable public key records to prevent tampering.
This feature automates the verification of encryption keys, addressing the complexity of manual checks for users with multiple devices and frequent key changes.
The implementation leverages the Auditable Key Directory (AKD) library and integrates Cloudflare's key transparency auditor for enhanced security.
Scaling challenges included managing billions of key entries and hundreds of thousands of updates per 2-minute epoch due to Messenger's multi-device user base.
Engineering advancements involved optimizing AKD algorithmic efficiency for smaller proof sizes and improving infrastructure resilience and recovery processes.

#security #dist

Read original

Microsoft Azure BlogNov 18, 2025

Announcing Azure Copilot agents and AI infrastructure innovations

Why it matters: Azure's new AI-powered Copilot agents and enhanced infrastructure promise to automate complex cloud operations, significantly reducing manual effort and allowing engineers to focus on innovation and architecture rather than routine administration.

Azure introduces Copilot agents to automate complex cloud operations, including migration, deployment, optimization, observability, resiliency, and troubleshooting.
Azure Copilot provides an agentic interface for cloud management, integrating with existing governance, RBAC, and policy frameworks for secure and compliant operations.
Azure is significantly enhancing its global AI infrastructure with increased capacity, resilience, optimized datacenter design, and network topology for AI-scale workloads.
Key infrastructure modernizations include new systems like Azure Cobalt and Azure Boost, AKS Automatic, and Azure HorizonDB for PostgreSQL, supporting diverse workloads.
The initiative aims to free up engineering teams from repetitive tasks, allowing them to focus on architecture and innovation by embedding AI agents directly into the platform.

#sre #dist #mlp

Read original

Microsoft Azure BlogNov 18, 2025

Azure at Microsoft Ignite 2025: All the intelligent cloud news explained

Why it matters: This article highlights Azure's comprehensive AI-first platform, offering engineers new tools for building, securing, and scaling intelligent applications and data solutions, enhancing productivity and innovation across various domains.

Azure at Ignite 2025 unifies AI, data, apps, and infrastructure to deliver intelligence at scale, addressing key business questions on AI adoption and data readiness.
New AI agent capabilities include Microsoft Fabric IQ, Foundry IQ, and Microsoft Agent Factory, simplifying the creation and scaling of intelligent applications.
Significant data modernization updates feature SAP BDC Connect for Fabric, Azure HorizonDB (PostgreSQL), Azure DocumentDB, and SQL Server 2025 for enhanced data management.
Operations and security are boosted with AI-powered tools like Foundry Control Plane, Azure Copilot with built-in agents, and native DevSecOps integration for Defender for Cloud and GitHub Advanced Security.
AI-ready infrastructure is introduced with Azure Boost for speed and security, and Azure Cobalt 200, redefining performance for the agentic era.
Microsoft Foundry expands its model choice by adding Anthropic Claude (Sonnet 4.5, Opus 4.1, Haiku 4.5) and Cohere models, making Azure the only cloud offering both OpenAI and Anthropic models.

#mlp #data #dist

Read original

Microsoft Azure BlogNov 18, 2025

Microsoft Foundry: Scale innovation on a modular, interoperable, and secure agent stack

Why it matters: Microsoft Foundry provides a comprehensive, secure, and modular platform for developers to build, deploy, and manage AI agents and applications at scale, integrating advanced models and developer tools. This accelerates the shift from prescriptive logic to intelligent, adaptive systems.

Microsoft Foundry offers a modular, interoperable, and secure agent stack for building AI applications.
New Foundry Models include offerings from Anthropic, Cohere, and NVIDIA, with a generally available model router.
Foundry IQ, now in public preview, redefines Retrieval-Augmented Generation (RAG) for improved orchestration and response quality.
Foundry Agent Service introduces Hosted Agents, multi-agent workflows, built-in memory, and direct deployment to Microsoft 365.
Foundry Control Plane centralizes identity, policy, observability, and security, integrating with GitHub Advanced Security.
Foundry Local is in private preview on Android, extending agent capabilities to mobile platforms.
AI-powered tools like GitHub Copilot and AgentHQ enhance developer productivity for building intelligent systems.

#mlp #security #dist

Read original

Microsoft Azure BlogNov 18, 2025

Introducing Anthropic’s Claude models in Microsoft Foundry: Bringing Frontier intelligence to Azure

Why it matters: Engineers gain access to Anthropic's Claude models on Azure Foundry, alongside GPT, offering unparalleled choice for building advanced AI agents. This integration simplifies operationalization, providing robust governance and security for scalable, enterprise-grade AI solutions.

Microsoft Foundry now integrates Anthropic's Claude models (Haiku, Sonnet, Opus), positioning Azure as the only cloud with both Claude and GPT frontier models.
Claude models are engineered for enterprise use, supporting diverse applications from real-time chatbots to complex research and agentic software development, with Constitutional AI for safety.
The platform addresses the challenge of operationalizing AI agents, providing governance, observability, and seamless integration into enterprise workflows.
Foundry Agent Service uses Claude for intelligent agents, enabling multi-step workflows, integration with productivity tools via Model Context Protocol (MCP), and automated data tasks.
Key features include real-time model selection for efficiency and cost savings, alongside unified controls for managing agent fleets.
Developers can also leverage Claude Code, Anthropic's AI coding agent, within Foundry.

#mlp #security #dist

Read original

Engineering at MetaNov 18, 2025

Announcing the Completion of the Core 2Africa System: Building the Future of Connectivity Together

Why it matters: This project demonstrates cutting-edge subsea cable engineering, utilizing SDM and optical switching to build massive-scale, open-access infrastructure. It's crucial for global connectivity, supporting future AI, cloud, and high-bandwidth applications across three continents.

The core 2Africa system, the world's longest open-access subsea cable, is complete, connecting 33 countries across Africa, Europe, and Asia.
It's the first cable to continuously link East and West Africa, and connect Africa to the Middle East, South Asia, and Europe.
The project, led by a Meta-consortium, uses an open-access model to promote competition and accelerate digital transformation.
Engineering innovations include Spatial Division Multiplexing (SDM) for 16 fiber pairs (double older systems) and undersea optical wavelength switching.
This infrastructure supports evolving demands for AI, cloud, and high-bandwidth applications, enabling connectivity for 3 billion people.

#dist #data

Read original

Cloudflare BlogNov 18, 2025

Cloudflare outage on November 18, 2025

Why it matters: This incident highlights the critical importance of robust change management, configuration validation, and effective incident response in large-scale distributed systems. It underscores how seemingly minor changes can cascade into widespread failures.

Cloudflare experienced a significant outage due to a database permission change that generated an oversized "feature file" for its Bot Management system.
The excessively large feature file, propagated across the network, caused routing software to fail as it exceeded an internal size limit.
Initial incident response was complicated by fluctuating system failures, leading to a temporary misdiagnosis of a DDoS attack.
Resolution involved halting the propagation of the bad configuration, manually inserting a known good file, and restarting the core proxy.
The outage impacted core CDN, security services, Workers KV, Turnstile, and Access, manifesting as widespread HTTP 5xx errors and increased latency.

#sre #dist #security

Read original

Page 12 of 22

Prev 1...10 11 12 13 14...22 Next