This partnership delivers advanced AI infrastructure and models, enabling engineers to deploy complex AI workloads from cloud to edge, addressing critical needs like low-latency inferencing, data residency, and scalable AI application development with greater flexibility and performance.
Microsoft and NVIDIA are deepening our partnership to power the next wave of AI industrial innovation. For years, our companies have helped fuel the AI revolution, bringing the world’s most advanced supercomputing to the cloud, enabling breakthrough frontier models, and making AI more accessible to organizations everywhere. Today, we’re building on that foundation with new advancements that deliver greater performance, capability, and flexibility.
With added support for NVIDIA RTX PRO 6000 Blackwell Server Edition on Azure Local, customers can deploy AI and visual computing workloads distributed and edge environments with the seamless orchestration and management you use in the cloud. New NVIDIA Nemotron and NVIDIA Cosmos models in Azure AI Foundry give businesses an enterprise-grade platform to build, deploy, and scale AI applications and agents. With NVIDIA Run:ai on Azure, enterprises can get more from every GPU to streamline operations and accelerate AI. Finally, Microsoft is redefining AI infrastructure with the world’s first deployment of NVIDIA GB300 NVL72.
Today’s announcements mark the next chapter in our full-stack AI collaboration with NVIDIA, empowering customers to build the future faster.
Microsoft and NVIDIA continue to drive advancements in artificial intelligence, offering innovative solutions that span the public and private cloud, the edge, and sovereign environments.
As highlighted in the March blog post for NVIDIA GTC, Microsoft will offer NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs on Azure. Now, with expanded availability of NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs on Azure Local, organizations can optimize their AI workloads, regardless of location, to provide customers with greater flexibility and more options than ever. Azure Local leverages Azure Arc to empower organizations to run advanced AI workloads on-premises while retaining the management simplicity of the cloud or operating in fully disconnected environments.
NVIDIA RTX PRO 6000 Blackwell GPUs provide the performance and flexibility needed to accelerate a broad range of use cases, from agentic AI, physical AI, and scientific computing to rendering, 3D graphics, digital twins, simulation, and visual computing. This expanded GPU support unlocks a range of edge use cases that fulfill the stringent requirements of critical infrastructure for our healthcare, retail, manufacturing, government, defense, and intelligence customers. This may include real-time video analytics for public safety, predictive maintenance in industrial settings, rapid medical diagnostics, and secure, low-latency inferencing for essential services such as energy production and critical infrastructure. The NVIDIA RTX PRO 6000 Blackwell enables improved virtual desktop support by leveraging NVIDIA vGPU technology and Multi-Instance GPU (MIG) capabilities. This can not only accommodate a higher user density, but also power AI-enhanced graphics and visual compute capabilities, offering an efficient solution for demanding virtual environments.
Earlier this year, Microsoft announced a multitude of AI capabilities at the edge, all enriched with NVIDIA accelerated computing:
With Azure Local, customers can meet strict regulatory, data residency, and privacy requirements while harnessing the latest AI innovations powered by NVIDIA.
Whether you need ultra-low latency for business continuity, robust local inferencing, or compliance with industry regulations, we’re dedicated to delivering cutting-edge AI performance wherever your data resides. Customers now access the breakthrough performance of the NVIDIA RTX PRO 6000 Blackwell GPUs in new Azure Local solutions—including Dell AX-770, HPE ProLiant DL380 Gen12, and Lenovo ThinkAgile MX650a V4.
To find out more about upcoming availability and sign up for early ordering, visit:
At Microsoft, we’re committed to bringing the most advanced AI capabilities to our customers, wherever they need them. Through our partnership with NVIDIA, Azure AI Foundry now brings world-class multimodal reasoning models directly to enterprises, deployable anywhere as secure, scalable NVIDIA NIM™ microservices. The portfolio spans a range of different use cases:
Together, these open models reflect the depth of the Azure and NVIDIA partnership: combining Microsoft’s adaptive cloud with NVIDIA’s leadership in accelerated computing to power the next generation of agentic AI for every industry. Learn more about the models here.
As an AI workload and GPU orchestration platform, NVIDIA Run:ai helps organizations make the most of their compute investments, accelerating AI development cycles and driving faster time-to-market for new insights and capabilities. By bringing NVIDIA Run:ai to Azure, we’re giving enterprises the ability to dynamically allocate, share, and manage GPU resources across teams and workloads, helping them get more from every GPU.
NVIDIA Run:ai on Azure integrates seamlessly with core Azure services, including Azure NC and ND series instances, Azure Kubernetes Service (AKS), and Azure Identity Management, and offers compatibility with Azure Machine Learning and Azure AI Foundry for unified, enterprise-ready AI orchestration. We’re bringing hybrid scale to life to help customers transform static infrastructure into a flexible, shared resource for AI innovation.
With smarter orchestration and cloud-ready GPU pooling, teams can drive faster innovation, reduce costs, and unleash the power of AI across their organizations with confidence. NVIDIA Run:ai on Azure enhances AKS with GPU-aware scheduling, helping teams allocate, share, and prioritize GPU resources more efficiently. Operations are streamlined with one-click job submission, automated queueing, and built in governance. This ensures teams spend less time managing infrastructure and more time focused on building what’s next.
This impact spans industries, supporting the infrastructure and orchestration behind transformative AI workloads at every stage of enterprise growth:
Powered by Microsoft Azure and NVIDIA, Run:ai is purpose-built for scale, helping enterprises move from isolated AI experimentation to production-grade innovation.
Microsoft is redefining AI infrastructure with the new NDv6 GB300 VM series, delivering the first at-scale production cluster of NVIDIA GB300 NVL72 systems, featuring over 4600 NVIDIA Blackwell Ultra GPUs connected via NVIDIA Quantum-X800 InfiniBand networking. Each NVIDIA GB300 NVL72 rack integrates 72 NVIDIA Blackwell Ultra GPUs and 36 NVIDIA Grace™ CPUs, delivering over 130 TB/s of NVLink bandwidth and up to 136 kW of compute power in a single cabinet. Designed for the most demanding workloads—reasoning models, agentic systems, and multimodal AI—GB300 NVL72 combines ultra-dense compute, direct liquid cooling, and smart rack-scale management to deliver breakthrough efficiency and performance within a standard datacenter footprint.
Azure’s co-engineered infrastructure enhances GB300 NVL72 with technologies like Azure Boost for accelerated I/O and integrated hardware security modules (HSM) for enterprise-grade protection. Each rack arrives pre-integrated and self-managed, enabling rapid, repeatable deployment across Azure’s global fleet. As the first cloud provider to deploy NVIDIA GB300 NVL72 at scale, Microsoft is setting a new standard for AI supercomputing—empowering organizations to train and deploy frontier models faster, more efficiently, and more securely than ever before. Together, Azure and NVIDIA are powering the future of AI.
Learn more about Microsoft’s systems approach in delivering GB300 NVL72 on Azure.
Our collaboration with NVIDIA focuses on optimizing every layer of the computing stack to help customers maximize the value of their existing AI infrastructure investments.
To deliver high-performance inference for compute-intensive reasoning models at scale, we’re bringing together a solution that combines the open-source NVIDIA Dynamo framework, our ND GB200-v6 VMs with NVIDIA GB200 NVL72 and Azure Kubernetes Service(AKS). We’ve demonstrated the performance this combined solution delivers at scale with the gpt-oss 120b model processing 1.2 million tokens per second deployed in a production-ready, managed AKS cluster and have published a deployment guide for developers to get started today.
Dynamo is an open-source, distributed inference framework designed for multi-node environments and rack-scale accelerated compute architectures. By enabling disaggregated serving, LLM-aware routing and KV caching, Dynamo significantly boosts performance for reasoning models on Blackwell, unlocking up to 15x more throughput compared to the prior Hopper generation, opening new revenue opportunities for AI service providers.
These efforts enable AKS production customers to take full advantage of NVIDIA Dynamo’s inference optimizations when deploying frontier reasoning models at scale. We’re dedicated to bringing the latest open-source software innovations to our customers, helping them fully realize the potential of the NVIDIA Blackwell platform on Azure.
Learn more about Dynamo on AKS.
The post Building the future together: Microsoft and NVIDIA announce AI advancements at GTC DC appeared first on Microsoft Azure Blog.
Continue reading on the original blog to support the author
Read full articlePostgreSQL is evolving into a central hub for AI development. By integrating vector search, LLM orchestration, and seamless IDE workflows directly into the managed database service, Microsoft reduces the friction of building and scaling intelligent, data-driven applications.
Maia 200 represents a shift toward custom first-party silicon optimized for LLM inference. It offers engineers high-performance FP4/FP8 compute and a flexible software stack, significantly reducing the cost and latency of deploying massive models like GPT-5.2 at scale.
Azure Storage is shifting from passive storage to an active, AI-optimized platform. Engineers must understand these scale and performance improvements to architect systems capable of handling the high-concurrency, high-throughput demands of autonomous agents and LLM lifecycles.
Azure's proactive infrastructure design ensures engineers can deploy next-gen AI models on NVIDIA Rubin hardware immediately. By solving power, cooling, and networking bottlenecks at the datacenter level, Microsoft enables massive-scale AI training and inference with minimal friction.