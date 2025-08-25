Read our disclosure page to find out how can you help Windows Report sustain the editorial team. Read more

Readers help support Windows Report. We may get a commission if you buy through our links.

Microsoft is taking Azure Machine Learning one notch up with its latest addition, the ND H200 v5 virtual machines.

As Microsoft notes, these VMs are powered by NVIDIA’s H200 Tensor Core GPUs and are designed to handle the heaviest AI workloads, from training massive language models. This helps them to serve high-throughput inference at scale.

Worth noting that the ND H200 v5 packs eight H200 GPUs, offering a combined 1,128 GB of high-bandwidth HBM3e memory. That’s a massive 76% jump over the previous H100 generation.

In other words, the massive memory pool means larger models, longer context windows, and bigger batch sizes can now run with fewer compromises. Microsoft says this setup also reduces cross-GPU communication, cutting training overhead and boosting efficiency.

Moving on, NVIDIA NVLink delivers 900 GB/s per GPU inside a VM, enabling fast parallel training across all eight GPUs. Between VMs, each node is equipped with 3.2 Tb/s of InfiniBand bandwidth, complemented by GPUDirect RDMA for low-latency GPU-to-GPU communication.

This design makes scaling across hundreds of nodes smoother and more predictable, eventually helping teams move from experiments to production with fewer roadblocks.

On the software side, ND H200 v5 slots right into existing Azure ML workflows, supporting frameworks like PyTorch, TensorFlow, and JAX. Optimized containers, distributed training via NCCL, and direct CLI provisioning ensure that data science teams can get started quickly.

Early benchmarks suggest up to 35% better throughput for large model inference compared to previous-gen setups, especially for models like Llama 3.1 405B. Microsoft notes that the high-performance simulations and scientific workloads also stand to benefit from the combination of memory bandwidth and compute density.

With support for auto-scaling clusters, Azure ML users can spin up anything from a single ND H200 VM to hundreds of nodes, only paying for what they use. In short, this is not just a hardware bump, but a full-stack upgrade aimed at fueling the next wave of AI innovation.