AMD Announces Day 0 Support for Alibaba Qwen 3.5 on Instinct MI300X, MI325X, and MI355X

News

Milan Stanojevic

Windows Toubleshooting Expert

News

3 min. read

Published on February 17, 2026

AMD has announced Day 0 support for Alibaba’s Qwen 3.5 large language model across its Instinct MI300X, MI325X, and MI355X accelerators. The move strengthens AMD’s AI stack by aligning high-context, multimodal models with its latest data center GPUs.

The integration was developed in collaboration with the Alibaba Qwen team and arrives fully optimized through AMD’s ROCm software ecosystem.

Built for massive context and multimodal AI workloads

Qwen 3.5 targets long-context and enterprise-scale AI workloads. The model supports context windows up to 256K tokens, addressing the quadratic scaling limitations that traditional Transformer architectures often face.

To overcome long-sequence bottlenecks, Qwen 3.5 uses a Hybrid Attention architecture. It alternates full multi-head attention layers with linear attention layers, reducing computational overhead while maintaining strong reasoning performance.

Gated Delta Networks further enable linear scaling with sequence length. As a result, inference throughput improves significantly beyond 32K tokens, where many models begin to slow down.

Ultra-sparse MoE design reduces compute overhead

Qwen 3.5 introduces an Ultra-Sparse Mixture-of-Experts (MoE) architecture. During inference, the model activates only a fraction of its parameters instead of the full network.

It combines a Shared Expert mechanism with routed experts using Top-K routing, such as selecting the top 8 experts out of 64. This approach lowers compute usage while preserving performance levels comparable to dense models.

For AMD hardware, this design pairs well with optimized hipBLASLt GEMM kernels and AITER FusedMoE implementations inside ROCm.

Native multimodal capabilities with visual agent support

Qwen 3.5 ships as a multimodal model by default. It integrates a DeepStack Vision Transformer along with 3D convolution components for advanced visual reasoning.

The architecture merges multi-layer visual encoder features using DeepStack. Vision-specific components such as mRoPE and Conv3d run through MIOpen and PyTorch kernels optimized for AMD GPUs.

The model can operate as a “Visual Agent,” identifying objects in complex environments and handling multimodal workflows that combine text and image inputs.

Optimized for ROCm, SGLang, and vLLM

AMD enables Qwen 3.5 deployment through its optimized ROCm software stack. Developers can run inference using SGLang and vLLM frameworks with Triton-based kernels for linear attention.

Large HBM capacity on Instinct MI300X, MI325X, and MI355X GPUs allows full-scale models and extended context windows to run on a single GPU or node. That reduces multi-node overhead and simplifies enterprise deployment.

AMD has also released quickstart and prerequisite guides to help AI developers, system architects, and DevOps teams integrate Qwen 3.5 into production pipelines.

Strategic timing amid shifting AI alliances

The Day 0 launch positions Qwen 3.5 as a production-ready, open-weight model optimized specifically for AMD accelerators. As enterprise AI stacks diversify, hardware vendors increasingly align with alternative model ecosystems.

In related industry developments, the Pentagon is reportedly considering blacklisting Anthropic and companies associated with it over supply chain risk concerns. Meanwhile, as Microsoft gradually moves away from exclusive OpenAI alignment, open-source AI agents like OpenClaw continue to gain traction.

Via TechPowerUp