AMD Announces Day 0 Support for Alibaba Qwen 3.5 on Instinct MI300X, MI325X, and MI355X
AMD has announced Day 0 support for Alibaba’s Qwen 3.5 large language model across its Instinct MI300X, MI325X, and MI355X accelerators. The move strengthens AMD’s AI stack by aligning high-context, multimodal models with its latest data center GPUs.
The integration was developed in collaboration with the Alibaba Qwen team and arrives fully optimized through AMD’s ROCm software ecosystem.
Built for massive context and multimodal AI workloads
Qwen 3.5 targets long-context and enterprise-scale AI workloads. The model supports context windows up to 256K tokens, addressing the quadratic scaling limitations that traditional Transformer architectures often face.
To overcome long-sequence bottlenecks, Qwen 3.5 uses a Hybrid Attention architecture. It alternates full multi-head attention layers with linear attention layers, reducing computational overhead while maintaining strong reasoning performance.
Gated Delta Networks further enable linear scaling with sequence length. As a result, inference throughput improves significantly beyond 32K tokens, where many models begin to slow down.
Ultra-sparse MoE design reduces compute overhead
Qwen 3.5 introduces an Ultra-Sparse Mixture-of-Experts (MoE) architecture. During inference, the model activates only a fraction of its parameters instead of the full network.
It combines a Shared Expert mechanism with routed experts using Top-K routing, such as selecting the top 8 experts out of 64. This approach lowers compute usage while preserving performance levels comparable to dense models.
For AMD hardware, this design pairs well with optimized hipBLASLt GEMM kernels and AITER FusedMoE implementations inside ROCm.
Native multimodal capabilities with visual agent support
Qwen 3.5 ships as a multimodal model by default. It integrates a DeepStack Vision Transformer along with 3D convolution components for advanced visual reasoning.
The architecture merges multi-layer visual encoder features using DeepStack. Vision-specific components such as mRoPE and Conv3d run through MIOpen and PyTorch kernels optimized for AMD GPUs.
The model can operate as a “Visual Agent,” identifying objects in complex environments and handling multimodal workflows that combine text and image inputs.
Optimized for ROCm, SGLang, and vLLM
AMD enables Qwen 3.5 deployment through its optimized ROCm software stack. Developers can run inference using SGLang and vLLM frameworks with Triton-based kernels for linear attention.
Large HBM capacity on Instinct MI300X, MI325X, and MI355X GPUs allows full-scale models and extended context windows to run on a single GPU or node. That reduces multi-node overhead and simplifies enterprise deployment.
AMD has also released quickstart and prerequisite guides to help AI developers, system architects, and DevOps teams integrate Qwen 3.5 into production pipelines.
Strategic timing amid shifting AI alliances
The Day 0 launch positions Qwen 3.5 as a production-ready, open-weight model optimized specifically for AMD accelerators. As enterprise AI stacks diversify, hardware vendors increasingly align with alternative model ecosystems.
In related industry developments, the Pentagon is reportedly considering blacklisting Anthropic and companies associated with it over supply chain risk concerns. Meanwhile, as Microsoft gradually moves away from exclusive OpenAI alignment, open-source AI agents like OpenClaw continue to gain traction.
Via TechPowerUp
Read our disclosure page to find out how can you help Windows Report sustain the editorial team. Read more
User forum
0 messages