Microsoft AI (MAI) has officially pulled back the curtain on the latest lineup of models, MAI-Voice-1 and MAI-1-preview. The first one is a highly expressive speech generation system, while the latter is the company’s first large-scale foundation model trained end-to-end.

MAI-Voice-1 brings natural speech to Copilot

MAI-Voice-1 is Microsoft’s first in-house speech model built for expressiveness and speed. According to the company, it can generate a full minute of audio in under a second on a single GPU. This makes it one of the fastest systems of its kind.

The model is already integrated into Copilot Daily and Podcasts, with a dedicated playground inside Copilot Labs. Users can try demos like storytelling experiences or guided meditations, showcasing the model’s high-fidelity audio across single and multi-speaker scenarios.

MAI-1-preview hits public testing

Alongside voice, Microsoft also introduced MAI-1-preview, its first internally trained foundation model. Built using a mixture-of-experts architecture across roughly 15,000 NVIDIA H100 GPUs, the model is now being tested on LMArena, a popular community evaluation platform.

MAI-1-preview is designed for instruction-following and general assistance, with Microsoft planning to roll it out in select Copilot text use cases in the coming weeks. API access is also being extended to trusted testers for early feedback.

Both releases are part of Microsoft’s long-term vision to deliver responsible, reliable AI tailored to user needs. The company is already running its next-gen GB200 cluster and says more specialized models are on the way.