Microsoft Brings Fireworks AI to Foundry to Accelerate Open Model Deployment

News

Milan Stanojevic

Windows Toubleshooting Expert

News

3 min. read

Published on March 12, 2026

Organizations are increasingly adopting open AI models to gain greater control over performance, costs, customization, and compliance. However, fragmented tools and complex infrastructure still make it difficult for many teams to deploy and scale these models.

Microsoft aims to simplify this with its Foundry platform. The company has now announced the public preview integration of Fireworks AI with Microsoft Foundry, bringing high-performance open-model inference capabilities directly into Azure.

Microsoft Foundry aims to unify the AI development lifecycle

Microsoft Foundry is designed as a centralized control plane for enterprise AI workloads. The platform combines model management, deployment, evaluation, agent development, and governance features in a single environment.

The goal is to help organizations move from experimentation to production faster while maintaining consistent infrastructure and operational oversight. Foundry also includes enterprise-grade capabilities such as governance controls, observability tools, and security features to support compliance at scale.

Fireworks AI integration brings high-throughput inference

The new integration adds Fireworks AI’s inference infrastructure to the Foundry ecosystem. Fireworks AI specializes in optimized serving stacks designed to run large AI models efficiently.

According to Microsoft, Fireworks AI already processes more than 13 trillion tokens each day and handles roughly 180,000 requests per second. Its infrastructure can generate more than 1,000 tokens per second on large models, making it suitable for demanding enterprise workloads.

Through Microsoft Foundry, developers can access Fireworks AI capabilities directly from Azure endpoints. This allows organizations to evaluate and deploy models without building their own custom serving infrastructure.

Developers can access several major open models

The integration gives developers access to a growing catalog of open models through the Foundry platform. Supported models currently include:

DeepSeek V3.2
OpenAI gpt-oss-120b
Kimi K2.5
MiniMax M2.5

These models can be evaluated and deployed within the same environment used for enterprise management and governance.

The system also supports bring-your-own-weights (BYOW), allowing teams to deploy custom or fine-tuned models alongside the hosted catalog.

Flexible deployment options for experimentation and production

Developers using Fireworks AI within Foundry can choose between different deployment models depending on workload requirements.

For early testing and experimentation, serverless pay-per-token inference allows teams to run models without reserving dedicated infrastructure. This approach helps developers quickly evaluate different models and performance configurations.

For production workloads, Microsoft offers provisioned throughput units (PTUs). These provide predictable performance levels and consistent capacity for applications that require stable response times and throughput.

Part of Microsoft’s broader open AI strategy

The Fireworks AI integration reflects Microsoft’s broader strategy to support the full lifecycle of open AI models within Azure. Instead of forcing organizations to manage multiple tools and infrastructure layers, the company is positioning Foundry as a single platform for evaluating, deploying, and operating AI systems.

By combining high-performance inference with enterprise governance and development tools, Microsoft hopes to make it easier for organizations to build scalable AI applications using open models.

The company says the approach reduces operational complexity while giving developers flexibility to experiment with different models and deployment strategies.

AI ecosystem competition continues to accelerate

The Foundry update comes amid rapid developments across the AI ecosystem.

Microsoft recently announced Wave 3 of Microsoft 365 Copilot, introducing new agentic AI capabilities designed to automate tasks across enterprise workflows.

Meanwhile, Google has expanded Gemini AI features across its Workspace productivity tools, and OpenAI has launched a program offering six months of free ChatGPT Pro access to open-source developers who build using its Codex platform.

As enterprises continue exploring open models and AI infrastructure platforms, competition between major technology companies is expected to intensify.