Microsoft's new Phi-4 Multimodal model is better than most competitors
The new Phi-4 models pack a punch
2 min. read
Published on
Read our disclosure page to find out how can you help Windows Report sustain the editorial team. Read more

Recently, we’ve discovered that Microsoft’s Phi-4 has a few bugs that are slowing it down. Hopefully, they were fixed, because Microsoft has just rolled out the new Phi-4 Multimodal and PHI-4 Mini language models.
Phi-4-mini brings significant enhancements in multilingual support, reasoning, and mathematics, and now, the long-awaited function calling feature is finally supported. As for Phi-4-multimodal, it is a fully multimodal model capable of vision, audio, text, multilingual understanding, strong reasoning, encoding, and more.
So, the new Phi-4 models might be small, but they really pack a punch. Of course, the more exciting one is Phi-4 Multimodal. As its name suggests, it can handle multiple types of inputs and it’s the first of its kind from Microsoft. On the flip side, the Phi-4 Mini focuses on efficiency. This smaller, streamlined model is crafted for situations where you need a swift, reliable AI without the heavy computational baggage.
In the latest Phi-4 tech report, Microsoft also provided a comparison table with other models such as Gemini, Qwen, and Claude. The data shows that Microsoft’s Phi-4 Multimodal 5.6B achieves better scores in many aspects.
To keep these models secure and reliable, Microsoft ran extensive tests with both in-house security experts and external specialists, using tailored strategies from the Microsoft AI Red Team (AIRT). They made sure that when optimized with ONNX Runtime, both the Phi-4-mini and Phi-4-multimodal models can run directly on devices across different platforms—perfect for situations where you need fast performance on a budget.
The better news is that both models are now available on Azure AI Foundry and the NVIDIA API Catalog, so you can test them right now.
User forum
0 messages