Microsoft's AI now outperforms doctors in complex NEJM diagnostic cases
A remarkable feat
2 min. read
Published on
Read our disclosure page to find out how can you help Windows Report sustain the editorial team. Read more

Microsoft’s AI tools may be on track to reshape healthcare, at least judging by how well they’re handling some of the toughest medical challenges.
In a new experiment, Microsoft’s AI Diagnostic Orchestrator (MAI-DxO) correctly diagnosed 85.5% of complex cases pulled from the New England Journal of Medicine, a journal known for its deeply challenging case studies.
The tool works by turning large language models into a virtual panel of clinicians. It can ask follow-up questions, order tests, and issue diagnoses as the case progresses, just like a team of doctors would.
When paired with OpenAI’s o3 model, MAI-DxO performed the best, far surpassing the results of 21 real physicians from the US and UK. The doctors, despite having 5 to 20 years of experience, only averaged 20% accuracy across the same benchmark.
To evaluate this properly, Microsoft created a new benchmark called the Sequential Diagnosis Benchmark (SD Bench) using 304 NEJM cases. This setup allowed AI models to step through each case as a clinician would, reviewing symptoms, asking for more data, and narrowing down diagnoses with each step.
Microsoft says the potential impact is massive. These tools could support clinicians in tough diagnostic situations and even help patients handle routine care themselves. But the company also acknowledges this is just a starting point.
The research needs to be tested in real clinical settings, with proper regulatory guardrails and oversight in place. Microsoft is now partnering with healthcare institutions to do exactly that.
User forum
0 messages