Microsoft's new prompting techniques give GPT-4 an edge over Google's Gemini Ultra

2 min. read

Published on December 13, 2023

published on December 13, 2023

Readers help support Windows Report. We may get a commission if you buy through our links.

It took less than a week for Microsoft to counter the positive press Google’s hot new pre-generative model was receiving with a new research project that nets GPT-4 higher scores.

Google unveiled its new multimodal artificial intelligence model with a laundry list of high scoring benchmarks acquired by its largest size model dubbed Gemini Ultra. Version 1.0 Gemini Ultra came out of the gate exceeding performance of the wildly popular OpenAI model GPT-4 on 30 out of 32 ‘state-of-the-art’ large language model (LLM) benchmarks.

However, in a newly published piece on the Microsoft Research Blog, OpenAI’s GPT-4 can reclaim its LLM supremacy with a tweak to the prompting technique now patented as Medprompt.

According to Microsoft’s chief scientist officer Eric Horvitz, director of research engineering Harsha Nori, and principal researcher Yin Tat Lee, “that steering GPT-4 with a modified version of Medprompt achieves the highest score ever achieved on the complete MMLU (Measuring Massive Multitasking Language Understanding).”

Google’s Gemini Ultra just began being seeded to select organizations, but with Microsoft Research modifying its Medprompt technique, its highly touted benchmark success is quickly becoming old news.

To inch out better scores from GPT-4 Microsoft Research “extended Medprompt to Medprompt+ by adding a simpler prompting method and formulating a policy for deriving a final answer by integrating outputs from both the bass and inferred confidences of candidate answers.”

According to Microsoft Research, Google implores a similar pairing of both complex and simple queries for evaluating its model responses.

The net result of applying the Medprompt+ technique to GPT-4 is an overall boost in benchmarks scores for each of the evaluated sections over Gemini Ultra, some as much as 10 percent.

With that being said, Microsoft Research notes that Medprompt+ “relies on accessing confidence scores (logprobs) from GPT-4,” which aren’t currently publicly available. While admin-level tweaking can boost GPT-4 scores to beat Gemini in synthetic benchmarks, Microsoft Research is still looking into squeezing the most out of the out-the-box experience most people will encounter with “zero- or few shot prompting strategies.”

Microsoft Research notes all of its prompting techniques and tools within its GitHub repo channel here.

Kareem Anderson

Networking & Security Specialist

Kareem is a journalist from the bay area, now living in Florida. His passion for technology and content creation drives are unmatched, driving him to create well-researched articles and incredible YouTube videos. He is always on the lookout for everything new about Microsoft, focusing on making easy-to-understand content and breaking down complex topics related to networking, Azure, cloud computing, and security.