Microsoft's new prompting techniques give GPT-4 an edge over Google's Gemini Ultra

Reading time icon 2 min. read


Readers help support Windows Report. We may get a commission if you buy through our links. Tooltip Icon

Read our disclosure page to find out how can you help Windows Report sustain the editorial team Read more

GPT-4 - Google Gemini - hero

It took less than a week for Microsoft to counter the positive press Google’s hot new pre-generative model was receiving with a new research project that nets GPT-4 higher scores.

Google unveiled its new multimodal artificial intelligence model with a laundry list of high scoring benchmarks acquired by its largest size model dubbed Gemini Ultra. Version 1.0 Gemini Ultra came out of the gate exceeding performance of the wildly popular OpenAI model GPT-4 on 30 out of 32 ‘state-of-the-art’ large language model (LLM) benchmarks.

However, in a newly published piece on the Microsoft Research Blog, OpenAI’s GPT-4 can reclaim its LLM supremacy with a tweak to the prompting technique now patented as Medprompt.

According to Microsoft’s chief scientist officer Eric Horvitz, director of research engineering Harsha Nori, and principal researcher Yin Tat Lee, “that steering GPT-4 with a modified version of Medprompt achieves the highest score ever achieved on the complete MMLU (Measuring Massive Multitasking Language Understanding).”

Google’s Gemini Ultra just began being seeded to select organizations, but with Microsoft Research modifying its Medprompt technique, its highly touted benchmark success is quickly becoming old news.

To inch out better scores from GPT-4 Microsoft Research “extended Medprompt to Medprompt+ by adding a simpler prompting method and formulating a policy for deriving a final answer by integrating outputs from both the bass and inferred confidences of candidate answers.”

According to Microsoft Research, Google implores a similar pairing of both complex and simple queries for evaluating its model responses.

The net result of applying the Medprompt+ technique to GPT-4 is an overall boost in benchmarks scores for each of the evaluated sections over Gemini Ultra, some as much as 10 percent.

With that being said, Microsoft Research notes that Medprompt+ “relies on accessing confidence scores (logprobs) from GPT-4,” which aren’t currently publicly available. While admin-level tweaking can boost GPT-4 scores to beat Gemini in synthetic benchmarks, Microsoft Research is still looking into squeezing the most out of the out-the-box experience most people will encounter with “zero- or few shot prompting strategies.”

Microsoft Research notes all of its prompting techniques and tools within its GitHub repo channel here.