xAI unveils Grok-1.5 Vision with the capability of 'understanding' images

The new model outperforms its market rivals in the RealWorldQA benchmark

Reading time icon 2 min. read


Readers help support Windows Report. We may get a commission if you buy through our links. Tooltip Icon

Read our disclosure page to find out how can you help Windows Report sustain the editorial team Read more

Elon Musk's xAI unveils Grok-1.5 Vision capable of 'understanding' images

Elon Musk’s xAI has recently announced its first multimodal model Grok-1.5 Vision, aka Grok 1.5V. This comes after the company’s last month’s announcement of Grok-1 AI to take on ChatGPT.

The company’s first multimodal model Grok 1.5V not only understands text but is also capable of image processing. It can process everything it sees in documents, images, screenshots, charts, as well as diagrams. In a recent blog post, talking of Grok-1.5 Vision’s capabilities, the company mentioned:

Grok-1.5V is competitive with existing frontier multimodal models in a number of domains, ranging from multi-disciplinary reasoning to understanding documents, science diagrams, charts, screenshots, and photographs.

Grok-1.5 Vision outperforms its rival in the RealWorldQA benchmark

The company also detailed the advanced capabilities of the Grok-1.5 Vision with seven different samples which are as follows:

  1. Writing code from a diagram
  2. Calculating Calories
  3. From a drawing to a bedtime story
  4. Explaining a meme 
  5. Converting a table to CSV
  6. Help with rotten wood on a deck
  7. Solving a coding problem

Musk-led AI company also shared a comparison chart to compare its first multimodal model with its rivals. Testing results show that Grok-1.5 Vision stands tall against its competitors like GPT-4 with Vision, Claud 3 Sonnet/Opus, and Gemini Pro 1.5.

Comparison chart detailing Grok-1.5 Vision performance against its rival models
Image credit: xAI

While the results look promising, xAI’s Grok-1.5V outshines all its competitors in the RealWorldQA benchmark. According to the company, RealWorldQA is a new benchmark designed to evaluate basic real-world spatial understanding capabilities of multimodal models.

Well, it is pretty clear that Musk’s AI company is in no mood to take the backseat and is aggressively making moves to keep up with its rival. However, we can’t deny the fact that its AI models have received a fair amount of criticism in the past. More recently, Grok AI was criticized for misinformation and more.

Lastly, Grok-1.5V will soon be available to the existing Grok users and early testers out there. So, if you are among the early testers of Grok-1.5 Vision, please share your experience of using it with our readers in the comments.

More about the topics: AI, twitter