Microsoft's new patent unveils AI capable of turning background audio noise to images

The tool can be used during meetings or chat sessions.

News

3 min. read

Published on February 25, 2025

by Flavius Floare

published on February 25, 2025

Share this article

Readers help support Windows Report. We may get a commission if you buy through our links.

AI is one of Microsoft’s most pressing topics, and the Redmond-based tech giant has been investing a lot of time and resources into developing models to be used as assistants in Windows or Microsoft 365 products. From Copilot in Windows 11 to Windows Recall and other AI-powered features, the technology definitely has its moment.

As such, it should be no surprise that Microsoft is working on yet another AI technology that might soon be coming to Windows 11 and Microsoft 365 platforms, such as Teams. In a recently published paper, the Redmond-based tech giant unveiled a new patented technology that seeks to use AI models to turn background noises in virtual meetings into images.

This text, Intelligent Display of Auditory World Experiences, describes an AI system that creates smart visual displays from sounds, especially during meetings or chat sessions. The system listens to different sounds and speech during an event. These could be speech from participants or other sounds in the environment.

It features several Specialized AI Models, such as a Sentiment Recognition Model, a Speech Recognition Model, and an Audio Recognition Model.

The first model can understand specific speech characteristics, such as how loud someone speaks (volume) or the emotional tone (happy, sad, angry, etc.). It helps understand how someone is feeling or the intensity of their speech.

The second model picks up essential keywords or phrases from the speech. It can highlight these words or phrases in a transcript to show which parts of the conversation are important or relevant.

The third model, the Audio Recognition AI, analyzes non-speech sounds, such as applause, laughter, or background noises. Based on these sounds, it identifies different events that are happening.

The system combines all this information to create visual displays, including Speech Characteristics, showing visual indicators for volume or tone. It can highlight keywords by displaying essential keywords or phrases from the conversation, and for non-speech events, it indicates events like applause or background noises.

Why is it important? The system would make understanding and interacting with what’s happening during an event easier. Visual indicators allow users to see the event’s mood, essential points, and other happenings, including potential incidents, such fire alarms, and so on.

It can be a perfect accessibility tool for hearing-impaired users, allowing them to experience all the virtual meeting or chat interactions.

Microsoft has released dozens of accessibility features for Windows 10/11 and Microsoft 365, so the chances of the company releasing this AI system are high.

You can read the paper here.

Leave a Reply Cancel reply