Microsoft wants users to interact with elements, such as persons and objects, while watching video content

The technology is absolutely revolutionary, but it's quite eerie.

3 min. read

Published on January 22, 2024

published on January 22, 2024

Readers help support Windows Report. We may get a commission if you buy through our links.

Microsoft recently filed a patent describing a technology capable of allowing people to accurately detect, identify, and interact with elements, such as persons and objects while watching video content.

The patent, called Detecting Prominence of Objects in Video Information, describes in detail how this technology would be used to facilitate interactive shopping, but also tracking and identification, in a system that would surely send shivers down the spine.

The patented technology, which can be read in its entirety here, works this way:

There is a system of processing videos that uses machines to find and follow people who show up in the videos.
The system then gives a score to each person based on how important they are in the videos, to make a list of scores.
The score of a person shows how much they can be of interest to viewers. For example, the score of a person shows, partly, how much they show up in the videos.
The system gives the scores based on information that is specific to each person. The system makes this information by adding up features that belong to a certain person.

The technology could be used in a variety of applications. It could be used to track down a certain product, such as a sweater a character in a movie is wearing, effectively allowing the user to identify it, and save the information for later use, as you can see in the image below. microsoft interactive video content

However, it can also be used to accurately detect and identify persons, either public persons or private persons, by accessing a database and searching for similar faces.

A face detection component determines the identities of the people who appear in the video information by recognizing their faces. For instance, in some implementations, the face detection component determines whether any of the individuals that appear in the video information, have been previously identified as public persons, such as celebrities, or politicians.

One of the most eerie aspects of this tool would be its ability to capture the emotions of each face, by making use of audio and video content to put together a map of the emotions expressed by the subjects of the video.

The emotion detection component dtects emotions of interest by determining whether the audio information contains predetermined sounds indicative of these emotions.

As such, the technology could be used by various parties, from enterprises and companies to regular users, and or even police departments, to place products, and persons in video content, and later detect and identify those products and persons, using the same technology.

Microsoft has been filing many patents, such as the one describing a technology that would make Teams meetings hyperrealist, but not many see the light of day.

However, as video streaming platforms are taking hold, and video content is the predominant form of media, with platforms such as TikTok, Instagram, and other places, a technology like this would surely be a game changer, and it could revolutionize the way we consume video content.

But it is quite eerie.

More about the topics: microsoft, Video streaming

Flavius Floare

Tech Journalist

Flavius is a writer and a media content producer with a particular interest in technology, gaming, media, film and storytelling. He's always curious and ready to take on everything new in the tech world, covering Microsoft's products on a daily basis. The passion for gaming and hardware feeds his journalistic approach, making him a great researcher and news writer that's always ready to bring you the bleeding edge!