Microsoft's might want to turn Copilot into a multi-modal AI chatbot

Microsoft released a paper detailing the tech.

Reading time icon 2 min. read


Readers help support Windows Report. We may get a commission if you buy through our links. Tooltip Icon

Read our disclosure page to find out how can you help Windows Report sustain the editorial team. Read more

Microsoft multi-modal chatbot

Microsoft recently unveiled that Copilot, its AI model, will become native on Windows, and it will rely on NPU systems to complete tasks, which is excellent news, as the tool might finally become popular amongst users; however, there is good reason to believe that the Redmond-based tech giant might turn Copilot into a multi-modal AI chatbot, and we have found evidence for it in a recent published paper that talks about a tool capable of communicating with users through all sorts of media-based responses.

The paper, called Videochat, discusses a new chatbot technology called “multi-modal chatting.” This technology allows chatbots to interact with humans using different responses, including text, images, videos, and sounds.

Instead of replying to users with text (but it can still respond with it, if the users want to), Microsoft envisions this multi-modal chatbot capable of communicating using various types of media (like pictures, videos, etc.).

The goal? First, Copilot could express itself in more diverse and affluent ways using different responses. This would make the conversation more engaging and enjoyable for users and easier for them to get along.

Second, a Copilot capable of multi-modal responses in a natural way would make it more practical in the context of an operating system, such as Windows. The way the paper describes the videochat-based multi-modal chatbot is similar to a search engine, such as Google, or Bing, but capable of understanding the user.

For example, the tool uses a video search engine to take users’ queries and respond to them accordingly when communicating through videos.

Could it be possible? Surely, so. The technology could potentially open dozens of opportunities for Microsoft: the AI model can become Windows’ de facto search engine, without having to access Google. It can also act as a video aggregator, showcasing videos, and other forms of content.

What do you think about this?

You can read the whole paper here.

More about the topics: camera, microsoft

User forum

0 messages