Microsoft's latest patent reveals a Copilot able to compose music that match videos and PowerPoint presentations

The technology has been patented, so who knows?

Reading time icon 3 min. read


Readers help support Windows Report. We may get a commission if you buy through our links. Tooltip Icon

Read our disclosure page to find out how can you help Windows Report sustain the editorial team. Read more

copilot compose music

While the Redmond-based tech giant has started updating Copilot with a brand-new interface that makes the AI model stand out with a sleek look, it seems the company has even bigger plans for it.

In a recently published patent, Microsoft is developing an Artificial intelligence model for composing audio scores that can create music or audio that matches videos, text, PowerPoint presentations, virtual realities, or even video games in development.

The paper, titled suggestively, “Artificial intelligence model for composing audio scores,” discusses the methods this Copilot would use to create music.

First, it sets off to collect data, gathering a large amount of training data, which includes many audiovisual datasets containing both video and audio components.copilot compose music

Each of these datasets is analyzed to extract different types of features. For example, it would look at the video’s visual features and elements, such as colors, shapes, movements, and scenes. Any text that appears in the video, such as subtitles or on-screen text, would also be extracted. Lastly, in-video audio features, such as sounds and music, are already present in the video and not part of a musical score.

After extracting them, Copilot would analyze them and find a correlation between these features. For example, certain scenes (like a sunset) often have specific types of music (like calm, soothing tunes).

Copilot would be trained with these features, and using the correlation system, it would generate appropriate audio scores matching new videos’ visual and textual features.

In real life, this technology can be used in various applications, such as:

  • Film and Video Production: Automatically generating background scores for movies, TV shows, or online videos.
  • Advertising: Creating music that perfectly fits the mood and message of commercials.
  • Gaming: Producing dynamic soundtracks that change based on the game’s visuals and actions.
  • Virtual Reality: Enhancing immersive experiences with audio that adapts to the visual environment.

With the ability to compose music, Copilot could also save time and ensure that the audio perfectly complements the visual content by automating the process of composing audio scores.

It’s worth mentioning that the AI model can somehow create music at a very rudimental state using the SUNO plugin, which was released earlier this year.

However, an improvement of that plugin would be more than welcome. It would allow creators to pin down their product’s music concept before pitching it to an actual music composer.

While the issue of actually replacing a music composer should be considered, ultimately, giving Copilot the ability to compose music would only streamline productivity down the line. But what are your thoughts on this?

You can read the paper here.

More about the topics: AI, Copilot

User forum

0 messages