Azure text-to-speech avatar might disturb users with its uncanny valley characteristics

Microsoft will surely refine the technology in time to make it more human.

Reading time icon 4 min. read


Readers help support Windows Report. We may get a commission if you buy through our links. Tooltip Icon

Read our disclosure page to find out how can you help Windows Report sustain the editorial team Read more

azure text to speech avatar

Microsoft announced the release of the Azure text-to-speech avatar at the Microsoft Ignite conference taking place in Seattle from November 14 to 17, 2023. The Azure avatar is now in public preview, and Azure users everywhere can build their avatar only with text inputs.

We are excited to announce the public preview release of Azure AI Speech text to speech avatar, a new feature  that enables users to create talking avatar videos with text input, and to build real-time interactive bots trained using human images.

Microsoft

The Redmond-based tech giant thinks the Azure text-to-speech avatar might be a suitable solution to combat traditional video content creation, and small companies, such as startups, could greatly benefit from such a tool.

Traditional video content creation requires a lot of time and budget, including setting up video shooting environment, filming videos, editing, etc. With text to speech avatar, users can more efficiently create video. Users can use the avatar to build training videos, product introductions, customer testimonials, etc., simply with text input. 

Microsoft

The text-to-speech avatar can be used for various applications:

  • A chatbot for a travel website
  • Virtual sales in a live commercial
  • AI teacher who teaches online and can answer questions
  • A virtual HR to respond to employees’ questions

While the tool will be quite useful to many companies, it can also generate videos that somehow lack the full spectrum of human expressions. Here’s why:

The Azure text-to-speech avatar could be useful, but it doesn’t feel real

It’s important to know that Microsoft offers 2 ways to generate an avatar:

  • Prebuilt text-to-speech avatars, with Microsoft providing a list of options users can choose from; these avatars will be able to speak different languages and have different voices based on the input received from users.
  • Custom text-to-speech avatars enable users to build their customized avatars using real-life images and videos. The system will take those resources and will automatically come up with an avatar that matches those characteristics. An important feature is that the system will make an avatar resemble the user if the user provides their voice and appearance.

Even so, the avatars lack certain expressions, a fact that makes them look quite robotic.

Let’s take the 2 video examples Microsoft posted on their blog post about the products. Both are generated using the Azure text-to-speech avatar. The first one, as you can see below, features an avatar showcasing how users can generate video content using Azure avatars.

From the YouTube thumbnail, you can’t tell that the model presented in the video is actually an avatar, but as soon as you play the video, it becomes clear that it’s entirely AI-generated. The synchronization between the avatar’s facial expressions and their voice is somewhat odd.

The Azure text-to-speech avatar technology allows the building of interactive avatars, the second example that showcases the feeling of uncanny valley (something that acts like a human person, but it’s not human).

As Microsoft says, the interactive avatars utilize the Azure OpenAI Service GPT-3.5 model to respond to customer queries, including verbal dialogs with customers in different languages. This alone makes it incredibly useful, but again, the interaction looks artificial and devoid of any human interaction, which could be disturbing for some.

Take a look here:

In time, Microsoft might solve this issue, and with the new AI technologies emerging, the Redmond-based tech giant could transform the Azure avatar into an industry-to-go tool. Why? Because companies are already loving the tool.

We are using Azure AI Services for our AI Banking Avatar due to the unique combination of leading-edge AI and Visualization services in one platform. By using different Azure AI Speech text to speech avatar we will be able to generate a next level customer experience and really simplify banking and banking interactions.

Gerald Ertl, Managing Director, Commerzbank AG

However, Microsoft hasn’t taken into consideration the customers’ interactions with these avatars. While they could be a much cheaper option for companies (and faster too, a marketer should be able to create AI-generated tutorials without resorting to external sources), the lack of any meaningful physical expressions makes these avatars look like robots.

AI cannot be ignored, especially if we’re talking about tools such as Copilot on Windows 11 or Microsoft 365, but when it wants to resemble humans, it could get quite uncanny.

Microsoft will refine these avatars, there is no doubt about it, but for now, there is a feeling down my spine every time I look at one of them, forcefully grinning or having no expression at all.

What do you think about these avatars?

More about the topics: AI, microsoft