The future of voice – artificial intelligence (AI) or human?
Rachel Griffiths, Client Director at RADA Business
Yesterday, our new AI voice assistant arrived.
This morning, without prompting, my three-year-old son used it to request the weather forecast for today. He managed to do it. I felt two things: uncomfortable that he could be granted a request without having to say please or thank you, and amazed at how quickly he adapted to this new visitor in our lounge.
I can’t help thinking that this is a sign of things to come. Frequently talking to virtual assistants like Siri, Alexa or Google Assistant is becoming commonplace.
As we welcome these increasingly familiar voices into our lives, what (if there are any) are the implications for the evolution of the human voice?
Last year, RADA Business tutor Liz Barber joined speakers from organisations such as the BBC and Amazon at an event hosted by Apadmi in Manchester to consider this question. There are few people more experienced to have a voice on this topic. Liz holds a Postgraduate Degree in Voice Studies from the Royal Central School of Speech and Drama, with her Masters research focussing on ‘AI and voice’.
It was quite an event, here are just some of the things that stood out from Liz’s presentation.
What can artificial intelligence voice development learn from the world of acting?
Our voice is determined by what we do with our body
There’s a reason why some people find the sound of AI voice assistants soulless. Their lack of physical presence means that you don’t hear a personality.
Voice is a result of a dynamic relationship between mind and body. It is a physical process. Actors understand this. Their profession is grounded in the relationship between physical presence and voice. They begin to study any character by first examining their physical presence. They find the character’s physicality first because they know that it is the body that determines the breath and ultimately defines their voice.
At RADA Business, this is the foundation of all our work. Body, breath and voice. These are the qualities that underpin every powerful performance.
The need to develop your sonic persona
In AI voice terms, brands and companies are having to develop their sonic persona; their personality in sound. The sonic persona acts as the audio ‘key’ or ‘hashtag’ that enables customers to access the brand’s specific audio content. The aim for the brand is to get their phrase or words on everyone’s lips so that they can engage effortlessly with their content through AI voice.
Without a physical presence, the sonic persona is literally just words. Voice is so much more.
With a physical presence, your voice is the single most powerful expression of your personality. Finding your authentic voice is fundamental if you are seeking to deliver an exceptional performance. It takes skill and practice. An actor’s training will start with recognising and ‘unlearning’ the habits they have developed through life, which get in the way of their free voice. They work with body and breath to strip their voice back to its most natural and neutral. Only when this is done can they enable their true vocal range and begin to embody the voice of their character.
Voice reveals emotions; it doesn’t describe them
The voice is a powerful way of expressing emotion. Is it possible to programme AI voice to express (or respond to) emotion, purely through words? I don’t think so.
Our voice is psychosomatic; it reveals our emotions. That is because our voice is rooted in our personality and in our persona – literally meaning ‘through sound’.
Imagine that you are waiting in a restaurant and the person you are meeting is 30 minutes late. You are cross. They arrive and you say, ‘you’re late.’ Notice what happens to your body, your breath and your voice. Now repeat the same words, imagining that you are delighted that they have only just arrived because you are late too. Notice how your emotion changes your body, how your breath changes, how your voice alters to create a very different impact.
They may be machines but they still exert an influence over us
AI voice can trigger some extreme emotions in us. Anger, frustration and disappointment can be experienced if the piece of content that you are seeking is not located, or your request is misheard after the sixth time of asking.
As humans, we use emotion in our voice to connect with and move our audience. With AI voice assistants, our intent and emotion is significantly impacted. In the need to find content and not to move our audience, emotion becomes redundant.
When we interact with AI, we change our vocal production and over-ride our natural voice. Our tone and our pitch alters, our volume increases, our pace slows. The arrangements of our words and phrases are more consciously constructed to give ourselves the best chance of obtaining the exact piece of content that we are seeking. Our body, breath and voice becomes almost robotic.
Human voice versus AI voice, what does the future hold?
Without doubt, AI voice will continue to proliferate in society. What is in question is how the human voice will evolve with our increasing engagement with AI voice. Will we replicate the disembodied characteristics of AI voice? Or are we going to develop an entirely new form of communication from human to machine?
The one thing that will never change is the richness, complexity and capability of the human voice.