Voice assistant technology is in danger of trying to be too human

(opens in a new window)tomasso79/Shutterstock

Posted 25 October, 2019

(opens in a new window)Leigh Clark, (opens in a new window)Swansea University and (opens in a new window)Benjamin Cowan, (opens in a new window)University College Dublin

More than 200m homes now have a smart speaker providing voice-controlled access to the internet, according to (opens in a new window)one global estimate. Add this to the talking virtual assistants installed on many smartphones, not to mention (opens in a new window)kitchen appliances and (opens in a new window)cars, and that’s a lot of Alexas and Siris.

Because talking is a fundamental part of being human, it is tempting to think these assistants should be designed to talk and behave like us. While this would give us a relatable way to interact with our devices, replicating genuinely realistic human conversations is incredibly difficult. What’s more, research suggests making a machine sound human may be (opens in a new window)unnecessary and even dishonest. Instead, we might need to rethink how and why we interact with these assistants and learn to embrace the benefits of them being a machine.

Speech technology designers often talk about the (opens in a new window)concept of “humanness”. Recent developments in artificial voice development have resulted in these systems’ voices blurring the line between human and machine, sounding (opens in a new window)increasingly humanlike. There have also been efforts to make the language of these interfaces appear more human.

Perhaps the most famous is Google Duplex, a service that can (opens in a new window)book appointments over the phone. To add to the human-like nature of the system, Google included utterances like “hmm” and “uh” to its assistant’s speech output – sounds we commonly use to signal we are listening to the conversation or that we (opens in a new window)intend to start speaking soon. In the case of Google Duplex, these were used with the aim of (opens in a new window)sounding natural. But why is sounding natural or more human-like so important?

Chasing this goal of making systems sound and behave like us perhaps stems from pop culture inspirations we use to fuel the design of these systems. The idea of talking to machines has fascinated us in literature, television and film for decades, through characters such HAL 9000 in (opens in a new window)2001: A Space Odyssey or Samantha in (opens in a new window)Her. These characters portray seamless conversations with machines. In the case of Her, there is even a love story between an operating system and its user. Critically, all these machines sound and respond the way we think humans would.

We need to remember virtual assistants aren’t human. (opens in a new window)Phonlamai Photo/Shutterstock

There are interesting technological challenges in trying to achieve something resembling conversations between us and machines. To this end, Amazon has recently launched the (opens in a new window)Alexa Prize, looking to “create socialbots that can converse coherently and engagingly with humans on a range of current events and popular topics such as entertainment, sports, politics, technology, and fashion”. The current round of competition asks teams to produce a 20-minute conversation between one of these bots and a human interactor.

These grand challenges, like others across science, clearly advance the state of the art, bringing planned and (opens in a new window)unplanned benefits. Yet when striving to give machines the ability to truly converse with us like other human beings, we need to think about what our spoken interactions with people are actually for and whether this is the same as the type of conversation we want to have with machines.

We converse with other people to get stuff done and to build and maintain relationships with one another – and often these two purposes intertwine. Yet people see machines as tools serving limited purposes and hold little appetite for (opens in a new window)building the kind of relationships with machines that we do every day with other people.

Pursuing natural conversations with machines that sound like us can become an unnecessary and burdensome objective. It creates unrealistic expectations of systems that can actually communicate and understand like us. Anyone who has interacted with an Amazon Echo or Google Home knows this is not possible with existing systems.

This matters as people need to have an idea of how to get a system to do things which, because voice-only interfaces have limited buttons and visuals, are guided significantly by what the system says and how it says it. The importance of interface design means humanness itself may not only be questionable but deceptive, especially if used to fool people into thinking they are interacting (opens in a new window)with another person. Even if their intent may be to create intelligible voices, tech companies need to consider the potential impact on users.

Looking beyond humanness

Rather than consistently embracing humanness, we can accept that there may be (opens in a new window)fundamental limits, both technological and philosophical, to the types of interactions we can and (opens in a new window)want to have with machines.

We should be inspired by human conversations rather than using them as a perceived gold standard for interaction. For instance, looking at these (opens in a new window)systems as performers rather than human-like conversationalists, may be one way to help to create more engaging and expressive interfaces. Incorporating specific elements of conversation may be necessary for some contexts, but we need to think about whether human-like conversational interaction is necessary, rather than using it as a default design goal.

It is hard to predict what technology will be like in the future and how social perceptions will change and develop around our devices. Maybe people will be ok with having conversations with machines, becoming friends with robots and seeking their advice.

But we are currently sceptical of this. In our view it is all to do with context. Not all interactions and interfaces are the same. Some speech technology may be required to establish and foster some form of social or emotional bond, (opens in a new window)such as in specific healthcare applications. If that is the aim, then it makes sense to have machines converse more appropriately for that purpose – perhaps sounding human so the user gets the right type of expectations.

Yet this is not universally needed. Crucially, this human-likeness should link to what the systems can actually do with conversation. Making systems that do not have the ability to converse like a human sound human may do far more harm than good.

(opens in a new window)Leigh Clark, Lecturer in Computer Science, (opens in a new window)Swansea University and (opens in a new window)Benjamin Cowan, Assistant Professor, School of Information & Communication Studies, (opens in a new window)University College Dublin

This article is republished from (opens in a new window)The Conversation under a Creative Commons license. Read the (opens in a new window)original article.

Explore UCD

About UCD

Students

Research & Innovation

Colleges

Engage

Key Services

Voice assistant technology is in danger of trying to be too human

Looking beyond humanness

Latest

Most shared

UCD academics on The Conversation