Voice Design Series, Part 2: VUI 101 - Getting Practical

Before developing and integrating a VUI strategy into your marketing plan, the first step is to understand the technology, including key concepts and nomenclature. It is critical to keep in mind that the way designers approach VUI versus GUI (graphical user interface) will differ. In some ways, it will flip a designers practical training upside down. Rather than creating visual experiences and graphical content to drive a physical action (click), the voice-based designer must foundationally think about, among other variables¹:

A Brand’s Voice: Everything from the gender voice of your brand, to their pitch, tone and conversational style. Unlike graphical interfaces, your brand now has an identity that can be heard but not seen, so it’s critical to think through and workshop “who” the brand truly is. Personas and journey mapping can be a valuable tool here.
Anticipatory Design: Given the infinite number of questions and responses, its critical to set up in the architecture in such a way that is structured, and the conversational steps between the user and Alexa can be anticipated in order to develop the optimal user experience.
Brevity and Mental Models: Prioritizing information is critical in voice design, and that starts with understanding the user’s mental model when it comes to their commands and goals. Less is more.
A “Visual” Information Architecture: There’s no back button, home button, scrolling, etc. That’s obvious. So, designing for VUI requires a very clear structure of possible commands and a well-vetted information architecture to ensure that users are navigated down a conversation that achieves their goals, without extraneous content to veer them off-path. Creating a strong “error strategy” is critical so users know when things have gone amiss. Anticipating users’ needs and how they’ll phrase requests is critical—we don’t want you to go through prompts if you can just say what you want!

VUI 101 – Know the “New” Lingo

Designers should understand some common terms for VUI as a baseline for moving forward:

Skills – Conversational applications that can be added to your Alexa device, much like an app on a smartphone.
Wake word – What your Amazon Alexa device listens out for before it bursts into action, such as “Alexa, can you…?”
Invocation – The invocation name that users will use to invoke and interact with your skill, which does not need to (although it often does) match your skill’s name, but must be relate to your skill for easy recall.
Intents² – An intent is what a user is trying to accomplish. Within the code, this is how you define your function. ‘Intent’ doesn’t relate to the specific words that a user says, but the high-level goal they are aiming for.
Utterances – The specific phrases used when requesting a response from Alexa (e.g. “What time is it?”).
Welcome/Help – A welcome message that provides users brief help on how to use the skill.
Out of bounds – These are requests your skill isn’t designed to accommodate. Like a well-designed 404 page, the out of bounds requests should be handled gracefully and redirect the user as much as possible.
One shots – A one-shot utterance is given all at once and fully satisfies what is needed to activate an intent. These are the most direct way for a user to interface with your skill.
Smart homes – A home equipped with lights, outlets, thermostats, cameras and other electronic devices that can be controlled remotely by phone or computer.
Alexa for business (closed systems) – Utilizes information about the devices, user accounts, and skills in an organization to respond or perform the requested action – such as scheduling or starting meetings, etc. A notable example is Marriott Hotels venture to let you order room service, housekeeping, and check out via voice.
SDK potential – Alexa has SDKs (Software Development Kits) for Java, Node.js, and Python, which are sets of software development tools and libraries that give you programmatic access to Alexa features.

Applying the “Old”, Familiar Design Heuristics

At Red Privet, we’ve begun work with clients on Alexa-based VUI projects (ask us about it). And what we’re learning is that, while the ever-changing landscape can be both challenging and new, there are still some familiar design heuristics (principles) that have direct relevance to VUI. Below are some key design principles to keep top of mind when developing an Alexa skill:

Recognition versus Recall – “Recognition refers to our ability to “recognize” an event or piece of information as being familiar, while recall designates the retrieval of related details from memory.”³ This one is huge given there is no inherent recognition in VUI. For that reason, recall – which designates the retrieval of related details from memory, is important as is feedback (see below). Since users cannot “browse” the functionality, we need to find simple, unobtrusive ways to present capabilities and options to them … even repeatedly.
Balance of structure and freedom – Include prompts and conversation versus one-shots and direct commands.
Feedback – Provide ample information to communicate what has happened and why. Note Alexa has introduced “Brief” mode where some conversational feedback items are now replaced with a tone acknowledgement.
Layered – Present more detailed content/options based on a past selection. Voice can be annoying when responses are too long. Layering can allow users to get just what they need or ask.
Error prevention – Include validations wherever possible. Robust utterance synonyms can meet the user’s intents without causing and error. Prompts that list options, not only present the user with what they can do but also models the way to phrase the command.
Error recovery – Handle errors gracefully. Out-of-bounds messages as well as repeating options when a response can’t be understood.
Answers descriptive questions – Along the way, be sure users can understand: “What is this, what does it do”. The standard Welcome Message in your skill is the ideal place for this!
Answers procedural questions – Such as: “How do I…?”. The standard Help Message in your skill can support this principal.
Hicks law – More options are harder. Keep it simple. This is especially true in voice where trying to listen to, much less remember, long option lists can be frustrating.
Ockham’s razer – The simplest of two equivalent design options is best. Never more true in voice. Keep it simple, be concise.
Progressive disclosure – Present only necessary or requested information.
Signal to noise ratio – Eliminate irrelevant information (particularly in prompts or answers).

Conversational versus Command (or Hybrid)

In recent years, we have also seen a progression from conversational and engagement-rich communications (e.g. writing formal emails and complex graphical UI interactions) to a preference for succinct (SMS conventions such as U2; l8tr) and command-line-like interfaces (slack). It isn’t clear that just because we are now interfacing with voice we will always prefer conversational interactions versus direct commands. So as we move forward with VUI its important to understand that there may be conversational-based skills, command-based skills, and hybrid – and when they’re most appropriate in design.

We can certainly see how a conversational UI, particularly one tied to smart agent technology could benefit certain task contexts, like planning a wedding. However, currently the most popular Alexa functions are command based like “set a timer” or “play a song”.

In our next blog, we’ll talk about tips to strategically approach your Alexa app to best suit your project’s goals.

Sources:

1. Babich, Nick. (2018). How conversational interfaces will change our lives. https://theblog.adobe.com/how-conversational-user-interfaces-will-change-our-lives/.

2. Screenmedia (2017). Intents, Utterances, and Slots: The New Vocabulary Needed To Develop For Voice. https://medium.com/screenmedia-lab/utterances-slots-and-skills-the-new-vocabulary-needed-to-develop-for-voice-7428bff4ed79.

Our thinking