Part 1: Voice-First Experiences for Kids

June 22, 2021

Rectangle Circle

Voice technology offers education publishers, toy and game developers new avenues and approaches to creating immersive learning and play experiences for kids. From working closely with our clients, SoapBox Head of UX Declan Moore and Senior Software Engineer Ronan Tumelty see these voice-driven experiences come to life on a first-hand basis. 

In this Part 1 of our new Beginner’s Guide series, Declan and Ronan walk us through a sample use case for speech recognition tech in a kids’ app and share some of the lessons and best practices they’ve learned working with clients to design and integrate voice experiences into their apps, tools, and games for kids.

Introduction:

Product managers, designers and engineers all aim to create experiences that work seamlessly for kids, and bring them confidence, joy and engagement. But, interactive voice experiences are still brand new and our understanding of what actually works for kids is not a one-size-fits-all exercise. It’s still very much evolving.

Voice interactions, although intuitive, are still new to a lot of people.

This Beginner’s Guide is written for those who aspire to create and deliver great voice experiences for kids online, offline and on-device. In Part 1 we walk through an online experience and kick off the discussion with 3 core concepts:

3 Core Concepts:

  1. Voice-First Experiences

A “voice-first” or “voice-led” experience is when a child uses their voice as the primary means of interacting. This is distinct from a “voice-only” experience – think screenless smart speakers like Siri or Alexa, where there is little or no UI. 

  1. Input and Outputs

In a voice-first learning or play experience for kids — be it a tool, toy or game experience — voice is rarely used in isolation. Other interfaces are also leveraged like physical buttons, touch screens and sensors, that offer the designer a wide range of inputs or outputs to work with.

In the voice-first use case detailed below, the child is using a tablet. On an average tablet we can assume we have access to the following:

ogyF7hJOQvmm l3ctTBUEDH72LM5uWgfEPlqMvAjJV6TYBoMuP0MWWJOGaUzGwlcW2dTO0N32DcYzD5qMa rUS3rSu6SbLZCzCr97

Making all of these inputs and outputs gel together and feel natural is key to the success of any kids’ voice experience.

In the below use case we’re showcasing an online experience using a tablet and leveraging its graphical interface, touchscreen, microphone and audio capabilities. We’ll use the microphone as the input, and as outputs, visuals and audio. The same principles apply to a browser-based/web/internet driven experience.

  1. How Visuals Support Voice

While voice experiences come in many different shapes and sizes, there are a number of key considerations to take into account when designing for kids. In an interactive play experience on a tablet, for example, visuals provide context, clarity and direction. Visuals help a child to:

  • Know when they can speak
  • Know when the system can hear them
  • Know when the system is taking action
  • See the results of the action

A Sample App Experience

Let’s step through a sample App experience, and going forward, let’s refer to the product as the “App” and the user as the “child”. To deliver maximum clarity and immersion in this experience, kid-specific voice technology like the SoapBox voice engine is combined with third party voiceover, visuals and sound effects. 

The aim is to deliver an experience that doesn’t require the child to touch the screen once the game has started.

Note: For accessibility and pedagogical reasons related to some products, there may need to be more ways than one to interact with a given activity.

In this example, the child is playing a reading game.

pgDQVTauMSQ37DK 3R

Step by Step:

  1. Present the experience
  2. Starting the voice interaction
  3. Show the App is listening
  4. Let the child know they can be heard
  5. Ending the voice interaction
  6. The processing/thinking state (optional)
  7. Providing feedback

01. Present the Experience
This is the default state of the experience. The Voice UI is in its “off” state

tablet01

02. Start the Voice Interaction
We then instruct the child on what to do. In this case we use an audio prompt.

tablet02

Lessons:
– Simply asking a child to read may prompt them to read silently in their head. Be clear that they should read out loud by giving them audio or visual instructions, or both.
– Ensure any audio – instructions, background music or sound effects – are complete before moving to the next step. Loud audio may impact the voice performance.


03. Show the App is listening
Give the child a cue that they can speak, that the App is ready. To communicate this clearly, we’ve scaled the UI, coupled with a color change.

arc01 03

The blue circle with the dots symbol will by now be familiar to the child as we will have introduced it during the App on-boarding.

tablet02

Lesson: Be consistent with any UI element you choose. This use case is no different to a Home or Back button experience in a regular GUI.


04. Let the child know they can be heard
With a touch based UI interaction you can give feedback through physical, visual, auditory or animation cues. A “hands-free” experience doesn’t have those options but the feedback is just as important. In this example we’re using an audio reactive animation.

tablet04

The size of the white bars will react, in real-time, to the amplitude of the audio coming in from the mic. This signals to the child that the App is listening. Think of it as the digital equivalent of eye contact or nodding during a conversation.

Lesson: Strike a balance between input and UI. Make sure the child focuses on the text they are reading and is not distracted by the UI.


05. Ending the Voice Interaction

tablet05

Once the child has finished speaking, we want to stop listening and score their audio. At that point, we need to show another visual cue to let them know the recording is stopping.

arc04 07

Lessons:
– If we do this immediately we’ll inadvertently cut off a child who wants to speak after a pause. To deliver more flexibility and ensure a positive experience, the UI needs to revert back to its “off” state.
– Remember, we are still listening for speech, so if at any point the child starts to speak again, we need to immediately jump back to listening.


06. The processing/thinking state (optional)

tablet06

Once the App has stopped listening for speech two things need to happen:

  1. Let the child know that they can not interact using their voice at this point.
  2. If work needs to be done by the App before the child can interact again (processing of data, gamification calculations) we must show this.

To reduce the potential for frustration the child needs to understand that the App is “thinking” and that while thinking, speech isn’t available.


07. Provide Feedback

tablet07

In this example, the SoapBox voice engine has scored the child’s reading attempt. The score set for “correct” is 70% and since the child has exceeded it the App plays celebratory/correct feedback.

One of the most powerful things you can show a child is meaningful feedback. Showing them that their voice has determined the outcome is magical.


On Visual Metaphors

FouhR3xTeOBVXKMzc6Q806Bt1eDSgJBZFeLYJQvkfMcApxJly Lvl6Oa26pwpx1Fsjvwv82bA2l50e6F87IiokrwHePwsTgtmtIQ4q 5HaQPQNzknwh7HCumG6uamkbIwf Z4YNI

Voice Interfaces (VUIs) are still evolving. We haven’t yet reached a point of standardization, and different visuals may suit different interactions. So when testing your product give consideration to your visual cues. You and I may know what a microphone icon represents but will every 4, 5 or 8 year old?

What’s Next?

From the child’s perspective, they’ve just read a short sentence out loud and the App has reacted and fed back to them. Behind the scenes however, so much more is happening! 

As an inventor, designer, or engineer involved in creating these kinds of experiences for kids, there remain so many questions, like:

  • How do I know when to start listening?
  • When do I stop listening (or how long do I listen for)?
  • How do I score the child’s reading?

Do not fear! We’ll answer these and more in Part 2 of this Guide, and following on from there, we’ll be sharing best practices around microphone usage, and how voice data can help with scoring and gamification of voice-first products.

Get In Touch

Voice experiences for kids are as varied as the approaches we take to tackling them. At SoapBox we may be experts when it comes to speech recognition for kids, but we’re still at the beginning of an exciting journey of experimentation and learning when it comes to voice-first experiences for kids, and we’d love to keep learning as we help you with yours.

All feedback, questions and suggestions for future topics in this Beginner’s Guide series are welcome. Watch this space for Part 2 and if you’d like to get in touch just email us: Hello@SoapBoxLabs.com

Share this