Ulas leads the Actions on Google Developer Platform, building the APIs and runtimes for actions. At Google, he previously worked on Android, search and NLU initiatives. Before that Ulas co-founded and worked in several startups in the search and recommendations domains, after starting his career in the networking equipment space. He holds a bachelor's degree in Computer Science from Harvard University.View the profile
Saba is a senior interaction designer on the Google Assistant team. She helps create frameworks for Google and third party experiences that are scalable across existing and emerging devices, including Smart Displays. Previously, she lead design for Help and Feedback across Google products. Prior to Google, she worked at Ancestry.com and Inflection. She holds a bachelor's degree with honors in Engineering Sciences from Harvard University.View the profile
About the talk
In an ever-expanding world of surfaces for the Google Assistant, developers and designers face new challenges in creating effective and consistent experiences for users. Drawing on examples, this talk will highlight some key considerations for designing Actions across surfaces - with tips to understand when and how to leverage visuals alongside voice interactions, to create rich, delightful conversational experiences.
My husband and I have been in a somewhat long distance relationship for a few years now. We talk on the phone and message a lot but perhaps what's most enjoyable for me is when I visit him. I leave little notes all over his apartment. And sometimes when I'm home alone, you'll start changing the color of the smart lights in my living room to let me know that he's thinking of me. Are human conversations take a lot of different forms. You don't just talk verbally, depending on where we are and what we're trying to say. We might right hold hands or even
change color of smart lights. Is his human conversations that have inspired us to create conversations with technology? It's no surprise then that we're starting to incorporate more modalities into our digital interactions as well. What's with this increase number of devices and complexity of interactions? It might feel overwhelming to design for the Google Assistant. If you look at the Google Assistant today, I can talk to it not just through the speaker in my living room, but also through my car or my headphones I can
tap on my phone or my watch. How do we design for an increased number of complexity of devices? I'm so ready and I'm an interaction designer on the Google assistant team. I'll be talking about how you can design actions across surfaces and give you some Frameworks and some tips engineer on actions on Google and later on. I'm going to tell you how to build an action using the design principle stop. I will talk about. Before we can get started to understand how to design actions across surfaces. We need to get a better understanding of
what does experiences can look like let's start with a journey through a user's name. You wake up in the morning to the sound of an alarm ringing on your Google home. But not even getting out of your blanket. You can say Hey Google stop. You get up get ready. And as you're about to head out you want to make sure you don't need an umbrella. Do you turn to the smart display in your hallway? And you ask? Hey, Google. What's the weather today? You're able to hear a summary and also at a glass see the hourly forecast. As you're walking to your car you take out your
phone and tap on it to ask the Google Assistant to order your favorite from Starbucks. It's able to connect you to Starbucks where you can quickly place your order. As you're driving if you want to listen to your favorite podcast for the news and you can ask Google Assistant for help in a hands-free way and I can connect you for example to NPR news update. You go about your day and when you come back home, it's time to make dinner for your family. Return to the smart display in your kitchen and you ask for example tasty for help with the recipe like in this case
Pizza phone. Stacy comes back and you can hear and see on the screen step by step instructions and ingredients. After dinner, it's time to unwind with your family in the living room you decide to play a buzzfeed personality quiz, and it's a great way for your family to get together around a shared device and have some fun. And then you had to bet you say Hey Google good night. It's able to start your custom bedtime routine which includes for example setting your alarms or telling you about your day tomorrow. Says he can see
users are able to interact with the Google Assistant similar to human interactions and that there were a lot of different ways and contacts. Although there are a lot of different devices. There are some overarching principles. So let's take a look at them. First experience was familiar whether it was morning evening or commute users can access their favorite Google Assistant actions Whenever Wherever they need them. This isn't was available in different concepts beside being used on the go and at home up close and from a distance and and shared and private setting. Does your thinking
about your actions think about all the different context and maybe used it? And lastly different devices themselves to different modes of interaction summer voice only summer Visual and somewhere mix of both and we'll talk a bit more about the strengths and weaknesses of each of these modalities in a bit. First let's take a deeper. Look at a couple of those devices and see how these principles apply. You've already heard about smart displays. Now. There are now it's earlier this year. And even though it's a new device users can expected to feel familiar like a Google home with a screen.
Is designed to be used at home from a distance and as a shared device? And as you can see in this shoe inspiration action example, even though there's a screen user still interact with the device through voice. They don't have to tap through complex app navigation and said the visuals are designed to deceive at a distance. And of course the user can walk up to the screen and touch it if they want. Next Mex take a quick look at phone. One thing to note here is that we're making phones more visually assisted as well similar to Smart displays allowing for a greater
focus on the content. These devices as you know are great for use cases on the go up close and in a private environment. And users can interact with the Google assistant on the phone through both voice and visuals. Hopefully they give you a better sense off with experiences a con Google Assistant look like across devices. Now in order to design for so many devices it helps to have a vocabulary tailed categorize them. Google use this design framework called the multimodal Spectrum devices based on their interaction
type. So unwanted you have voice only devices like the Google home and other smart displays that you have to hear or talk to. On the other end you have visual only devices like a phone or a Chromebook that's on mute and most watches. Do you have to look at these devices or touch them? And in the middle you have what we called multimodal devices and that there a mix of both. Cars and smart displays that rely primarily on voice but have optional visuals are known as voice forward devices. Phones and Chromebooks with audio on which can use
mix of both voice and visual are known as Intermodal devices. Now we have a vocabulary for categorizing these devices. But before we can start designing for them, it helps to understand the strengths and weaknesses of each of these modalities. Let's talk about voice first. What does great for natural input we've been using it for Millennia whether you're a kid a senior someone who's not tech-savvy it's still really intuitive. It's great for hands-free far-field use cases, like setting a timer in the kitchen while you're
cooking. And it helps reduce task navigation to for example, if you were out and run you could ask your favorite Google action, that's about stiffness and asking about your workouts instead of having to pull out your phone navigate to that app and search for that answer. Similarly, you can ask Google Assistant to play the next song without having to mess with any controls. Divorce has a lot of benefits is great, but it does have some limitations and that's where visuals come in. Think about the last time
you were at a cafe. You probably walked past all the pastries looked at the menu and made eye contact with a cashier. Think about how difficult that interaction would be if it was just to voice. Have a listen at what the menu would sound like if it was through voice alone? espresso latte vanilla latte cappuccino mocha Americano flat white hot chocolate black coffee and tea That feels pretty overwhelming right? It's like watching all the options go by and catching the right one. It's like looking at a ticker. What is the
femoral in linear and that makes it very difficult to hold a lot of information in your head? By contrast the menu is a lot easier to scan if it looks like this. And you can imagine that the problem gets compounded if you also had to compare prices and calories. The visuals are great for scanning and comparing. We also use them a lot to reference objects in the world so I can look at all the big goods and then pointed the one that I want instead of for example having to hear or stay out loud something like that small sugar cookie with a chocolate drizzle.
Voice and visuals both have their benefits and it often is useful to use both. In this example, we usually prefer to look at the menu, but then talk to the cashier in order to check out and pay. Similar benefits to using both voice and visual exist in the real world as well as well. And that's what makes it such a unique Opportunity by leveraging The Best of Both wise and visuals are able to provide really rich interaction. But how do we design for them along with designing for speakers? 1 things to keep in mind again is to start with the human conversation.
You might have a nap already, but avoid the temptation to duplicated Instead try to observe a relevant conversation in the real world or role play for the call and write down that dialogue. You'll realize that not everything that's in your app does well as conversation or vice versa? Instead think of Your Action as a companion to your app that's faster in certain use cases. I won't go into detail into how to write good dialogue persona, but I highly recommend you check out our brand new conversation design website at that link there. It goes into great detail into how to get
started. So we've learned that we need to create spoken dialogue and then add visuals to it. So let's take a look at an example of how to do that and how that helps with scale. If you haven't already, I'd encourage you to check out the Google IO 2018 action that helps you learn more about this event. We started by writing a spoken dialogue as if it was for a voice only device like a Google home and it includes turns like this one. So a user can say browse session and we respond with a spoken word sounds like you're some of the topics left for today and so on.
Now in order to take this dialogue and scale it. We need to take every turn like this and think about all the ways. We can incorporate visual components to it. So this would include for example display prompts cards and suggestion chips. In our example, we can a company that spoken prom with a display prompt like which topic are you interested in this helps carry the written conversation on a screen. We can add a list of sessions as a card user could tap on that for example, and we could have a suggestion chip like none of these in this house that user know how
to Pivot or follow-up conversation. Was it constructed our response to have spoken and visual elements week and then map that response to the most Immortal Spectrum from earlier? So depending on whether the device has visual or audio capabilities or how important voices we can choose the right components. He already saw what our response would look like on a Google home. We would simply have the spoken prompt. Let's take a look at what the response would look like across some of the other devices. A smart display is a voice forward advice. So we still need to show all
the spoken prom and make it carry the whole conversation. We don't really need a display prompt anymore. Especially if we're going to have better visuals like the list and the chips. IPhone on the other hand is Intermodal device and we need to have both the voice and the visuals carry the conversation. In this case, you might notice that we shorten the spoken from because we can Derek the user to look at the screen for more details and we'll talk more about how you can do that. And of course the rest of the visual components are there as well. And finally, if your phone was on
silent, we would simply ignore the spoken prompt and the visual components are able to carry the complete conversation. So now we've learned that in order to scale our dialogue. We need to write spoken prompts and then add visuals to them and that helps us map across devices. But how do we know what kinds of visuals to add? I'd like to be with five tips for how you can incorporate visuals into your dialogue. For that let's take this made of assistant action called the national anthem player on a smart display as the name suggests a user can ask for a country and it
will come back with as the national anthem for their country. When you invoke this action, it gives me a welcome message. That sounds like this. Welcome to National Anthem player. I can play the national anthems from 20 different countries, including United States Canada and the United Kingdom. Which would you like to hear? Says you can see the device is currently writing on the screen what it's saying out loud and this is really a missed opportunity especially given that by now we've learned that visuals have some strength over voice and that smart displays are great for showing Rich
immersive visuals. So tip number one, is it considered cards rather than just display from in this case. We've swapped out the words for a carousel users can quickly browse to the list and select the country that they You'll notice it's kind of similar to the menu example. We looked at in the cafe where visuals are helping someone scan and compare auction. Additionally things like Maps charts and images are also great on visual devices, but they're difficult to describe through voice similar to the cookie. Second consider varying your spoken and
your display prompt. This is particularly useful for devices that are Intermodal that might have a display prompts next to a card and some of that information might be redundant send this case. We're stripping out the examples for the countries in the display prompt because the card already shows a lot of examples Consider visuals for suggestions here. We know that the user is a repeat user. So we're reordering the list so that their most frequently visited countries show up first browser using suggestion chips to allude to the user how they can
follow up or pivot a conversation. This kind of Discovery can be quite difficult to voice alone. Next you can use visuals to increase your brand expression. Will you still allow you to change your voice and to the logo, but now we're also going to be aligned you to choose a font and the background image and we'll talk more about how you can do that. As you can see here experience looks a lot more custom and immersive. And lastly visual devices are great for caring conversations that started when a voice only device
example, if I use this national anthem action on a Google home, and I wanted to see the lyrics the action can send a notification after a few steps to my phone. I can take out my phone and beat them there. So hopefully those 5/5 will help you and incorporating more visuals into your dialogue. Let's summarize what we've learned so far on how to design actions for the system. First users interact with the Google assistant in a variety of different ways and concept to this could include at home or on-the-go up close at a distance for through voice or visuals. In order to
design across so many modalities it helps to keep in mind the multi-modal Spectrum and think of your responses as having visual components as well as spoken components. And lastly learn and leverage the strengths of each of these modalities. We learn for example, that visuals are great at scanning round expression and Discovery instead of just showing on the screen what you're saying out loud try to use cards instead. All right. Now I'm going to hand it over to my colleague who lost she's going to talk about how you can develop these actions.
Hi, so we said that the assistant runs on many types of devices and in the future will run on many others action in a way that it will run. Well on all these devices today as well as devices in the future. So let's go through an example. To walk you through this. I'm going to use a test action. I created called California surf report that gives wave height and weather information and beaches in California for Surfers. So currently I only have spoken responses no visuals yet. So let's see what the
sound like on a voice only device like the Google home. What are four most of today will be from 2 to 3:50 in the morning to three to four feet in the afternoon expect waist-high swell in the morning with Northwest winds shoulder High Surf in the afternoon with Southwest winds. Okay, great pretty informative. So now let's take a look at what this sounds like and looks like on a device like a smart display. OK Google talk to California surf report Okay, let's get the test version of California surf report.
Welcome to California surf report. Tell me the surf report for Santa Cruz. Surfing Santa Cruz beach looks fair. For most of today ways will be from 2 to 3:50 in the morning to three to four feet in the afternoon. As far as you can see spoken responses are a good way to get started and working well on many devices. But when we have a screen we can make it a lot better. So, let's see how one of the best visuals we can add and easiest ones to add is the basic card
and here's an example from the note g s client library of how to add a basic car to responses. So we start with the spoken prompt as usual and the second statement of asked as a basic card a basic card can have a title subtitle body tax and optional image. So, let's see what this looks like on our smart display again, please. OK Google Show me the surf report for Santa Cruz. Surfing Santa Cruz beach looks fair. For most of today ways will be from 2 to
3:50 in the morning to three to four feet in the afternoon expect waist-high swell in the morning with Northwest winds shoulder High Surf in the afternoon with Southwest winds. Okay credit. That looks much better. We have a nice visual for it rather than a hoochie lines of text. And here's some other kinds of cars you can use as a character list card that allows you to display a set of things that the user can choose from. There's also a newly-introduced table card and a great way to add
visuals to your action is used to Justin chips suggestion chips allow the user to understand what they can do in this turn in the conversation and also you can learn more about responses at the link. By the way, all of this like I promised Works equally well on an intermodal device like a phone as you can see we have formatted the font sizes and the layout to fit the Intermodal form factor. Okay, great. So next maybe what we want to do is short and the spoken response a bit because it's a bit repetitive with information that's already on the card users can just look at the display for
these. So, how do we do this? We have a feature in the API call capabilities. So instead of thinking if Google home do this smart display do that is the surface. The user is interacting with you on have does it have a screen. Can I open audio? Capabilities of the device are reported to you in every wife of called so you get to know what this is on every conversation turn and here's a sample list of capabilities that we support and there's a you can learn more at
the link. So I now use case what we're looking for is the screen out for capability. This indicates that the user device has a screen so we can show them a car. Oh, and by the way, if you don't want your responses to differentiate between ones with displays and ones that are also only you can always add a card and we'll just rip it out for you silently. So this makes it easy for you to build if you don't want to differentiate. And here's again to know J's client Library snippet that shows how to use this send the first day of statement. We determined that the user's device does
not have a screen. So we have the full content in the spoken response Clause. We know that there's a screen. So we shortened the spoken response and ended with a phrase like here's a report to leave the user to the screen and then we are panda basic car to the response. So, let's see how this looks like on our smart display now switch the demo, please. OK Google Show me the shortened surf report for Santa Cruz. Surfing Santa Cruz beach looks fair for most of today with two to three foot waves in the morning and three to four foot waves in the afternoon. Here's the report.
So that sounds a lot more concise and user-friendly. Another way you can use capabilities is to require that your action only run on devices that have the capability. This is what we call static capabilities and you can configure these through the actions on Google, So as you can see here, but only use this if your action absolutely makes no sense without that capability. So for example, the national anthem player action. Sava talked about would not make sense on a device without audio.
So this would be a good place to use that however for the surf report have it equally works well on voice only and display only devices so it wouldn't be a good place use this you can configure all this using the actions on developer console. Another high-quality an easy way to Target multiple surfaces is to use Google libraries we call helpers. So I've been asking California surf report at the surf report with the beach name, but if I don't see the beach name, I got a problem which beach I just doesn't tell me what I can
say. It doesn't tell me which beach has this action actually supports so we can fix that with a helper called ass with Carousel. What's a square Carousel does is it presents the user with a list of options to pick from and Associates visuals with each item when the user? Cutters the queried selected item Google does the matching of the query to the items so we can deal with variations and how people pick things much better. So let's break make our pop better with the ass with Carousel helper.
And again the no JS Library step in here. Are we start with the spoken response with the prop? And we add a carousel to it and Carousel is made up of items in each item has a list of phrases that you think the user might take the match this item and visuals associated with each item. So the user can understand what they're about to tap on. So let's switch to the demo and see what this looks like. OK Google show me the surf report Which beach do you want to see the report for? Okay, so this
is the example where I'm a little confused as a user. OK Google show me the beach carousel Which beach would you like to know about? Okay, great. Now it's much easier for me to understand what the possible options are and even tap on one if I want to go with that. And we always continuously improve the experience with his helpers. So this is one of the advantages of helpers is that we continue to modify them to optimizing for surfaces. No, since we launched smart display since we launched the cartoon API last year we come up with smart displays smart displays
each conversation turn takes up the entire screen. So giving this back maybe we can make our visuals more branded and give them a little bit more flare. So we're introducing styling options this year. So here's how it works with switch to the demo, please. So here's the a new tab in the actions on Google console call team customization. You can modify the background color primary color. So that's like the font color of the text and the typography and even set a background image. So let's try a few things here. Let's say
we want to make this let's add a background image. Like one of the guys here. Play so this is the landscape aspect ratio image. And then we want add a portrait image as well. I also now all we have to do is safe. And then we click test right here to update our test version. All right. Now, let's see what this looks like at the demo on the demo. OK Google Show me the surf report for Santa Cruz. Surfing Santa Cruz beach looks fair for most of today with two to three foot waves in the morning and three to four foot waves in the afternoon.
Here's the report. I think now that looks really beautiful. Smartest players are coming out later this year. However, you can start building your action against these visuals today using the updated simulator. So we've added a new simulator device type for smart displays as you can see and we've also added a display tab, which shows you the full screen version of what you would get on a smart display. You can also use this with the phone. And on the left side as usual, you
have the spoken responses as well as a input box where you can put user fairies. One last thing we said that assistant is in many places. So that users interacting with your action using a voice only device. Maybe they also have a device that has a display on it for example of phone. So what if in your current turning the conversation you really want to have you have your response to display something. So for example in the surf report Action the user might I ask us for the full report and we want to return
the hour-by-hour wave height graph. So how do we do that? There's a future in the API called multi-surface conversations and here's how it works. To add miniature API call Tia webhook. We report not only the capabilities of the the device that the user is currently using but the union of the capabilities of all the devices that use drones. So in this is example of what you see is that the current user device only has a Voice output cable video has no screen.
So the user seems to have another device with a screen on it. So, how do we use this? We are again in the client Library. We have a function to help you to inspect if the user has a certain capability among their devices has a device with this capability. How do we transfer the user to the other device? We have a function at for new service that does this and you can give it a notification that will appear on the target device in addition to the list of
capabilities that require you require for continuing your conversation. I'm not going to demo this but here's what it looks like. So, let's me that users that show me the full report and they're talking to you on a Google home. So you would call the ask nusurface function that I showed you earlier and we ask the user. Permission to send a conversation over to the user's phone and the user except then there's a notification sent to the new device. the conversation ends on the current device
And when do you sort apps in notification then they resume the conversation from where you left off like this? Note that this is not just for single responses. This works equally well when you want to continue the conversation, so we bring the full context over so you can continue from where you left off. So to sum up Weaver built a lot of features in the API for you to add visuals to your responses. So please use them and we make it such that we take her responses and optimize them as best as possible to all
these services and services in the future without extra work from you. And if you wanted to customize your responses always think of capabilities and not individual device types this way we can run your action on new device without any extra work from you. So I just like to end with an invitation for you next time you order coffee at a cafe or do a presentation like this one start to notice all the different modes of interactions you use everyday and let the richness of those human conversations Inspire
how you design actions for a users and help us evolve what it means to have conversations with technology. There's some links to resources we mentioned and how to get seeds. I just talked to us will be with our team at the assistant office hours and send boxes ready to answer your questions and showing off some of the devices. So, thank you and good luck.
Buy this talk
Access to all the recordings of the event
Buy this video
With ConferenceCast.tv, you get access to our library of the world's best conference talks.