Chris Kelley is a design lead in AR/VR prototyping at Google, tasked with exploring the future of immersive computing. While at Google, Chris has been the lead UX designer for the ‘Google Pixelbook and Pen’ and also worked in wearables, including Google Glass. Prior to joining Google, Chris was an Emmy-nominated motion designer and interactive creator. He’s also an accomplished rugby player, winning two national championships and five national all-star selections.View the profile
Elly is a software engineer and prototyper on the Google Daydream team, exploring new AR use cases. Previously, she worked on interactive and immersive exhibits for Google’s client-facing spaces. Elly has a Ph.D. from the MIT Media Lab, focusing on technology for live music and theater performance. She got her BA at Amherst College, where she majored in computer science and theater & dance.View the profile
Luca Prasso visual designer, Daydream Luca is part of Daydream Labs, a rapid prototyping team at Google that investigates killer use cases for AR and VR. Previously, he was part of Spotlight Stories, a Google ATAP project that explored immersive storytelling. Prior to Google, Luca has co founded Curious Hat, a company dedicated to creating innovative children mobile apps. Luca was character technical director supervisor at PDI/DreamWorks for 17 years, working on award winning movies such as Shrek, Antz, KunView the profile
About the talk
The AR team at Google has built hundreds of augmented reality prototypes to explore how immersive computing can make interaction with technology more natural and contextual. In this session, the team will share helpful takeaways for how you can build useful and delightful AR experiences.
Thank you so much for joining us. My name is Chris. I'm a designer and prototype were working on the Mercer prototyping Google by Ellie and Luca as they were to talk about exploring AR interaction. Really awesome to be here. Which former CEO of computing the rapid prototyping of AR & VR experiments? often that focus on use case exploration or app ideas We work fast, which means we fail fast, but that means that we learn fast. We spend a week or two on each prototyping Sprint and at the end of the Sprint we had build a
functional prototype starting from a tightly scope question. And that we put that prototype for people's hands and we'll see what we can learn. So this talk is going to be about takeaways we have from those are our explorations. But first I want to set the table a little bit and talk about what we mean when we say augmented reality. When a lot of people think about a are the first thing you think about is bringing virtual objects to users in the world and it is that that's part of it. We call this two out of a r but they are also means more than that. It means
being able to understand the world visually to bring information so users and we call this understanding the end of AR. Many of the tools and techniques that were created for computer vision in machine learning perfectly complement tools like AR core, which is Google's AR development platform. So when we explore AR we build experiences that include one of these approaches or both. So this talk is going to be about 3 magic powers that we've found for AR. We think that these magic powers can help you feel better AR experiences for
your users. We're going to talk about some prototypes that we've built and Cheryl earnings with you during each of these three magic power areas during the talk. First I'll talk to you about context-driven super powers combined Visual and physical understanding of the world to make magical AR experiences. Ben Ali will talk to you about shared augmentations in this is really all about the different ways that we can connect people together and they are and how we can Empower them just by putting them together.
Anna Luca will cover Express of inputs. This is about how are can help unlock authentic and natural understanding for users? So what start about context-driven superpowers? What this really means is using a r Technologies the deeply understand the context of a device and then build experiences that directly leverage that context. There's two parts to an air contacts one is visual understanding and the other is physical understanding. With arcore, this is your phone the ability to understand and sentence environment physically. What do computer vision in machine
learning we can make sense of the world visually? And by combining this results, we get an authentic understanding of the scene which is a which is a natural building block of magical AR. So what start with visual understanding? The prototyping Community has done some awesome Explorations here. We've done a few of our own at work side to share. To start we wondered if we could trigger custom experiences from visual signals in the world. Traditional apps today leverage all kinds of device signal to
trigger experiences GPS the IMU Etc. So could we use visual input as a signal as well? We both are really basic implementation this concept this uses arcore in the Google Cloud Vision API that detects any kind of snowman in the scene which triggers a particle system that starts to snow. So it's official understanding we were able to tailor and experience to specific use in the environment for users. This enables adaptable and context-aware applications. Even though this example is a simple one. The cost of can be extended so much further, for example
yesterday. We announced the augmented images API for Eric or so. If you use this you can make something like an experience that reacts relativism to device movement around an image in the scene or even from a distance to an object in the world. If you think this concept is interesting a highly recommend checking out the AR VR demo tent they have some amazing augmented images demos there. The next thing we wanted to know is if we could bridge the gap between digital and physical and for example, bring some of the most delightful features of you readers
the physical books. The digital age has brought all kinds of improvements to some traditional human behaviors and leaders have brought lots of cool new things to reading. But if you're like me, sometimes you just missed that holding a book in your hands. So we want to know if we can bridge that Gap in this prototype users highlight a passage or word with their finger and they actually get back at definition. This is a great example of a short-form focus interaction the required no setup for users. It was an easy win. Only made possible by visual understanding. But it says we tried
this prototype there were two downfalls that we noticed and it became immediately apparent when we used it. The first is that it was really difficult to aim your finger at a small moving Target on a phone that maybe the page is moving as well. And you tried to Target the sort of word those really hard. And the second was that when you're highlighting a word your finger is blocking the exact thing that you're trying to see. So these are easily softball with a follow-up you exit duration, but they illustrate a larger lesson. And that's not with any kind of immersive Computing. You really have
to try it before you can judge it. An interaction might sound great when you talk about it and it might even look good in a visual mock. But until you have it in your hand and you can feel it and try it. You're not going to know if it works or not. You really have to put it in a prototype so you can create your own facts. Another thing we think about a lot is can we help people learn more effectively. Could we use AR to make learning better? Now there's many styles of learning and if you combine these styles of learning that often results in faster
and higher quality learning. And this prototype we combined visual oral verbal and kinesthetic learning to teach people how to make the perfect espresso. The videos explained I'm sorry replace videos around the espresso machine in the physical locations where that step occurs. So if you were learning how to use the grinder the video for the Grinders right next to it. For users to trigger that video they move their phone to the area and then they can watch the lesson. How did physical component of the physical proximity of the video and the actual device made a huge
difference in general understanding in our studies users who would never built and never use an espresso machine before usually made of an espresso after using his prototype. So for some kinds of learning, this can be really beneficial for users now. Unfortunately, one thing that we learned here was that it's actually really hard to hold your phone and make an espresso at the same time. We need to be really mindful of the fact that your users might be splitting their physical resources between the phone in the world. And so if it applies to a use case try building experiences that are
really snackable in hands-free. speaking of combining learning and superpowers together We wondered if Eric and help us learn from hidden information that's layered in the world all around us. This is part of that we prototype that we built that's an immersive language-learning app. We show translations roughly next two objects of interest in position is labels by taking a point Cloud sample from around the object and putting the label sort of in the middle of the points. You just found this kind of immersive learning really
fun and we saw users freely exploring the world looking for other things to learn about. So we found that if you give people the freedom to roam and tools that are simple and flexible the experiences that you built for them can create immense value. I don't have physical understanding. This is airs ability to extract and infer information and meaning from the world around you. Where to buy snows exactly where it is not only in space but also relative to other devices we can start to do things that really feel like you have super powers. For example,
we can start to make interactions that are extremely physical natural and delightful. He must have been physically interacting with each other for a really long time. But digit life has abstracted some of those interactions. We wonder if we could swing the pendulum back the other direction a little bit using AR. Thomas prototype much like a carnival milk bottle game you fling a baseball out of the top of your phone and its milk bottles are shown on other devices. You just point them or you want to go and it goes? Where does by putting multiple devices in a shared coordinate
system what you can do using the new Google Cloud anchors if you had a twin I was for your call yesterday. Then one thing you'll notice here is that we aren't even showing users of Pastor camera that we did that deliberately because we really wanted to stretch and see how far we could take this concept of physical interaction. I want think we learned was that once people learn to do it. They found it really natural and actually a lot of fun with it. But almost every user to try that had to be not only told how to do it but showing how to do it. People actually had to flip this mental
switch off the expectations they have for how a 2d smartphone interaction Works serial need to be mindful of the context of people are bringing in the mental models they have for their actions. We also wanted to know if we could help someone visualize the future in a way that would let them make better decisions. Can we pay attention to the things that matter to us and in a literal sense the imagery that appears in our peripheral vision takes a lower cognitive priority than the things we focused on. What Smartphone they
are be any different in this experiment? We overlay the architectural mesh of a homeowner's remodel on top of the active construction project. Dominican visualized in context what the changes to their home was going to look like? At the time that was for a type of created we had to actually a manual alignment of this model on top of the house. You can do it today. If I rebuilt it. I would use the augmented images API that we announced yesterday and we much easier to put a fix damaged in a location to Houston sing Come Together. But even with that initial friction for the ux the
homeowner got tremendous value out of this. In fact, they went back to their architect after seeing this and change the design of their new home because they found out that they were going to have enough space in the upstairs bathroom something they hadn't noticed in the plans before So the lesson is that if you provide people high-quality personally relevant ways to create value personal relevant content. You can create ways that people will find really valuable and attention-grabbing experiences. But when does modifying the Real Environment start to break down
you might be familiar with the uncanny valley? It's a concept that suggests when things that are really familiar to humans are almost right, but just a little bit off and make us feel uneasy subtle manipulations of the real environment and a Arc and sometimes feel similar. It can be difficult to get right. In this specific example, we tried removing things from the world. We created this day our invisibility cloak for the plant. What we did was we created a point cloud or on the object attached little cubes to the point Cloud apply to material to those points and extracted the texture from the
surrounding environment that work pretty well in uniform and fortunately the world doesn't have too many of those. It's made up of dynamic lighting and and subtle patterns. So this always ended up looking a little bit weird to be thoughtful about the way that you add or remove things from the environment people really perceptive than I needed to strive to build experiences that align with our expectations or the very least notify them. What is physical understanding always critical? All points in the second have their place, but ultimately you have to be guided by your critical user
Journeys. And this example we wanted to build a viewer for this amazing 3D model by Damon Petty I have to do it was important that people could see them on in 3D and move around discover the object the challenge I was at the camera feed was creating a lot of visual noise and distraction. Before having a hard time fishing the nuances of the model we about the concept from filmmaking and guided users by using focus and depth of field all which were controlled by the user's motion. This resulted in people feeling encouraged to explore and they really stop getting distracted by the physical
environment. So she was already great of so many things they are really allows us to leverage those existing capabilities to make interactions feel invisible. If we live in leverage visual Visual and physical understanding together, we can build experiences that really good people superpowers. With that Ellie is going to talk to you about special opportunities. We have insured augmentations. So I'm Ellen a dagger. I'm a software engineer in prototyper on Google VR and AR team has talked about the kinds of
experiences you start to have when your devices can understand the world around you and I'm going to talk about what happens when you can share those experiences with the people around you were interested not only an adding AR augmentations to your own reality. But also when sharing those augmentations if you listen to the developer Keno yesterday, you know that Sheridan AR experiences is a really big topic for us these days. For one thing a shared reality. Lets people be immersed in the same experience. Think about a
movie theater. Why do movie theaters exist everybody's watching the movie that they could probably watch at home on their television or their computer by themselves much more comfortably not having to go anywhere, but it feels qualitatively different to be in a space with other people sharing that experience. And Beyond those kinds of shared positive experiences having a shared reality lets you collaborate lets you learn lets you build and play together. We think you should be able to share your automated realities with your friends and your families in
your colleagues. So we've done a variety of Explorations about how do you build those kinds of shared reality than they are? First there's kind of a technical question. How do you get people aligned in a shared AR space? There's a number of ways we tried if you don't need a lot of accuracy, you could just start your apps with all the devices in approximately the same location. You could use markers or augmented images. So multiple users can Allpoint their devices at one picture and get a common point of reference the kind of Heroes 000 of my virtual world and you can
even use the new arcore Cloud anchors API that we just announced yesterday to localize multiple devices against the visual features of a particular space. In addition to the technical consideration, we found three axes of experience that we think are really useful to consider when you're designing these kinds of shared augmented experiences. First of those is co-located versus remote users in the same physical space or different physical spaces. Second is how much Precision is required or is it optional? Do you
have to have everybody sees the virtual Bunny at exactly the same point in the world or do you have a little bit of flexibility about that and the third is whether your experience is synchronous or asynchronous is everybody participating in this augmented experience at exactly the same time or at slightly different times. I'll be seen he's not as necessarily binary axios. But more of a Continuum that you can consider when you're designing he's multi-person AR experiences. So let's talk about some prototypes and apps that fall on different points of the spectrum and the lessons we've learned
from them. To start with we found that when you've got a group that's interacting with the same content in the same space. You really need shared precise spatial registration. For example, let's say you're in a classroom. Imagine if a group of students who are doing a unit on the solar system could all look at and walk around the globe or an asteroid field or look at the sun in Expeditions. AR one of Google's initial AR experiences, everybody. All the students can point their devices to a marker. They calibrate themselves against the shared location. They
see the object in the same place. And then what this allows is for a teacher to be able to point out particular parts of the object if you all come over and look at this side of the sun you see a cut out and just pour over here on the earth. You can see your hurricane everybody starts to get a spatial understanding of the parts of the object and where they are in the world. So where does it matter that your shared space is a lot of precision when you have multiple people. We're all in the same physical space interacting with or looking at the exact same augmented objects at the
same time. We were also curious how much can we take advantage of people's existing spatial awareness when you're working in high-precision shared spaces, we experimented with his in this multi-person construction application. We've got multiple people who are all building onto a shared AR object in the same space adding blocks to each other everybody's being able to coordinate and you want to be able to tell what part of the object someone's working on have your physical movement support that collaboration like if Chris is over here and he's
placing some green blocks in the real world. I'm not going to step in front of him and start putting yellow blocks there instead. We've got a natural sense of how to collaborate how to arrange how to coordinate ourselves in space people already have that sense so we can keep that in a shared AR if we've got our virtual objects precisely lined up enough. We also found it helpful to notice that because you can see both the digital object. But also the other people through the pastor camera you were able to get a pretty good sense of what people were looking at as
well as what they were interacting with. We've also wondered what would it feel like to have a shared AR experience for multiple people in the same space before aren't necessarily interacting with the same thing. So think of this more like an AR LAN party We're all in the same space or maybe could be different spaces. We're seeing connected things and we're having a shared experience. So this prototypes a competitive quiz guessing game where you look at the map and you have to figure out where on the globe you think is represented and stick your push
pin in get points depending on how close you are. We've got the state thinks so we know who's winning the location of where that Globe is doesn't actually need to be synchronized and maybe you don't want it to be synchronized because I don't want anybody to get a clue based on where I'm sticking my push pin into the glow. It's fun to be together. Even when we're not looking at exactly the same AR things. And do we always need our spaces to align exactly sometimes it's enough just to be in the same room at this prototype examples of AR boat race you blow on the microphone of
your phone and it creates the wind that propels your boat down the little AR Track by us being next to each other when we start the app and Spa on the track. We get a shared physical experience, even though RAR worlds might not perfectly aligned we get to keep all the elements of the social game playing talking to each other our physical presence, but we're not necessarily touching the same objects. Another Super interesting area we've been playing with is how audio can be a way to include multiple people in a single device AR experience.
If you think it's a standard Magic Window device, they are in a pretty personal experience. I'm looking at this thing through my phone, but now I'm imagining you can leave a sound in AR that has a 3D position like any other virtual thing and now you start to be able to hear it. Even if you're not necessarily looking at it and other people can hear the sound from your device at the same time. So sorry to say but let's say you could leave a note all over your space might look something like this. timer
this is a cherry. You don't have to be the one with the phone to get a sense of where these audio annotations start to live in physical space. Another question we've asked if you have a synchronous AR experience with multiple people pouring different places. What kind of representation do you need of the other person? So let's imagine you have maybe a shared AR photos app where multiple people can look at photos that are arranged in space. So I'm taking pictures in one location. I'm viewing them arranged around me and they are and then I want to share my air experience Luka who comes in
and joins me from a remote location, but we found we needed a couple of things to make us feel like we were connected and sharing the same AR experience, even though we were a different places we needed to have a voice connection so we could actually talk about the pictures that we needed to know where the other person was looking to see which picture you're paying attention to when you're talking about it was interesting is we didn't actually need to know where the other person was as long as we had that shared frame of reference. We're all here. Here's what I'm looking at here. Is
it Lucas looking at? We've also been curious about asymmetric experiences what happens when user share the same space and the same augmentation but they've got different roles in the experience. So for instance in this prototype, Chris is using his phone as a controller to draw in space, but he's not actually seeing the are annotations. He's drawing the other person use the same AR content use their phone to take a video. They're playing different roles in the same experience a kind of artist vs. Cinematographer and we found there could be some
challenges that asymmetric experiences if there's a lack of information about what some of the other person is experiencing Christmas Chris can't tell what Lucas filming or see how his device is drawing the looks from a Faraway. So as we mentioned previously these cases different combinations of space and time and precision are relevant for multi-person AR experiences and they have different Technical and experiential needs. If you have multiple people in the same space with the same augmentations at the same time, then you need a way of sharing you need a way of
common localization. That's why we created a new Cloud anchors API. If you've got multiple people in the same space with different augmentations at the same time the kind of AR land party model, you need some way to share data and if you've got multiple people in different spaces interacting with the same augmentations at the same time, you need shearing and some kind of representation of that interaction. Search are they are experienced. This is a big area. We've explored to some parts of the space. We'd love to see what you all come up with. So Chris is talked about examples for your
device understand your surroundings and gives you special powers. I talked about examples where you've got multiple people who can collaborate interact talk about what happens when your devices have a better understanding of you and allow for more expressive input. Thank you, honey. My name is Luca Prosecco and a prototype are in a technical artist working in the Google R&D our team. So let's talk about the device that you carry with you everyday and the one that all around and how they can provide the meaningful and Atlantic
signals that we can use in our augmented experiences. So they are core trucks the device motion as we move it to the real word and that provide some understanding of the environment and these signals that can be used to create a powerful and creative and expressive the tools and offer new ways for us to interact with the digital content. Someday. Representa who we are what we know and what we have and we were interested in understanding if the user can connect more deeply if the data is display around them in 3D and who we are and physical expiration
so they can look at the Disco this data. So we took several thousands words CDs and we map it in an area. That's why there's a football field. We are fine. To every city and we scale the doctor based on the population of the city and each country has a different color. So now you can walk to the state of field. And as arcore tracks the motion of the user we play Footsteps in sync you take a step and you hear a step in anime Sonic sound field surrounds the user and enhance the experience and the sense of
aspiration of the state of forest in flypast Sardis play a part in the sky and the taxes camera is a heavily tinted so that we can allow the user to focus on the data and then still give a sense of presence. And what happens if the user has he walks to the physical spaces start mapping and pairing in creating this mental map between their the data and the physical location is start understanding better in this particular case, the relative distance between places and what we discover is also that the jester that are part of our Digital Life everyday a pinch
to zoom. It's now in ER something more traditional. It's that it's actually moving closer to the digital object and inspecting it like we do with the real altrichter and and Pam and drag means taking a couple of steps to the writer to look at information. So Physical expiration like this is very fascinating but that what we need to take into account all the different users in provide the alternative move in the four dances so many user can move everywhere but what if we cannot afford or he doesn't want to move what if it's sitting so in this particular case that we allowed
to use it to Simply pointed the phone everywhere they want to go tap on the screen anywhere and just a application will move the point of view in that direction and the same time we still have to provide audio haptics and color effects to a non said the science of a physical space. The user has to have will travel tea. And so we found it. This is a powerful mechanism to explore a certain type of data that makes sense in the 3D space and not to allow the user to discover hidden patterns. But can we go beyond the pixels that you can find on your screen?
We're fascinated by the special audio in a way to incorporate the audio into an AR experience. So we combine arcore and the Google resonance as DK. And regiments is Disturbed very powerful spatial audio engine than recently Google open source, and you should check it out because it's great. And so now I can take it out your sources and place them in the treaty location, sir. And Anna made them in the describe the properties of the walls and the ceilings and floor and all the obstacles and now has the AR core moves the
point of view. He carries with it the digitally years. The Rize resonance used to render accurately The Sounds in the scene. So what can we do with this? So we matching what if I if I can sitting next to a performer doing an acoustic concert tours classic accounts to throw a jazz performance on stage with actors and listen to their play and be there. So we took her to amazing actors Chris and Elli and we asked them to record separate lines from Shakespeare and then we placed it beside your sources a
few feet apart and we surrounded the the environment to With the Enemy Sonic sound field of a rainforest of the rainy and then we later on with switch the two room with a lot of Reverb into the walls. Don't push me. They were storing into this wood. And here I am. Call not iron for my heart is true as steel. I do not so now I suppose a nice pair of headphones and it's like being onstage reduce actors. So we took this this example and we extended we we observe that we can build in real-time a 2d map of where they use
her as being so far with this phone. I just walking around and so I didn't even time when the user hits the button we can programmatically plays audio recording in space where we know that the user can reach with the phone and we are ears. And suddenly the user becomes the human mixture of this experience and and different instruments can populate your squares and your rooms and schools and in this opens the door to an amazing amount of opportunities with ar audio first experiment.
So let's go back to visual understanding Chris mentioned that the computer vision in machine learning can interpret the things are around us and it's also important to understand what the body and turn it into an expressive controller. So you realize that we are surrounded by a lot of sound sources for all of the places and naturally or body in our head the moves to mix it and focus on what we liked it and when we want to listen to so can we take it this this intuition into the way we watch movies or play video games when a mobile device so far we did
we took the song the camera signal headed to Google motion mobile Vision that give us a head position and Heather orientation and we fed it to the to Google resonance SDK. And you said okay you're watching a scene in which actors are in a Forester and they're all around you and it's raining. So now as I I leave at least my phone far away from my head. I hear the forest as in taking the phone closer to my face. I start hearing the actors playing. I warned you. This is an Oscar performance.
No alarm company here. My man, according to the script here is the scroll of every man's name, which is Fault fit through all Athens to play another interlude before the Duke and The Duchess on his way. Is that the tiny little motion so we can do when we're washing away playing this experience that can be turning to subtle changes in the user experience that we can control. So we talked about how the changes impose as can be, a trigger to drive interaction in this Google research Optical surface Timo. We actually explored the opposite the
absence of motion. And when the user in this case, my kids a stopper posing the app takes a picture. And so this simple mechanism that is triggered by computer vision create an incredible the lights full of opportunities that apparently my kids floss. And we searched his doing incredible progress in looking at the nausea body image and understanding where the Bodhi foes and skeleton is and you should check out the Google research blog post that because of their pose estimation that researches that is amazing. So we took that Ali's a
video and we fed it to the machine computer and a bunch of Master of a valley and a lot of variety of experiments with the creative filters that we can apply to this but was more interesting for us is that they're also allows us to understand better the intent in the context of the user. So we took this pose estimation technology and we added additional characters now tries to mimic what the human character is doing and this allows down now to bring your family and friend in this case. My son Noah into the scene so that he can act in that creates a nice video. But
this also like Emily Kelly mentioned before we ship to see this situation because this is an asymmetric experience what you don't see here is how frustrated my son-in-law's after few minutes because he couldn't see what was going on. I was the one having fun taking picture and video him and he didn't see much you can only hear the lion roaring. So we need to be extremely mindful of the developer about this and balance of the light. And so maybe I should have passed the the image of the of the town that to a nearby TV so I can Make that my son First.
Citizen in this in this experience. So all is AR technology in the physical individual into spending our ingredient. This allows us to unlock all kinds of new expressive input mechanism the beginning of this journey, but we are excited to hear what you think and what you want to come up with. Set the summarize we share a bunch of ways in which we think about AR and various exploration. Do we have done? We talked about expanding our definition of they are putting constant into the word, but also pulling
information from the world. Maybe the old ingredients a Wii user to create this magical a are superpowers to enhancer the social interactions in to express yourself in this new digital medium. So we can buy an AR core capabilities with different than Google Technologies and this gave us the opportunity to explore all these new interaction models and we encourage you developers to stretch your definition of a r But we want to do it this together. We going to keep exploring but we want to hear what tickle you what pickle your curiosity.
Buy this talk
Access to all the recordings of the event
Buy this video
With ConferenceCast.tv, you get access to our library of the world's best conference talks.