About the talk
Experience three quick and fun demos that show what Developers can accomplish with Cloud AI.
Transcoder Documentation → https://goo.gle/3u4G8OV
Video Intelligence Documentation → https://goo.gle/3xyHuDL
Get started with AI on Google Cloud → https://goo.gle/3y0eAwG
Speakers: Zack Akil, Kaz Sato, Markku Lepisto
Google Cloud Tech at Google I/O 2021 Playlist → https://goo.gle/io21-gcloudtech
All Google I/O 2021 Technical Sessions → https://goo.gle/io21-technicalsessions
All Google I/O 2021 Sessions → https://goo.gle/io21-allsessions
Subscribe to Google Cloud Tech → http://goo.gle/GoogleCloudTech
#GoogleIO #Cloud #ML/AI
product: Cloud - AI and Machine Learning - AI Platform; event: Google I/O 2021; fullname: Zack Akil, Kaz Sato, Markku Lepisto; re_ty: Premiere;
Kaz Sato is Staff Developer Advocate at Cloud Platform team, Google Inc. He leads the developer advocacy team for Machine Learning and Data Analytics products, such as TensorFlow, Vision API and BigQuery, and speaking at major events including Strata+Hadoop World 2016 San Jose, Google Next 2015 NYC and Tel Aviv and DevFest Berlin. Kaz also has been leading and supporting developer communities for Google Cloud for over 7 years. He is also interested in hardwares and IoT, and has been hosting FPGA meetups since 2013.View the profile
Hello and welcome to this day. IML demo derby session, but we've got a few today is three back-to-back that I was showing off interesting ways that you can use machine learning to solve problems with a quirky project that will help my friend, watch movies were safely. Let's get straight into it. So, the project videos with machine learning, the sounds a bit weird. So, let me give you a back story as to what this is all about. So, a while ago, I was watching some movies with some friends and I discovered that one of my friends actually has a very serious phobia of swans. Now, it's not
actually swans, but I want them to watch this video and feel safe watching it. So, we were watching some classic movies. So everyone else knew exactly when the swans were going to appear. I thought we would just tell her friend exactly went to look away. And then they wouldn't need to get super scared. This work pretty well until sometimes, we got it wrong and people got upset. So I figured I could potentially use machine learning to solve this problem. Only a couple of tools that we need to solve this problem. And one of them is called the cloud video intelligence API.
This is a very powerful API that allows you to just detect what's in a video using machine learning. And in fact, you don't need to know anything about machine learning to use it. You just feed it a video and it tells you what's inside the video. I got a couple of really interesting features that you can find out more about in the docks. The next tool that could be useful for this problem is called the transcoder API. That is a very powerful video processing API, and a classic example of this is on YouTube where I you've got the option to watch a video in different levels of
quality. You got like standard-definition high-definition 4K, this API allows you to do feed in a row video and it will automatically. I'll put all the different formats for streaming over the web All For You. It has this and it has a couple of other really interesting features useful for video processing on the web. So looking at these two tools, there is a couple of interesting features that we can just put together to potentially solve this problem. We've got the label detection feature of the video and tell him is API that will tell you what's in a video during what time segments.
For example, if you wanted to text Wong's, it will potentially tell us from 10 seconds to 15 seconds there Swans in the video. We can use these time segments to then, in the transcoder API. Inject a full screen overlay to hide the swans. Read the video on the inject overlays feature is constantly use for things like injecting water marks, but I'm going to use in this case to hide the entire screen. So let's look at the Dental in action. Alright, so we're here in the Google Cloud platform. This is my dad and that's just get into it and upload a scary video
file. Okay, so the video file is uploaded. Let's have a look at what it looks like. So we can see. It's a nice calming video. All is a plane taking off. Outlets. Got to the banjo. I don't think there's a scary Swan. So this is what we don't want. We want this to be fixed. So, how's that going to happen? Well, I have a function that it will trigger Space video to the video running through the cloud video intelligence API. I'm running late or affection, and it should I put the
labels to this next buckets. So this is a Json file with all of the things that the video Until the safety I spotted in our video along, with the time segments that it found them for. I just landed in this bucket. We have our final size function, which is going to scan through all those labels and pick out the ones that feature the words Wong. And for the time, segment started detectors to a fear. It will feed that to the transfer and then hide the screen. So, let's see if it's done that.
And here it is. So, this should be our final safe to watch video. Let's have a look. So we can see your cat starts off the same. You have a relaxing plane taking off. And then a friend named Alexandra. Look at this, where I previously got startled by a swan. It has been hidden from the video game. Jessie full demo video pipeline that will make videos safe to watch based on any one's individual phobia. Okay, so a quick recap of how things are plugged together. I take the role
video on that gets fed through video intelligence to run label detection. When I've got those labels Island scans through for the things. I'm afraid of and use the time segments to inject overlays using the transcoder API, which then outputs and use safe to watch video with all the scary things hidden. Point to note is that you can't teach the cloud video, intelligence API new things. Because it's all pretty trained and static. If you have something more specific that you're afraid of, but the video and songs. A guy doesn't detect, you can train your own model using a tool like
Auto amount video classification for more details and coat samples of how to do that. Check the description below and I'm off tomorrow. He's going to show you what he stopped. Thank you. I will show you a real life example of H. A. I A lot has many visitors and employees and wants to understand people on space, is what in a privacy Centric. Manor today. We will show you how the dealership and improved safety with Google Cloud coral and a technology partner called Blue Zoo.
Let's build this. Today. Our goal is to County number of employees or customers entering or exiting the premises what to do with it. Say, we can feed the raw camera feed straight into a machine learning Model. A customer that we have trained to detect patterns one parent could be an employee leaving through the door and then the output of his Machinery model assembly counters all numbers. How many times these events happened? Dear or camera feed or image posted anywhere never transmitted out
of the system and this is how we could be any camera development has the Google add staple machine learning accelerator. So first and a USB webcam, Okay, so I recorded myself in and out through the door several times. And then, so here I am walking in, that's good. Now, we can use the console to then draw a bounding box around the object, like so, and then we can choose which label. This one is correspondence. This one is an event for person. Then what you need to do is click on to a new model. It's a simple as that. I will Select Auto mlh and then the
type of model I wish to train. I wish the wrong it on Coral devices with the Google, it's time for you and I wish to have the higher accuracy version of the model. Okay, that's no Expo. Tomorrow. And say like the edge TPU compilation of the apps. Let me know process. One of the example videos that I recorded of myself walking in and out using this new model. Okay, so below the new model B train has me working out. You may have seen a salt blip there. It's a personality to a bounding box around me that we go and there is an employee out event. So I label myself as an
employee here. Now, I'm walking in this pay attention, OK, person in And here we go. Employee in one to collect these occupancy counters. We can now connect this device securely to Google Cloud using iot core and then we can stream the data payload all the way to a table in pick where he has to table, schema holding the occupancy counters. And if we create a table, We can see that there's no data. So let's no. Run this model through a real test with this thing for it. Taking at the dealership. This footage was recorded with
one employee and two, customers entering the building. There we go. Trust me thinking, for a summary to iot core. And what do they transmit? One employee and two customers in. So when the devices transmit data to call will receive them and drop them in this pops up topic. And then we have a simple Cloud function that is triggered by that same. But subtopic function is extremely simple. It will just take TJ's on payload and write it in to pick query and there we go.
So the same data package that was transmitted by the clients with numerical values, telling us that one employee has entered and two customers have endured during this 15. Second time. Here. They recently say son is built on Rucker using big data source. We have two tables. The first one is from this system using messaging on an inference and the second one is from browser. Using their occupancy sensors, and then remove it and visualize that combined there. So, here we have the best book, show me the map of Southern California, including San
Diego. The dealership has three blues Wi-Fi sensor, additionally 21, Coral based Optical sensors over the doorways for fine-grained a capacity texts on both of these leaders and employees. Here at the bar chart shows, the occupancy of most of the recent week. The number of visitors in the clouds tells us when the lounge might exceed capacity. And finally, if the sales closing rooms are not being occupied during much of the day. We may have a closing problem. I really hope that you enjoyed your sister and thank you for watching and bye. Bye. Hi. I'm past the reply to get from
Google Cloud today. I'd like to show at to call the PDF to audio book. The problem was, I had too many hours PDF books. Also. I wanted to lose my weight. How can I solve these problems at the same time? I sent you a message to build up to that come back. If you get books into audio books by using Cloud AI so that I would have listened to the audio books while I'm running or exciting. How did you do audio book works? You can upload a PDF file on a crowded Google and search pocket, and wait for 10 to 15 minutes. So you do have fear. Multiple MP3 file
generated content. Management is required. Action is suitable for building a small automation to pack, PDF to audio book for implementing a PDF to audio book. Everything apart and go to breakfast. I have you find a function code P to a GCS, trigger an attitude. The audio function functions, you use the cloud functions before I come and buy fish. Buying, the actual name longtime name. And Trigger pocket name and song. Once you upload a PDF file on the Traeger bucket on Crossways. The whole process will begin it. Start with and how she always be an API and then extract
features from the ocean result and called alternate able to spoil, tracking your body text. And if I do recall text to speech API for Geneva, Is your nephew yet? This API has wide variety of Aeronautics switches. And one of them is social. I texted action feature. For example, if you apply the single page in a book to the API, you don't get the ocean results like this to text Brooke in a different spot. A graph of the result. As Json format is various parameters of each other such as deep, a group text. And it said, what
to use pigeon API from the application. You write a contract is at first used by the u r, y of the PDF file on prostrate, and then switch, find that you are. I for saving the results of fire and call the API. So now you have converted to PDF book into text, but that's not the whole story. As you can see. There are many parts that should not be included in the audio book. Second page. Headers diagram lady on top of the cold and page numbers. We needed a way to detect. Each paragraph should be included or ignored for this classification. I have used all the time tables.
This is a service allowing you to create you out, watching value model, put happy birthday to such as database tables for spreadsheets much processed. So all you have to do is uploading it, raining later. It's raining and wait for some hours under the hood, Altima table start in history, test for data science, digest, feature, engineering mother section, and hypotonic attorney for PDF to audio book. I have corrupted the features of the 22 deter from The Oceanaire Resort.
That includes a paragraph, text character numbers in height, X and Y position of the paragraph. And so on. So that we expect equated mmodal. Would you classify? Whether a holographic is a body text or use race car a stick? Let's take a look at how old a male table works. You can start creating dataset. Frost and children. Happy birthday to say it and quit it. And you can upload your local CSV file by spaceship on the file name of CSV file. Address to counting your bottom.
And then you can generate static sticks of the uploaded features. So that you can check out AJ Futures correctly a product with expected distributions and start raining tomorrow. Please find me about it for 24 hours and wait for 4 hours. After the training finished, you'll get a new virus in Newport, right place. It looks right. It gets pretty high accuracy. The report also includes a confusion Matrix where you can find your accuracy for detecting body. Text is 95% And the activity for the garbage tickets is
97%. This isn't so bad. If you take a look at the future importance, you can land a paragraph. Text is the most important feature for the semi-final me. I have use text to speech API like this. Where are you are adding ssml tags to each other. So sad, for example, you were here longer break at each section. Header. I understand that. There's a new section starting. I was able to listen to Jimi, geology of all read PDF books. Anyway, if you are interested in this, please take a look at the documentation for each product. Thanks so much.
Buy this talk
Buy this video
Our other topics
With ConferenceCast.tv, you get access to our library of the world's best conference talks.