Daniel is a lead engineer on TensorFlow.js. Previously, he worked at the intersection of visualization and machine learning, with projects like the TensorFlow Graph Visualizer and the Embedding Projector, which are part of TensorBoard, as well as new saliency techniques for neural networks. He holds a masters degree from the MIT Media Lab.View the profile
About the talk
highly interactive in the playgrounds app you we have all these controls drop down controls that you can change and quickly run different experiments. Another nice thing about the browser. It runs on laptops. It runs on mobile devices and these devices have sensors like the microphone and the camera and the accelerometer and they're all behind standardized web apis that you can easily access in your web app. We didn't take advantage of this in the playground with Will Show You Someday mom's later. And most importantly when your building web apps these apps run client-side which
and build such a library and we went and built different. Yes. We released it August 2017. We figured out a way how to make it fast and scale by utilizing the GPU of laptops and cell phones through webgl. For those of you that are not familiar webgl is a technology originally meant for rendering 3D graphics. And the library that would build allows for both inference paint raining in darling the browser. When we release date for Andreas, we had an incredible momentum the community went instantly with and took
pre trans models from python import it at in the browser one example, I want to show you that there's a file transfer date someone wins and took the free trans model and this damn old has an image source image on the left as artist on the right and then it can mash them together and they made this in a creative interesting application. Another demo is people with take models that read a lot of texts and then they could generate a few sentences and then they reported it in the browser and explored novel interfaces how you can explore all the different endings of a sentence. Indication of the
main people took the standards convolutional neural Nets and build this fun little game where you can train your own. Image recognition model just by using the webcam and this was very popular. And lastly researchers to another example, they took a van generation model that can generate new fonts previously for strength and a lot of font Styles and then they build this novel interface in the browser highly interactive. We can explore different types of fonts. Now building on. Incredible momentum, we have to differentiate us about
prevent models and we trained them do transfer learning right there on device. To give you a schematic view of the library. We have the browser the does all the computation using webgl tensorflow. JS has two sets of apis that sit on top of this occur API, which is which gives you a low-level building blocks linear linear algebra operations, like multiply and add and on top of it. We have layers API which gives you high level building blocks and best practices to build neural Nets. And on top of that because there's so many models written in
Python today. We offer tools to take an existing Model A Keras model anything as a plus-size model. These are two formats that are very popular service light in these tools will automatically convert that model to run in the browser. About to give you an example of our core API. I'm going to show you how we're going to go over coat that tries to train a model that fits a polynomial curve and we have to learn to speak with a BNC now. This example is pretty simple, but the cold walks you through all the steps of how you were trained such a model and these steps
Function quadratic function on top of the standard API like TF out and see if x we also have a changing API training has been very popular in the jealous of world. So you can call these methods these mathematical methods Honda tens of itself in this read better closer to call me right now. So that's our model to train it. We need a loss function. And in this case, we are just measuring the distance between the prediction of the model and the label the ground should data. We need an Optimizer. This is the Machinery that can optimize and find those
coefficients. We specify learning right there. and for some number of a box pass over our data, we call Optimizer. Minimize with our loss and RF of EX and why's so that's our model. This is clearly not how everyone rides machine learning models today over the years. We've developed best practices in high-level building blocks and uapi submerged like the Earth layers and Kara's that makes it much easier to write this model and for that I want to walk over our latest API to to show you how it is.
We're going to go to a simple neural network that sums two numbers. What was special about this network is that the input comes as a string character by character? So 90 + 10 is the input to this network being fat as a string and the network has an internal memory worth in codes this information. It has to save it and then on the other end the noodle network has two output the some 100 again character by character. No, you might wonder why I go through such a trouble to train this neural network like this. But
this example forms the basis of modern machine translation, and that's why we're going over this. Show me the code when you port tensorflow from Desert for GS. We have our model we safety of. Sequential which means it's just a linear stack of layers the first two layers. I'm not going to go into details, but those two are building blocks that can take these things into a memory into an internal representation and the last three layers they this internal representation and turn it into numbers and that's our model to train it. We need to compile it with a loss
and Optimizer and a metric. We want to monitor in this case accuracy and we cool modded outfits with our data. The one thing I want to point out the bald model. V training can take for this example. I can take 30 or 40 seconds in the browser. And while that's running we don't want to block the main you wipe red. We want our app to be responsive. This is why I modeled outfit is an asynchronous call and we get a call back once it's done with the history object which has our accuracy as it evolved over time. I went through examples of how you ride these models.
In the browser, but there is also a lot of models that have already been written in Python. And for that we have tools that allow you to import them automatically before we dive into the details. I want to go to show you any fun little game with our friends of Google brand studio built cold Emoji called Emoji scavenger hunt and this game takes advantage of a pre-trained model convolutional neural network that can detect 400 items and I'm going to walk over to a pixel phone. And open
up a browser just to show you that tensorflow JS can also run in a browser because we're using standard webgl. And I'm going to ask you in a kill on my right to help me out here because I'm going to need some help now to give you some little details about the game. It shows you an emoji and then you have to go with your camera run around your house and find the real version item of that emoji before the time runs out and there is a neural network that has to detect that. All right, shall we start? All right. Let me see. We're going to play. It's your life.
We have to find a watch 20 seconds. All right. That's great. Alright, let me see. What's the next? How many we need to shoe, Thanks, buddy. Let's see. What our next item is banana. We have 30 seconds to find a banana banana. Anyone have a banana awesome. We got a banana over here. Let's see what our next item is. beer beard Daniels 12:30 in the middle of Iowa for Skippack photography All right. So let's talk a little bit about how we actually built that game. Switch back to the slides here. Okay. So what we did was we
trained a model in Python to predict from images 400 different classes. That would be good for an emoji Scavenger Hunt game. These are things like a banana. I'm watching a shoe. The way we did this was we took a pre-trained model called mobilenet. And if you don't know what mobile in that is, it's a state-of-the-art computer vision model that's designed for Edge devices design for mobile phones. So we did as we took that model and we reuse the features that I would learn there and we need to transfer learning task to R400 class classifier. So then once we do that, we have an object
detector. So this object detector lives entirely in the python world. The next step of this process it's actually take that and convert it into a format that will be able to ingest in the web and then we'll skin the game and add sound effects and that kind of thing. So let's talk a little bit about the details of actually going through that process. Soap in Python when we're actually checkpointing and training our model we have to save it to disk. So there's a couple ways you do this the common way with tensorflow is to use it what's called a saved model details are not important for
the stalker. The idea here is that there are files on disk that you need to write. Daniel also mention that we support importing from Charis. Now Charis is a high-level library that lives on top of tensorflow that gives you a sort of higher-level API to use details are unimportant. They're also files on disk that uses to check my phone. So we have a set of files and now the next step is to actually convert them to a format that we can invest in a in a web page. So we have released a tool on pip called tensorflow JS inside of that tool. We have some the
conversion scripts. All you do is run the script you pointed through those save files that you had on disc and you pointed to an output directory and you will get a set of static build artifacts that will be able to use on the web. The same flow holds for Charis model, you point to your input H5 file and out pops a directory of static Builder artifacts. All right field artifacts and you post them on your website. This is the same way you would host png's or css or anything of that sort. Alright, so once you do that, we provides an apis and tensorflow JS to load the static Builder
artifacts. It looks something like this for tensorflow Save model. We load the model and we got a model object back that model object can actually make predictions with tensorflow JS tenser's right in the browser. The same flow hold for carrots models, we point to those static build artifacts and we have a model that we can make predictions on. Okay. So under the covers does actually a lot going on when we convert these files to a format that we can ingest in the web. We actually are pruning nodes off of that graph that aren't needed to make that prediction. This makes the
network transfer much smaller and our predictions much faster. We're also taking those weights and sharting and packing them into four megabyte chunks. This means that the next time the browser loads that page your ways will be cashed. So it's super Snappy. We also support about 90 of the most commonly use tensorflow Ops today and are working hard to continue supporting more. And on the car side we Support 32 of the most commonly used car Slayers during the importing face. We also support training and evaluation of those models of computing
accuracy when you get them in of course, you can also make predictions as well. All right, so I want to show you a demo before I bore you anymore. This demo is built by our friends at creative lab as a collaboration between them and a few researchers on Google. So I'm going to go back over here to this laptop. Okay. So the idea of this model is it takes a 2D image of a human being and it estimates a set of key points that relate to their skeletons. So things like
your wrist point the center has your eyes your shoulders and that kind of thing. I'm just going to turn the demo on here. So I do that the webcam will turn on and it's going to start predicting some coupons for me and I'm going to step back here so you can actually see the full thing. And as I move around you'll see you know, the skeleton change and make some predictions about me. All right. So this is obviously a lot you can do with us. We're really excited to show you a fun little demo. It's very it's very thin. And what's going to happen is when I click this slider we're going to move to
a separate mode where it's going to look for another image on the internet that has a person with the same pose as me. Okay. Is it going to work here? First it's not working. Now. We have a physical installation of this what you can go check out is that the experiment sent an h and it's really fun at the full full screen version of that where you can be another version of you. We have released this model on npm so you can go and use this and you need to know machine
learning experience to do it. The API lets you point to an image and out Pops in a row if it's not easy, so we're really excited to see what you do with that. Okay, so there's a lot you can do with just supporting the models to the browser for inference. But since the beginning of the plan Jess and tensorflow JS, we made it a high priority to be able to actually train directly in the browser this opens up the door for education and interactivity as well as
retraining with data that never leaves the clan. So I'm going to actually show you another demo of that back on the laptop over here. Okay, then if you want to come help me Are you cool? Okay, so the game is in three phases in Phase One. We're going to actually collect frames on the webcam. And we're going to we're going to do is we're going to use those frames actually play a game of Pac-man. Okay, so Jenna want to start collecting frames. What he's going to do is he's going to collect frames for up down left and right
and those are going to be associated with the poses with the four controls for the Pac-Man game itself. So I said collecting those or saving some of the images locally and we're not actually training them all yet. So once he's done actually collecting those frames we're going to train the model. And again, this is going to be trained entirely in the browser with no requests to a server anywhere. Okay, so when we actually train that model what's going to happen is we're actually going to take a pre-trained mobilenet model that actually in the page right now and we're going to do a
little retraining phase with data that he's just the ones you press that train model. Awesome are lost value is actually going down. It looks like we've learned something. Okay, so that phase 3 of this game is to actually play. So when he presses that button what's going to happen is we're going to take frames from that webcam, and we're going to make predictions given that model that we just ran. Once you press that play button and we'll see how it goes. If you look in the bottom right of the screen, you'll actually see the predictions happening. So
it's it's highlighting the the control that it thinks it is and you'll see him actually playing with Pac-Man cave now. So obviously this is just a game but we're really excited about the opportunities for accessibility. You can imagine a Chrome extension that lets you train a model that lets you scroll the page and click know all of this code is online and available for you to go in for can build your own applications with an order of really excited to see what you do with it. I meant I got to go back to the talk.
Okay. So let's chat a little bit about performance. What we're looking at here is a benchmark of mobilenet 1.0 running with tensorflow python. This is classic tensorflow not running with tensorflow, Js. We're thinking about this in the context of a bachelor as of one. And the reason that we want to think like that is because we're thinking of an interactive application like Pac-Man where you can only read One sensor frame at a time. So you can't really batch that data. On the first row. We're looking at Gente flow running with Cuda on a 1080 GTX. This is a beefy machine and it's getting
by 3 seconds per frame. And I want to mention that the smaller the bar the faster it is. In the second row we're looking at 10 to flow CPU running with avx2 instructions on one of these MacBook Pros here. We're getting about 1600 seconds for a friend there. Alright, where does tensorflow JS fit into the sector? Well, it depends so running on that 1080 GTX that BP machine we're getting about 11 milliseconds for a friend on tensorflow JS running on an integrated graphics card on one of these laptops. We're getting about a hundred milliseconds for frame. I just want to
point out that a hundred milliseconds is actually not so bad that Pac-Man game was running with this model. I mean, you can really build something interactive with that. The web is only going to get faster and faster their new set of Standards, like webgl compute shaders and webgpu, which gives you much closer to the metal access to the GPU, but the web has its limitations you live in a sandbox environment and you can only really get access to the GPU through these apis. So how do we scale Beyond those limitations with that? I'm going to hand it off to Nick who's going
tons of resources put into it by companies like Google and we seen The Interpreter be up to 10 times as fast as python lots of room for performance improvements. Also using tensorflow gives us access to really high-end machine learning Hardware like GPU devices and TV using the cloud look for support for that soon. What's that back and look at the architecture? We highlighted earlier. We have a layers EPI and in a little bit lower level core API to has our Ops. This whole run time is powered by webgl in the browser but today.
3M p.m. We're shipping a package that gives you tensorflow that gives you access to those TV use the GPU and CPU all of this is there are npm package. Show you how easy it is to use our node bindings. I want to show you a little coat snippet. This application function right here is a very common server-side request response Handler for an employee those who work with Express Kramer know exactly what's going on here. Orient Point listens 4 / model and takes input as a request which we pass into a tensorflow JS model and then output is pushed out into the response.
picture sexy the set is this large library of sensor data about pitches that baseball players have thrown a natural baseball games. For those that aren't super familiar with baseball. We've been a little context you're a picture will for a different types of pitches the fool the player who's trying to hit this ball. There's pictures that have higher velocities is pictures that are little slower but have more movement in this example of sort of highlighted the Fastball The Change-Up in the Kerrville. Those are all types of hitches. Don't get too hung up about the details of
baseball what we're really trying to solve here is a very classic machine learning problem taking sensor information and drawing a classification to that. So for that, I'm going to actually showcase Systema. Call rent so on one side of my screen. I have a terminal which I'm going to start my Note application with and on the left. I have a web browser. We built this a very simple UI that listens to what are servers doing over sockets the socket IO library and node know what this interactions doing.
So I'm just going to type. Known start my server. I know all through node. My model is up and running and training. Every time we take a pass or a dataset, we report that over the socket to a client and you can see the blue bars moving a little bit closer to 100% That's our model understanding how to tell the difference between a curveball or fastball and if you can kind of see every step it moves a little bit differently and are models having a little bit of trouble at the moment with the fastball Sinker, but it's only looked at data for a few passes the
more the server runs the better gets a training so All this rain did we shown is historical baseball data from 2015 to 2017. I'm going to hit this test live button and this is going to use with these the note remark to go out to Major League Baseball pulling some newer day. That's a live data and run evaluation. What is data comes in? We're going to see the orange bars and the orange bar shows how much better we are at predicting data that our model has never seen before he's alive pictures and
stack up in performance? So in that training said we were looking at 7,000 pitches and we are training every couple seconds 7000 pitches. So that's an interesting Benchmark. But let's be classified ad with the mobilenet Benchmark. We showcased earlier. So these are the numbers for python tensorflow GPU in the CPU entrance time. So we're just getting started. We've just launched an npm package. We have a lot of ways to go, but we have some promising early numbers the Showcase. Tensorflow but no JS is exactly as fast as
python tensorflow. What's up with that? I'm going to hand it off to Nickell to wrap up. Text neck. exciting stuff Okay, so let's recap some of the apis and libraries and tools that we have supported today with tensorflow, Js. I'll be have a low-level API called The Core API which is a set of accelerated linear algebra kernels and a layer for automatic differentiation. We saw an example of that with a polynomial regression demo. We also have a high level layers API that encodes machine learning best
practices into an API. That's much more easy to use and we saw an example of that with that addition r&n translation demo. We also have showed you a couple ways that you can take pre-trained models from python. Do you have saved model or via carrots models and porthos to the browser for inference or doing retraining or Computing accuracy? And of course, we also showed you the new no Jazz support for tensorflow jazz today, and we're really excited about that. Okay, so this project tensorflow JS was not just the three of
us on stage her. It was a cross-team collaboration between many amazing people at Google and we also have some amazing open source contributors that I want to send a shout out to this project literally would not have been possible without them. So, thank you. I'm all of the things that we talked about here today the demos all the code is open source on our website Jazz. Tensorflow. Org. All of the code is also open source on github.com tensorflow. So I suggest we also have a community mailing list. This is a place where people can go and ask questions
Buy this talk
Access to all the recordings of the event
Buy this video
With ConferenceCast.tv, you get access to our library of the world's best conference talks.