Sarah leads Tensorflow's mobile and embedded efforts (TensorFlowLite). She is a long time Googler, and prior to this, she has spent many years building Google's advertising systems and web search infrastructure.View the profile
About the talk
TensorFlow Lite enables developers to deploy custom machine learning models to mobile devices. This technical session will describe in detail how to take a trained TensorFlow model, and use it in a mobile app through TensorFlow Lite.
Thank you so much for coming to our session this morning. I'm sorry, sirajudeen. I'm on the tensorflow Lite team and we work on bringing machine learning to mobile and small devices. And later on I will introduce my colleague Andrew Celli who will be doing the second half of the stock. To the last couple of days have been really fun for me. I've gotten to meet and speak with many of you and it's been really nice to see the excitement around tensorflow Lite. And today. I'm happy to be here
and talk to you about all the work that our team is doing to make machine learning on small devices possible and easy. Then today's talk will cover three areas first. We'll talk about why machine learning directly on device is important and how it's different than what you may do on the server II will walk you through what we have built with tensorflow Lite and lastly will show you how you can use tensorflow light in your own apps. The first let's talk about devices for a bit. What do we mean when we say a device
while usually a mobile device basically our phones so our phones are with us all the time. We interact with them so many times during the day and more than phones come with a large number of sensors on them, which give us really rich data about the physical world around us. Another category of devices is what we call each devices and this industry has seen a huge explosion in the last few years to some examples are smart speakers smart watches smart cameras and Market has grown we
see that technology which only used to be available on more expensive devices is now available on Far cheaper ones. So now we're seeing that there's this massive growth and devices are becoming increasingly capable both mobile and Edge and this is opening up many opportunities for novel applications for machine learning. So I expect that many of you are already familiar with the basic idea of machine learning. But for those that aren't I'm going to really quickly cover the core concept. So let's start with an example
of something that we may want to do. Let's see classification of images. So how do we do this? The in the past what we would have done was to write a lot of rules that were hard-coded very specific about some specific characteristics that we expected to see in parts of the image. This was time-consuming hard to do in frankly didn't work all that well, And this is where machine-learning comes in. Beat with machine learning we learn based on examples. So a simple way to think about machine learning is that we use
algorithms to learn from data and then we make predictions about similar data that has not been seen before face tattoos step process for stomata learns and then we use it to make predictions. The process of modern learning is what we typically call training and when the model is making predictions about data is what we called in France. This is a high-level view of what's happening during training. The model is passed in label data that is input data along with the associated production. And since in this case, we know what the right
answer is. We are able to calculate the error that is how many times is the model getting it wrong and by how much we use these errors to improve the model and this process is repeated many many times until we reach the point that we think that the model is good enough or that this is the best that we can do. This involves a lot of steps in formation. And that is why we need a framework to make this easier. And this is where tensorflow comes in it's Google's framework for machine learning. It makes it easy to train and builds neural networks.
And it is cross-platform. It works on CPUs gpus TV use as well as mobile and embedded platforms and the peace of God, which we call tensorflow Lite is what we're going to be focusing on in our talk today. Can I want to talk about why would you consider doing machine learning directly on device? And there's several reasons that you may consider, but probably the most important one is latency is the processing is happening on the device. Then you're not sending data back and forth to the server. So if your use case involves
real-time processing of data such as audio or video that is quite likely that you would consider doing this. Other reasons are that your processing can happen, even when your device is not connected to the internet. The data stays on device. This is really useful if you're working with sensitive user data, which you don't want to put on servers. It's more power efficient because your device is not spending power transmitting data back and forth and lastly. We are in a position to take advantage of all the sensor data that is already
available and accessible on the device. This is all great. But there's a cat like there always is and the cat is that doing on device? Ml is hard many of these devices have some pretty tight constraints. They have small batteries tight memory and very little competition power. Tensorflow was built for processing on the server and it wasn't a great fit for these use cases. And that is the reason that we built into flow. Like it's a lightweight machine learning library for mobile and embedded platforms.
This is a high-level overview of the system. It consists of a converter where we convert models from pencil floral format to tensorflow Lite format and forever agency reasons. We use for my switches different. Danita consists of an interpreter which one's own device. There are a library of Ops and Cardinals and then we have apis which allow us to take advantage of Hardware acceleration whenever it is available. Tensorflow light is cross-platform switch works on Android iOS Linux and a high level developer workflow here would be to take a
train tensorflow model converted to tensorflow Lite format and then update your apps to use a pencil floelite interpreter using the appropriate API. An iOS developers also have the option of using core amount instead and what they would do here is to take that trained tensorflow model in converted to Cora mouth using tensorflow to 4 ml converter and then use the converted model with the quarter mile run time. To the two common questions that we get when we talk to developers about tensorflow Lite is is it small and is it fast? So let's talk about the first
question. One of our fundamental design goes with sense of light was to keep the memory and binary size small. And I'm happy to say that the size of our core interpreter is only 75 lb and when you include all the supported apps the size is 400 kg. So how did we do this? The first of all we've been really careful about which dependencies we include secondly tensorflow Lite uses flatbuffers, which are far more memory efficient than protocol buffers are one other feature that I want to call out to your intensive floor light is
what we call Selective registration and that allows developers to only use the apps that their model needs and those they can keep the Hot Springs Mall. Stop moving on to the second question, which is of speed several design choices throughout the system to enable fast start-up low-latency and Hyderabad. Does let's start with the model file format tensorflow Lite uses flatbuffers. Like I said and flat buffers is a cross-platform efficient serialization Library. It was originally created a Google for
game development and is now being used for other performance sensitive applications is that we can directly access the data without doing parsing or on parsing of the large files which contain weights. Another thing that we do at at the time of conversion is that we previews the activations and biases and this leads to faster execution later at runtime. The tensorflow light interpreter uses a static memory and static execution plan this leads to faster load times. Many of the
Cardinals that tensorflow Lite comes with have been specially optimized to run fast onion on arm CPUs. Don't let's talk about Hardware acceleration as machine learning has grown in prominence. Because for quite a bit of innovation that the silicone layer is well and many Hardware companies are investing in building custom chips, which can accelerate your level processing. Gpus and DSP switch have been around for some time are also now being increasingly used to do machine learning tasks to take
advantage of hard acceleration whether it is two gpus dsp's or custom AI chips. On Android the recently-released Android neural network API is an abstraction layer, which makes it easy for tensorflow light to take advantage of the underlying acceleration. And the way this works is that Hardware vendors right specialized drivers or custom acceleration code for their Hardware platforms and integrate with the Android and an APR tensorflow Lite intern integrates with the Android and an API via its
internal delegation API. a point to note here is that developers only need to integrate their apps with tensorflow Lite dentists low-light will take care of attracting away the details of Hardware acceleration from the in addition to the Android n n a v i v are also working on building direct GPU acceleration in tensorflow life gpus are widely available in use and like I said before they are now being increasingly used for doing machine learning test. Similar to an API developers only
integrate with tensorflow Lite if they want to take advantage of the GPU acceleration. The last Speed and Performance that I want to talk about is quantization. And this is a good example of an optimization which cuts across several components in our system. The first of all what is quantization a simple way to think about it is that it refers to techniques to store numbers and to perform calculations and numbers in format that are more combat than 32-bit floating-point representation. And why is this important? But for two reasons
first model size is a concern for small devices to the smaller the model the better it is secondly there are many processors which have specialized Cindy instruction set which process fix what number is much faster than the process floating-point numbers. So the next question is how much accuracy do we lose if we are using 80fitz or 16-bit instead of the 32-bit which are used for representing floating-point numbers. Well, the answer obviously depends on which model that you're using but in general the learning process is robots to
noise and quantization can be thought of as a form of noise. So what we find is that the accuracy sent to be usually within acceptable Treasures. A simple way of doing quantization is to shrink the weights and biases after training and we are going to be releasing a tool which developers can use to shrink the size of their models. In addition to that we have been actively working on doing quantization a training time and this is an active area of ongoing research. And what we find here
is that we are able to get accuracies which are comparable to the floating-point models for architectures like mobile at as well as Inception and a tool which allows developers to use this and we are working on adding support for more models in this. Okay, so I talked about a bunch of performance optimization. Now, let's talk about what does it translate to in terms of numbers. So we benchmarked to models and Inception V3 on the pixel 2 and as you can see here, we are getting speedups of more than three times when we compare
one size models running on tensorflow Lite vs. Floating-point models running on tensorflow at point out to you that these numbers do not include any hardware acceleration. We've done some initial benchmarking with Hardware acceleration and we see additional speedups of three to four times with that which is really promising and exciting. So stay tuned in the next few months to hear more on. Now that I've talked about the design of tensorflow and performance. I want to show you what since it's all I can do in practice. Let's please roll the video.
Help. So this is a simple demo application which is running the mobilenet classification model which we trained on carbon office object. And as you can see it's doing a good job protecting them even this tensorflow logo that we trained this model and like I said, it's cross-platform. So it's running on iOS as well as Android and we also are running a tear on Android things. This was a simple demo. We have more exciting demos for you later on in the talk. Now, let's talk about production use cases. I'm happy to say that we've been working
with partner teams inside Google to bring tensorflow light to Google apps to portrait mode on Android camera. Hey Google and Google assistant and smart reply are some features which are going to be powered by tensorflow in the next few months. Additionally tensorflow Lite is the machine learning engine, which is powering the past Immortal functionality in the newly-announced ml kit. And for those of you that may have missed the announcement is a machine learning as CK. It exposes both on device and cloud-powered apis for machine learning as well as the ability
to bring your own custom models and use them. These are some examples of apps that are already using tensorflow Lite via MLK Picsart. It's a really popular photo editing and collage making app and VSCO so really cool photography. Go back to tensorflow Lite and what is currently supported so we have to Ford 450 commonly used operations which developers can use in their own models. If you need an offer which is not currently supported. You do have the option of using what we call a custom up and using
that and later on in this talk and Roula show you how you can do that. Optiboard is currently limited to inference. We will be working on adding training support in the future. We support several common popular open-source models as well as be quantized counterparts for some of them. With this. I'm going to invite my colleague Andrew to talk to you about how you can use tensorflow light in your own apps. Thanks era. So now that you know what tense of alight is and what it can do and where it can be run. I'm sure you know what you want to know how to use it so we
can break it up into four important steps the first one it and probably the most important is get a model you need to decide what you want to do. It could be image classification. It could be object detection or it could be even speech recognition. Whatever that model is. You need to train it yourself. You can do that with tents feel just like it rained any other tensorflow model or you can download app retrain model if you're not ready to make your own model yet or if an existing model satisfies your needs a second you need to convert remodel from tensorflow to tensorflow Lite and will
show some examples of how to do that in a second. Third if there's any custom apps that you need to write this could be because you want to spot off in my something with some special instructions, you know about or could be that you're using a piece of functionality that we do not yet support like a specialized piece of signal processing whatever it might be you can write your Ops. This may not be necessary. Of course. The last step is to write an app and you're going to use whatever client API is appropriate for the Target platform. So let's dive into some code converting your pencil for
model. Once you're done with a sensible training you typically have a safe model or you might have a graph def. What you need to do. First is put this through the converter. So here I'm showing how to do this within python. So if you download the normal tensorflow Tulane that's recompiled like app if you're able to run the the converter And it just takes the same model directory in or frozen graft off and if you specify a file name of what key of Life all you want and now I'll put a flat buffer that's on disc that you can now ship to whatever device you want.
Now how much you get it to advise? You could put it into the package. You could also say distributed through a cloud service where you can update your model on the fly without updating York or application whatever you want to do is possible. So next Once you converted well, you might actually run into some issues doing a conversion because there's a couple things that can go wrong. So the first one is you need to make sure that you have a frozen graph data or save model. Both of these are able to get rid of the parts of the graph that are used for
training. These are typically things like variable assignment variable initialization optimization passes to not strictly necessary for doing inference that is prediction. So you want to get rid of those out of the graph because we don't want to support those operations right now because we want to have the smallest version of the runtime that can be distributed to keep your binder size small. The second thing that you need to do is make sure that you write any custom operators that you need and now I'm going to a little bit of an example of doing that. Well before that, let me tell
you one more thing, which is we also have some visualizes let you understand the model that you've transformed and the transformation process. So take a look at those. They're linked off of the documentation. So let's get it to write in a custom off. So what kind of off tonight we need. Well here I have an example that's a little bit silly but it's the return pie. So the important thing when you write an APA is that you need to implement for C function. So we will see API for defining operations. And the reason we do this is that all of our operations are implemented this way so they can run
on devices that only support see eventually but you can write journals and seafood sauce in this case. What I'm doing is I'm ignoring the input sensors and I'm putting an output sensor, which is Empire by now if you had input sensors and you wanted to make a and I'll pretend sir. You could also read the input sensors and then say Oh * 3 + 11 * 3 Operation. This is going to be application dependent. And of course as I said before you don't always need to do this, I'm just laying it out here being to show that if their sum function out of that you need we are extensible. Okay, once
you convert your model, you need to use a client API. Let me start with the c plus with API, but we have other language by inz as well that I'll get to it. But in any of the binding it's going to fall the same basic pattern. The pattern is create an interpreter and load the model fill in your data execute the model and read back your data. So it's very simple soon as he passed. You are the first thing you do is create a model object. And this is given the file name all of the tentacle light file and it creates an object that is going to hold that modeling and map it.
So as I said before we use flatbuffers in the reason, why is that we can MF the buffers which means that there is zero latency to start running the model effectively. Okay second. If you have any custom operations, you can register them. So basically at this phase you're deciding which operations to include into your run time by default. We provide a built-in op resolver. That's that includes all of our a default operations. You might also use selective registration that we alluded to before were you include only a subset of the operations in this case, you might provide a minimal
resolver. And if you wanted to use the custom operation that we had before you would create a custom resolver that would tell tensorflow Lite how to find your custom operation. So Now we know what our options are and where to get the code for them. And we know our model now, we need to create an interpreter object. So we take the pair of model and resolver and put it together and it returns and interpreter. This interpreter is going to be our handle for doing our execution. So the next step is we're going to perform execution. But before we can do that, we need to fill the buffer. So if
you have a model like a classification model that is something that takes an image in where are you going to get that image? Well, the obvious place you might get it would be from your devices storage if it's an image file name, but also commonly would be a camera whatever might be you produce a like a buffer and in the supposed to be a flood siren in Star buffer and you fill it into our buffer and once you fill this buffer you're ready to run so we fill their buffer tensorflow Lite has all the information and needs to run the the execution and we just call invoke. Now it's going to block
until that execution is done. And then we're going to be able to read the output of it in an analogous way to our input. So that is we can get a float star buffer out which could represent the class numbers and then you're free to do with that data whatever you want. So for example in an image classification after we showed before you would read that index out map it back to the string and then put it into your gooeys display great. So now we know how do U C plus plus what if you're using another platform for example Raspberry Pi on Raspberry Pi the most common thing to use is probably
Python. And again, it's going to fall the same basic pattern first. We create an interpreter object The Interpreter object is now or handle. How do we feed data Wilson sits python we can use numpy array and this is really convenient because if you need to do pre-processing or post-processing you can do it with The Primitives that you're familiar with and this is kind of a theme that goes on that. We want to keep our bonding. Does idiomatic as possible in the language that they are and also keep a performance? So in this case we put in some numpy array and we take out some dump Iraq. So
that's python. What is your writing an Android app or you want to ride an Android things application? Then you might use the Java API in this case. It's the same thing take you build an interpreter. Give it the file name of The Interpreter. It might be from a resource. If you're doing an Android application, and then finally you're going to fill the inputs in and and call Ron. So one of the things that we did for the job apis that we know that many Java programmers don't really want to deal with building their own native Library so that case you can just use our grade of file here which will
include our precompile version of tensorflow. Like you don't have to download our source code and even for the tooling Parts where you do the conversion from tensorflow to tense of light you can download the free compiled version of tensorflow as I alluded to before So what are you doing iOS will not case you can use the seat belts with a bapi. You can also use The Objective C API. But again, we provide a precompiled binary in the form of a cocoa pod. Okay. So now that you know how to use sense of a light. I want to tell you a little bit about what's going to be coming up
intensive whole life. One thing that we've been asked for a lot is adding more operations. So the more operations that we had the more models can be run from tensorflow out of the box. The other thing that happens with machine learning that's often difficult is that researchers come up with new technique all the time. And that means that tensorflow is always adding operation. That means that we're going to continue to follow tensorflow as it as important operations and add them into tensorflow Lite as well. Okay. The second thing we're going to do is run improve the Tulane provide better
documentation and tutorials and try to focus on ease-of-use. So it's really easy for you to understand on end-to-end example how to integrate tensorflow Lite And the third thing which they're already mentioned, but I'll mention again is that were excited about on device training on device training is really exciting because it allows us to refine a model based on a user's experience. It allows us to decouple that experience from going to the cloud. So if they're disconnected we can continue improving the model. So there's a lot of requests for this this of course will require
more computation on the device, but we're excited about upcoming upcoming Hardware accelerators that will make this more and more possible. Okay. One more question before we get into some exciting demos. When should I use tensorflow life? So as we alluded to before we're starting to use tensorflow Lite for our first party applications and 30 third-party applications are also using it. That means that what we're doing moving forward. Is that working to make tensorflow Lite are standard solution for running ML on small devices and mobile devices Center flight tensorflow Lite
currently supports a subset of tensorflow Ops. This means that our recommendation is that you should use tensorflow light if you can and let us know. And let us know about any missing functionality you need it's not quite done. You probably want to see our demos. So with that I want to show you a video of retrain model. We showed you that tensorflow Kobe logo being recognized. This is a common thing that we get which is people like our pre-trained examples like mobilenet, but they may not have an application where they need to tell the difference between five dog breeds in many zoo
animals. They might have an office in area where they have markers and whiteboard and in fact that you were testing the app we found we had this issue 2. It's like we don't have the classes that are in these preacher and models. So one of the great things that one of our other tensorflow members created was something called tensorflow for poets and there was a code lab about that and it's available online as well. And it basically allows you to take a pre-trained image model that has really good detector ability and put your own classes into it and I want to shoot you a demo app that we
created that runs on the PC and create stencil full line models for you. So can we go to the video? Okay, so we showed you before can we recognize scissors and post it notes. Let's try it out. It was want to try these models check. Okay, the scissors looks good, right? Okay. Great. Posting also looks good. But what if we had another object and object that's you no more, No more important like this metal tensorflow logo. This happens in everyday life, right? Okay. Let's go take a look at how this does. Well, it's labeled as other that's not
very good. But the great thing about machine learning is that we can fix this and the way we fix it as we had data we have our application was gone to our training table now and now we're going to define a classical tensorflow and this is basically a short for tattoo for logo and now from a webcam we're going to cook capture. I'm going to capture a couple of different perspectives as many as we can and ideally you would take it on different background. So it doesn't associate the background with it being attentive logo, then I click the train button and right now it's using tensorflow to
train them. And you can see it's converging to a good validation accuracy if in a Reload the model and we're testing it in tensorflow Lite running on the PC right now, and we see that it's recognizing tensorflow correctly. So it's that fasten easy, but also we can take that model and we can move to Android and iOS and use the exact same model and update it. So, thanks now. Let's move to a live demo. So I'm going to go over here to the podium. All right.
Okay, so classification what I just showed you is kind of this idea that you have an image in and you put an image out and you put classifications out. But what if you have multiple objects in the scene or you have something in the corner of an object? You also want to know where in the where in the scene that object is and that enters this model called single shot detection. It's a type of model and it turns out that our friends in the center for research released a package called object detection that's part of the tense of a model and then basically allows you to use their free
train model that recognizes many classes. So what I've done is I want to load that onto a small device in this case. We talked we shown you a lot of things about mobile devices. I want to show you another small device. This is a Raspberry Pi so the Raspberry Pi is a really cool example of a device because it's very cheap and easy to get so any high school student can have one of these you can have many of these and just use them for a dedicated project, but the other great thing about them Is not only are they relatively powerful but they're also able to interface with other Hardware they
have gpio pins and then be capitalized in a number of different ways. But one way is to run Linux and that's what we're doing here. But you can also use Android things which you can see one of the cool sand of the the sandbox has many examples doing that. So you can also do this with Android thinks. So in this case, I have the system the system board right here and it's connected to a motor controller and this motor controller is just a microcontroller that interfaces to server Motors. The server Motors can go up and down left and right and they can basically aim the camera. So now
what we're going to do is win a load or object detection model onto this device and we're going to actually run it in order to recognize different objects to let's go to the demo feed, please. And we can see my app. You can tell by the beautiful nature of this app that I'm not a great app developer. But this is what I can do on a weekend. So give me a little bit of slack. Okay? So here if we hold up the Apple it's recognizing the apple and it's telling us you know, what probability and where the object is now, that's all good and fine, but when we couple it with the ability to move
where I'm at to turn on the motors now and now I'm going to bring back the app. And what I'm going to do is I'm going to move the Apple Inns in the screen and it's going to try to keep it centered. So as I move this apple, it's basically going to try to keep it 10 or so. It's like a little virtual camera person and it's works on other objects like this banana here. Hopefully we go and it's going to keep that Center and if you put two objects in the in the screen, it's going to try to keep them both in Okay, so So get a little bit of faucet actions, but basically it's going to try to keep them
both Center. So this is really a fun application and I bet you can come up with many different application that are really exciting to do with this type of application. So can we go back to the slides again and I'll So I like I said, this is a basically what I can do in a weekend, but I imagine great app developers than people with a lot of creativity about connecting devices and connecting to offer can do many interesting things. So what I want to do now is I want to tell you in summary
tensorflow Lite you seen a lot about it. Basically we feel it it tentatively make on device ml small fast and easy and we think that you're going to find it very useful in all of your applications and work cited of to see what you build. Come talk to us. I'm going to be in office hours at 12:30. You can come talk to me. In addition. You can come to our sandbox if you haven't already and we have of course the examples that I showed you here. We have the the tracking camera. We also have the object classification on mobile device. But another cool thing that we have is the donkey cars
in this was done by a group outside of Google and we converted them over to tensorflow Lite and we are excited to say that their application works really well with tensorflow Lite as well. So was that I hope you check these things out. I want to tell you that if you want to get started you can go to our documentation page you can go to the tense of food at or page and there is a TFI page where you can find more information about it. In addition. Our code is all open source. You can get it on GitHub. You can download it modified submit a pull request and of course file any issues that
you have while using it in addition if you want to talk About cancer 4life talk about your application ask us about feature request, please send her mailing list. This community is really exciting. We found that an open-source in tensorflow. We got a lot of excitement. We got a lot of interest and we made it a much better piece of software to use forever in both people inside Google and outside Google and we hope that you'll engage sensor full light in the same way that tensorflow has been engaged. So with that I want to thank you for your attention for coming to Ayo for listening to our
talk about tensorflow Lite. I also want to thank you thank our Google Partners. This product didn't come out of isolation. It came from all of our experience building mobile apps with machine intelligence as we gain experience. We found that there was a common need and that was the Genesis of tentacle life. So all of our partners provided application provided feedback even provided code and help with with are models. So thank you so much you and to them and Enjoy the rest of Iowa.
Buy this talk
Access to all the recordings of the event
Buy this video
With ConferenceCast.tv, you get access to our library of the world's best conference talks.