I currently work on TensorFlow, within Google Brain with a focus on making it more efficient on mobile systems. Previously, I've worked on Android Things where I helped launch some of the first Google Assistant smart devices and in Google Payment Fraud, where I built various ML models and graph mining systems for fraud and abuse prevention.My graduate school program at CMU was focused on Machine Learning, and research geared towards its applications in mobile context-aware computing and natural language.I've also spent developed several full-stack large scale production systems and applications previously at ThoughtWorks, Capillary Technologies and Akamai.View the profile
About the talk
The TensorFlow Model Optimization Toolkit is a suite of tools for optimizing ML models for deployment and execution. We will talk about two recent improvement in the toolkit: LSTM quantization support and Quantization Aware Training support in Keras.
How to everyone my name is Jen. I'm a software engineer on the 10th of bloating today my college of pocket and I will be talking about the 10th of operation for kids. Multiplication means transforming your machine learning models to make the Magnificent wax cubes. That means faster competition as well as lower memory storage my battery uses. How do you say focused on inference instead of training? And a peacock of the above-mentioned the benefits of unlocked used cases that are otherwise impossible
examples include the speech recognition, please unlock updated accident music recognition on anymore. the mode of operation together is the weight of it is a flower in a tenth of latitude that make the same part of my Zmodo of Malaysia is an active research area and there are many techniques are we going to prioritize the ones that are General across Moto X pictures and across various Hardware accelerators There are two major techniques in between Kate and pruning correlation stingrays fluid calculation lower Bates and pruning 460 interconnection
Rocklin to focus on pronunciation and I will briefly talked about pruning. Not have to take a closer. Look at a connotation is a general term describing technologies that reduce the numerical Precision of the static parameters and ask you to the operations in lower Precision percent reduction makes the mod makes the motor smaller at the Loop Station execution makes the motor faster now that sting a bit more on to Halle perform organization. As I conquered example of Proto value has the most cases we are wasting most of the
representation space in the floating Harbor line is where can I find a linear transformation that Maps the float value onto into it. We can reduce the X Factor for then competition that can be carried out of between intitle values and that is where the speed up comes from. So there are too many approaches to do a connotation for string and different when you put when you operate on a already trained the model and is built on top tensorflow light converter during training for navigation performance additional Waste Management in training is required of it is built on top of
whatever floats your ass API. Different techniques offers a trade-off between ease-of-use and the Mordecai Tracy the most easy to use technique is the dynamic Rental Car navigation, which there can be some accuracy lost by week attitude with her two to three times speed. Because floating Point calculation still needed for the activation. It's only meant to run with you. If we want extra speed up, I'll see if you all want to run the motor on how to work the Raiders we can use indirect Univision. It's a rather small set of and label the calibration data to collect the Mini Max
trains on television this remove the floating Point calculation in a computer graph to there's a speed up as if you buy more importantly it allows tomorrow to write on how to work the Retard Sajid ESPN TV you which are faster and more energy efficient. That's if you And if actress is a concern because you was going to Disney World train to fine-tune the weights. It has all the benefits of integer addition to required training. Now that's halfway up operator level breakdown on the first one edition. Dynamic range is a fully supported an interlock violation in the support of
the foremost of operators think this is the required annual Network support and that of blocks use cases that has a Speech & Language wear contacts is needed. To unblock nose use cases. We have recently added a recording your own network on addition and build turn key solution through the question AP I can be converted in a quantized. That was the postman API. This trial shows are entering workflow in the post. When is a table. We created the center flood light converter on a Lotus evora are in model with inside the post hernia
operation flags and that provide I could ever seen that happen. After that. We're about to call the converting that they're due to convert and contact the modem. This is the exact same APR and workflow for more. Without RN. So there's no attachment for the end of users. Let's take a look at the challenges of the iron connotation transformation. Erin cell has a memory state that a person is to across multiple times x Sokka navigation error country music in Postville. Our Direction and The X reaction RSL contains many calculations and a determined determined
by the number of bits and the scale is a global population problem Auto contact operations are restricted for how to work in The Villages at some operations are not allowed on 13 Hardware platforms. Will you stop The Challenge on a great addition Edition spec for in the fuse back is quite a complicated then and this slideshow this back by zooming into one of the house candidates as mentioned that there are many technicians in YSL to balance performance and accuracy with keep 8-bit calculations as much as possible and it only goes to Harvey it's when we parted
by accuracy. As you can see from the diagram Matrix rated operations are in a debate on a web terminated operations are a mixture of a 8-bit and 16-bit and I please note the use of higher base is only internal to the South the input and architect activation for are in a cell are all eight bits. Now we see the details of our and connotation. Let's look at the actress in the performance. This table shows some publish degrees number on a few that I said. It's a speech recognition model that consists of 10 lyrics of La Quinta estacion. Inka drug court house. Tomorrow has the same accuracy
as the dynamic range appoint as the model and actress lost is negligible compared with the flow. I also this is the phone no more. So are in condition works with brownie as well as expected that there is a four-time motor size reduction because best way to appoint as to edit a performance. Why is there is a two to four times speed up CPU and more than 10 times to get up on ESPN TV you, other operators. So here are the main takeaways to the floor now support the R&L Jim consolidation, which is a turnkey Solutions through the postman API is smaller faster and more
energy efficient execution. ESPN TV you already production model that uses the connotation and a please check the link for more details on the music. Looking forward to our next step will be to expand upon addition to Auto record a new network such as the grun house. Are you we also plan to add an additional we're training for our ends know. I'll hand it over to my colleague pocket. Thank you. Thanks Jim. Hi, my name is bukit. I woke on the model of my addiction to nicotine and let's talk about because I'm simply working.
Sorry, can I go back a slide? Quantitation to read training training is a training technique for improving the accuracy of quantized models. The way it works is that we introduced some of the letters which actually happened during quantizing friends into the printing process and that actually helps between a loan around this area can get a more accurate Marine first place is Mother's Day run in your position and because of that little Aussie process and that leads to an accurate picture of and why,
but I mean nobody wants an inaccurate the Golden State going to get the best of both worlds and that's why we have the system. Get a sense of why these losses get introduced one is that we actually have a bunch of quantized morning Steve, but I'm interested in your position. So in a sense you have more information of a few buckets of information. So that's where you have like information representation lost. The other problem is that when you're actually doing these computations then have competition lost when you actually adding to
course wedding to the final buckets of values typically during matrix multiplication test for operations until you accumulate these values to in 32 and then you rescheduling back to invade. Do you have like the grease killing lost? The other thing is that Jenny when we run these accomplishments during influence their biggest influence optimizations and get applied to the graph and because of that the cleaning graph and infants graph can be sucking different which ones who can put in trying to do some of these letters. And how do we recover lost Acura TL for starter
or we try to make the training grow fast to nearest possible to be in infants to remove this and the other ones that we actually introduced kind of these letters which actually happened during inference. So the train alone surrounded and you know machine learning. It's magic. So for example when it comes to making errors, as you can see in the drop here like you go from read to your position. So let's see if you'll return floating-point you go down to integrate. How many go back up to floating point. So in a sense of Ashley mimic what happens during infant when you're executing at
lower position, then you actually do your competition and Decor butcher inputs anniversary. Tin date of Ambien lawsuit have been introduced the complete to have his correctly. But then after the conversation you had another account kind of drop back to your position. The other thing is early morning the infant spots. So for example, if you noticed in the previous slide and operation came after the railway activation, this is one of the optimization that happened during and since then the delegates folded in and what we
do is that when they're actually constructing a graph V. Make sure that these sorts of my listings and let's get the number to the number to pretty good if you look at almost as close as a really powerful you can ask me executive morning, which is gives you nearly as good accuracy and quantized. So what's the value of users when you have or on the one hand has pimple on this one line API that I can use quantize your money converted and go ahead and executed. This works great for app developers and Engineers have sex while you might want to go ahead
and then we have a slightly more complicated if it's like you can kind of configure you're quite useful Dremel Engineers from researchers. And if you want to go like completely out there you can actually completely concealer Foundation algorithm schemes different boots Etc what you want and the weight of a good fertile ground for researchers or Hardware engineers. So basically easy, how does Kenneth morning if you want to let you guys are in time Uriel
typically, you know construct the more important blood pretty much the same thing great importance remodel optimization Staffmark. That's the package if you can stop them or quantized the model and then just go ahead and do your compiled fit all of that continue with that. I know you might not want to contact the entire morning. Maybe you want to contact your morning because you know either most sensitive to want. I want to get the most performance example, you have a quantized
which letters do you want to point ice and then you apply it at the end and then you're good to go. Beyond I tell you might want to control the conversation with for example, you have a particular but you want to control which meet you on the phone that is how you want to point I said that case also it's a pretty similar API you use when you ask me for something and this conflict of the infrastructure, how do I synchronize the flat and let's look at how you define this country to which conflict is largely thanks to things one. Is that what is it in Britain declare that he want to call dies
and the other is how you want to quantize it will tell you what county is and the other thing is you tell us a tiger in just one tire is basically an object that encapsulates can the algorithm for the point is v-power give you a bunch of building one, but you can write their own. You might want to find a little later. So let's say you have your own special algorithm like a fancy convolution layer that you're right. And you won't apologize. As well. Well, you do it almost in exactly the same way you and this
config hotels with how we should corned eyes of yours. So fancy bear and again, you tell us how to quantize it and are in this case. Would you look like a histogram? Do you don't completely control or what sort of strategy of using 2.0 model you in this case could use a histogram to determine the range and then phone died and and that's how you would write it out with the ministry simple. You just Implement two methods one of them before you to construct any available to meet and then in the call method
will give you a bunch of 10:30 you quantize in them. music concerts in grilled chicken breast and it doesn't end here like a reaction me Friday to ability to completely in Ocala. Like I mentioned earlier that you know, you can be the fuse barrels for you, for example, so you can actually Define your own kind of transformed switch that works out of manipulations you want to do. So in summary of Point addition to be raining here is an API which helps you recover your accuracy while
getting the benefits of condensation. It says please simplify for easy adjustment quite flexible. If you want to do more complicated things and it can be expanded Edition loss that happens on be released and or sparkly things are some time but then you have like a story that we can transport models and executed on device and you can also use corn digestion and spicy together and that's pretty powerful one day when they go together. So that's the model of
migration tool kit. It's a suite of tools that make your mortgage faster and smaller foundation and Spark city are the main techniques that we have. You can find us on GitHub / model optimization. Please file any requests concerns bugs feed bag at your house, and we're always working on making those models smaller and faster. Thank you.
Buy this talk
Buy this video
With ConferenceCast.tv, you get access to our library of the world's best conference talks.