Duration 33:11
16+
Play
Video

[Arm DevSummit - Session] uTVM, an AI Compiler for Arm Microcontrollers

Thomas Gall
Director at Linaro Consumer Group
  • Video
  • Table of contents
  • Video
Arm DevSummit 2020
October 6, 2020, Online, San Jose, USA
Arm DevSummit 2020
Request Q&A
Video
[Arm DevSummit - Session] uTVM, an AI Compiler for Arm Microcontrollers
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Add to favorites
62
I like 0
I dislike 0
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
  • Description
  • Transcript
  • Discussion

About speaker

Thomas Gall
Director at Linaro Consumer Group

Tom's Linux experience started by sacrificing a pile of OS/2 install floppies to install SLS. Tom worked for IBM's Linux Technology Center before joining Linaro. As director of the Linaro Mobile Group, he oversees collaborative engineering involving Android and the Linux Kernel by a wide range of SoC vendors, handset vendors and Google.

View the profile

About the talk

Abstract: TVM is an open source deep learning compiler stack that is able to consume models from many well known frameworks and produce optimized inference for a variety of hardware targets. This talk will introduce you to MicroTVM which specifically targets micro controllers. The talk will discuss the architecture, status of the project, and conclude with a demo.

Presenters: Thomas Gall, Director, Linaro

Technical Level: Intermediate

Target Audience: Architect,Software Developer

Topics: #ArtificialIntelligence, #IoT, Microcontrollers, Linaro, Open Source, Machine Learning, #ArmDevSummit

Type: Technical Session

Conference Track: AI in the Real World: From Development to Deployment

Air Date: 2020-10-06

Share

Welcome everyone to this session about micro TVM and AI compiler. For our microcontrollers quickly. Introduce myself, my name is Tom. All, I'm the lead of AI project within linaro jayren. Talk about the TV and project as well as the microtubule project which is a subset of the TBM project. So, let's begin. And introduce. What is TVM, as I mentioned, it's an open-source AI compiler. So by open stores, when I'm referring to, he is the license that is made available under. So all of

the source code is available. You can build it for yourself so you can modify you can change it anyway. You want, you can make those modifications available for others to go and take the TV M Community. Vtvm project is really a framework so it's not just a single tool, it's a set of libraries. It is a set of capabilities that you get where it is able to import and digest a number of model, small variety of machine, learning Frameworks, and do. So instead of an egg stick,

Kind of way and within this compliation stack that is TVM then they have a variety of ways to Target different CPUs and gpus and offload engines that are available throughout the industry has a fairly new project. It's only been around for a couple of years and it's just exiting what is called the the incubation know that is Fenian as it's been getting started. So now after graduating into a more mature open-source project within the Apache Foundation has mentioned is under an Apache virgin to license within the side there is

Sun TV on. Attached. Or just want to go and explore their web pages and documentation in such and isn't mentioned it, it does support a wide range of hardware on some large So let's dive a little bit more into the, the TVM architecture wood. And why is it if you can know, why would it be interesting to the Army ecosystem will first and foremost, as a Set of tools and Centuria. Compile AI compiler architecture. It has the ability to digest model, two more variety of AI Frameworks of this, from the honest project

Icarus mxnet tensorflow, tensorflow Lite. It has the ability to ingest a model from any one of those projects and that makes you no TV. I'm particularly quite used for for for those wanting to do inference on arm devices. Next once you've ingested a model what happens is that model gets changed in to a an independent representation. So and I are language if you will and then that I our language is able to be run through a variety of optimizers that then too for

what that model needs to accomplish in order to perform inference. And then last the last mile, Of course translation to your specific type of Hardware. Now, historically where TVM has been is on higher end pieces of Hardware, like arm servers or cortex-a devices. And so things on the lower end is actually something fairly new when it comes to TGM as a project. Lastly, I want to mention something fairly exciting and unique to TV, how much is on the right hand side of the slide and that is something called Auto TV.

It's an Optimizer and what it is able to do, is it uses sort of feedback and goes through space at a successive generations, to optimize your code and end up with a faster, performance better-performing model and that makes TPM. You know, in many ways you were it's using artificial intelligence to end up with a better performing artificial intelligence model. We should think about the performance of TVM and compare itself to existing Frame Works. So why would you

turn the TV M as compared to just running something with in tensorflow or was it turns out when you use an AI compiler technology? You do, accomplish better performance. In this is through that, that I R Lair, where set of successive optimizations are made against the model, and it can make performance issues that you can realize performance gains by making better choices for the optimized code. That will be perhaps targeted at an offload engine such as a GPU or a DSP or or even a CPU. And so in this case, you're what I've done

is have taken in arms, 64, run is done on a thunder X machine. I not just putting some comparative number, so we're using imagenet. And this is a mobile app version, 1 all the way through mobile that person too. And then Inception version 3 and I'm just doing a comparison in the measurement of numbers where these are kind. So a lower number is better or has a higher number means that it took more time. You want your friends to happen faster and so, you know, as we see here, you know, That TV, I was able to do quite well in its ability to

optimize forearm. And the great thing about it is, is, you know, this is TDM optimizing. For inference is not TVM optimizing necessarily for an arm processors, just being able to make good choices in general. But the truck is all about, which is my controllers and devices in the small. So this case here, got a board stm32f 7 Series, which has a court XM 7 on it and I can have one of the boards right here and were able to take a model such as from tensorflow. And using TVM have the ability to

run on this class of hardware and were able to do that true, the microchip and project. So before we go in, Annex For what is microchipping project? Let's think about the problems faced especially associated with the farming inference on cortex-m devices. There are some in a special unique cases that are different from doing inference on microcontrollers, as opposed to doing something on a larger server. So firstly, General, it has a dance capability to digest

models from a variety of different sources and that in particular makes TVM like me. And when were working with the microcontroller environment in particular, we have to be very cognizant of the models that were important in because of a size restriction. Often times when you're looking at off-the-shelf models from some of the model zoos or some of the ones that you might create. From the tension uniform like a tensor flow project through jupyter notebook, large. So you need to go and oftentimes quantize your model, 745

representation, internally to something, which is more based on integers to reduce the size of that model. And so this is something that he know. TVM, once you've done, that participation step outside is able to blow in that contact smile and make you sit across the microcontroller landscape. You know, there's a wide variety of arm-based. My controllers that are out there. There's a horse, a quite a number of AI Frameworks that are out there and there's a wide variety of arm Hardware that's out there. So, you know, you take all those things together, you end up with a white,

a Cartesian product of materials and different types of software that you need. May need to be targeting for and that's, that's pretty Complicated to work with until a framework like TDM with get up with binaries, can help to make that easier. We saw the vision going forward. Now, the other thing that you have in the microcontroller Farm, it is. We need to not just ring for time, but we also need to be thinking about the power that were consuming. We need to be jamming our cells into some very aggressive working set size. We have to just, you know, consumers

all's not around as possible and we need to think about the offload that we may have available or may not even have available. Maybe we have to do the expense entirely on CPU and we have to do that inference. You know, at the same time that other things are going on within the microfiche other such as working with sensors or with happy. So you know these are our things which are you need to the microphone fireman? As I mentioned earlier, quantization is a particularly important thing that we need to do for running. Prince on microcontrollers, the size

benefits that you can gain from swimming down, from a 32-bit or 64-bit base model, down to do something. When she is, maybe fitting into a bite, or even maybe sport it is long. As you can, keep your accuracy up. That size that you gained, is what is going to help. Allow your model to run on a device application. Logic is something that needs to sit side-by-side with the words at the farm, where the prince is occurring and sent, you know, cortex Amazon from running, you know, maybe our Costless or even within our

cost, and we have to make that integration as easy as possible. So you're in supporting us thinking of this range of hardware and supporting you know optimizations which aren't just in time base but also power bases. That's there's a lot of unique issues that need to be delivered on when it comes to performing a i on a microscope. So let's now jump into micro TV on as a project in what is Michael cheating on me? What is it hoping to accomplish? What was the very first thing that I want to make very, very abundantly clear? Is

that my car is working progress. This is something that is gone from a proof-of-concept to now, starting to be really Alpha level Co it is not product, ready? It is not something that you want to take and deploy and product, not yet. Really the vision of the, the project and the work ahead of us. Over the next few months used to graduate the code. That's there, make it mature, and get it to a point where his product ready. But I think it's important that, you know, as a project and, you know, this is something that is going to be sure it's going to be useful and it's something that

you can use in the future. And so I think that's what makes it important to hear about today. Soap thinking back to the the the prior slide and then thinking about doing inferencing on a microcontroller is really trying to accomplish. Its main goals is we want to leverage the technology that's within the TVM framework. So this is the engineering has been put into making a model run optimally and doing optimizations on that model. And being able to use that honor microcontroller without

having to engineer, a whole new AI framework, instead we get to take advantage of the research and things that people have already invested and committed to the project, we need to have a minimal run time. So we need to make sure that when we're doing inference using TVM that we're not creating a large overhead in order to do. So we want to make that the skinniest possible. How do you talk to a deaf? Certainly when you think about Technologies, like, you know, Jay Lane Coeur, St, laying or what-have-you, you know, there are ways that you can

connect your, your Dev board to a host machine and be able to do interactive development. Well, certainly my car to give me to do the same thing. So we were procious initially with open OCT and an RPC to allow for interactive development. So that means you could be sitting here and working in Python to develop a model as well as an application and work, send that down to the device, test it and get back results. Interactive. And that's that's a pretty important capability. Next is you know the

artist that are out there such as Zephyr and embeda Os or free are tossed. Our cost is another just as soon as samples or maybe you want to run our toast, listen completely. Run our cost less than Mike Ricci p.m. needs to step up and provide at least the bare minimum that you would be able to run on a on a fortune question. Then lastly, you know, we think about terms things in terms of what are the languages that in application developer going to have to deal with S O, C plus plus and python a release, the two primary languages that deals with and I

see is there as well. But really what you're going to do up doing it, she's going to end up some kind of those mentioned. You have a model that you've trained this model, it's ready to go. And now what you do is you import that model, it gets transformed into a real AI are and then the next step for Mike Ricci p.m. is to transform the relay. I are in 2, C, C, plus, plus codes, combine that with the device support, and vendor support that you need any environmental support. Compile

that and then ship that down onto the board and then you could go around in circles as you improve your code in and hone in, on your, you know, your desired solution. So, you know, this development workflow is one that is really optimized to be as productive as possible. So, let's talk about the binary, that's going to actually find its way onto your end device. What does that look like? Well, there's really three components here, and they're separated by the three columns on the Flies. So on the left, most of the low-level stock control, this is your

device start up. This is your peripheral support. This is the zip code that go out and work with the sensor, do whatever needs to be done as far as interacting with the board. And this isn't something that necessarily is going to be supplied by my TV on. This is going to come from your artists or your vendor. And you know, it's really cold, that's engineered to do the right thing for the for that you're working. With the middle column is what powers a couple of different things? So first of all, we have the run time, so this is a mercy run

time that just enables micro TVM to be able to run on the board. Then there is the part that enables that interactive development workflow that I was talking about. So this isn't something that you would necessarily using a production employment starring but this is more when you're doing to the outlet. So you haven't PC server and you know it's going and listening for you to push down binaries if it's stopping as you need to as it runs a little Snippets of coke. And that RPC server takes care of all the framing in the end session information, that

would be associated with the work that you're you're doing. And then last of the child model that you have. And, you know, this is where you are. You don't see your specific things that you're going to have. That would be compiled for your board and this might be seems to stand in as an example and then it gets LinkedIn and taking advantage of the of the RAM and the particular set up that you have as far as maybe you have an offload technology that you can turn to

or maybe you're doing ever a tsp. Okay, so let's talk a little bit about. Where is the project today? And how are we tracking things? You no answer to what is the road map look like and kind of where we going from here. So very quickly, the road map is available for public comment. It is public. It is posted publicly. It's been online now for several months. And so what you're saying is is the members of the project, we're all started jumping in and breaking off pieces to work on the various bits that need to be implemented

internal to linaro until tomorrow. What is the company that made up of a member companies? What we've done is when we got involved with the project to step back and said that would get this to be something that would be product-level. So what we did as we develop the product level Speck that you know, talks about our own vision of where we want the project to do such that it can be product ready and Do something that would be usual filing our members as well as the community and those people that would would would use up my 40pm in the future. And then what we

do inside of an RO, we have a weekly sync meeting that we can talk to and just kind of going to keep in sync with each other as well as you're courting our own activities. And then you do likewise, then there's the TV on for me this word as far as what's going on, you have pull request that the lawn as well as comments on code as it contributed Okay, send next. What I'd like to do is go through an example of what does working with Mike riccio p.m. look like. And so this is a situation where think of yourself as the developer you're

going to do need to do an inference on a Dev board, the board that I chose. And here is the SM F7 Discovery board. So again, that's this particular board have right here is a cortext M7 baseboard. It is able to communicate via openocd so this is just done through a USB cable that connects to st-link on the board. And it makes for a really nice little little bored to to do our, our work done. So in this case here would I have is a minor work through a set of a python code that is going to

load a model, then, what we're going to do is run a transform and compiled that model will send it down to the four-wheel, then give that model of piece of data to use for inferencing and then that models going to turn around and it's going to give us a result or a prediction. And in this case of this python code, there's a lot of The Usual Suspects year that if you've been doing inference or working in in AI circles for a while, you'll probably represent immediately recognize things like numpy and and so on and so forth. So, in case of TV on the TV and libraries, we

have something called TV on micro. So we go ahead and report that in the TVM contribute libraries. They have the ability to pull in test. They do for us as well as pulling in our, our model and do the the load. From us. Country. Folks on this picture I have a tensorflow lite model that I have already gone and tuned and created. It's a very, very simplistic model. It is something where we give tvmc, tensorflow Lite, binary from the model. And we just go ahead and load that into memory into

a, into a flat buffer. And once we've done that, then what we're going to do is we're going to transform that model into ir. And so in this case, here we loaded into memory and then with the chest light. Model. Get rid of model than what we're doing is is now we're going to set things up such as we can, we can parse the model. So, very first thing, we're doing just as a kind of a bit of debug turn around and ask the library to make sure that will be blowed. It doesn't recognize it. What version of the model is it in this case, this comes back and tells us that

it's, it's the first model. Okay, so now what we're going to do is work going to transform the model from a tensorflow, lite object into a relay IR model. And so this is what we have to do is we have to tell the system a little bit about the models on this case. What are the inputs? So, in this case, the input is a 32-bit floating point value. It's just one, and the name of cancer is called Dance underscore for underscore input. And then we go ahead and pass this information in with the shape

dictionary and the type dictionary. And then we we received back what we need in the form of the model on the cramps. So next what we're going to do is we're going to set up the actual Target to device and so on this case here that our Target is going to be C code, and the device is going to be TVM micro device. And then next week we specifically will set up a type of a hardware piece of Hardware that were going to talk to me. So in this case it is a stm32 board where were creating a device configuration and

with that device configuration then we're going to set up a micro TV on session so this is setting up our interactive session which allows his scent binaries down bored as well as data and be able to run the model. Then lastly on this site what we're going to do is we're going to transform that model. I'm going to turn it into basically a binary something that we can we can ship down into the Bison. So the case here before we start to do that, what we're doing is when a chicken creative, transform object and we're going to set the optimization level. But they were also going to turn

off 2 things. I'm so, as it would be working with the model to optimize it, we're going to specifically disable fusing operators, and we're going to disable vectorization. This is a specific. Choice that were doing because Ian the current state of my crew, TVM if we act, if we turn these things on things will crash. So we will be there. There's some issues that we need to work out with that before we enable those particular abilities. Next, we take our age, they work. Next we're going to do is we're going to build our relay. I are so this is

using the model and the friends that we got earlier. So using those for passing this two objects, then we're going to do something to get a craft back, and we're going to get the BBC module. That is nearly ready to send it to our Target device. And so now with the micro create micromod. So now we're creating the module. So this is how we're going from Seco to Binary. And this is Dan going to be specific for are bored and ready to descend down to the device. So this creates the first step to create micro is with creating the

binary. And the next step is creating the run time. So now we're getting ready to run. Are binary or Checking with the context context of running that binary. Now we're ready to run and so after this this crate stuff has been done, what we can do is we can set the input value to our model. So this is going to be a value that goes into the input sensor and what we need to do is now do that in the form. That's in pork temperature needs. So this is the kind of the next two statements in the middle of the fight which is set the

grams. So we're just kind of basically kind of testing two dots here and then we're going to create a n n d r a m p r, a s s. Single value is .5 with a value of 32 and 1 status set The value is put into the input sensor. It is then sent up to the board and so ma. Run. Now we perform inference. So we're doing is grasping, run the model, give us a prediction, at least as far as, you know, what should do, what it should be protected. And then once we return from Ron, that then we're doing is

we're going back out to the device and we're going to get that results back and then print the screens on this case here is a very boring value of 0.5. So in this case, hear the the the model really, what this is doing, is this modeling a sign. So, when you put in a value, will try to predict what should be the next value in the in the end, the same way. And so we just get it from the output output tensor and write it out on the screen. It's the grand scheme of things up, kind of a very contrived example, but it does show how you can consume a

tensorflow lite. Model and get it into my car TV. I may be able to run on a Ford in just a few lines of python code. Okay, so in summary, what I want to do is leave you with a few things about how to get involved with the project. So, you know, maybe you're somebody who's a bit more advanced in your interest like to get involved with the AI project project and make some contribution. So the source code available on GitHub, it's very easy to go out and get that. We have a contributor TVM this tells you how to format your patches, how to go through the process

of contributing them and making them available for I get Hub. Basically what we do is we use pull requests as far as an interactive ability to have conversations and proposed ideas or talk about problems that you run into, that's what they discussed forms are for. There is also a slack channel that is available. It is not as active. Ask the discus form. So I would actually ask that you probably go to the forms as a default communication path. However, there are a few of us to do hang out on

slack who work on my car 2 p.m. or just aren't within the Army ecosystem and certainly know the more the merrier to come and join us. If you have bugs and issues that you discover with, with GM of the, again, the GitHub system is what we used to to track those So that's my presentation. I hope you can find some interest in a micro TV on as a project. I really encourage you to kind of keep an eye on it. I look forward to hopefully coming back at a future date and you never use a three-arm Deb Salvador, other avenues, that will be able to sing

some Praises of what's been going on with 5:30 p.m. and the next involvement as this project evolve and become something that is useful to to you and your products as you do inference on microcontrollers. So thank you very much.

Cackle comments for the website

Buy this talk

Access to the talk “[Arm DevSummit - Session] uTVM, an AI Compiler for Arm Microcontrollers”
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free

Ticket

Get access to all videos “Arm DevSummit 2020 ”
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Ticket

Interested in topic “IT & Technology”?

You might be interested in videos from this event

September 28, 2018
Moscow
16
177
app store, apps, development, google play, mobile, soft

Similar talks

Pete Warden
Engineer at Google
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Jason Andrews
Solutions Director at Arm
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free

Buy this video

Video

Access to the talk “[Arm DevSummit - Session] uTVM, an AI Compiler for Arm Microcontrollers”
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free

Conference Cast

With ConferenceCast.tv, you get access to our library of the world's best conference talks.

Conference Cast
735 conferences
30224 speakers
11293 hours of content