Duration 31:52
16+
Play
Video

[Arm DevSummit - Session] Using Arm NN to Develop Edge AI in the Smart City

David Steele
Director of Innovation at Arcturus Networks
+ 1 speaker
  • Video
  • Table of contents
  • Video
Arm DevSummit 2020
October 8, 2020, Online, San Jose, USA
Arm DevSummit 2020
Request Q&A
Video
[Arm DevSummit - Session] Using Arm NN to Develop Edge AI in the Smart City
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Add to favorites
76
I like 0
I dislike 0
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
  • Description
  • Transcript
  • Discussion

About speakers

David Steele
Director of Innovation at Arcturus Networks
Pavel Macenauer
Software Engineer at NXP Semiconductors

Experienced Program Manager with a demonstrated history of working in the electrical and electronic manufacturing industry. Skilled in Embedded Software, Management, Product Development, Product Marketing and Semiconductors. Strong product management professional and cross-functional team leader, graduated from Ryerson University.

View the profile

Pavel currently develops accelerated ML backends running on GPU/NPUs and enables NXP's eIQ Machine Learning platform. He actively contributes to Linaro's Arm NN framework and as such he was one of the developers contributing to the Python enablement in its latest release. His past experiences involve the development of safety-critical RTOS/display systems for Honeywell Aerospace or image processing applications for photographers.

View the profile

About the talk

Abstract: This session will examine Arm NN and how to apply it to vision applications as a highly efficient neural network inference engine for Arm cores, GPUs and NPUs. It will introduce the new Python interface, which brings ease-of-use to a whole new level and look into the Arm NN backends that enable hardware acceleration. Achievable real-world performance will be demonstrated by using a smart city, public transportation use-case that combines Arm NN with a state-of-the-art detection model and a full vision processing pipeline.

Presenters: David Steele, Director of Innovation , Arcturus , Pavel Macenauer, Software Engineer, NXP Semiconductors

Technical Level: Intermediate

Target Audience: Hardware Engineer ,Software Developer ,Other

Topics: #ArtificialIntelligence, IoT, #HPC, Linaro, Linux, Mobile, Open Source, Machine Learning, #ArmDevSummit

Type: Technical Session

Conference Track: AI in the Real World: From Development to Deployment

Air Date: 2020-10-08

Share

Hi. So, I am Paloma Santa Clara, and I'm a software engineer at an XP system and an XP semiconductors and together with David Steele from Arcturus networks, we are going to talk about arminta and its deployment in the Smart City. So in the first part of the presentation, I'm going to introduce our man, talk about the python interface which was introduced in the May release and then dive into back and switch or a mechanism, which connects armanen to the underlying Hardware afterwards. David is going to take over and

talk about the smart TV, use case, and hold a design starvation pipeline in actress Networks. So what's Armin on? So I mean honestly middle of inference engine for machine learning on the edge. I would like to stress the bird middle are here because on the input it takes models from popular framework, such a standard flow sensor. Flood, light Cafe when an ex and it's mostly delegate more dollars to the underlying Hardware, typically, either directly to write driver, or you can be basically any software like a computer engine. An example of income pension, which has used in

arm and hand is the uncommitted library, which accelerates accelerates the models, either through neon for cortex-a processors or three opencl for Mali gpus, or it can also explore a journal netzaberg using the ethos an interview or if you have some custom Hardware available to own on your machine, you can use a third-party driver. More more about the Arman and projects. So I don't two years ago we are men and project was donated as open source to the artificial intelligence initiative, Leonardo watch official intelligence

initiative and maintained by it. Still arm is the man that will be her boss community on an XPS? Well, as participating, and you can track most of the releases on get help. Our man is release quarterly barely. See the Austin coronavirus together with the arm come to the library. As I mentioned, are going to get library, is the computer engine for Armand on. If you would like to contribute or participate, more on the developments, are you can use the amount pass from work, which is outside my time by Elena Rose. You can contribute, the Aurora

sign up for the mailing list. So no more about pie Iron Man, which is the code name for a python interface, which was introduced in the May release of our man. So, basically, by Armon and doesn't implemented a additional computer, computacional Canimals. It's only implements if you a Helper and convenience functions Edition lie, to most of the C plus plus API, and into the whole Armada and see, Mike system. So when you are building your arm in on, you specify all the cement variables and it runs to build Straits, the library uses

creates the binaries, and if you enabled by Armand, n s, while it's will builds either either device and sauce package or the Play some binary package Legacy version of our men, and you also have that option. Those are available on the nxp, micro get help us. Well, afford a 1908 11, and 2002 releases and what you actually need to do that is to provide to the Armand and libraries and other files be compiled and it's a standalone projects which builds the rapper separately. So have a little

bit it's more about those who are interested in Python rapping. So wake which is a project would you hear witches are all ready for a most use for that. It's available for a large number of languages by Sanjay bus trip to Pearl pH be having Ruby as well. And basically what it do, see that you provide the other files off, your favorite C, C plus plus library, and it's generates a python interface which is basically the same as the interface you need to do two things that are the

first thing is you need to modify set up. I slightly to the pie is the python file a bitch. Its use-by setup tools to generate your path and packages and then you need to ride your sweet and plates. Syracuse has a Custom language for that, which is pretty simple to understand. And what you do there in those sunglasses that you expose the you before you define both supposed to say. You want to expose, to go to the user or to Swig, in this case, how to compile its it using sameach. So basically

it's a part of the Standard Iron, Man. And belts are millennials has seemed, I guess it's built system and you just set a variable either for double package of the minor a package or for the source package, if you are feeling a bit more hockey, you can also use the street directly in the arm and Rectal Surgery. And what produces is it either produces the monitor package, the video, which you can see has a name, like sea 37, CP, 37 them, Etc. So that it's

that means that its platform dependent it's crossed compiled or compound directly for your architecture or you can use the sauce package which you then use for example, onion on your board and if you have a compiler like GCC available there, it's composed of action on your part is not available as of right now. I buy. So your favorite install name of the module system So, a little bit about fireman and how to use it. So, Don't be scared too much. It's cold. And so, it starts off with the standard Imports. You

can import your math librarian. Apply for example, opencv for loading images and then by I'm on a swell. After you need to choose the Armand and posture. So that specifies, what framework, you are actually using or you used to build your model. So, here I'm using tensorflow Lite. So I have my Center, flood lights model already prepared. I load it. And then the important are some insulation. And the important part order is the preferred Bekins variable there. So remember that line, that is GPU a c, c c, f u a c, c. M, c q r. F specified. That are so those are actually

names of our men and beckons. So I remember those things because I'm going to talk about those a little later into the wrong time and day is the mural mural Network model. It's been out here at 12, so you something like opencv to load crmh, you specify the inputs. So you buying the image to the info, you need to specify the outfits. So Yep, certified and off the road. So you run in France so far that you have to nqr code function which you then call and say it's produces the outfits in wealth, better for you, specified your outfit and I'll direct

you can process it. That it's really simple to use. You can use all kinds of additional convenience libraries available on your board. If you have python there and take much more comfortable to work with and do you see + + and compile a mess. But, whenever you make some change? So that's an example of a year and an example of which is also available in the arm and The Reckless. If this produces a cancer with a classic something called Soft max out foods which produces the most

probable probable Boss and the input here was a cat's so you can see is a most probably a tabby. So they some big cats with an m on its forehead. So, I know remember those CPU ACC and I can, so I'm going to talk about cancer. So what's what's on our mind? And back into our man in back and it is an abstraction layer. Basically it's an interface which connects your model or your drive definition to the underlying Hardware. So either it's going to be a directly driver or it's going to be some confusion bike bicycle. You can connect it to any software available or

any Library. There are Typically, there are four door four available, not going to counting to and puke. You have the Open Sea albicans which is available through arm complete Library, which uses opencl. So it's going to not only be used on the GPU but it's can be used on any hardware which has opencl enabled and it's excellent you through through opencl and I mean, he's pretty well then which which is optimized for the cortex, a CPUs and uses. You also have the reference back and Twitches you use just for testing or it

can be the Lost full bag if some of some of the layers. So I can support dates in your in your moto. Additionally, it's pretty pretty easy to implement your custom bike. And so for example, snxp half hour, hour and fuse, which use a customer account for acceleration, can be either Uber length statically or dynamically. Yeah, it's about it. So no for the Lost slide about the cans there. There's a nice example of how they work and it's cold so hybridized iqushion.

If I mentioned you can specify multiple Bekins. So here's an example, I will give it back and and also the Army on my bike and so, you know, where you stay here. But then I asked me if I can only supports convolution, so it will run older condos in layers on that beckons and based on how you define the back and switch, you wanted wanted to use it. So delegate the unsupported layers which in this case would be the average blink ladder and a fully connected player. It will delegate those to another and other back ends which enables those

This is the old on her internal and optimize function. So basically it's a user, you just specified, the back ends and hit with the software would automatically optimize the whole the whole run time. To achieve this. You bring your envelope to my school. It's basically create summer clothes for every liar and a biscuit. Justin Bieber. Clothes are executed during prime time. Just a little info how to implement it, but it's pretty simple. How to implement your own back and you just need to implement all the interfaces. You need to implement Oliver clothes

unit tests, right? Make bottles. And that's it. So fun to ride. You can see you go to fold their best unit test. You go to folder with work clothes, for example, for convolution fully connected liar, and then you need to implement a few interfaces switch. Yeah, just leave tape on those few interfaces together with make files. So hopefully that's wasn't too to be key. And I'm going to head over now to David's steal from objects. Networks, who is going to talk about more

about, we go about examples and abuse cases in in the Smart City. Play Bubble. Hi, everyone. My name is David Steele from Arcturus that we specialize in Edge, AI nvision Solutions using arm, a class CPUs, gpus and MP use. So, now that we understand the role of Armon & in an edge system, when we know how we can implement it, what I'm going to do during my portion of the session is pull the lens back slightly and look at how we can use our men and as the are in France, engine in a real-world

application. Let's take a smart City example in public transportation, public, transportation offers a good use cases that really emphasize the benefit of edge processing. So why do we want to process at the edge, will simply put in the address where the data sources and where the action's need to take place. This makes and processing inherently more efficient and lower latency than shipping data to a central location for processing. And then waiting for a result, could be a number of other reasons, including privacy concerns, the infrastructure and

operational costs and complexity. And although I have a picture of a subway platform here. Consider that there is over 65,000 public buses on us roads, today, if you can imagine the bill you would get from continuously streaming video data over LTE from each of them everyday all day, just to perform some relatively simple analytics, it just doesn't make sense. Okay, so that's a quick overview. Why we want to do processing at the edge? But what is it that we want to know. If we take this busy subway platform here is an example there's probably a lot of things we might want to know

how. For example, we might want to know how many people are on the platform where they're located, but did you close to the edge of the platform or have they crossed over onto track level? We might also want to detect packages bags or luggage that get left behind and identify who owns them. And of course, in today's climate, we might want to detect a density and proximity to Aiden, things like social distancing, and we can do all of these things. But the question is for, you know, where do we start? The first step is to really think about our vision pipeline as a whole

and we need to remember that. Inference is only one aspect of it. In addition to inference we're going to need to rely on algorithms to help us perform tasks such as motion-tracking. We're going to need to apply to aristocrate logical rules such as actions based on time of day. And we're also going to need to gather data from our seen over a. Of time to look for patterns anomalies are strange Behavior. The output are results of our provision pipeline in some form of representation to show what division Pipeline and provide some form of identification.

In addition to each stage of a pipeline, we also need to consider carefully. The overall architecture of it are application after all. He's real time and it uses time-sensitive video Frame data and using latency in our system are losing synchronisation. Not only lead so bad user experience, but it can also lead to things like incorrect, bounding box data being provided to another pipeline node, resulting in an overall for performance. And finally, we also need to think about serviceability and flexibility. For example, if we want to update our detection model, why do we need to update

our whole pipeline? Or if we want to change it, text characteristic. How can we do this easily end run time? So, now that we generally understand the pipeline stages and art design considerations, we can start to formulate this into an architecture. And there are a lot of ways we could go about their, but if we think of the pipeline as a collection of nodes, and he's noticed his own micro service than this approach has some benefits. For example, we can fly from cloud-native, methodologies. We could container eyes or nose and nose in this can

help us meet our serviceability objective by allowing us to fairly easily upgrade pipeline components. At the level. We can efficiently, serializing, deserialize video Frame data using flatbuffers, would even do something like a fairly standard message passing library, to handle metadata and synchronization. Between, each know a bunch like this. Give us a lot of flexibility. It provides a method for us to where we can figure it out with him. This is quite important because it means that we're not limited to doing all the processing locally. For example, if our workload changes, then

we can add a note or offload. A no to another physical resource to make the architecture Moore distributor. Using this approach. We can also orchestrate the pipeline at runtime, which makes it highly Dynamic and easily configurable. Ultimately the pipeline architecture. We choose will have the biggest impact in our latency synchronisation flexibility serviceability. Design consideration for the one design consideration. We haven't talked about yet is efficiency and this requires us to look closely at the section models. Smart City application. What we're

trying to achieve with a Trane, Furnace model is to text and classify objects. Know lots of models that do that, but the range of characteristics, different input image sizes, Prix changing, different data. Sets supporting different model Percy, again to take an accuracy different difference, engine support in different and even different Hardware back in. If we look at the model and isolation art design consideration, nice to meet our real-time add processing requirements and this does help us narrow down the field significantly models. Like mobilenet for example, were

purpose-built for processing. I need models when combined with a detection had such as a 50, make it possible to achieve depression and classification at 1 for Passive. The network and its efficient, even makes it possible for us to process video and in real-time at the edge, even receive the Eucharist without dedicated Hardware acceleration. Now, of course, using models like moving that. There's always a trade-off. In this case, we're trading off-speed for accuracy, particularly when it comes to smaller object. But for our use case, he's actually less of a concern. We need to do is

we need to quickly detect people in real time at for us that require squeezing out every frame for second. We can even if we get a couple of false positives or negatives along the way, So now we talked about the type of model we need, let's talk about how we can achieve it in the best possible performance. Probably in his portion of the presentation, reference different back into their supported under arm and and and we can look at how these affect performance and their minds were talking Insurance here. So that leaves chart set with these charts. Lower number is better. So the

chart on the left is that comparing Armand. N c p u r. F vs CPU ACC Eastview accelerator back-end refuse me example, back and provided by are not optimized is intended for developers to help build their own more optimize back end or application is written in C plus plus. And it supports a basic set of primitive without multi-threading doesn't take advantage of the neon single instruction multiple data acceleration, offered by the court. So in a nutshell CPU, wrap is intended to be a reference implementation, not a performance reference or production back-end, but it does

provide a point of comparison to illustrate, the improvements can be achieved. When we do optimize the back into the hardware, we are running. I mistakes. UCP uaccm prisoner in France is down to 211 milliseconds, just like taking advantage of the multi core support and acceleration that allows us to do that multiple Matrix math operations in one cycle. Now to put in more of a real-world context by using CPU ACC, we're able to achieve about 4 frames per second which is to say not bad just for making good use of facilities that already

exists in the processor. Incidentally I sent a reference to a lot in this presentation for comparisons and that's simply just because it's very broadly supported. But even performance with mobile 93 small can a higher frame rate so we can get by eight frames per second. Now, another a reference point, we can use it to compare arm and end with opencv 4.2.0, which is the chart on the right now, in a way, this is a more fair comparison. In the CPU, accn opencv, both made good use of the Coors available spreading and the OS with some slight differences in model form

at But if we're looking for performance, we can probably still won't do even better. So discharge it was strange the games we can achieve when we start to optimize the model by using. Quantisation, in this example were using mobilenetv2 running on CPU ACC using 1.30 been in the next. Next part of the Shark Week once I saw model to use in a Time run it again on CPU, ACC young about the 50% for performance Improvement since finally in the third column, we change the back and to use mpu acceleration still

running or Quan Type in 8 model. And in this case were seeing an overall for inference from let you know. Let him know seconds to 25 North 2nd and to put this in a more practical terms that increase our performance from for frame frames per second to eight frames per second to 40 frames per second. Interesting enough, what's not shown in the side. Slide is what happens when your model is not optimized for the hardware. For example, if we were to run a non-contact model on the empty, your performance is going to go way the other way. Instead of 25 milliseconds were going

to get performance in the range of 3 seconds, which can be worse than she just ranted on the Rock Court. So the key message here is it is important to ensure number one that you're using the correct back and support the model year using is optimized for the hardware that you're running out on whether that be CPU GPU or empty you and if you recall from Pavel portion of the presentation, make sure you validate your choices near back and preference list as well. Okay, so now that we've got a pipeline architecture and weekend

except people in objects, we can start to do some fun stuff. We can start to work on understanding the relationship between objects in a time or space domain it argues case. The simplest starting point is to identify where people are located on the subway platform. And to do this, we need to understand sweetie space or at least have a representation in 2D or 3D. We can do this by stab ocean boundaries or zones in the field of view. And then using the bounding box score from the detection of what combined with a class data to locate where a person appear in this particular. Example,

we're breaking the platform down into three zones. We have a red highlighted exclusion Zone, which is close to the edge of the platform and over the tracks, and we have a yellow warning Zone and a green inclusion. So we just resent. The location information. Me, how many people are in each area on simple web interface? While this is a pretty straightforward, it gives us the basis to add additional heuristics. For example, in the same as if we know the size of the tiles using floor and we can detect how close people are to each other. We can use this proximity information to help maintain

social, distancing, or even control the flow of people where they way, we can do this All using existing 2D cameras that are already installed. All this is good and where I would now well into building our application. But what if we wanted to do something a little bit more complex, What if we wanted to identify that someone has left a bag on the subway platform. Now, to accomplish this, we need to identify the bag and its owner, and then introduced tracking Andre identification to recognize the getting them again, in future frame. Now, the simplest form of fracking is Moshe

model tracking, and this is where we use object velocity to create a prediction on. Hello. I mean, if a new detection, fits within that sounds like we can assume it's the same object Identity, or if it doesn't, then we create a new identity. I was approached is fairly lightweight, but it does have limitation. Number one is limited to fix velocity of object. Only two, it needs a reliable and continuous detection. In other words, an object really needs to remain in the field of view. And then once crack is lost, there's really no way to recover So the only question is, how can we make this more

robust? Well, we can enhance motion model tracking by adding a visual description of each detection. And then compare these descriptions to determine whether we're in the same object appears in future. This method Works in a similar sort of way to facial recognition as both approaches rely on embedding that works Big Break feature. Vector Hard music for a feature vectors physics Drive visual appearance. Once we have our feature of actor, then we can measure the distance between Pairs of images either using euclidean distance or cosine similarity. The chart of the top-right illustrates

comparing, the distance between dissimilar as a people. And the chart on the bottom is comparing pairs of the same person. And using the data, we can see the distance between distance to Paris is somewhere in a medium range of 1.1 to 1.5 compared with + 0.32 + 0.7 where it when were comparing pairs of the same person. So, based on this, we know that we have a fairly deterministic method to distinguish, what same and different identities are. And so, by adding this visual appearance and getting this will help us to overcome the limitations of motion model tracking,

I will allow us to read identify objects, irrespective of time and space. But of course, as with everything, it also has, it cost, it requires us to generate and embedding for every object detection. Bus more detections more expensive, it's going to be. It also requires us to have a second Network to process. The embeddings in addition, to our detection in Classification, Network. Fortunately for us, we can leverage our pipeline. Add a note in a pipeline and process, the sun, Hardware resources, optimal for this task. And this is a really good

approach rest today. But there's also some really interesting emerging solutions to this problem for tomorrow and it's two areas in particular, quite excited about a torturous one. For example, is a new generation of networks, this emerging such as centranet and these networks and not only detecting classify, but they can also create the embedding all-in-one forward pass which eliminates the workload variability associated with having to process an unknown number of detection in the frame or even the overhead of supporting to network. Is also new dedicated, real ID networks. Such as OS

nain, they're demonstrating very good promise as well. So hopefully, I given you some good Insight on how Armand and can be used in a real-world application and also how it fits. Well, in Ten teams into a well-designed pipeline architecture for your detection work flow. I am also Illustrated importance of considering algorithms and data analysis. In addition to entrance as part of your overall design. Now, I ran out of time to present the Life part of the demo and links to a live demo, which

you can reference after this presentation, and the lake is on the slide. So on behalf of Pavel and an XP and myself, David Steele torturous. I want to thank you for joining us today.

Cackle comments for the website

Buy this talk

Access to the talk “[Arm DevSummit - Session] Using Arm NN to Develop Edge AI in the Smart City”
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free

Ticket

Get access to all videos “Arm DevSummit 2020 ”
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Ticket

Interested in topic “IT & Technology”?

You might be interested in videos from this event

September 28, 2018
Moscow
16
177
app store, apps, development, google play, mobile, soft

Similar talks

Jeff Underhill
EC2 Business Development at Amazon Web Services
+ 1 speaker
Arthur Petitpierre
EC2 Graviton Sr. Specialist Solutions Architect at Amazon Web Services (AWS)
+ 1 speaker
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Jason Andrews
Solutions Director at Arm
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Ciprian Mindru
Software Engineer at NXP Semiconductors
+ 1 speaker
Markus Levy
Executive Advisor, Marketing and Business Development at Deep Vision
+ 1 speaker
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free

Buy this video

Video

Access to the talk “[Arm DevSummit - Session] Using Arm NN to Develop Edge AI in the Smart City”
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free

Conference Cast

With ConferenceCast.tv, you get access to our library of the world's best conference talks.

Conference Cast
735 conferences
30224 speakers
11293 hours of content
David Steele
Pavel Macenauer