-
Video
-
Table of contents
-
Video


- Description
- Transcript
- Discussion
About the talk
In this session, we introduce the basics of reinforcement learning and show you how to apply it to train your own autonomous vehicle models. You also learn how to test them in a virtual car racing scenario powered by AWS DeepRacer. Learn about the single-car time-trial format and the dual-car head-to-head racing challenges in the AWS DeepRacer 3D racing simulator. By the end of this session, you will be able to participate in the AWS DeepRacer League, where you can compete for prizes and meet other machine learning enthusiasts.
Learn more about AWS at - https://amzn.to/3j4cSmt
Subscribe:
More AWS videos http://bit.ly/2O3zS75
More AWS events videos http://bit.ly/316g9t4
#AWS #AWSSummit #AWSEvents
About speaker
Hope you're doing well, thank you for turning 8 of The Summit online and we'll come to the session. I'm done eating and I'm super excited to be here and share with all of you to help you get started with a SS deep freezer. So by the end of the session, what we really hope is for you to have some understanding of what reinforcement learning is, and how easy it is to get started with reinforcements learning to stop and train your own deep freezer model. So you can take part in one of our people are going out and creating your own Community reefs and big shout-out to all
exclusive edible steep, razor presenting sponsor, you provided exceptional leadership and participation. During the inaugural year of the end of the spirit of competition running the largest private university in the world. The extension, a diverse business group or a bee gee is the driving force for the collaboration between messenger and our joint customers to One of a Kind partnership offers the best of Etta by surfaces and Essentials. Unparalleled breadth and depth of industry and Technology experience. For more information, go to msn.com aabg.
This is how we structure our session for today. We will start with talking about the origin of the freezer, and a few components that you need to know. Then we'll see if our discussion to reinforcement learning. We will talk for a novelty of reinforcement, learning reviews and how the Freshman Learning Works in the freezer. We will also guide you how to build your first reinforcement. Learning model for deep freezer without step-by-step console to. And after you have your photo, nice time to participate at 8, to send it by Accenture, to
get an idea of what temperature is all about. Welcome to the AWS deepracer week 2020, presented by Accenture. These are 1/18 scale RC cars that are fully autonomous. Thanks to reinforcement learning On the day, the AWS deepracer League. Competition was announced a center has been all in. We're thrilled to be part of the deepracer experience. Imagine that you can build an artificial brain for navigation and plugged it into device. Do you have a big idea on? What is
it is 1:18, Scale, robot the car which gives you an exciting and fun way to get started with reinforcements learning by applying it to Artemis racing. So we lost 30% in 2018 to get Def Leppard Hands-On with the machine learning, unless you and I was depression has enabled dance of thousands of developers globally through the end of the first Global Optimist Racing League. And evil is the next Generation in autonomous racing unlocking, new challenges and new race formats. Appraiser is also an
integrated Learning System to help Def Leppard from Annie level to understand reinforcement, learning and how to apply the alpha conference in. The first one is a dusty place of evil evil still learning and racing with new senses that enable it to detect objects. We are swapping out the single camera and putting into a steroid camera and a light detection. And ranging sensor or known as lighter, sensor arrays format that we introduced in 2019 and object in addition of time trial is a 3D racing simulator that you can find in the deep freezer
control. So you don't have to wait for your physical deep freezer to stop learning. You can start now in the Edibles control, the 3D simulator in the console is where the building tax place and it's really your starting point to get your machine learning skills, for building machine learning models. And once that, you have that kind of confidence, you can have showcased during any race competition for the next component in a bit steeper, slick, presented by Accenture to get, ready for racing in 2020. And up is depressingly. Either in person or facial And you can also
built models to race against friends, and colleagues using the new community rest feature. So, that's all the companies in Edibles deep freezer. And I just want to reiterate that you can start learning right now by Lucky in Tooele bestie Presa, Council Foundation first before we can start building a model. So with that, let's go over the basics of reinforcement learning. Let's see how our enforcement learning fits in the AI contacts intelligent sticks to create machines. That seem to have human intelligence. This is the general concept. Ikea task is learning how to classify things
or events and also Mexican Pulp Fiction based on past Behavior What you learning is a subset of artificial intelligence, that aims to replicate these human ability to build models addiction. The statistical techniques learned from data to figure out what the parents are. The ultimately leads to the prediction. Michigan has three main categories, which provides learning. We can build a model to predict a family or to classified data. Models are trained using large amount of security training data
with labels. Without sacrifice learning things have been different, models are trained to identify similarities in large amounts of data to 830 vacation and doesn't have Exquisite labels. And which rainforest Melanie, which finally can build a model to predict which decision to make in a specific environment. Add a bottle of train in a similar environment where the models can interact with the environment and learn based on the outcome of actions, if an action was good or bad and it was a seaman categories in machine learning.
That's given to a concrete example, to illustrate our efforts. Melanie works is built on an idea. For example, when was the last time you used to reward, the right behavior of your pet? Peanut butter approach used to print out, bad behavior or role even more. Complicated passions of this game. Of course, it's for your fat to do all kind of tricks. From training or Pat's, an example. Let's dive deeper into reinforcement learning by understanding future Middle Ages. Be more detail. Reinforcement learning is a machine-learning technique that
enables an agent to learn in an interactive real-time environment by trial-and-error using feedback in the form of your from its own actions. So we didn't seem later, we have an agent that act autonomously in a given environment to raise a specific goal agent. Agent is the title and what is the goal of the ancient? The goal is to finish a lap around. A track is the surrounding area for our agent in tracks. In this case, the informant is the track. We decide within a Bass Pro. Bow maker is defined by the current position
within the environment. The basically, this is what the agent can see through his camera. The action can be defined with how fast a house loan, agency sets, run, a nickel to be including the direction. For every state of origin needs to take an action to try and Achieve his goal of making a lab. And depending on which action it takes, you will be given a reward. If the tooth impaction guess the agent closer to the goal, you can reinforce this action in future true apostrophe work. Or how do I discourage it with a negative or no? But this we want
is provided by the environment itself, as specified by a reward function, Behavior to parameters Swan by Mighty creator of the environment. By the last concept here is an episode which represent each situation where an engine goes from the start position to a termination state. If you drive off the track of finished, a lap around the track, we need to know so we can fully understand the whole concept of reinforcement learning in Edibles, deep freezer. So we got it.
Now, we have an agent which in fact by doing some pictures within the environment and it will be rewarded depending on how close or far from the goal. So what is the most crucial components will help us heading toward And yes, its function function in particular, behavior and is at the core of the most important job has raised great to see. A practical example of a root function is a state, and a square is a starting position. Was the finish line is the
goal of terminal stay at the age of the track as a permanent state, which will tell if he heichel that even has gone off the track and field agent to determine if the action is just took was good or bad based on the outcome of the election in based on the Artist as a reward. Three waffles and can be seen as the logic that was incentivised to driving behavior. You want your agent to learn how to fly to Egypt to drive on the yellow Centerline? By assigning a higher reward
to the center line. Or model will learn that actions that drive on the center line. Gets higher rewards than actions. At least you two sides, initially, or agent would have any idea of the value of learning in the state associated with an action. So it was nice to explore the environment, and then explore some more and instructed, patient to build up this knowledge. Armando start off, not knowing anything about the reward for specific actions and derivative 3x + secret until we move out of the track or reach the destination. Before we start
again, I'll freshen it drives around if he had glaucoma surgery was from this course. We have to find this article. It is also running, but we watch the Paris actions from a specific State leads to all steps from started going off track or the finish line is called an episode to get the best Model. Behavior is normal and we call it expiration. First agent needs to explore to see. What? Three words can be obtained from Paris. Actions tried in the various States. For the first thing,
it will first explorer. Ask the agent games more and more experienced to the exploration and nutrition. It started learning where it repeatedly gets hired Awards. Didn't start exploring laugh and moving on to exploring what he has learned. Now confident happens when a model stop repeatedly picking specifications depending on the state is in. Now the model is optimizing for expected cumulative. Return and model performance will be December be pitifully with subsequent updates to the model. Not really changing tomorrow. However, and exploitation, which
you need to consider with a hippopotamus. If you ask Brooke to much your mother may take a very long time to converge. If you ask ploy to soon, your mother may not find the best driving behavior and potentially also fail to converge Now, let's go back to the previous way street. So what we have here is a small crate in our agent, it easily could do it and pass all action to determine, which action will result in the eyes at the community. If you want. Assuming that it will choose the action with the highest expected to metafiction, where we know the feeling of
being in a square is called a fail, you function, the highest fail you now have a policy function and tomorrow is sometimes called the policy net worth. But the idea of knowing how valuable it is to be in a state is a very important concept that can be achieved from the state hours. If the agent keeps following the optimal policy, So let's recap what we have discussed about breakfast. Millennium in Edibles deep freezer where we are interested in creating a model that can learn how to make a decision in an environment
based on the reward that it received when it interacts with the environment. Do you want fashion is allergic to incentivize our Model Behavior? We want the model to do in an environment to learn, what can be achieved from each state, based on the various actions and the reward function is logic data. Science are awarded based on the outcome of its action, fraction of the value of being in each state. If you want from the state, when following the optimal policy, But how does a deep laceration learn? Skin Deep laser, our agent is a car and it
interacts with similar to previous track environment during training around taking pictures about 15 frames per second. Is Allstate, the picture will take an action and end up in the music. And this is called a step evil, has to do cameras and Laser Center in death of its state and consents object to the state action next. All steps from the starting point, until it reaches, terminal sleep, and he's called an episode. What's the agent has collected experience, which you need to specify in hyperparameters, it stopped updating, and training its
model. Now, the goal during training is to figure out which action in which state will leave to the maximum come out this effort towards it is sent back to the age to collect more experience and as for the exploration and exploitation once again is control to the model Hyperbaric test. However, unlike the draperies depresso can explore all states in the simulator. It will simply take too long if not impossible so we cannot fully get them in. The next question would be how do we select the
action if we don't know the value of the state's policy optimization which means we optimize the policy function making state to action in order to achieve our goal of using a method called Franny like gradient descent. Funny, the policy gradient is an example of policy optimization, that is a matter of policy function. Now department is a simply to waste in the neural networks and the narrow sense of policy function as an input from each state, and our goal is to get the maximum come out if you want.
Between a model of didn't wait by trying to maximize the community future reward in doing. So, we gave high probability to the action that leads to the highest cumulative preacher award. Debaser use of form of policy, optimization called proximal policy optimization or PPO. Can you send me your network architecture that we use in deep freezer? The default architecture is a simple six. Major networks. The first layer is an input and this is followed by a tree last year to help identify the features from how images needed to make driving decision.
Finally, we have to fully connected players to help. Determine what action did these agents will make Network as a mathematical formula with each note, in the network determines, how much weight should be placed on certain features in the image and we all become big enough. The output of the network will indicate a higher probability for the specific action during training. This was our adjusted with the car maximize the actor, the tumor is returned during racing
services Amazon sagemaker, twins tomorrow or models, Amazon, start the Lost Amazon key to his feeder streams displays the video in the console and Erebus Troublemaker container in your service account. And Linda to the right time to start raining, which is what are generated in Erebus Troublemaker to thread model. The new model sent back to Erebus, Troublemaker to get more experience and process continues and metrics are stored in other services, such as Amazon S3, Amazon, front porch, and Amazon Tina's, video stream. You have your mother already. You
can download and block into your TV as a device optimization for fast. Differential on the Saudi Saudi technical. Specs on what's going on under the hood of a diversity. Play, some people with 40 gigabytes of memory, it has built-in Wi-Fi, which we need to connect during the physical event to control. The car also has a 4-megapixel camera with 360 degree + 12 minutes, scanning radius for larger sensor, and the application on top of Eben to operating system with Intel up infinite two kids, and heroes
Connecticut. It's how do you speak Taipei ports? One USB, C Port, one micro USB and one at the Mi with all of this port in Mech, the freezer pretty much configurable for future app. What a cool thing which policy Pacer is open phoenotopia. Open. Funeral stands for open face or inference and neural network optimization to help speed up and keep learning workloads. Streamline, deep learning inference and deployments and enable easy heterogeneous execution across interparfums, including
accelerators. The integration of other thing that has kept his ambulance inside a Blazer that you can easily take that. You've just try and use the tool kit model Optimizer on foot and Optimus your train model Indy racer. Difference Engine allows you to the boy on the in the Atom Processor in France engine across multiple into architecture. So you can use the powerful Optimizer on model. Strain to cancel, flu and then used in France engine for high-performance inferencing. Okay, so we have come up with most of the things that you need to know
about the appraiser. And let's go to the console to help you get started for building your first model. So here we are at the deepest, a passport. As you can see, there are few things including video to help you get settled and participate at eight of us depressingly. And also make sure that you're in u.s. is one region on this page. You see that the lcu schedule race is happening with different racing form of such as time, trial object avoidance and how to hack Racing keep your eyes on this page as we're rolling out throughout the year. You can also view the current leaderboard to see the
ranking position and also to watch the evaluation video. Like all we have right here in March 2020, you morons to complete the lap and also to avoid any objects on the track. As we mentioned, a few other writing format example of how to hack Racing in, how to hack Racing, the RC boats assigned to compete with your mother. As you can see that the mall is now trying to overtake the other parts as they race on track. One picture that I would like to highlight Community racist racist is one of cool teacher that we
launched a train from 2019 and you can organize your own Property, Brother, competition, and invite your friends College. A wider audience by sharing telling once you have created your competition, And now, let us guide you to build your first reinforcement. Learning model for deep freezer for you. And you can see the diagram, and these are the steps that you need to follow through. Next thing that you need to do is to check your account resources. See if you have palette. Edible stickers are some sauces? And if I let IM Rose to stop peeling your mother, you only need to do this once
and you don't need to do it again. If you don't have any issues, if you want to come, pick you up on the standing up a Praying. For some reason in particular, you can stop RL which will take you to a microsite that we designed to walk you through the basic stuff is Microsoft stock. Sure to equip you with all the information needed to get salad. It is very well-documented and we like to encourage you to learn from here as a starting point. Next step on this page is you can stop the cricket model and Grace. And if you want to learn about sensors and Racing Form
must, you can click the link on step 3. However, we like to recommend you to build your vehicle first and customize it in your carrots page. So you have a sense of understanding on how you function and hyperparameters will be affected as a highly depends on the vehicle that you have stopped creating a new vehicle. How do you spell should be? Given a set of options to modify your senses Pro Camera to buy the sensors? Do you have two options? They are single lens camera and stereo camera. Single lens camera on Moto camera.
Has 120 degree field of view and more suitable to handle. Simple autonomous driving past such as time trial. Estero camera has two lengths to capture images, to determine the death of observed object information. From Estero, cam is Faribault for the vehicle to avoid crashing into obstacles or other vehicles in the front. So, this is why Sarah camera is most suitable for object, affordance and had to hack Racing, however, this to Sarah Kamm, next train going for more slowly. There is also an option for you to add more senses. In this case, is like the sensor
sensor uses rotating lasers to Santa Cruz is a flight to direction of distance to, the object specific tools. Heads are recorded as a point around a larger unit. Leather help detect blind spots off the horse vehicle to avoid collisions while the vehicle change lanes. And by combining cider with mono or stereo, can you enable the horse real? Michael to capture sufficient information to take appropriate actions but it also comes with a track off. The neural network must learn how to interpret the Lights Theater in results. The training will take
longer to converge. So the bottom line is understanding sensors for your vehicle is essential to achieve the best Motor Performance. Is it going to rain in time trial? One of the equation that you can choose as you can go with using Moto Cam? We talk like two sensor with three layer. CNN If you're doing have to have object avoidance, then the task would be more complex and you will need us to help. If you could choose appropriate action, you can do trial-and-error, took and based on your mother's result. So we have coffee pot. Let's go to the Next One. X in space.
Recycling mention regarding action in the appraiser. That action is actually defined by the action space, which DC Heckle reacts with a specific seat and steering angle in reinforcement learning. They are too kind of action spaces, discrete and continuous access pleased. Indeed, we use discrete actions please open, its actions as defined by the maximum speed and the absolute value of the maximum steering ankles. So we have two components hear steering angle and speak each of these. Two companies also have granular Wiki. You can try to
adjust each of these and on the table below, you see the changes. Remember that the more granular, you define the extra space to love, it will be easy to train your model and to finally come search. Watch that you are satisfied with the consecration. You can give a name to identify your vehicle doing all the training. Now, it's time to clean tomorrow. Stop by typing Immortal name, then you need to choose the track. We can see that there are few tracks that you can choose. You can pick the most female track with while you're going to race on. But remember, there's no guarantee that your
mother will be good by having the most email, maximizes the art for your mother to get best performance next. Erase time it is a time trial of chickapoo. It is or how to hack Racing. And finally just if you like what you want to use for this model, Next, here's where the fun begins, reward function in Python. So what do you want for? Mission is how you would like to Wendy. If you haven't moved from one position to a new position, so it's time it move, it will get another award. Did you sign up three? What
functions to be? Like an incentive plan different incentive strategies could result in different. Behaviors from a simple scenario, you don't need to have a complicated question at the beginning, as you can stop small and enhance it along the way. Let's not. Look at some example of a work function. If you click on the reward, you'll get a pop-up dialogue for you to understand some Basics that I follow the line to prevent zig-zag and Leslie. Are you want fashion design for object of whiteness and head to head?
Let's take a simple request permission to be 41 and review. It just helps to rewarding the agent to follow Centerline. The function, leverages two variables track Weight and distance from Center, both of the Parramatta supplied by the environment. From there it creates a marker to identify how far away is the agent from the center line Marker 1. Indicate if the agent is in around 10, per-cent of the check with marker to indicate if the engine is in around 25, per-cent of the checklist. And lastly, if the agent is in around 50% of the time, logically, speaking as
the agent is in Mattawan, which is the smallest amount, was 303, what the agent more, and that's what we from the center line and it's on this line. Finally east region is not round to 342 Marcus. We can assume that the agent is Crash and give a really small reward, so it will this Crystal agents to do the patch collection in the future. Please remember that you going to get what you can send the Flies and not what you intend. That's why I stopped during the logic.
Pro reward planning is more crucial. If you like to know more about the Paramount that you can go to dispatch here along with a description and the table top. Okay, so we've covered the variables and repair methods. So, what is hyper permit is? There is a set of variables that affect the training process, which highly depends on which algorithm you're going to use during training as it uses PPO or proximo policy optimization. And this item has parameters, which is shown on the
screen Now, understand that, stop you, if you can leave all of this with your people and you can try to learn these along the way. That's complete offer free of this hyper paramantis. Gradient descent patch size refers to the most recent random sample of experience from an aspirin is buffered and use for updating. The neural network, which will help us to reduce high correlation from input data, refers to how many pesos does it need to update in neural network, which is how much a
gradient descent update contributes to the network Waits you can use a higher learning to include Mortgage in. Peace and contribution for fossil training, how you might want to make it lower to have a more stable model, We talked about exploration and exploitation before, and entropy refers to the degree of uncertainty used to determine when to add Randomness is entropy until the emoticon search. And lastly, discount factors specified, how much of the future. We was contribute to the aspect of you. Or if you want to obtain a robust model,
training must provide your agent more or less evenly, distributed sampling from action spaces, and a balance combination with an exploration and exploitation is required. Settle for example to have a robust model. You will need to configure your learning rate, entropy and patches and sometimes you want to speed up the letting process which that means you need to configure learning rate that size number of hippos. And this contractor So that's how you crossed me what function and two and you have your permit is, it requires trial and error and patience.
This is an example of the word, graph of a model that I've trained for 8 hours to blue and red lines, indicate SS project completion and the Kremlin indicates efforts toward, is it good? If the effort we want and track competition show entrants to call search? Most of the times, when you see the track impatient hits wind up a stand. And if you see the address 3, what is on the same level? Your mother is likely to comfort. You want to evaluate the performance of your mother to do this? You can start a relation in the next section below the training. What's that you have done the information,
you will see the results of your performance. Look at your best model performance. You're now ready to participate at Elvis depressedly presented by accident. You had tons of thousand of Def, Leppard race in person and online to see who takes home. The championship cup is a physical ways, a terrible stomach. And other one is to waste on fish. If it happens to all the year release, Wiki most nucerro cameras and lenses. Answers, it enables cheap new Racing Form in addition to stand trial racing in time. Trials racing, you race
against the clock in a single-car on Race to see who gets the fastest lap in object you have a race but with the added challenge of navigating pretty objects on the time and last week and how to hack Racing, you raise your models directly on the track against another model to see who crosses the Finish Line first. Airport official secret of Reese has raised in future risk with time trial object appointments and how to hack Racing permit funds. Raised from half a month of June in
Boston. Accenture is a global Professional Services Company with a deep history of helping customers around the world, build artificial intelligence and machine learning partnership with a SS surfaces together. I sent you and bring the Best of Both Worlds to accelerate innovation in the club, featuring physical and spiritual race. Dancing to the League champion was a top 16 finalists at 8 president. Championship at infant learn more about AI. And don't forget there are additional resources for you to learn more about Oedipus, keep razor which you can join
at joint. Depressing. You'll find yourself a vibrant and healthy Community. If you want to learn with fellow Def Leppard across to her. Thank you for joining us in this session and had some fun together with edibles deepracer. Hopefully, this doesn't help you to understand how reference Manning lemon works, and how to get started with Edibles. People say I'm Donnie and see you on the train.
Buy this talk
Ticket
Interested in topic “IT & Technology”?
You might be interested in videos from this event
Similar talks
Buy this video
Conference Cast
With ConferenceCast.tv, you get access to our library of the world's best conference talks.
