Duration 1:38:16

Accelerate Model Hyperparameter Search with RAPIDS and SageMaker By Miro Enev, Sr Researcher, NVIDIA

Miro Enev
Senior Researcher at NVIDIA
  • Video
  • Table of contents
  • Video
Accelerate Model Hyperparameter Search with RAPIDS and SageMaker By Miro Enev, Sr Researcher, NVIDIA
In cart
Add to favorites
I like 0
I dislike 0
In cart
  • Description
  • Transcript
  • Discussion

About speaker

Miro Enev
Senior Researcher at NVIDIA

My interests are in advancing data science and machine intelligence while respecting human values in future technology ecosystems.Currently I'm a Solutions Architect at NVIDIA where we apply the latest AI research (focusing on deep learning) to the challenges of modern business.Previously I was a Principal Data Scientist at Phyn -- an IoT startup focused on water conservation and leak detection using sensor data and machine learning.I studied Cognitive Science and Computer Science as an undergraduate at UC Berkeley. Later, I received a PhD from the University of Washington's Computer Science and Engineering Department in June of 2014 where my thesis focus was on Machine Learning applications for Information Privacy in Emerging Sensor Contexts.

View the profile

About the talk

Hyper Parameter Optimization (HPO) improves model quality by searching over hyperparameters, parameters not typically learned during the training process but rather values that control the learning process itself (e.g., model size). This search can significantly boost model quality relative to default settings and non-expert tuning; however, HPO can take an exceedingly long time on a non-accelerated platform. In this workshop, we'll show you how to use SageMaker to run an HPO workflow which is vastly accelerated using RAPIDS and GPUs. For instance, we can get a 12x speedup and a 4.5x reduction in cost when comparing between GPU and CPU EC2 Spot instances. In addition to covering key concepts, we'll walk through a notebook that allows you to replicate this workflow on a cloud instance and show you how you can plug in your own dataset. We'll also cover model deployment and serving using on-demand or large batch inputs. Requirements: Participants are encouraged to have an AWS Account with access to GPU instances before joining the workshop so that they can follow along.


Thank you for joining. My name is Mira and today will be covering gpu-accelerated HBO and the AWS sagemaker Cloud. Let me go ahead and get my screen sharing song for guys. All right. Let me know if you can see that. I can also change my resolution to make it less wide if that's annoying free books. Is it okay Lena? I'm just asking is the screen resolution. Is everything working okay for you guys? How to make your own kahoot can you see the screen? Yes, yes. Yes. All right, let's get going here.

So I have to start out start. What do a quick intro to Rapids which is a platform that's open source and has been in development for about 2 years and has a lot of you know Nvidia Engineers helping out as well. As other folks are in the community and the goal of Rapids is to bring GPU acceleration to the popular data science tools that people use are primarily in the pythonic world initially at things like pandas and scikit-learn the hopefully you're familiar with but also more recently. I've

been focusing on spark and outside the house. So start out, you know, let's talk a little bit about why we decided to build Rapids. SimCity cuz gpus have a very strong scaling and if we expose our high-level code to them in a relatively seamless way we can get your tremendous benefits from the inherent parallelism that's there and many of our algorithms and so because you know gpus provide this Hardware which no exposes Benny workers if we're able to do this

and I weigh that, you know stays out of the way of the developer then I'm everybody waits until the Rapids effort has been focusing on leveraging, you know, the underlying strength of the hardware and then trying to accelerate data science in Different dimensions where we have these sub packages if you will things like Kuni F4 dataframes the data frame. Yes, where you might be doing each gallon ingestion Who Am L for Cricut Machine learning where you might be doing things like this isn't relearning

dimensionality reduction Etc Grasshoppers, and then we also offer Nero copy transitions two things like pytorch tensorflow mxnet with DL pack as well as other members of our ecosystem. So the idea is to leverage a very popular sport memory representation developed initially by Apache arrow and use that same representation to make sure that as they arise on the GPU it never has to leave and it can be handed off to be there. It's libraries and everything stays fast. What was the result of all of this I'm if you look at one of the historic evolution of data processing

we started out with things like the dude where we know we were going as fast as we could, you know, very inexpensive and could scale and we were able to get in a significant Improvement almost two orders of magnitude by moving to in memory processing with spark. And then, you know taking a nap and refactoring it to work on GPU was something that was non-trivial and something that took the only experts could do that would get you another ten X on top of the spark performance.

The goal now again, it's to take you know, all these applications that are communicating on the GPU and CPU and make it so that the data only has to come in what sort of the GPU from the CPU and will stay there and you know, if you need to be processed, so a lot of these copies and converts it typically used to happen at every stage of the pipeline. I got this is leveraging Apache Arrow under the hood to make everything interoperable so that I can hand you an object that I was just working on and you understand exactly how to pick it up and keep going.

It's a wizard optimization as well as a lot of other, you know, software engineering goodness. We're able to take even something that expertise to have spent a lot of time on and get another factor of improvement over that so now it's Rapids. We're looking at another significant Benchmark in the evolution of performance. So everything stays on the GPU everything is very fast and hopefully I'll come meet you back here with some demos and it was get up and check out the

latest and greatest. This is a ever-evolving project here is just a benchmark from TPC xbv, which is a very popular in the in the finance Community where there's 30 different queries that we competed to accelerate and you know, One by a very large margin again, just mostly because there's so much compared parallelism at 6 that's available in modern Opera them said if we just spend a little time to make the library's capable of tapping into that. We're getting something like, you know, 37 X on average or the 1K benchmarks and about 20 x speed up

for the 10 one interesting thing that will also see in the cloud is that in addition to being faster when running on gpus it's also cheaper which may sometimes be counterintuitive because it says specialized hardware. And the reason for that is that especially if you're renting the amount of time spent on this is so short that you can having tremendous power, but then be done with it and move on to the next workload or hand it off to someone else to use so Just hopefully will feel approachable. And if you guys have any questions

are there on the other side or the TCO and costs I'd definitely feel free to reach out. I'm trying to talk and we're after yeah, so let me pause there for a moment to see if folks have high-level questions about Rapids while I do that. I'll also talked into a rapid documentation page. So if you ever go to let's go to Rapids. AI this is the landing page for Rapid, you know has links to our blog posts. It also has a way for you to get started, you know, if you wanted more it gives you a heads-up for

Samantha and then it has a great fool, you know, if you'd like to install at some particular way. If you wanted us all with or with a container, she wanted to take the current stable version or the most recently fresh freshly baked nightly. You can click around in here when I get all the packages with safe. We're going to 18 at 4 with Icona 3.8 and the latest who drivers this presents you with a copy and paste a comment Stockman, for example, if you're on the way to get started

and there's many different ways, you know here shortly and So the next bit is going to be to actually jump in and create a safe maker. Notebook that will be using to actually run a rapid demo and walk out in two different ways. The first way will be to rent a very cheap instance which is fairly classic four-stage maker in which we just will launch the work from there. And then once you guys see that end-to-end workflow will switch into renting a slightly more expensive GPU instances are notebook which will allow us to do Dynamic

development. But you know without further Ado I'm going to go ahead and switch gears now and can I give you a quick heads up about the the hpo work that we're going to be done today? Or I get into this. Let me with the take a 9-second pause to see if there's any questions or anything in the chat. Questions. Does anyone have any questions for me out a password appointment? Okay, great for the next phase of the workshop. We're going to be sort of setting up a

combination of Rapids to take maker together. So I kind of tree has already sagemaker is AWS machine learning platform which allows us to stay in the notebook environment and use familiar high-level by Sonic apis to orchestrate mountains of compute the powerful in really great really convenience and it keeps improving so I highly encourage you to dive in we're going to be doing today is taking a machine learning work clothes that we've already built and will containerize it or do what's known as building a stage

maker estimator. And we'll tell sagemaker what sort of parameters we would like it to explore of the model that were hoping to improve and then we'll ship that container and the search space to say to make her and allow it to conduct experiments in parallel to figure out what model configuration produces the best results for us on the workflow that we provided and the data set the we handed it in this particular case. We're going to be doing a using a public dataset.

This is a dataset coming from the Bureau of transportation that has kept logs on flights for you. No sense. I think like the 80s and so there's about six or seven million flights per year from the domestic US carriers. And so you can query, you know, multiple you Pictures of this data and get and get all of the sticks and we're going to be training a classifier model to predict whether or not a flight is going to be more than 15 minutes late to arrive. So essentially going to take one of

the columns that already exist in the state of set and we're going to treat it as our Target for a prediction that there's a column in there called arrival delay 15, which is a binary variable same whether or not the flight was more than 15 minutes late use that as our Target or Y variable and then reuse all the other features as RX or training features, and we're going to ask in this case a Is there a forest fire on the train did a remodel that can predict whether or not a flights going to be late on the

Unseen test data? I'm so well-built this workflow and we won't go to the details initially would will do that in the second section. So for now, I just kind of assumed that you have some kind of minimal HCL in our case all that we're doing here is where ingesting the data from in this place to park a compressed representation of data. We're just dropping any samples to have have missing values and then we're splitting the dataset into a training test and that's

our game for ETL very straightforward. No feature engineering just a minimal amount of processing in our training stuff. We're training either extra boost or random forest model. And then lastly we're doing a inference or prediction to figure out what our accuracy is Hazard metric Enso in you know, the course of dropping this work. We provided several different code flavors either pain is if I could learn which is kind of the classic data science open source stack which primarily, you know runs on

a single thread of the CPU with the exception of the training step. We also have that same code but now augmented with desk which is a paralyzation tool kit which allows us to get paralyzation happening not just in the model training but also test in the ATL and anywhere else, so this is kind of like enhanced CPU if you are or multi CPU, And then we have the Rapids version which is symbol GPU which runs the parallel code to the pandas in the form of creative frames and then second learned in the form of could a machine learning

and so this is you know quite accelerated already. But we also offer a tradition of death with these libraries that exposes us to multiple workers so that week if you have multiple gpus within a note we can leverage them as well. So these are the four different code variance. There are available that you'll see in your notebook and with them will be able to you know, Ron and compare performance. So let's go ahead and I'm going to skip over this estimator for the creation of the container and very briefly mention again that once we have our workflow containerized all we do is we hand it off

to say to make her and behind-the-scenes. It launches are container spins up instances replicates the data on to all of those instances and then runs each of them with different hyperparameters hyperparameters being things like, you know, the size of the decision forest or the death of the trees were learning rates or any other parameter that's typically not changing over the course of the new model fitting process, but is usually a hand to buy data scientist. These parameters make great candidates for hyperparameter optimization.

So assuming that we have that setup she's maker will do the rest. And come back in and give us the best performing tomorrow. So let's go ahead and get started cuz it's going to take a little bit to spin up our notebook incense. So if you don't mind my switch over and have two tabs here one with the sagemaker. So if you go to a debate of yes Management console and you can just search for statement for here and click on that or you can scroll down and it's the first item in the machine learning section. So if you go into page

maker. You should see a screen that looks for like this navigate over to the notebook instances. And then go ahead and click on create notebook in Sims. So for the first version of this will use the defaults primarily. So this is a T2 medium instance, which is you know, a very lightweight CPU node, which is a perfect candidate for just being our development and notebook OC environment. I believe it cost $0.04 per minute. So it's something that you can kind of you to leave running and not feel too bad about. Let's call this TV to HBO.

And then for Alaskan brings we can leave this to none for additional configuration will leave the synonym for now. We'll come back to this life cycle configuration in the final step of the one thing that I encourage you to do is in the IAM role. This is like where you set permissions for stage maker go ahead and click on create a new one which will make sure that you don't end up having having any conflict with any previous. I am rolls that you may or may not have created. So go ahead and and in my case, I can't do that because I'm using a

Rapids in your account. So but in your case, you should be allowed to click on create a new role. And then the only stuff that's in here. That's a little tricky or not tricky, but I guess custom is that we're to be cloning a repository. So if you go down to the get repositories, there should be a drop down and says Kalona public get repository to this notebook and the get repository URL. I'm going to face it in the chat. It should be get hub.com. We already knows. Animal examples so we put this in the chat.

She is actually the chat and I think so. You know, where is the chat for this? And I may not be here. I'm just going to make this big for you guys. Hopefully you'll be able to see it and populist. The only tricky part is my name, which is Mero e n e z in the middle. This is get hub.com Bureau enough Cloud Dash and then I'll Dash examples. I'm going to go ahead and click on create notebook instance here in just five seconds again, please feel free to jump in and

interrupts see more of Jack Terrier appreciate that. Can someone give me a thumbs-up that they're following his want to make sure I'm coming through clearly for you guys. In chat chat thumbs up is okay, too. All right, sounds good. So great. So I'm going to click on Creighton this year and this will start the launch process which will take you probably about. Play 3 to 5 minutes since age makers having to spin out a whole new instance for you. I'm going to go ahead and open up one that I already have running but you guys should see this pending here for a bit. Once

it's ready. It will have this these options that come alive and once they do click on the open Jupiter tab here. so Jesus the one that they just keep me to show this is the previous one that was running right the one interesting thing. I guess also worth mentioning is that this work was included in the official sagemaker examples repository. So one thing you could also do instead of cloning that repository I just gave you is you could click on this little brain icon on the left hand side once your notebook is fully done spinning. And from it you'll see

all of these great sagemaker example notebook. So if you click on view and GitHub, this is a 4.4 k-star dribble that has lots of examples and lots of different things. If you're curious to learn about these maker. This is a great place to start inside of hyperparameter tuning lives Rapids, bring your own branch, which is where are demo lives also. So because of that it's possible for us to replicate this directly into the notebook just using the UI. The reason why I asked you guys to clone the repo that you did was because it's a slightly fresher

set of code that will allow us to do some additional features for the second part of the workshop and in the very near future, it should make it. Into the official examples as well. So if you're curious if replicated from the examples you click on this little brain icon and you would throw down into the hyperparameter section, which is here and then eventually you would click on this and so it shows you like a read-only preview of The Notebook house when you do that eventually and then

not in this particular case, you would just click create a copy. I think this is because of my someone, you know, I'm scared on VPN settings because we're using a team account, but you should be able to see a preview of The Notebook and you just click create a copy and it 60 replicate this in your local directory. But since we already cloned that director, you don't have to worry about that for now. So I'll briefly. I'm walking through the structure of the book and then maybe take a few minutes to let you guys

explore. All right. I hope that your ancestors are starting to spin up so that you can follow along in your own screen. But for the most part, these are the things we already talked about so. Where to be using the slight delay, you know workflow or going to be a containerizing it publishing it and then using sagemaker python API to do all the work for us. So go ahead and clear our colonel Actually, it was this one still running data. I think it might have still been. Oh, yeah. So this is this is one that just finished running. So maybe

I'll keep it without clearing it so you guys can see an example workload. That's still in flight. So sexy what we're doing here is we're just coming in and starting out by making some key choices for the configuration settings. You can kind of skip this Preamble. This is just making sure that we have an account and we are in a good region. Actually. It's the Region's should either be u.s. West or Us East to order USB one or you at West to sorry either of those regions

is is fine. If you're in a different region than that, you may want to change just because our S3 bucket of the data set that you can be replicating is in those two regions. And if you are running your computer in a different region from that say to make her well, you don't yell at you and not let you make progress. So one of the assumptions that stage maker makes is that your daddy said and your computer co-located or at least within a region, so The purposes of this demo you should either be in u.s. East one or us West to

if that is a problem for anyone. Please. Let me know the way to change your region is if you going to the AWS Management console and you navigate to the top-right you'll notice here. There's a region selector. So Us East one or us West to North Virginia or Oregon are the ones that we have support for in this demo. They're both regions with relatively High numbers of gpus so you shouldn't have any issues renting. But if you do have any trouble definitely, please let us know. So you're initially we just can't figure her are seeds made for execution

role and make sure our accountant region are accessible. Next. We're going to be making some choices. This is where you can order configure how large or small you want your hyperparameter optimization to be you know, in the case of something that small hyperparameter optimization. You might run the one-year data set with about 6 million flights and you might run 3 cross-validation folds which are I have a data science term for improving the robustness of your results by shuffling the training and tested it multiple

times. You might run 10 experiments total and you might run those two at a time. So this is what you might consider a small hyperparameter optimization waiter only sampling 10 different model configurations and looking for the best what I thought of those a large type of round rock musician on the other hand might look at 10 years of being instead of just a single year. It might use you no more cross dilation folds. Because now it's starting to become really large you would want to bring in an instance type that has access to multiple gpus so

that you can distribute the workload across their memories and leverage them in parallel and you might run a hundred of these experiments and you might run them 10. So this now is sort of you know, Siri, Business, this is not what you might do before to fly into production. Let's say and this is something that's going to yield some confidence that whatever result you found is likely to be very good. Soak for the demo. We're going to be closer to the small HBO side when we ran experiments like this large HBO using the

CPU and GPU comparison and murdered finding things like a 12 x speed up in wall clock time. So the Jeep you finished in about six hours for those hundred jobs. Where is like on the CP side? It was taken more than 3 days and then about a 4.5 x reduction in cost. Again. This reduction in cost is maybe a little bit counterintuitive at first, but it's very consistent with what our customers are seeing and we're seeing just because of you know, the way that gpus are able to finish their work so much faster and because of the cloud

economics AWS has these great easy to spot instances. Which I'm not sure if you're familiar with but essentially they are. A way to rent for up to 90% Savings in practice. I think we see about 70% savings. And so the trade-off for this cost reduction is that you could potentially get kicked off of your instance but in practice we see that out of a hundred jobs, maybe one will get preempted. So it's really an awesome option is to use the spot instances

and I'll show you that in just a second. So, you know that a high-level again. We're going to be targeting our stops mostly towards the smaller hyperparameter settings, but your feel free to change them in here. Adidas at the real orange stripe, it's coming from the dead 2019 data. We offer options for of course larger versions of the same sets with three year or tenure with more flights. And we also include the NYC taxi dataset is something that you can plug in

using CSV data, which is compressed but maybe more familiar to some folks and just to show you what it would look like to plug in your own dataset. If you're curious you can jump it's about as well. so next we simply tell sagemaker. Where are data sources going to be in this is that public S3 bucket that we provide for you? And then we also tell it where we wanted to place the outputs of the trained models. And in the case this case were asking to displace those into the default bucket, which is built for you when you login and it's some combination of the word sagemaker

your account region and your can I see you can swap meets around you can plug in your own bucket if you'd like and as long as your data set, you know who is parquet or CSV and you can scribe The Columns of it which we will control you in the second part of the workshop. You can plug in your own business and to hear as well. So once the data set is out of the way the next choice we make is whether we want to run with extra boost or rainforest here. We're going to leave it to xgboost, which is one of the best performing

90s learning algorithms out there. It's based on decision trees that are constructed in sequence such that each Nutri price to fix the mistakes of its predecessors if you well and it's an algorithm that performs, you know, exceedingly, well it's often in the Winner's Circle in the machine learning Marketplace in competition website at all. So if you're looking for you know, I machine learning algorithm to try on a new dataset, especially if it's a structure

tabular data set like this Airline data or the NYC taxi data then certainly So we'll leave that there were going to do just three cross-validation fold, which as I mentioned before are the reshuffles of the training and test Ada that allow us to make sure that we're not just getting a lucky or unlucky room set of coin flips for when we're picking what the model sees during training versus what it gets tested on. So this just finally proves our buses of a result of a linear model will provide the average across these reshuffles as

opposed to just a single run in here, you know, we can pick which variation of the code we'd like to run for. Now we're going to take the for the the rapid single-gpu version which is kind of like the accelerated the first form of acceleration before we start bringing in gas and you know, if you're curious about these other Coldplay They should all be available in this code workflow directory. If you want to jump into them. We're going to be doing Us in the second.

What position? So pick single-gpu for a workflow Choice here. We can pick our hyperparameter ranges into the when we're doing you no optimization over these models. These are the parameters that determine the learning configuration. So how deep do you want our trees to be how many trees we want in our decision forest and what percentage of features do you want to be used at every point when a branching decision is being made inside of the of the tree. There's many more parameters and ranges you can plug into here and this is

just a starting set. Do you know if you have you no more ideas for this you can you're welcome to abdomen here as you can see. All you have to do is just create a new entry in the dictionary where this is a keyword that's recognized by the machine learning algorithm under the hood and then you have to put a sausage maker. What kind of a parameter is it? Etc. Etc. So the last two choices that we have our how do we want to search the space of cybercrime the nurse do you want to

use a random strategy that just picks places and then and evaluates them and then when it's making a choice or where to look next to it completely, you know is disconnected from past decisions because it's random or do we want to use a Bayesian strategy that looks at, you know the past I'll come and rescue our form of regression to figure out where to go next for the purposes of of this demo. We floated random. You're welcome to plug in Bayesian. I think they're thinking about adding additional search strategies in here. But again, it's

it's up to you. I think both are fine if you are searching over really large hyperparameter. Sets random is a really powerful Zeno surprisingly powerful tool to make progress with just a few hyperparameter settings like this and only running to instruments in parallel. Like we're about to be homeless is also fine choice. So I think only two more configuration settings left and then We're Off to the Races. So next we sort of make that commitment about how large are

experimental sets are going to be here. We say we want to run 10 total experiment or 10 total jobs. Each of these going to be a instance launch with our code in a container on a new instance with their data set attached running some HBO configuration within this range. So if you were saying let's run 10 of these experiments again, this is on the small HBO side and then this is saying I want to be able to run up to two of those and parallel. This number is somewhat low on purpose partly to sort of keep costs down

for folks. We're just getting started but also because you may have to request a instant Limit increase for gpus in your region often times Amazon will kind of be hesitant to give you the more powerful compute instances until you can figure out that you're serious. So there's this photo limit increase that you can request for whatever reason you happen to be in if you'd like this to go with Nordstrom at the same time. Typically it takes about a day to get back a response and they can you don't have to be there if we're both just say I'm running this

researcher for work. And as long as I can tell that you're a real human and you haven't been super delinquent in previous payments or something like this. They will give you access to a few instances where any other kind instances that you may have a order limit on but to IS 250 a safe number where you below that threshold for having sex If you are again to do this at a higher level might want to have access to 10 instances at once and run this for a hundred jobs. Okay, so

let's keep going here. So this is just a another exciting that would cut off any jobs that end up running for more than 24 hours. This is more of a safety limit in case we have some kind of a bug in her code but also because sometimes the CPU configurations, you know, the CPU is not really great at doing very deep and wide Forest. So I'm wishing you would definitely more than 24 hour jobs and we kind of don't want those to run forever. So what terminate anyting that's more than 60 * 6 * 24 seconds are based in

24 hours. Okay, last choice or compute platform. This is the instance type the ring running on and we have a little helper function that takes your previous choices and recommends a insect to run on your of course, welcome to just take a look inside yourself like directly with a string here. but the correct minute since I play Prodigy with a optional guaranteed to work on the computer extras you made so far or at least One that we tested. So in this case because we picked the one year dating site.

It's saying let's go with the P3 2xlarge which is the single GPU. This is a single Volta 100 GPU with 16 gigs of memory and 61 gigs of CPU memory choice for the single CPU version of this and then lastly the spot instances as I've mentioned before I'll encourage you to check these out these are typically producing, you know, 70% savings on the cheap. Yeah, so here's our run that just finished and because it was running with spot savings enable. We got to 70% savings

on this from Yogi start raining in billable seconds. So this is really awesome. I'll encourage you to do the spot. It's just raining. Are so that's all of our key choices that we just made. There's some rice here. So it's yours are input back yet, or I'll put bucket or compete for a single GPU are algorithm extra boost with the cross validation fold a single b102p 3 2x large. Are we using spot instances? Yes, what's rhps our strategy random how many experiments are running 10 how many of the time too and then what's the maximum duration of an experiment one day?

We have all of our choices and now we're ready to actually know put them into play. So this workflow bit again. We're going to come back to this in a little bit for now just assume that we have a working python code that does this, you know work Saturday to set and if you're curious to dig into that code, there's the four different flavors the multi CPU multi-gpu single see if you sin will GPU I'll show you doesn't in a day, but let's just assume that we have that code we can now know step into the part

where we're kind of making it processor board. I just threw all by sagemaker. So simply what we're doing is we're riding a dockerfile which is kind of a recipe for hire software is in the package with building the first layer on top of a Rapid image that we publish to, you know, multiple locations including Docker Hub. And then on top of that we install ice maker training which is something that the folks at AWS maintain and that allows you to send that you just get a little bit of extra Plumbing in your container that makes it easier for you to talk to Sage maker and figure out how

everything is hooked up. And then lastly we basically copy all of our custom workflow code, you know the stuff like that. You see all the training in the scoring logic into that container and we call it done we build this retagit and then we we push it up to the Amazon Cloud to their elastic container registry. That's what's happening in the stuff down here. Were using the Rapids AI container as our base were hiding in some environment variables that tell

sagemaker what choices we just made above so that when the container runs it actually knows that we wanted to use for Dimplex you boost in those how many and it knows that should be running in single-gpu mode. So with these are my choices for those represented in our dockerfile, we just you know augmented with a few lines to allow it to build some tools specifically those clothes are the sage make a training stuff that I just mentioned which helps us with that extra plumbing.

And then we also add in a flask which is just that a server that will be using in order to be able to deploy a train model so that we can send it in Princess. And the rest of it is very straightforward for copying our local code with logic and that's pretty much it. We're just telling you that the entry point is executable in that the thing to do is just as to run the Central Point code once you land in the container here is the full dockerfile if you want to. You're curious. So again starting from the Rapid Pace image

or setting are the choices that we made his environment variables were adding in a little bit of software or copying our custom workflow and that's it. So next week, we do a Docker pull or base layer just so that we have it locally and then we build on top of that base layer everything that we just saw. So this takes about two seconds because I think I've run this before but typically takes about 45 seconds does dr. Pool of the rapid Pace container image might take you a little while it might take you

like 3 or 4 minutes cuz you're pulling maybe a significant you don't contain her base layer from from Rockit doctor up. So if you're following along and you're on the stuff, you know, this might take just a few minutes the doctor build is quite fast. Next we sort of create a registry where we're going to be publishing our container so that they can make her can get access to it. So far is the Terrace local to The Notebook until we do this doctor push. And so now I have to push it's been it's

being sent up to UCR. It may take another few minutes. So this is another point where you may have to wait a little bit but once that's ready you should be actually able to get some more fun stuff, which is we can now take all of you know, the stuff that we thought all of our choices and the container that we finally published we can package into a pacemaker API that will allow us to run a program proposition. And so the siegebreaker estimator. Estimator API

I miss by our Target and so estimator is a fancy way of saying a container with some logic and the parameters that were passing into here or is just this dictionary. And so this image name is the name of our Add container that we just built so this estimator is going to be running our container. It has access to spot instances. If you told it to be running for a maximum duration of 24 hours. Like we sold it to it knows about where to place its model artifacts when it runs out of the bucket that we told her to write to you and it has

RH maker session enroll at this point we have Take an AR workflow and our choices and I produced a sagemaker estimator again what we need in order to make progress here in so in this next step, we summarize our choices to make sure everything is consistent and then we just have to make sure that our estimators working correctly. So here's just sort of a sanity check. This is not running. Ives HBO. This is single experiment if he will and so this is just using the the same as tomato that we just finished and now running it

on using sagemaker so you can see it started the training job at launch the instance because we're using spotting since is it said? Oh, I don't happen to have sponsors right now in Mira tried and it actually found some You know what's repairing the assistance for trains is where it like mounts the data. And then it brought in the image from the registry and lastly. It's starting to produce the output of our internal logic. So this is like things that we now are doing in our custom code.

It's hitting your entry point. It's starting to parse the configuration choices. It's able to see that we're running the airline dataset single-gpu month an extra boost and three folds its parsing in the hyperparameters in this case because we're just doing it to check. These are the default. Otherwise the hyperparameters would be set by sagemaker when it's running in orchestrating the HBO. It will send every experiment different versions of these. And then as it runs, you know we pointed it to Airline dataset take your

case. I think we've pointed it to the three-year dataset. Yes, the three-year dataset. So this should be 36 parquet files in here basically one for every month. So you can see these are the files that got mounted but it's reading ingesting when it's doing this and so it has an extra one for men Adidas with 37 files. And now it's running the same if you work for with the state I saw and rest of the data and it has 18 million flights with 14 feature Columns of which one is the target. So like 13 guest speaker columns and one and one label

and Zoe finished ingestion in about 5 Seconds. It did the date of split. Very quickly and then it's fit the extra boost model in about two point five seconds. Again. This is probably not the most complex mark because this is just ten trees of death 5. Is there a kind of just simple defaults and more of the sanity check than anything? So we shouldn't expect super high accuracy, but nonetheless, this is just making sure everything works. And so, you know, everything completed here in the first run and so it's finished simply one crossed salvation fold, and then it did it again

for another constellation fold and then one more time and then it is a result of that performance of the model across those remixes of the training and testing as you can see that it's pretty close to 91.4 / 5 + fairly tight end. Have some confidence in the robustness of this result. Although this number is fairly low once again, because the number of trees and the depth of the trees is quite low as well. This is more of the stuff kick the tires example or work

for a rent to completion with our choices and say you make her was able to get everything done. And in addition to that we were able to save 70% by turning on spot savings. So with this week and now have some confidence that are computerized estimators working. Well, so we can actually go ahead and do proper HBO. So I'll show you what that looks like here in just a second and then I woke it up you guys are questions. So, you know, we already specified our search ranges. So now it's up to

take maker to take this estimator that we know works and to spraying around and run multiple parallel experiments for us with different signs of these parameters. All we have to do to do that is just tell it what you want to be our metric the thing that we are going to be scoring ourselves on and so the way that you tested to make sure this is in the form of a regular expression. And so you're saying whenever you see the strain final score, that's the thing that I want you to parse an issue attend back to our final score since he's being produced the very

end. So this is what stage make her wall parts out of our output and used ring matching on to understand that the result of this container run with the particular type of Greater parameter settings that was given was this results. So that's how you know, you know what to look for and you know what this metric definition you can build on this evening python API. Once again this time you're not doing an estimator but rather you're passing in the estimator in as an argument to a hyperparameter tuner. So this is

Jason, you know that we just finished this metric definition of for a final score and it also takes in the hyperparameter ranges that we defined above the search strategy that we sent to random the maximum number of jobs around in the maximum parallel job so far case this was reading 10 in this was to So with that we're almost there. This is just a quick summary of your choices. Just so that you can kind of confirm that everything's looking good before you kick off this computer and then here you actually launch the work. So here are

your Junior HBO. Fit and HBO is R.C. Bigger Harper Predator tuner object. So this will actually kickoff, you know, since the a lot of compute depending on how you set up here in our case. It will run 10 instances two at a time and then it will eventually get back to us with the result of those experiments and so here this is a blocking call to make sure that we don't get past the sell until This is completed. And once that's complete will be able to see our HP result. If I don't know if I could face

we can do the results of our run. So here's our 10 runs in the best one was 91.5 for I think that these guys are perimeter search ranges that we set for a little on the low side. But none the less, you know, this is a reasonable result for the three-year dataset. So this is just an example of how we would run into end on sagemaker using Rapids. I'll take a quick break here and see how you guys are doing. The other pieces that are in here are still alive.

Okay, we should be able to do this also. So the last piece is essentially to take the best model that we re was fine and put ourselves in a place where we can serve this model to the world. So once again, we're going to be leveraging the sagemaker API and this time we're going to be taking our our model and deploying it to one instance. And then we're going to send it in put using the real time predictor API and in the case of the airline dataset. I already have

an example of a flight from 2019. That was no 9 minutes early to leave. So this should be something that the model tells us should arrive on time. That is also an example from 2018 where the flight was 123 minutes till. In departing. So this is one that the model classified as late. So once we take once this model is done being spun up on an instance. We should be able to ask it queries and should be able to tell us, you know, whether or not I think the flight is

late and as a sanity check again, we'll ask it about these ones that we know are late or early. I'm just make sure that are the whole process work. So we bought there and see how you guys are doing. Is anyone following along. Are there any questions? Maybe we should just take a stretch break anyway. Is this stuff makes sense? You guys is there something is there something that you guys think you might be able to connect to your to your work? Maybe not in this exact form, but are you getting thoughts about simply being able to leverage your stuff?

A1 Don't be shy. Jack White monster any technical troubles. Is anyone following along are you guys able to Do this on your side? I'm not hearing anything from you guys. So I'm wondering whether it makes sense to you. No go into any additional detail here. The next piece I was going to cover is around how you can get a live environment with a rapid colonel in your notebook so that you can actually do. Attracted development right in the notebook is supposed to just bring you in a workflow that

you knows already working. Okay C so I'm going to be stubborn here and wait for at least one question to come up before making a meringue for a kind of want to know if you guys are if any of this is making sense or am I just talking to my home screen? Cricket's, all right. I think we can, you know ended early if you guys are not interested in hearing more. I will. Wait for just a bit just in case something comes up, but I don't think it makes sense to do the additional deeper dive. If I'm

not hearing feedback from you guys on the factory waterfall along because the next step is going to be a little trickier. So I'll stop here again give you guys a few minutes and then I don't hear anything I can do to come back. I attend these like go if they have any questions to Middle, please open the chat or just reply to me know if he has any questions, please. Yeah, I'm just curious if you folks are not able to access the age of Youth resources for whatever reason maybe they don't have an account or

no. They don't have credits with permissions on that's no problem at all. I just don't think that the the next piece of the DPAC session makes too much sense. So maybe what I'll do is I'll just show you what it looks like on my machine and then you guys can see that and have it as a reference just so you know what's happening under the hood, but you know, I won't ask you to run it cuz it doesn't seem like books really following Along on sagemaker from where I am. I'll I'll wait for about one more minute to see if any other questions come up and then

I'll show you the a little bit more of the inner workings of the of the work. Okay, so Let's talk about the for the machine learning or the data science is happening in this behind-the-scenes work was that we didn't really dig into very much up until now, so I see you. A lot of them try to set up the colonel. There's a medicine before we're working with the airline dataset and we're trying to predict flight delays. So the dataset was actually a crane a

local copy of it for ourselves. So will download it from the same bucket that we were using in the HBO demo on the cloud. Where are using two years of data all the damn that's available in 2020 and the data in 2019. I think this actually only goes to July of 2020 cuz it's like I think a few months delayed in in the public release, but this is fairly fresh date is still so we're going to go ahead and adjust it into memory. So as I do that you should see the GPU memory in the Jeep utilization jump up a little bit here.

So the memory went up a bit and then you don't station went up as I was reading that data. If we look inside of the data column, we see, you know, all the samples we can free sample view them. vertical orientation I know we can see things like, you know, the year of the quarter of the months the day of the week to report an airline codes the origin City the destination city the departure time of departure delay, whether or not it left more than 15 minutes late will there not in a ride more than 15 minutes

late how long it was in the air and the distance covered? So this arrival delay 15 if you remember is our Target variable. So this is the thing that we're trying to predict from all the other features. So again, just because it's kind of become minimal workflow Regal stuff where you would typically do things like handling missing data up and sleep feeling in those values using imputation where you would be trying to like, you know, predict the right missing value. She won't based off of the weight of evidence of everything else. Or where you might do some

additional feature engineering or if you want to like merge this with other data sources, so you could for example get the geolocation of these airports or something else like this were skipping all of that just doing the simplest thing possible when we're going to be at simply dropping any values or any rose that have missing values and the majority of these samples in the bean cancel flights. So we're okay with that because canceled flights are not really an interesting Target for predicting whether or not they can be late. They're definitely be late if I

cancel. So this is all we do for PTL is dropping data here is where we split our training and tested at and again, this is what we need to do in order to be able to feed our machine learning models. So that weekend Valley with their performance here is that train test split and chairs are example of our training data, you can see that the indexes are now shuffled from all over the place. They're not just an increasing order and we have about 707 million train samples. Really the only thing left now and our minimum work below is to go ahead and train

model in Sochi are going to build a more complicated learner than we did. What are in the cloud HBO for taking the tires example? We had a Mastiff look like 5 and take like 10 trees instead. We're going to have deep trees or deeper trees with some significant number of them and then the rest of the perimeters. Do you know what this a reasonable learning? Right? And one thing to note is the tree method here is GPU hist which is how to keep you some sleep. I learned

these trees has a built-in feature histograms. And so that's what we're telling it. Celebrity here. So go ahead and kick off that training process. You can see the GPU gets utilize fairly well, but this is just one of Ford Fusion this machine during this training procedure and then it should, you know wrap up here in just a few seconds. It's about 15 seconds to learn these ducks and three hundred of these trees and now it can go Head and two prediction with this and see now we're up to 95% accuracy.

So this was you know, significantly better then or before were just using simple architecture and so we can go ahead and save the small disk and we can also point out that we're leveraging the forest inference Library, which is a accelerated in friends tool that lets you simply get really great predictions at speed. So your folks were curious. Even if you train your model on CPUs, you can still use the force in first library. And this is a really great accusations going in here to get the

most out of his freshly large back predictions. So that's what's happening under the hood for the single GPU workload. It's fairly easy fairly straightforward, very minimal workflow really just ingestion splitting training scoring and then accelerated in French. Since we have just a little bit of time left and there's no questions. I'll also show you the what it looks like to run this on multiple gpus at this close at so the the MotoGP side show you with CPU and GPU side-by-side, so

We already have this dance. We don't need to record it. Here we're creating a local Cuda cluster. So now you might have noticed the memory of for these GPS went up and actually let me go ahead and bring up a taskstream in the dots graph here while we're at it. So now we have a cluster of workers of single worker in this is all because of desk. That's cuz a really great library for paralyzation. I'm highly encourage you to check it out if you are doing distributed work, and essentially the code looks very similar to what we did before. So

if you look at the single-gpu example of TCP for a moment. So in the symbol single-gpu case we just did to the dataframe Reed parque with you know, these feature columns excetera excetera in the case of the multi GPU case. We just do. Could be a free market since there is somebody P. I just asked if you have to set up and fly. And so when I do this task ovs Reed parque, they actually returns immediately without doing any of the work and that's because

Jack is sort of doing Lazy evaluation. It just builds up a computer graph of the things of acid so far and only when I tell it that I need a results. Will it actually trigger something cute so far? It's just sitting in the back and just telling it like keeping track of all the conversations I've asked her to do. So, for example, if I asked for the data after I told it to read this in it knows that you know, and he said read something with these columns because I told it to but it doesn't actually have any values associated with it

unlike in the single computer case without ask where she knows we asked for data. It already has everything populated because this is computed immediately. Turn the desk is things are lazy. And if we visualize them we see that there's a computer graph somewhere in the back that's saying I need to read parquet from all of these files and I know that I need to do that, but I don't I haven't done it yet. So if you again look at the shape of the state of that it knows it has 14, but that's what I'll come out of the competition

has occurred. So it doesn't know how long or how many elements are in here waiting for the trigger a competition that forces compute in here. You can see some play that there's a taskstream to figure out that your character is it to figure out how long this wasn't so it's telling us that it's about 9 million pimples in ear. So let's go through and do the same stuff you're doing before so next. We'll drop the samples with missing values. If you visualize what the woods it is. Now, it's still just a computer grass that hasn't fully even

realize it's going to be reading data and dropping elements that are that have missing bodies. We can no chain together more complex complications in here. So like for example with the word to do us some and visualize that that would actually force all of the sudden nose in are grafted to link together because it's going to have to do some kind of aggregation over all of those data trunks Weirs before we're just reading a man and then dropping the same values that is completely independent. All the workers can process that without having to

talk to each other. It's not until we do something that requires communication that you know, we bring this graph together. And of course you can do things that are more It like I mean call for example requires us some and then as well as a count to figure out what to / so these bass graphs and get arbitrarily complex, depending on what you want to do near. The scenes are under the covers for the purposes what we're doing though. I'm just showing you that task is a really powerful tool and it looks almost identical to the code that you're writing when you're doing sort of

single-threaded or single-gpu code, but it leverages lots of workers and it figures out how to best place the work on them. So highly encourage you to try it. So here we just continue this is our train test split. So now I try and it is again hasn't even realize what it knows that something that needs to be read in and needs to have its values no dropped and they need to be split and then eventually, you know, it's right behind it off again. None of this has run yet. But here we're actually

will trigger that conversation we do is persist call Roxy tell him that all of that work that you know about go ahead and kick it off in the background. So you should see some of this memory go up here when I run this self and you should also be at a scrap start to appear. Where we can actually processes data. So now if you visualize it the X train it sort of took this plot of all the conversations that needed to do for each of these, you know shards of data and you actually did that it got compressed back down into just a single

object of the fully realized version of this data or the competition. So please let's go ahead and get ourselves into some machine learning. So here is the model training. Same as before where the parameters are 10 production 304 boosting around this time. We should be leveraging for gpus when we're doing is training this off. So you should see for GPU being utilized here to see how your accounts for complex problems. All right and see the result of that. And again, we're getting about 95% accuracy. So

that's that's a good point for us too. Hopefully, you've seen you know, what's possible at the highest level where we can take this workload and just like this notebook except strip down if we just look in side of the workflows. For example, the single-gpu workflow. This is almost to call Mike code to the notebook. You just soften single-gpu except it's now in the form of the I file this year's like no other read parquet here is read CSV. If you wanted to

read and dated that way here just of handling this instead of a drop and a here's a train test split. Here's the model training with extra boost his model training with random Forest. And then predictions were in so there's a one of these versions for single-gpu. Here's the version probably most people are familiar with which is singles you so this is pandas in scikit learn. It should look very similar. So again pandas re parque instead of gruyere free 4K pandas read CSV, same dropping a ball

same train test split. How much identical SUV used in random Forest calls? So the single GPU single CPU code, they should look almost identical with some very small changes. And that's the whole point of Rapids. Is that you and please keep developers Happy by not changing too much, but just making it go much much faster. Where is Subway's that? I think I will. And I'm the session. Do you guys have any thoughts or questions for me before you rap? I see that can raccoons be leveraged to accelerate and models like burnt or GPT asking a question. So

Rapids can definitely be leveraged alongside something like pytorch to accelerate Burt training and the role of Rapids in that would be too probably accelerate the tokenization like the string parsing and then hand off fully numerically embedded representations of the strings to the Deep learning. So we're actually actively working on use cases where we can show exactly that's Rapids is an on-ramp to deep learning in a few models. Meteo Crusher How about I have a question for you guys? Does anyone here use Sage maker?

Say frequently. Okay, maybe not. So maybe this was a little tricky for that reason. Okay, I think next time maybe we'll find a way to get free credits for people so that they can come pre logged in and maybe not be worried about the account setup stage. Is there anything we could have done to make this a smooth experience for you guys was The Notebook instant spin up a hurdle. Any questions audience? Yeah, I think meter you can continue no questions of flippo's equations in two and they are in the

chat. Took like 5 minutes, Yeah, I think maybe we can start future sessions by having folks come in and just been up the notebook while I'm talking about slides and stuff. Are you able to make progress in the notebook though once you started it or? yummy. An error when try to load the data, so interesting. Can you error please? It might be it might be your region. Yeah, I see that PS3 raise and let's see. Yeah. Yeah, so it's telling you that you're in a region that is not Us East one or west to so that the demo won't run

unless you're in those two regions. If you wanted to run in a different region used basically have to copy the data from S3 bucket that we provide and put it in your region. So yeah, if you're like somewhere else in the world, then you can just switch your AWS region again the way to do that is to you navigate to the Top If you go to Just AWS console in the top, right you should see a region selection tool. You can go to u.s. West to order USB Swan and if you're in those regions

when you create your notebook and since then you should be all set. All you are nice to okay. I'm sorry. I am I probably should have made that clear when we're starting my apologies. Add one more question for you guys is the only difference between the single and multi CPU or GPU. So there's multiple workflows. There is I think I think I understand. Your question to ask is the only difference between the single and multi flavors of both of the single or the CPU and GPU. So for example,

simple diffusion differs from multi-gpu in that only. Has been added. Yeah, I think that's your question is. Yes. It's on not super Difference with the exception that you have to basically spin up a cluster you have to do this local Cuda cluster or local cost in the case of a CPU. Vincent's Worcester is up and running pretty much everything stays the same except that like when you're in for example using your training API instead of saying extra boost ktxa extra boost. Train and you have to pass in

the first argument at is your clients, which is how you talk to your cluster, but everything else is pretty much the same way somewhere. Epi. And you got enough distributed competition with minimal change. Great. Glad you guys are finally give me some feedback. I thought I was talking to myself here. I have one more question for you as does anyone here do hpo like do you really curious to know if this is something that's you done by your team's or is it kind of I mean, I would argue that it's your

data scientist in you have a model or data set that you care about you should probably do HPI at least once on it. And if you're someone who's putting things into production probably periodic periodically run it so see if you can find a challenger model to the one that's in production. I'm curious if you guys. Use HBO plan to use HBO are already using you know it in production. Or is this kind of all? Over your heads like, you know good recommendations, but not really practical.

Okay doesn't sound like we got too many HBO takers while I hope that you guys can you take this and third-down notebook which explains how to use Rapid Rhino Pokemon Olympic torch, but I'm a database. Oh, I see. Okay. I let me see if I can find something for you. If there's a thing called sideburns. This is a great blog post talk to you about how I think it's primarily focusing on each shell and pre-processing tabular data. This one looks at. What is it doing word embeddings? Yep. Yep. I think this one is a

good one to start with some Rapids with my torch. There's another one rapid. This is another good blog post. That uses Rapids to do parsing of logs and then feeding those into an anomaly detection Mall. or original be improved the The quality of the rapid beating in a pipeline. So definitely stay tuned if you just look at the Rapids medium blog post then just keep your eye on on this place and you should see some more really great Rapids MLP content. awesome Okay. Well since I

know folks were struggling with the AWS stuff. I think we'll probably call it here. And next time I'll try to do I provide movie a set of detailed instructions that you know, if you're interested in following along you can try beforehand before coming to the session so that your pre-spawn up. Thanks so much for your guys's time and attention appreciate you staying through to the end. And if you have any thoughts or questions, definitely feel free to reach out to get hoverboard 3.

Cackle comments for the website

Buy this talk

Access to the talk “Accelerate Model Hyperparameter Search with RAPIDS and SageMaker By Miro Enev, Sr Researcher, NVIDIA”
In cart


Get access to all videos “Global Artificial Intelligence Virtual Conference”
In cart

Buy this video


Access to the talk “Accelerate Model Hyperparameter Search with RAPIDS and SageMaker By Miro Enev, Sr Researcher, NVIDIA”
In cart

Conference Cast

With ConferenceCast.tv, you get access to our library of the world's best conference talks.

Conference Cast
561 conferences
22100 speakers
8257 hours of content