Duration 27:28
16+
Play
Video

Waking the Data Scientist at 2am: Detect... By Chris Fregly, Developer Advocate, Amazon Web Services

Chris Fregly
AI and Machine Learning at Amazon Web Services (AWS)
  • Video
  • Table of contents
  • Video
Request Q&A
Video
Waking the Data Scientist at 2am: Detect... By Chris Fregly, Developer Advocate, Amazon Web Services
Available
In cart
Free
Free
Free
Free
Free
Free
Add to favorites
31
I like 0
I dislike 0
Available
In cart
Free
Free
Free
Free
Free
Free
  • Description
  • Transcript
  • Discussion

About speaker

Chris Fregly
AI and Machine Learning at Amazon Web Services (AWS)

Chris Fregly is a Developer Advocate for AI and Machine Learning at Amazon Web Services (AWS) based in San Francisco, California. He is co-author of the O'Reilly Book, "Data Science on AWS."Chris is also the Founder of many global meetups focused on Apache Spark, TensorFlow, and KubeFlow. He regularly speaks at AI and Machine Learning conferences across the world including O’Reilly AI & Strata, Open Data Science Conference (ODSC), and GPU Technology Conference (GTC).Previously, Chris was Founder at PipelineAI where he worked with many AI-first startups and enterprises to continuously deploy ML/AI Pipelines using Apache Spark ML, Kubernetes, TensorFlow, Kubeflow, Amazon EKS, and Amazon SageMaker.

View the profile

About the talk

In this talk, I describe how to deploy a model into production and monitor its performance using SageMaker Model Monitor. With Model Monitor, I can detect if a model's predictive performance has degraded - and alert an on-call data scientist to take action and improve the model at 2am while the DevOps folks sleep soundly through the night.

Share

Traduction. So today's talk is called The Weeknd bday to scientist at 2 a.m. So typically we talked about waking the Dead Ops Folks at 2 a.m. Now we're actually going to shift the target of our pages over to the data scientist. So what will be what we will be talking about is after we push our models into production. There's ways to actually monitor those predictions and if we start to see things like a skew or drift are the terms that we'll talk about here in a bit. There's ways to actually take action and

so some useful links book and also this website that that has lots and lots of resources. We will also post this talk. And the video and all that will go to the global AI conference as well. And I think to be like in trouble was pretty clear. I've been doing Lots with the community before joining AWS. So I've got something called the advance to flow Meetup that I've had for probably I think about 5 years now it if you know predates to flow we try to stay current with the time. So I think originally it

was around spark shifted in Regent's Park and tensorflow and now we basically do pretty much everything but the most recent Focus has been kubeflow and then quite a bit of sagemaker. And there is there's the book there. If you want to get to the book If you go back to that link there you can sign up you can get early release access for it as well too, and we'll have to account for you. So this is really the the full book where we'll be focusing on today is essentially chapter 9 here what should be deployed.

We will also talk a little bit about the bugger which you can use actually drink training where you can to bug him. If you think of like tensorboard if you've ever used tents are bored with things like tensorflow you can actually get for the insight into what's happening during training, but they stalk will like mostly be about what happens after the model goes into production. How much is model monitoring down there? Okay, so quick sort of high-level overview of sagemaker. Today's talk will be focused on stage banker and features there in

so we have fully managed notebook and sincerity is our Jupiter Jupiter lab, of course power in these things. There's also a series of a built-in. I like algorithms for you know, things like Fast text which is a variance of word to back things like xgboost. We've got forecasting algorithms. These are all built ins you can use those if you want or of course, you can bring your own code, which is actually I think what will mostly be covering today is that have a

bring your own script which is which is classical script mode with Amazon. So there's built-ins. There's bring your own script and there's even ways to bring your own container. Yes, if You have your own Docker container that's already been approved by your security team. This is something you can certainly bring into the six maker platformer and then scale it out using sagemaker and then you could also benefit from model monitoring which is what we talking about today. We also do hyperparameter tuning, of course. This has been there for I think since 2018

hyperparameter tuning very very common in like today's world trying to get the best model fit to our data set so that we can that make predictions with that model. There's one click deployment very easy to do you can either do it from the UI or just one single line of code to get into production and scaled out and of course, we support the auto-scaling you could set up different autoscale policies for these models in order for these model servers in production. Okay, so backing up, you know, we do have reinvent 2020 that's actually coming up. I believe

in the end of November right after Thanksgiving and I think the first couple weeks of December it's completely online this year and I think it spans about three or four weeks. So but just backing up to last year's announcement. This was actually a really big year 2019 reinvents was huge for sagemaker. We solved lots and lots of like really cool features the two that will be talking about today or do you take me for the bugger and then sagemaker model monitor, but in 2019, we actually wash something called tastemakers Studio, which is its really these are the premier IDE.

If you want to use things and track experiments or visualize the experiments, you can still track experiments with just you know, single lines of python code, but they took her Studios really slick. We also at the same time. We have launched the sagemaker experiment API and then something called autopilot, which is basically Automated machine learning automl so lots of of cool features check them out, but are the ones that like will be talked about today are to bug her and monitor. What's quickly talk about the bunker? So when your your training session with

these deep neural networks, there is lots of these connections lots of layers happening and there could be cases where you start to see Vanishing gradient soar like an exploding gradients or there's just something happening during training that you might not have expected and this typically happens when you multiply small numbers by smaller numbers and and go deeper and deeper into your network. And so what you want to be able to do it is to tap into that and text you look at at the values of Pacific tensor is or layers and

see how the actual training is happening. Answer one way to do this, of course is to just put his to put in your own server print statements right to bug statements left the actual training process run, you know, if it takes 3 days and then go back and then sort of use that data to then chart and you know find where those things happen. But what you really want to do is is tapping and get notified right when these things happen. And so that's something to bugger. Lets us do a couple examples of a situation where your loss is not decreasing

example, we want the loss to go 2-0. So lost is essentially the the error we you know, we are currently making predictions during training to see how close we are to the actual I like label data and what we want is to reduce that are down to zero as best as possible as close to zero as possible. And so sometimes the sensor values because we are, you know limited by Hardware which if there's a 32-bit floats these things can sort underflow and then get get down to zero in an

hour starting to see lots of zeroes because that's called Vanishing gradient example of something that we certainly want to keep an eye on and get notified because what you don't want to do is it is to continue to pay for your training job when your model is all zeros were starting to get Zero's everywhere. This is kind of flashy mathematical way to explain Vanishing gradient, but really at the end of the day, it's multiplying small numbers by small numbers which then gives you really small numbers and starts to go to zero. There's also something called the loss not

decreasing. So I was mentioning this or we're trying to continue to decrease. If for example, we start to Plateau like the green line here, so that Top Line is green bottom one if we start to plateau and we're not actually starting to approach zero then something's wrong and that that's a case where we do want to get notified. So, let's see how we would do that. Such a bummer let you actually capture the data then automatically detect these errors and then get notified. So the way

that all this comes to work is our code is running inside of a Docker container sensory, but that's being Spilled Out across the stage micro the structure and so periodically we are checking these like values of these tensors and we can man fire off what's called a cloudwatch events. And this is just normal sort of Amazon speaker, right leg AWS speak and we can then setup a notification SMS notification. So the simple notification service SMS. Could either email someone on call like they did scientists

or send a page or a text message while I something like that or we can actually stop the training job automatically. So we don't even have to notify the data scientist. We could just stop the training job and then let them know the next morning. because maybe they want to log into the VPN and and fire off this job again with some different type of parameters or somehow try to address this problem is Alright, so curious how you would actually add the bugger to your training

job very very small amount of code. You would specify lost not decreasing. So if you were you saying something by Katy Perry at work here, this is xgboost. So greasy makes sense if you were using tensorflow or pytorch or maybe mxnet, you can certainly then Best Buy gradients maybe check for certain scenarios like overfitting and you would set up the rules and basically, how often do you want to check? Is it every step through the training process is in a return steps and you can even customize the actual parameters. So if

the Lost does not decrease over ten steps or by 50% over in a 100 steps something like that, then you would trigger to school. Okay, let's shift gears a little bit. So now we've we have successfully trying to model website. We have pushed model into production. And now we actually want to monitor these predictions and so we will do this by capturing that data both the production inputs and the actual production outputs actual predictions themselves and we can periodically sample and then compared to a baseline

data set that that we were using during training and so this is often called Drift. This is called a training servings. Do you sometimes so we have trained the model on a particular set of data that that has a particular distribution and now we are monitoring the the calls that are in the wild, right? So this model has been pushed out it's now taking traffic and there are sets of like inputs that are coming in and there's such as predictions that are being made. By this model, what we want to do then is is have a baseline. Maybe we talk about the

bass line before the model is pushed into production and by Baseline, I mean a a set of inputs that are representative of the training data set and make predictions and then for the capture those and every let's say, you know 500 predictions we will compare and make sure that the type of data that it is being passed in is similar to the data that that model hasn't rained on example new data is coming in from from the wild that has completely different characteristics than what this model was trained on then this

probably is not the best model for the current situation. And so something we would do would be continuously training this model. We would be no flag this new data distribution that's coming in we would then reach Turn the model with new data so that we have a better model. If all of this could become really really cumbersome without a sort of fixed or you know structured way to do this. And so there needs to be a way to do automatic data collection. For example, we need to be able to continuously Monitor and schedule this continuous monitoring whether this

happens every 500 predictions or every 30 minutes or every 2 hours. There's also got to be some way to specify the rules right like the intervals to make sure he tore or if we if the least you is is a 10% difference then then what we were expecting then that's something we want to den flag. It sounds really nice to actually have some some sort of visual data here so I can go into one dashboard and see how far off we are and then it's also Notifications and take action. How does this

work? It's actually quite a bit going on beneath the covers in my so my company before joining Amazon. We actually specialized and so I can I can speak from from personal experience wear if you're trying to like build something like this. There are so many moving Parts. It's actually very very difficult to build and maintain and so one thing that's nice of that page maker has built this for us and so we can just tap into the stage maker framework is so the pieces that we

need of course are some data. So that's the S3 bucket on the left there. We then have a trained trained model. And so that's the middle that was done with a pacemaker training job that model then gets pushed out to sagemaker as a rest endpoint for example, and then now we have applications that are sending in data and then getting back. So let's take a look here. So this is actually built into the stage maker python SDK if using Python and you can turn it on with with with just a few lines of code. Are you passing

data capture config? And that's going to configure how often you want to sample. I typically wouldn't sample 100% that's a lot of data to be sampling and you then tell it where to actually put these results and you give it some sort of S3 bucket URI. And then that's it. So basically some sample data that was collected. It's going to create these Json files Json L is a special Jason lines for Matt. This is use pretty heavily throughout sagemaker where each each line is

valid Json, but the entire file it doesn't have to be properly formatted. So think of it as we are just validating each line of Json to be a a like complete Json documents, but we can write multiple and so this is actually what gets stored intuos3 and so diving a little bit deeper. Let's actually look so this is some of the data that was captured. So here's a series of numbers coming in as a CSV snapshots of the actual input data and this is all done within your own account. So there's nothing

being shared with Amazon or anything. This is all captured is put into your private S3 buckets. There is nobody else that's being the state of and then there's do you like output so these numbers coming in and put and then that was the output being returned as CSV in this case. And so if we start to see these input sequence is coming in that do not match what this model was trained on enforce. This depends on the type of model vet that were actually deploying then yeah, that's where we actually want to get notified. So what we would do is there's a sort of

Baseline certified training phase. I guess you would call it's not really training but we are passing in data that we know represents our data set. So we do set up a Baseline and the state of does represents the type of RV the structure of of the data that the model was trained on and then So that's called Baseline. So we would generate bass line statistics and then constraints and then we would then continuously compare those bass lines to the actual real live like request and predictions coming in

and okay. So here's here is the code that used to actually suggest a baseline And so this is going to generate or this actually points to a CSV that does represents our dataset what we will continue to compare two throughout the monitoring process. Now, here's a little bit more under the hood about what's going on with mobile monitor. So beneath the covers we are actually launching spark jobs, and these are these run as or something for processing jobs

processing jobs were actually released during 2019 reinvents and they're they're basically freeform either python jobs. Pais aren't they could be Scala jobs, they could be anything. So in this case, it's the inputs and outputs and it's going to then compared to the Baseline. So these run as far as jobs and there's this open source library and follow you. And so I've used you to actually multiple places within my switch is when when new data comes in. I want to actually run DQ and sort of get do you

like bass line for the 3rd of summary statistics on my new data coming in as a sort of one way to use processing jobs and Spark, but Kira mobile monitor. This is all done for you with my monitor, but you can certainly use DQ. It's off of the AWS Labs. It's it's like completely open-source works with any version of spark that you want to think the one I'm using for work for my demoses part 246 and Mama monitor will do all this for you. And so all you do then is pay for just the actual duration of the

model number. And so this is what type of substance here. So very very you can tell it like how many to run if you want to do work in parallel and you could have five instances running in parallel and then just pay for the fractions of the minutes and hours that you are actually running. Okay, so here's an example. So this is this is taking you look so this is the actual Baseline results. And what we want to do is keep an eye out. So on the left are the different inputs. And so there's the account length. I think this example we are actually trying to predict

the churn and sojourn is the actual Target for if you like prediction and then all the other data points there one through seven are the features that are used to make this prediction and what we're constantly doing. Hailey McCann, you meet yourself there. Actually I can meet you. Yeah, can somebody mute Lena, please? Okay, so the fields so the account length has a certain like the miracle statistics summary statistics. The mean is a 1.27. If suddenly we start getting data where the mean of the account link ends up being $500

or something wildly different than what we were expecting then this is this is cause to I like possibly retrain the model and so you can automatically trigger to retrain the model and back here in a bit as well. So what's the use of suggested constrained? So one thing about each you actually this is so, you know, maybe the first time GQ or do you like model monitor job runs through and looks that all of our inputs and outputs for these predictions. Maybe there's certain fields that

we do want to actually a door certain constraints that might be indicative of something going wrong here. So this is the whole landscape coming together. The next step is to create this monitoring schedule job or this monitoring schedule and you could do this either as a cron type of thing or if it's certain constraints are met. So here is a cake. Ron expression the same do it every hour but there's also other ways to schedule based on the number of like things like that. Like I said Crossing jobs,

which are running spark jobs that are using the open-source DEQ d e q u a library library Resendez like violations report that shows up and so this is stuff. You can actually review either just using just regular pandas dataframes or you can actually visualize it within your studio itself. So Studios not required for this but it's really a nice way to put a visually look at these things. And so what you could do is Paige the data scientist with a link to this violations report and then have them actually look at these

violations and then potentially trigger a new training run. Okay. So here's here's how you would programmatically look at these monitoring jobs. Like I said, these are just spark jobs that are running in the skip over this little bit. Yes, all of this end up going into cloudwatch also, so you cannot build a sports around this. Okay, so here's an example. So this would be bassline drift. So for the feature age within 10 minutes, there was some potential drift happening here. So this can actually create a first-class warning or like an alarm at then

would then be sent to the data scientist to take a look there's many actions you can take I'm just describing, you know, some of them are human possible ones. Were you would then point them to the details of this drift? Okay, so Let's see here. I think that pretty much takes us to the end here at the how much time 12:36. I think I actually have about 4 more minutes or so, but these are the reference. Okay, cool. And there's a really really good blog posts that I actually just came out I think back in

July from one of my peers that goes very very into and uses I believe it is more wealthy also part of it is the other side of it. So we had talked about the bugger kind of around the middle of the talk. And then with a marble monitor there is a really really good example of how to use the bugger to do explainability. So model explainability there specifically looking at lots of traffic signs are I believe the author is German. So she chose German author German traffic signs and it's it's the ability to build

sort of a keep map or like a sailing on the actual inputs on the actual image to say how that traffic sign was predicted from an from a computer. Sam's Town point so which which parts of the little you know, if there's like a person crossing a crosswalk the sagemaker to bugger can actually highlight the part of the specific image that lets the prediction but maybe not explain ability as much as interpretability but like really really has some like really good examples if you click through house a really good birth

example where she can can show how the Bert model is being visualized and so is there something we are we are actually pulling into our book as well. So check those out check out the Neuroscience on website there. And if you'd like to say hello on LinkedIn or on Twitter and thanks so much for your time.

Cackle comments for the website

Buy this talk

Access to the talk “Waking the Data Scientist at 2am: Detect... By Chris Fregly, Developer Advocate, Amazon Web Services”
Available
In cart
Free
Free
Free
Free
Free
Free

Ticket

Get access to all videos “Global Artificial Intelligence Virtual Conference”
Available
In cart
Free
Free
Free
Free
Free
Free
Ticket

Buy this video

Video

Access to the talk “Waking the Data Scientist at 2am: Detect... By Chris Fregly, Developer Advocate, Amazon Web Services”
Available
In cart
Free
Free
Free
Free
Free
Free

Conference Cast

With ConferenceCast.tv, you get access to our library of the world's best conference talks.

Conference Cast
566 conferences
22974 speakers
8597 hours of content