Duration 23:56
16+
Play
Video

An introduction to MLOps on Google Cloud

Nate Keating
Product Manager, Cloud AI at Google
  • Video
  • Table of contents
  • Video
Google Cloud Next 2020
July 14, 2020, Online, San Francisco, CA, USA
Google Cloud Next 2020
Request Q&A
Video
An introduction to MLOps on Google Cloud
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Add to favorites
12.97 K
I like 0
I dislike 0
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
  • Description
  • Transcript
  • Discussion

About speaker

Nate Keating
Product Manager, Cloud AI at Google

About the talk

The enterprise machine learning life cycle is expanding as firms increasingly look to automate their production ML systems. MLOps is an ML engineering culture and practice that aims at unifying ML system development and ML system operation enabling shorter development cycles, increased deployment velocity, and more dependable releases in close alignment with business objectives.

Learn how to construct your systems to standardize and manage the life cycle of machine learning in production with MLOps on Google Cloud.

Speaker: Nate Keating

Watch more:

Google Cloud Next ’20: OnAir → https://goo.gle/next2020

Subscribe to the GCP Channel → https://goo.gle/GCP

#GoogleCloudNext

AI212

product: ML Pipelines, MLOPs; fullname: Nate Keating;

event: Google Cloud Next 2020; re_ty: Publish;

Share

Alright hello and welcome to an introduction to Emma Lofts on Google Cloud at next on their, I'm excited to share this over here with you and I hope it will be super valuable. As you continue to leverage, AI to transform your business. Where's the Quicken adoption? My name is Nate. I'm a project manager on blue clouds, a iPod for him where we have a mission to empower every Enterprise to transform their business. With a I I Googled huge range of AI, machine learning, breakthroughs leading technology, but at the heart of our decision-making is a

focus on the capabilities that will truly help our customers unlock new business value. And so are called a platform does just that by providing customer a, i n m, l teams the infrastructure into said to help them get more done or officially and more effectively. What's a good way to look at the agenda with the definition and overview of em'll Ops in the court challenges that play about where do designs teams are today. And we're receiving go and I'll share a simple framework for Emma Lost based on real processes that we see in practice both internal you Google and for more

mature customers with some of the services you can leverage today to get started as well as we're making additional investments in the future that'll be exciting to hear. So let's Jump Right In formally defined might be described as both a culture and a practice that aims at unifying ml system development and operations. It takes both its name, as well as some of the core principles and cooling from devops. And this makes sense, as the goals of ml. Ops in devops are practically the same to shorten systems development, life cycle, and ensure high-quality software is

continuously develop delivered and maintained in production. But machine learning has its own unique challenges and different needs that require special attention. Let's jump you slide. That may be familiar to many, which is from the canonical. Google paper, 2015 hidden technical debt. In machine Learning Systems, the takeaway being that production machine learning is more than training a model and that very little of properly functioning and I'll systems is actually model training code. There are a number of different Technologies and processes that must be in place to get

the most out of production and all systems nation and management of these different processes that comprises. True ml Ops a simple analogy might be the assembly line prior to the assembly line, automobiles that existed for decades and even years. Before the Model T Ford had something called the 999 which set the land speed record at over 90 miles an hour. But it really wasn't until the assembly line was introduced, which is also just a set of Technologies and processes. That continuous high-quality production and delivery of automobiles became

possible expensive. And so, the content platform is quickly, becoming assembly line from machine learning. But let's take a look at where most teams are now. Most ml Journeys, will look something like this as a data scientist. It starts hopefully with a clear ml use case in business objective and weird that use case at hand, you can start Gathering data from different data sources and doing some exploratory analysis. That helps you get a sense for what the data actually Holtz. One give a sentence for what's in the data itself and

you've no imputed some missing values and normalized your data or done some future generation. You can actually start to approach the modeling and figure out how you're going to tackle some new experiments. And so you would manually execute these different experimental steps doing more data preparation, more feature, engineering and testing and then you do some model training in eye primer. Tuning on any models or model architectures that are particularly promising. Last but not least you'll be evaluating all these generated models text me, I'm against

holdout sets of data. That wasn't trained on evaluating the different metrics looking at lost curb stability and comparing those models with other models to see which one actually works the best. And of course, this whole process is related is as manually executed over and over and over again. You might Begin by doing multiple different concurrent experiments, checking out different ideas and different architectures, analyze and compare the performance there. And then kick-off new experiments based on what you learn until eventually. You actually iterate towards an optimal model and

experimenting again there with the others as you to not final model. Until again, you get a model that is sufficiently performance, that it passes, all of your tests. And then of course unfortunately at this step you put in storage and throw it over the wall tonight in operations books and it'll actually be their job to deploy them all the production as a prediction service and they'll ensure all the necessary features are starting to production. Make sure we have autoscaling, make sure he set up any necessary deployment pipeline for all the various targets, which action may be in a

distributed system. So it's easy to see why that process is difficult and why organizations are struggling here. First of all it's really time. Consuming the steps are highly manual and are written from scratch for ever use case. Dolphin flexible, custom-built steps can't be reused and are only understood by the author. And so every additional data scientist in your team or in your organization can't usually leverage all the work that a particular team member has done towards their use case and you know I hear from data scientist all the time that even

they can't understand the work. They've done 6 months ago and leverage that the future is also an error-prone process to get his shoes like training serving skew, where the lack of coordination between i t a i t Hopps and the data science teams lead to unexpected differences in the online and offline performance. Is this suboptimal from performance standpoint but it's also really hard to figure out like why that happened. In short, there's a high marginal cost for model development and it really shouldn't have to be this way. So let's lay out of framework here and start by

addressing some of the challenges we saw using this framework cross 2ml solution life cycle and we can dive into some of the specifics in the details of each of these steps. First. First box on the left, experimentation is absolutely crucial. We don't want to hinder the experimental process at all, but one of the Intuit data science today is still very much science and foremost. Use cases requires a lot of trial-and-error. Instead of cutting down on the number of experiments, you want to way to embrace experimentation. Let m l teams, track and experiments are running to make sure they're

confident in building the best model. Second re-running training jobs, when new data is available or based on specific triggers. He might have said to be really, really easy riddles to test out and challenge the model running in production, to see if it actually is better. 3rd, bottles must be properly tested evaluated and approved for released following a rigorous Audible and even reversible CI. CD process is really missed out on by a lot of updated science teams today. And lastly for any model running a production want to always have a

sense of how that models before me. This is important to ensure quality and business continuity. What is also crucial for getting a v? Poly signals into how to improve a model for the next iteration. What better training data is there then true production inferences? The dotted line to Center here, roughly delineate, the training and serving size of this framework in building a world-class team. And platform is a German and tell him a dirty standpoint. We tend to see that customers find success by focusing initially on the left hand side of the screamer,

The Bedrock of Emily Ops is liable and repeatable training pipeline since. So as we go into the next section, let's take a look at those repeatable and reliable training by bolts. First, we transform the previously manual steps of ml experimentation into reusable pipeline, orchestrated, experiment of various steps, as a pipeline to be executed, that is in its entirety. And which allows you to iterate by changing from the configurations to point a different data or use different brand or values. And then you execute

the entire run as an experiment is disreputable and reusable training pipeline. That lets you run multiple iterations and actually produce different models to compare. And East pipeline here may develop one or actually many models. The output of this process is not a model to deploy but rather the code of the pipelines and its components which is horse controlled in Antarctica. Then similar to any software system, you set up a CCD process for your coat, including building, components running, automated test, and tagging all

the produced artifacts. Those produce artifacts like compile packages in container. Images are stored in a central, vac store. And then the pipeline is deployed to a Target environment. This can be the test for Deb environment, pre prod or staging environment or production by him depending on the COC trigger. If this is a production pipeline is executed automatically and repetitively to continuously train your model. Given new data that can be either based on a schedule like daily weekly mention or based on specific trigger. For example the availability of new data

or dropping performance of the model that's actually running in production. Play continuous training pipeline runs till the end, it outputs and newly trained model and that is stored in a central model, registry for deployment as a prediction service, at some point in the future. Crucially, a link between the model artifact itself and the training pipeline is never severed. Meaning that all models in the model registry can tell you their provenance what pipeline train them, who created the model, who ran the training job, what data is trained on and evaluated, on him over the

metrics and results of the different. Evaluations lineage tracking is incredibly key as well. Stay in the future. One additional thing to highlight here to notice the parity between the dev experiment Pipeline on the top rope here, and the continuous production pipe on the bottom round here. Avoiding the debt reduction consistency is crucial as it's the source of many bugs or under performance of miles. And production is really hard to debug for most teams. Now that we've an overview of the reliable and repeatable training pipeline, which again, is the foundational first step of them

allows. Let's take a look at the serving iPhone. For deploying the train model we want to execute model serving as a c s d process as well. So you'll fetch the model artifacts from the model registry where are training pipelines? Have actually store them as we just saw along with the code that raps the model as a service for Devil's. Rest API to execute the following steps. First building, a prediction service himself. Then running tests, which may and probably should be unique to your use case in organization and then deploying the service to the Target

environment or environment. Enduring this point in the process where it's often advisable to run additional tests and add some release Gates. So, for an online serving use case, you may want to Canary your model on Prussia Hardware, to ensure that it meets certain latency requirements, might actually also want to test out different Hardware to make sure your deployment model to the appropriately. Optimal hardware, for that model architect. Magician you might want to run to organize baby tests to ensure that your new candidate model. Truly outperforms, the current model and production

before routing all of your traffic to it and so he might take 1% or 5% of your traffic and send it to the new model to test out, Howard performing on true production data. That you'll also need to capture the eventual business outcome to service the ground truth and evaluating that models performance. When the bond was deployed in the serving and structure itself, it starts to start production on live data. In kadena Mabel explainability and continuous evaluation where the serving logs are captured and stored for monitoring and Analysis in the

future. We monitor model performance in production to know if models are going stale. But then, if I, if there's any outliers or if there's skew or concept drift exhibited in the data, It should be something like the statistical properties of the data changing over time, which can trigger a new iteration of model training experimentation. And this happens all the time, you know, we live in a dynamic world and the data is constantly shifting around underneath us so soon. Credibly key to keep a close watch on the production data that's coming in. So we can identify, for example, if

a particular class underlying distribution, Again, crucially because we have maintained model provenance. And lineage like we mentioned, we can use a models off on performance as a baseline When comparing against production and quickly, identify where concept has occurred as live data shifts away from the training data. Is also a state of science teams quickly, pull up the training data and pipeline to reproduce the Run figure out where they issue might be occurring and then debug things. Take me to step back and zooming out the end-to-end view of both

training and serving using rml Ops framework and maps of the input and output artifacts and different storage systems. So, we start again with experimentation development, where we develop a training pipeline that is reusable and repeatable and then the code is checked into a code repository for Source control. We didn't build that training Pipeline and deploy to the Target environment either death staging their production using CCD process where the artifacts are registered in the central depository where we can pull out those artifacts for deployment. A roll, back in the

future. Dial the model of the pipeline is running in production, you can do continuous training you there on a weekly or daily basis. Or when we have a particular trigger base of the model running production or new data is available the outputs of these are stored in the model registry which can be picked up for a model csb process in the future. That model sissy process includes all of the tests stage Gates and approvals necessary to rigorously. Make sure the mall of your deployment production is ready. And is actually are performing the model

that is currently in production. And then crucially, when that model is actually running on production Hardware. We want to make sure that it's constantly staying up to Performance metrics and the beat identify when it starting to Decay over time or whether there's skewer dressed. So we can go back to an earlier step in this process, retrain new models and upload them in production for testing. It isn't. This is a very complex in high-volume process where customers may be running, thousands of different training experiments for a single-use case and testing

dozens of different models of production. At any given time. It's important that we make sure we get this full picture, make it available to our teams. And so each of our pipelines and steps in the Semel Ops process rights out, she executions of artifacts to Central meditative store where and all teams. It leaders and governments folks have a system of record where they can do an Allosaurus run, an audit, keep track of these Myriad processes on going. As we automate the different steps of our assembly line from machine learning and show Okay. Now that we have

this big picture of you and me, put it all together. Let's talk about Emma Lofts on Google Cloud platform. Just got some of the products and components that we offer to enables such a system. And as mentioned that the car really is our suite of integrated products for building and using Ai. And here, we'll just take a focus on Then we'll provide more context on the relevance individual products themselves. So, like before, let's start with a team that has existing data and products and clear use cases. They may be running analyses, and experiments

locally or in different style of environments for different organizations. This is kind of what data science look like and sometime around 2010 is a lot about how cannalysis lot of custom tooling, a lot of different experiments, run locally, not a lot of cloud usage laptops. Stored a lot of the knowledge of data scientist in organization, when they left the organization, that knowledge was largely lost. The first thing teams, use a standardized on from the original view is the core data science platform. This would be things like which language they should use which

framework still use the preferred development environments. All of the different things that most of the team members will probably argue about In 2020, the average data science team. That most organizations has a pretty good handle on this and though there are many different options to choose from and there are different benefits and challenges associated with those different options and also there's a lot of low-hanging fruit for improvement here. Enterprises are still better position than they were five years ago from a core data science tooling standpoint and model building

stand. Are coming back to our introductory section in our framework, what most teams are. Now, moving to build is this true. Production-grade platform for Emma lofts, These are the services that enable the end and workflow that we saw before following our, I'm a lobster. And a lot of spring work, experimentation continuous, training model, cicd and continuous monitoring. So let's take a look at what this looks like for specific gcp products at the data science layer. We have our AI platform notebooks which is a host in Jupiter loud, environment

integrated with our infrastructure and very services. You can come tonight at 2, a source control system, run notebooks in a scheduled matter and your control over the underlying compute is the familiar interface for data scientist who like to do ad hoc development using jupyter notebooks especially up front part of their prototyping experimentation phrase. In the envelopes in infrastructure layer, we build this on modular Services. Fax largely by open-source and most of my open source created by Google. Each of the boxes here can be used to varying degrees or not at

all, depending on your different needs based on many of the internal expertise we have and working with many of our customers. We know the ad workloads can be highly customized, it's our services are designed for novice intermediate and advanced machine learning teams to use whatever they need and also bring their own tools to the platform as well. It's the core of this and a lot of Slayer lives are a iPod from training and prediction services for the backbone of building and running bottles. The Adams and green are, we're making our latest Investments a

feature store to help solving train help, solve training starving, scuse shoes and improved collaboration within teams by registering features, during the front training process. You can share those same features are available in production for your model at low latency, and you can also share those features with other team members to ensure that all things have access to the great work, that individual data scientists are doing and make sure we're not duplicating effort. And we're using the most powerful and effective features to build new models. In addition a Rasta robust

experiment, tracking service to enable large-scale, experimentation that n m, l model introspection use case and comparison incorporating a Google open source products like 10 support. This is also really critical because the amount of experiments we run is directly proportional to how confident we are that we really chosen, the best model, am I making experiments, shareable and Easy, referenceable in the future. We can nibble all the members of our team working on the same or similar use cases to learn from the work that we've done in generating the optimal model for our use

case, At the bottom layer there, we have rml pipelines and centralized meditative store for orchestrated managed end-to-end workflows. These are really critical for building this end-to-end capability and making sure we're not writing custom code for different steps of this end-to-end process that can be leveraged and reused in the future. Want to make sure we're orchestrating full runs event and pipelines and tracking the outputs of each step inputs and outputs of each step in our Central meditative store. And for many of these different items in green, we offer great open

source and hosting Solutions today. And we're working on bringing managed services to these areas in 2020. And so it was shown here is simply the high-level overview focused on rml. Off stack definitely, stay tuned for announcements in these investment areas and more as our platform grows in breadth and depth. so with that, I'll leave you with this view of our analytics platform, your teams can begin leveraging, the power of antelopes on gcp immediately by checking out, our AI platform in the gcp, For more information, I've included some detailed blogs and solution guides are

further reading which of the best place to start here and they'll be published alongside this breakout session. After reading these please reach out to Google Cloud route to get in touch. You would actually love to hear from you. Want to be answering your questions in the Dory throughout the next week or so, I thank you for tuning in and listening and we are immensely excited to see what you build with are cloudy iPod 4.

Cackle comments for the website

Buy this talk

Access to the talk “An introduction to MLOps on Google Cloud”
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free

Ticket

Get access to all videos “Google Cloud Next 2020”
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Ticket

Similar talks

Tracy Frey
Managing Director, Outbound Product Management at Google
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Meredith Hassett
Developer Experience Enthusiast at Wix.com
+ 1 speaker
Bryan Zimmerman
Product Manager at Google Cloud
+ 1 speaker
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Pali Bhat
Product & General Management at Google
+ 2 speakers
Aparna Sinha
Director of Product Management at Google Cloud
+ 2 speakers
Johannes Wechsler
SVP E-Commerce & Digital (Chief Digital Officer) at BAUHAUS Deutschland
+ 2 speakers
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free

Buy this video

Video

Access to the talk “An introduction to MLOps on Google Cloud”
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free

Conference Cast

With ConferenceCast.tv, you get access to our library of the world's best conference talks.

Conference Cast
636 conferences
26240 speakers
9757 hours of content