About the talk
ML development often focuses on metrics, delaying work on deployment and scaling issues. ML development designed for production deployments typically follows a pipeline model, with scaling and maintainability as inherent parts of the design. We examine TensorFlow Extended (TFX), the open source version of the ML infrastructure platform that Google has developed for its own production ML pipelines.
Presented by: Robert Crowe, Charles Chen
I'm Robert Crowe and we are here today to talk about production pipelines. Ml pipeline. So we're talking about ml modeling too much or different architectures. This is really all focused about when you have a model and you want to put it into production so that you can offer a product or a service or some internal service within your company and it's something that you need to maintain over of the lifetime of that that's appointment. So normally when we think about ml we think about modeling code because I miss them it's the heart of what we do right it modeling and the results
that we get from The Amazing miles over producing these days. That's that's really the reason we're all here. The kinds of results we can produce is what papers are written about for the most part overwhelmingly. There was already a written about architectures and results and different approaches to doing ml. It's great stuff. I love it. I'm sure you do too, but when you move to bring something into production Who discovered that there are a lot of other pieces that are very important to making that model that you spent a lot of time putting together
available and robust over the lifetime of a product or a service that you're going to offer out to the world. So they can experience really the benefits of the model that you've worked on. And those pieces are what tf-x is all about? In machine learning we were familiar with a lot of the issues that we have to deal with things. Like where do I get labeled data? How do I generate the labels for the data that I have? I may have terabytes of data, but I need labels for hot does my label cover the feature space that I'm going to see when I actually run
inference against it is my dimensionality if you wasn't minimize or can I do more to try to simplify my feature set my my feature Vector to make my model more efficient have I got really the predictive information in the data that I'm choosing? And then we need to think about fairness as well. How are we are we serving all of the customers that we're trying to serve fairly no matter where they are or what religion they are or what language they speak what demographic they might be because you want to serve those people as well as
you can you don't want to unfairly disadvantaged people have rare conditions to Lake Health Care where it where we're making a prediction. This is going to be pretty important to someone's wife and a baby on a condition that occurs very rarely but a big one when you go into production is under the data lifecycle because once you've gone through that initial training and you put something into production, that's just the start of the process you're now going to try to maintain that over a lifetime. And the world changes your data changes things conditions in your ear
your domain change. Along with that you're doing now production software deployment. So you have all of the normal things that you have to deal with with any software deployment things I scalability. What I need to scale-up is my solution ready to do that. Can I extend it? Is it is it something that I can build on modularity best practices testability? How do I test an ml solution and security and safety because we know there are attacks for ML models that are getting pretty sophisticated these days. Google created
tf-x for for us to use we created it because we needed it. It was not the first production ml framework that we developed. We've actually learned over many years because we have ml all over Google taking in billions of inference request really on a planet scale and we needed something that would wait would be maintainable and usable at a very large production scale with large datasets and and large loads over a lifetime. So cheer practice evolved from earlier attempts at it is not
what most of the products and services at Google use and now we're also making it available to the world as an open store product available to you now to use for your production deployment. It's also used by several of our partners and and just companies that have adopted adopted tf-x. You may have heard talks from some of these at the conference already and there's a nice quote there from Twitter where they did an evaluation. They were coming for a torch paced environment looked at it the whole sweet or the whole
ecosystem of tensorflow and move everything that they did to tensorflow one of the big contributors to that was the availability of tf-x. Regency of access to provide a platform for everyone to use along with that. There's some best practices and approaches that we're trying to really make a popular in the world things like strongly typed artifacts so that when you were different components produce artifacts, they have a strong type pipeline configuration workflow execution be able to play on different platforms different distributed pipeline platforms
using different orchestrators different underlying execution engines trying to make that as flexible as possible. Derksen pours out of layers that tie together the different components and tf-x and we'll talk about components here in a little bit and we have a demo as well that will show you some of the code and some of the components that were talking about the horizontal layers and important went there is metadata storage. So he threw the components produced and consumed artifacts you want to be able to store those and you may want to do comparisons across months or years to
see how did things change because change becomes a central theme of what you're going to do in a production deployment. These are I just kind of a conceptual look at the different parts of tf-x on the top. We have tasks that a conceptual look at tasks. So things like ingesting data or training a model or serving serving the model below that we have libraries that are available again is open-source components that you can leverage their leverage by the components within tfx to do much of what they do on the bottom row in an orange a good color for Halloween.
We have the tf-x components and those are we're going to get into some detail about how that you your data will flow through the T-Rex pipeline to go from ingesting data to finish train model on the other side. So what is a component? A component conception has a three-part system of a particular component, but it could be any of them two of those parts the driver and publisher are largely boilerplate that you could change. You probably won't a driver consumes artifacts and
Begins the execution of your component a publisher takes the output for the component put it back into metadata. The executor is really where the work is done in each of the components and that's also a part that you can change so you can take an existing component override the executor in it and produce a completely different components of does completely different processing. Each of the components has a configuration and 40 effects that configuration is written in Python and it's usually fairly simple. Some of the components are a little more complex but most of them are just a couple
of why is it code to configure? The key essential aspect here that I kind of alluded to is that a component there is a meditative store the component will pull data from that store as it becomes available. So there's instead of dependencies. They determine which artifacts that two components depends on it'll do whatever it's going to do and it's going to write the result back into metadata the over the lifetime of a model deployment you start to build an amenity to store that is a record of the entire lifetime of your model and the way that your data has
changed the way your model has change the way your metrics have changed. It becomes a very powerful tool. Capogna's communicate through the metadata store do an initial component will produce an artifact put it in the Methodist or the components that depend on that artifact will then read from the metadata store and do whatever they're going to do and put their result into it and so on and that's how we flow through the pipeline. Does it matter datastore I keep talking about what is it? What does it contain? There's really three kinds of things that it contains
train models or just artifacts themselves. It could be trained models. It could be data or data sets that could be metrics. They could be split. There's a number of different types of objects that are are in the metadata store. Those are grouped into execution records. So when you execute the pipeline that becomes an X and execution run and the men of the the artifacts that are associated with that run are grouped under that execution run so that again you're trying to analyze what's been happening with your your pipeline that becomes very important. Also the lineage of
those artifacts the which artifact was produced by which component which consumed which inputs and so on. So that gives us some functionality that becomes very powerful over the lifetime of a model you can find out which data a model was trained on for example, if you're comparing the results of two different model trains that you've done tracing it back to how the data change can be really important. And where's that we have some tools that allow you to do that. So tensorboard for example will allow you to compare the metrics from Sam Hunt model that you train 6
months ago and tomorrow, but you just train now to try to understand you and you can see that it was different. But why why was it different? And warm starting becomes very powerful to especially when you're dealing with large amounts of data that could take hours or days to process being able to pull that data from cashed in if the inputs haven't changed rather than re-running that component. Every time becomes a very powerful tool as well. There's a set of standard components that are shipped with tf-x. But I want you to be aware from the start that you are not
limited to those standard components. It's this is a good place to start it will get you pretty far down the road, but you will probably have needs you Merry may not where you need to extend the components that are available and you can do that. You can do that in a couple of different ways. This is sort of the nautical pipeline that we talked about. So on the left where ingesting our data, we flow through a we split our data. We calculate some statistics against we'll talk about this in some detail. We then make sure that we don't have problems with their data and try to
understand what types are features. Are we do some feature engineering we try and this Probably sounds familiar if you've ever been to an ml development process. This is mirroring exactly what you always do then you're going to check your metrics across that and we do some deep analysis of the metrics of our our model because that becomes very important then we'll talk about an example of that in a little bit and then you have a decision because once assumed you already have a model and production in your retraining it or maybe Avenue model that your training and the question becomes
should I push this new model to production or is it is the one I already have better? Because you met many of you have probably had the experience of you train a new model and it actually didn't do as well as the old one did. Along with that. We also have the ability to book inference on inference request. So if your if you have a batch of you maybe in a batch request environment, so you're you're pulling in data and batches here running request against it and then taking that result and and doing something with it. That's a very common use case and we
have components now that this is actually we have components to to do that as well. This is the python framework kind of of a defining a pipeline. So that's a particular component is to transform component and the configuration that that you need to use for that but on a on a very simple level that's how you set up component and in the bottom you can see there's a list of components that are returning that call. Those are going to be passed to a runner that runs on top of whatever orchestrator you're using. So
little bit more complex example little bit harder to read gives you an idea. There's several components are the dependencies between components and that they are there penalties and artifacts. I just talked about hard to find in code like that. So you'll see the free sample. The statistics can depends on the output of example, John. So now let's talk about each of the standard components in apple. Gen is is where you ingest your data and it was going to take this going to take your data is going to tensorflow example, this is actually showing just to input formats but there's a
long list of of a reasonably long list of input format that you can have to do your splits for you. So you may want just training an eval or maybe you want to build a shin split as well. So that's what does and then it passes the result on 6th Jan satistics Jen because we all work with data. We know you need to dive into the data and make sure that you understand the characteristics of your data said well statistics Jen is all about doing that in a in an environment where you may be running that many times a day and also gives you the tools to do visualization of your
data like this. We're so for example, that's that's the trip start our feature for this particular dataset. I just looking at that just looking at the Instagram tells me a lot about an area that I need to focus on that those 6-hour I have very little data. So I'm going to want to go out and get some more data cuz if I try to use that and I run in Prince requested 6 a.m. It's going to be over generalizing so I don't want that. Image and is looking at the types of your features. So is trying to decide is a float. Is it an ant is it a categorical featured and if is a categorical feature, what are
the valid categories? So, Jen try to infer that but but you as a data scientist need to make sure that it did the job correctly. So you need to review that and make any fixes that you need to make. Example validator then takes those two results the schema down and and the statistics Chan and it looks were problems with your data. So it's going to look for things like missing values values that are 0 and shouldn't be 0 categorical values that are really outside the domain of that category things like that problems in your data. Transform is where we do feature engineering
and transform is one of the more complex component. So you can see from the code there. You could actually that that could be arbitrarily complex because depending on the needs of your data set and your model you may have a lot of feature engineering that you need to do or you may just have a little bit of the configuration for their wisdom configuration parameters, but it has a key advantage in that it's going to take your feature engineering and it's going to convert it into a tensorflow
graph that graphs that then gets pretended to this the model that your training as the input stage to your model and what that does that means you're doing the same feature engineering with the same code exactly the same way both in training and in production when you deploy to any of the deployment targets So that eliminates the possibility that you may have run into where you have two different environments and you're maybe even two different languages and you're trying to do the same thing in both places and you hope it's correct. This eliminates that we
call it training serving SKU it eliminates that possibility trainer Well now we're kind of coming back to the start trainer. Does what we started with where is going to try in a model for us. So this is tensorflow and and the result is going to be a saved Model A little variant of the same model the eval save model that we're going to use it has a little extra information that we're going to use for evaluation. So trainer has the typical kinds of configuration that you might expect things like the number of steps whether or not to use warm starting. And you
can use tensorboard including comparing execution runs between the model that you just trained and models that you trained in the past at some time. So tensorboard has a lot of very powerful tools to help you understand your training process and the performance of your model. So here's an example where we're comparing two different execution runs. Evaluator uses tensorflow model analysis one of the library that we talked about at the beginning to do some deep analysis of the performance of your date. So it's not just looking at the top level metrics. Like what is you know, they are
messy or the AC for my my whole day is that it's looking at individual slices of your data set and slices of your features within your data set to really die then at a at a deeper level and understand the performance so that things like fairness become very manageable bite by doing that. If you don't do that sort of analysis, you can easily have gaps that may be catastrophic in the performance of your model. So this becomes a very powerful tool and there's some visualization tools it will look at it as well that help you do that model validator ask that question that I
talked about little while ago where you have a model it's in production. You have this new model X Just train is it better or worse than what I already have. Should I push this thing to production? And if you decide that you're going to push into production and Pusher does that push now production could be a number of different things. You could be pushing it to a serving cluster using tensorflow serving. You could be pushing it to a mobile application using tensorflow Lite. You could be pushing it to a web application or an OJs application using tensorflow JS or you could even
just be taking that model and pushing it into a repo with tensorflow Hub that you might use later for transfer learning. So there's a number of different deployment targets and you can do all the above with with booger. Buck. And fur is that a component that we talked about little while ago we're we're we're able to to take book inference request and run inference across them and do that in a managed way that allows us to to take that results in and move it off. Orchestrating we have a number of tasks in our pipeline. How do we orchestrate?
Well, there's different ways to approach it. You can do tasks where pipelines where you simply run a task and wait for her to finish and you run the next task and that's fine that works but it doesn't have a lot of the advantages that you can have with a task or data aware pipeline. This is where we get our metadata. So bye-bye setting up the dependencies between are components and artifacts in Atascadero where pipeline were able to take advantage of a lot of the information over the lifetime of that product or service that that ml deployment that we have in the in
the artifacts that we've produced. Orchestration is duntroon orchestrator. And the question is which orchestrator do you have to use? Well, the answer is you can use whatever you want to use. We have three Orchestra leaders that are supported out-of-the-box Apache airflow kubeflow for a kubernetes container eyes environment and Apache beam. Go-300 are not only selections you can extend that to add your own orchestration. But in the end you're going to end up with essentially the same thing regardless of
which were orchestrated or you're going to use you're going to end up with a directed acyclic graph or dag that expresses a dependencies between your components, which really a result of the artifacts that are produced by your components. So here's three examples it look different. But actually if you look at them, they are the same deck. We get this question a lot. So I want to address this what's this cute little thing and Ennis tfx thing and and what's the difference between the two the answer is is really focused on a kubernetes containerized environment and it's a great
appointment platform for running and you know, what a very scalable manageable way. Kubernetes pipelines uses tense uses tf-x So you you're essentially deployment EFX in a pipeline environment on kubernetes. And that's coo bird coo coo po pipeline but effects can be deployed in other ways as well. So if you don't want to use qvar pipelines if you wanted to play in a different environment may be on Prim and your own data center or what-have-you you can you see effects in other environments as well. What are the things that we do because
we're working with large datasets and it did a lot of processing involves some of these operations that we're going to do require a lot of processing. We need to distribute that processing over a pipeline. So, how do we do that? Well a component that uses a pipeline is going to create a pipeline for the operations that it wants to do is going to have that that pipeline off to a cluster could be a spark cluster could be a Flink cluster could be cloud dataflow. Mapreduce happens on the cluster and it comes back with a result. But we want to support more than just
one or two types of distributive pipeline. So we're working with Apache beam to edit an abstraction from the Native layer of those pipelines so that you can take the same code and run it actually on different pipelines without changing your code. And that's what Apache beam. Is there different Runners for different things like flank and Spark is also a really nice one for development of the direct Runner or local Runner that allow you to run even just on a laptop. But here's some Vision 4.4 for being
there's a whole set of pipelines out. There are they that are available to have strengths and weaknesses and in a lot of cases you will already have one that you stood up and eat your you want to try to leverage that resource. So by supporting all of them you're able to do this installation. You don't have to spin up a completely different cluster to do that. You can leverage the ones we're just expand the ones that you already have. So being allows you to do that and also with different languages now in this case, we're only using python but being as as a
as a vision as a as a Apache project allows you to to work with other languages as well through different sdks. And now I'd like to introduce Charles 10 who's going to give us a demo of tf-x running on actually just a laptop system in the cloud here. So you'll get to see some Live code. Thank you, Robert. So now that we've gone into detail about tf-x and tf-x components concrete with a live demo of a complete TSX pipe time. So this demo uses the new experimental tf-x notebook integration. And the goal of this integration is to make it easy to
interactively build up to FX pipelines in a Jupiter or Google colab notebook environment. So you can try your pipeline out before you export the code and deploy into production. You can follow along and run this yourself in the Google colab notebook at this link here. The for the interactive notebook we introduce one New Concept. This is the interactive context in a production environment. Like like Robert said we would construct a complete map of pipeline components and orchestrated on an engine like air flow or poop flow. By contrast when you're
experimenting in the notebook, you want it interactively execute and see results for individual components. We construct an interactive contacts, which can do two things. The first thing is it can run the component we Define this is context. Run and the second is it can it can show a visualization of the components output. This is context Tasha. Let's get started. Here's the overview we've seen of a canonical tfx pipeline where we go from data ingestion to data validation feature engineering model training and model validation to
deployment go through interviews in The Notebook. Here's the notebook hair. And the first thing we do is a bit of setup. This is the PIP install step and physically we run we run the PIP install to install the python package and Alex dependencies. And if you're following along after doing the installation, you need to restart the run time, so that the notebook picks up some of new some of the new versions of dependencies. Next we do some imports. Set up some pots. Download the data. And
finally, we create the interactive context. once we've done this we get to our first component, which is example Jen which ingested into the pipeline. Again, this is just a couple of lines of code. We run this with contact. Run. After we're done, we're ready to use the data which has been ingested and processed into splits. We take the output of example Jen and use this in our next component statistics Jen which analyzes data and outputs detailed statistics.
This component can also be used Standalone outside of a tf-x pipeline with the tensorflow data validation package. You can see that part input here is the operative example down again, and after we're done we can visualize this with contacts. Show. free split We get a detailed summary statistics and a visualization of our data which we can dig into to ensure data quality even before we train a model. After that, we can use skim a Gen 2 in first suggested schema for your data. We see that visualized here. This includes
the type and domain for each feature in your data set example for categorical features. The domain is inferred to be all the value from what we've seen so far. This is just a starting point and you can edit and create the schema based on your domain knowledge. Once we have this came out we can use the example validator to perform of anomaly detection that is find items in your input data that don't match your expected schema. This is especially useful as your pipeline evolved over time with new data sets coming in.
We visualize this. And any unexpected values are highlighted if you see an Amelie's. You might want to either update your schema or fix your data collection process. After it is validated, we move onto data transformation or feature engineering. This is done in the transformed component. The first thing we do is write a little common code and priest processing function with tensorflow transform. You can look at this in more detail on your own. It defines the Transformations we do. Using tensorflow transform
and this means for each feature of your data. We Define the individual transformations. This comes together in the transform component where future engineering performed and we output the transform graph and the engineered features. This will take a bit of time to run. And after that's done, we get to the heart of the model with the trainer. Carrot we Define a training function that returns a tensorflow estimator. We build the estimator and return this function from the function. And this is just tensorflow. So once we have this, will you run the trainer component?
Which is going to produce a train model for evaluation and serving. You can watch it rain here. It'll give you the loss. evaluate and then produce a safe model. for evaluation and serving After we train the model we have the valuator component. This uses the Standalone tensorflow model analysis library in addition to overall metrics on over the entire day. We can Define more granular feature column slices for evaluation. The value of a component that computes metrics for each day to slice. What you can and visualize with interactive visualization.
What makes tensorflow model analysis really powerful is that in addition to the overall metrics? We see here. We can analyze model performance on granular future slices. granular future complexes that is So here we see the metrics rendered. across one slice of our data and we can do even more granular. things with multiple Columns of our data after we've evaluated our model we come to the model validator. Based on a comparison of the performance of your model compared to an existing Baseline model whether or not the models ready to
push to production. Right now since we don't have any existing models this check will by default return true. You can also customize this check by extending the model validator executor. The operative this check is done used in the next component The Pusher. The butcher pushes your model again to a specific destination for production. This can be tensorflow serving a filesystem destination or call service like Google Cloud AI platform to figure The Pusher to write the model to a file system directory. Once we've done this
architected a complete TSX Pipeline with minimal modifications, you can use your new pipeline in production in something like air flow akufo, and essentially this would be getting rid of your usage at the interactive contacts and creating a new pipeline object to run. For convenience. We've included a pipeline expert feature in the notebook that tries to do this for you. So here we do is first we do some housekeeping we Mount Google Drive. We select a runner type. Let's say around the Run Tucson air flow. We set up some past.
We specify the components of the exported pipeline. We do the pipeline export. and finally We use this s for sale to get a zip Archive of all you need to run the pipeline on an engine like airflow. With that, I'll hand it back to Robert who will talk about how to extend to FX for your own needs. Thanks Charles. Alright, so that was the The Notebook Lovin Advance here. Great custom components. So again, these are the standardized standard components that come out of the box with the tf-x. But you are not limited to those you can write your
own custom component. So let's talk about how to do that. First of all, you can do kind of a semi custom component by taking an existing component and you working with the same inputs the same outputs essentially the same contract but replacing the executor by just overriding the exact existing executor and then executor remembers where the work is done. So changing that executor is going to change that component in a very fundamental way. So if you're going to do that, you're going to extend the base executor and Implement a do function and that's what the code looks like. There's some
custom there's a custom config dictionary that allows you to pass additional things into your work component who is fairly easy and Powerful way to create your own custom component. But you can also if you want and this is how you would fit a custom component into a an existing pipe fence in like any other component. You can also though do a fully cooked cussing component where you have a different different components pack a different contract different inputs different outputs that don't exist in it and an existing components and those are defined in a camp on its back
that give you the parameters and the inputs and The Outpost to your component. And then you are going to need an Executor for that as well. Just like you did before but if it takes you even further So your executor your inputs your outputs. That's that's a fully custom component. All right. Okay. Now I've only got three minutes left. I'm going to go through a quick example of really understand really trying to understand why model understanding and model performance are very
important. First of all, I talk about data lifecycle a couple of times trying to understand how things change over time the ground truth may change your data characteristics the distribution of each of your features May chain competitors expand into different markets different geographies styles may change the world changes over the life of your deployment. That becomes very important. So in this example and this is a hypothetical example, this this company is an online retailer who is selling shoes and they're trying to use click-through rates to decide how much inventory they should order.
And they discover that suddenly they've been going along and now on a pickle slice of their data not their whole data set just a slice of it things have really gone South so they've got a problem. What do they do? Well, first of all, it's important to understand the realities around us miss predictions do not have uniform cost across your your your business or your or whatever service you are providing different parts of that will have different costs the day you have is never the date of that you wish you had and the model as in this case. The model objective is often a proxy
for what you really want to know. So they're trying to use this clicker race is a proxy for ordering inventory. But the last one the real world doesn't stand still is kind of the key here. You need to need to really understand that when you go into a production environment. So what can they do? Well, their problems are not with the data says that they use the trailer model their problems are with the current inference request if they're getting and there's a difference between those two. So how do they deal with that? Well, they're going to need labels assuming they're doing
supervised learning. They're going to need to label those inference request some how how can they do that? If there are in the fortunate position of being able to get direct feedback they can use their existing processes to label that data. So for example, if they're trying to predict the click-through rate and they have click-through-rate, they're collecting they can use that directly. That's great many people are not in that situation. So you see a lot of people a lot of environments where you're trying to use things like 70 supervision and humans to label the date of that
you have for a subset of the day that you have so you can try to understand how things have changed since you trained your model. Weak supervision. Is it a very powerful tool as well? But it's not, you know that easy to use in a lot of cases. You need to try to apply historical data or other type types of modeling caristix. And in many cases, those are giving you a signal a labeling signal that is not 100% accurate, but it gives you some direction and you can work are there modeling techniques to work with that kind of a signal? Has a flow model analysis the
fairness indicators that you may have seen today and and we're out on the show floor with those are great tools to try to understand this and identify the slices and the problems that you have with your data. First things first, you need to check your data. Look for outliers. Check your features face coverage. How well does it does your data cover the feature space that you have and read you the tools that we give you and tensorflow data validation and tensorflow model analysis. We also have the what-if to a very powerful tool for doing exploration of your data and your model. And in the end you
need to quantify the cost because you are never going to get a hundred percent. How much is that extra 5% worth in a business environment? You need to understand that? An app to fax something is again with Google built because we needed it and now we want you to have it too. So it's available as an open-source platform that we can trade you all to build on and since it's open source. We we hope you're going to help us contribute and build the platform to it to make it better in the future as well. So on behalf of myself and and my colleague
Buy this talk
Access to all the recordings of the event
Buy this video
With ConferenceCast.tv, you get access to our library of the world's best conference talks.