
Video

Table of contents

Video
 Description
 Transcript
 Discussion
About the talk
Shapley algorithm is an interpretation algorithm that is wellrecognized by both the industry and academia. However, given its exponential runtime complexity and existing implementations taking a very long time to generate feature contributions for a single instance, it has found limited practical use in the industry. In order to explain model predictions at scale, we implemented the Shapley IME algorithm in Spark. To our knowledge, this is the first spark implementation of the Shapley algorithm that scales to large datasets and can work with most ML model objects.
About speakers
Cristine Marsh is a data scientist on the Data Science fraud team at Affirm. She is currently working on models to prevent fraud. Cristine is passionate about fair and explainable ML and using data science to improve lives.
View the profileIsaac is a software engineer at Affirm where he productionizes ML lending models and supporting infrastructure. He is passionate about using ML to produce tangible, worthwhile outcomes by balancing theory and implementation. He studied computer science & engineering at MIT for his B.S. and computational biology at UC Berkeley for his Ph.D. His dissertation was using probabilistic graphical models as supervised classifiers by incorporating domainspecific data assumptions.
View the profileEveryone on the programming and Christine door marked and their title is applied machine learning. Great. Thanks. So, as just mentioned Isaac, suffering here and machine learning team and here are some ways to get in contact with me. And I am. Christine. I am a applied machine learning scientists on the fraud, team specifically, and I'm going to also introduce a firm has a company. So next slide. So, if you have not heard of a firm, then we offer pointofsale, lone spur can go next Friday. So our
sorry, can we go to the Forum? Also affirm law. First, pointofsale loans for our customers, are applied machine learning team creates models for credit risk, decisioning fraud detection and the personalizing a customer's experience. Our machine learning engineering team, production Rises models, and creates platform for model training. So now let's go and do an overview about what we're going to talk about today. Next slide. So being able to make sure that models are fair and interpretable are is very important for all Machining learning
applications. But it's also particularly important in lending for credit. We need to be able to explain why they were rejected for. Fair lending, reasons for fraud, for internal risk operations. Team to understand why you need to call. Doctors are enabling fraudulent users to slip through our fraud models and we want to understand how each feature is impacting. An individual user. Not just a general features are important and we have millions of Rose bettas in hundreds of features and someone might ass, like, why don't you just use a model like logistic regression?
Well, simply we have found that logistic regression does not perform as well as random Forester XT or other black box models. So we need a solution that allows us to interpret the effect of a feature on individual you First and does so in a timely manner. So to do this, we implemented, the shapley values algorithm and Spark. So next slide. So what we're looking for we want to find a solution where we can allocate Surplus resources generator by the operations of players or in our case features. So we've all heard the
saying the whole is greater than the sum of its parts. How do we divide that Surplus? So the way I like to think about this is like a trivia game. The number of unique questions, a team gets right is higher when played together, then each player played alone and combine their answers. So by working together and collaborating, then players are able to get questions, right? That they would not have otherwise and we want to be able to Tribute. Not only how much did the individual bring to the table if it had worked alone, but how valuable are they, as a collaborator? So, let's go into the
different properties we want for measuring contribution. Okay, so I'm going to go into each of these in more detail. So, let's start with Cemetery, Cemetery, equal contributions being equal pay out. So this is pretty simple. If you're going out to eat with your friends, when two people eat the same thing, when splitting the bill, they should pay the same amount. And the next is dummy. So if you don't contribute anything, there's a value of 0. So if you go out to dinner with your friends and just have water, you won't have to pay anything. Our next one is
a tivity. So for this one is easier to think in machine learning terms. So let's think of a random 4th model with three trees, you evaluate them on marginal contribution of a picture for every tree and you get Point 3.2 and .25. When you averages contributions, you will get the same value as you would have if just evaluated and enforced model as a whole. So, pretty cool. Next one. Is efficiency. So I think this one's pretty neat. So for an ml models, when you start with the average price prediction, when you add or subtract the
Marshall contribution for each feature in the model, then you will get the final prediction for that samples out. So you can use the shop Lee values and the average prediction to get blood samples prediction. Okay, so we have our four properties. So what is a solution that includes these four properties? Chablis value surprised. So sharply value is a solution for Game Theory and it's a way of defining payment proportional to each player's Marshall contribution for all members of the group on it has all the properties. We talked about. It was
introduced in the 50s biologically as a solution to a game theory, problem and has since been used in machine learning for interpretability. So let's jump example. So we can see how the math behind the algorithm works. Okay. We have a for feature model. We have the FICO score, the number of delinquency the loan amount and also have they repaid specifically a farm before. And then, let's go and look at the equation. Next light, please. So, here's the equation to determine the marginal contribution of future. Jay. We are going to walk over it through it
over several slides. And first, I want to talk about what we're selling over. So we are stopping over all possible, permutation orders for Featured day. So why does permutation order matter? Next. We are trying to see, not only how well this feature works. But also how well it works. With in combination with other features. In order to do this. We need to manufacture all possible. Scenarios of how a feature can interact with another feature, or another feature set. So, how can we do this for mutations?
So are for future model, then we're going to use our in this example. Our future of interest is the loan amount. It can only be in for positions, in that future order 1st 2nd 3rd or 4th for each of these. We only care about the permutations that have different features that prior to the loan amount. So everything, before the loan amount in the permutation orders, and let's look at what all possible features that look like. The first obviously, we don't know if it's first, we don't care about anything else if it's second, and then it can only be
combined with three possible features. And then if it's third, then it can only be combined with three possible feature set. And then, if it does last, then it can only be. They all can only be combined at one way. So for each of these permutations, we need to compare how the model performs with. And without our feature Pinterest. Let's talk back to our equation. So, we now know all possible permutation orders for the loan amount. So we know what we're coming over. We know the place in the permutation order and the number of
features, now, we need to be able to figure out the score with argatroban trust and the store prior to adding the feature of Interest. So let's think about how we can do that. Unexplained. So we need to compare the scores with and without our future and trust and see how the results change on and next slide. So we're going to kind of put in the difference between those two scores into our equation. But how do we actually get the scores of the model with some features and not others?
In an efficient manner. We could retrain the model every time but that is very slow and will miss some interaction. So, what can we do instead? I haven't seen a black screen. Okay, I'm good. Now, I'll talk to Mason. Think, I like to think of this, similarly, to how you would create a partial dependents lot where you hold some value, concentrate and others and see how the output changes. So we're going to talk about the Monte Carlo, approximation way of doing this. That is used in ours and other shapleigh packages that are non electric shop.
Monte Carlo approximation, next slide, please. Okay, so we're going to start. So we have our example, customer Taylor who is applying for a $300 loan, at affirm. Their FICO, score is 600, and they have not repaid their lungs with firm before and have one delinquency on the credit report. And our Black Box model is going to taking all these features and output a probably estimate of their repayment. Okay? So what we need to do is get a sample of random permutation order, and then we're also getting a random background user. So we have our random Sam. We have our permutation
order. The next thing we do. Next, please. We will use the permutation order and the random background user to estimate an approximation of the loan shapley values. So we're doing this by creating two, counterfactual instances. The first instance has all the features that are prior and including the loan amount from Taylor and everything else from the background user and the other will have just the things prior to the loan amount. So have been delinquent and what's a FICO score and
from Taylor and then another and then the loan amount and that have, they were paid from the background user. So Then so basically it will be able to calculate how much is that impacting it? Cool, so we're able to do this and then we're able to get the margin contribution, for this permutation order, and for this random background user with for this loan amount. So, come down next line. Okay, so we're able, we were able to create these outputs and basically, by the large of law of large numbers
with every permutation for many background users. We will eventually have a good approximation for the actual shapley value. But as you might imagine, this is still not a quick process. So we decided to speed it up by implementing it in spark. And let me hand it over to Isaac to talk about how we did that. Yes, I'm going to talk about our implementation of this since Park, which allows us to scale it for larger data sets and still have some reasonable throughput and latency. So, basically, we're just taking advantage of the fact that spark will
take a large dataframe, which in this case will be our background data set, which is often the training dataset. Handle the logic of actually just putting that other partitioning that it's a several different instances. And also the same time, spark will take our model code and also put it broadcast it as it's called on to every one of them. And so are logic just takes advantage of that, and Concretely going through an example. Here. We take our agents to investigate. So this is the same tailor that we saw earlier. And so on every one of these instances,
we have one partition of the training, get it set. So it's some stuff set of number grows. And the way that our logicworks is basically will just go through all the roads here. That, that particular partition has. Take the instance investigate, do a bunch of stuff. We permutations like Christine was just staying but basically just a mouse to having these two counterfactual instances that differ by the future value of the incense to investigate the year 300 or not.
And then simply run local to each executor the Black Box model to generate a prediction from each one of these counterfactual Rose giving us one sample of the marginal contribution. And so one of the things that this allows us to do actually is we can move over all the rows and then try all of the different features at once. With this present, absence logic in a kind of randomize way based on the permutation. And will ultimately get out. These really tall fart dataframes, that for each row will have name of the feature and the the
value from the name of the feature in the value. So for example, this could say it like having a little amount of $300 and then Marshall contribution like one sample tomorrow, then she loves do fairly easily is is that we don't see any others that we packages. Implement. Weighted training data sets very easily by taking the weighted mean here. So we take basically take the weighted mean, if you smart or contributions up waiting by Rose from the train dataset that correspond with Tire weights. From the train and get it set
out. At the end of the day. We'll get out some family values for each. One of the future values here. So, the highlights of our implementation is that it was one of the big things that scales with data set automatically, using spark, use data sets there, too. And also, because of the way that it generates these counterfactual that you get a huge batch of them. And so often times, it's faster to do bass prediction is predicting. A lot of rows at once versus kind of like
the way that we get it right over the rose. We can reuse them on The Roaster, multiple different Marshall contribution comparisons. And finally, as I mentioned, we can support training way to firm as a company. Cuz we're often predicting things like delinquency which are very unbalanced and we need to use training ways to Training ways to deal with that unbalanced dataset. So here's some benchmarking that our colleague Sean did. And so basically he fit this toy model here using any special uninterpretable
like using a special model form. That doesn't have an existing. Kind of faster way of generating spark values or sorry about you, and he found that our method was a lot faster. I think this is normal life by CPU core and in terms of ranking with the same, man had relatively small difference in actual Shafi value. Okay, so I want to know go through this quasi demo here to show how it's how you would actually use this until just before this. Damn. I'm going
to be going through this notebook. And the notebook itself is available on are public GitHub, which their links to in various places where you can find. So, yes, I'm starting with this demo. So, first, the first thing you need to do is So, I'm starting for the point of having a model that's already been fit on a training dataset and the train dataset had. 30000 rows in it and four columns corresponding to the demo earlier. And so the next thing you want to
do, if you want to actually explain a prediction from this model, using the sparkly package is we import l046 install. The package done, obviously. And then you have to import these methods. And then you have to basically inform my sister. So what that consists of is Subclassing. This. Berkeley model class and implementing these two methods, predict and get required features predict will taking a feature Matrix which is a 1 / row of that you want to predict and I'll give you back a list of floating Point values.
And also required features which in order as a set of features that the model needs to predict some instantiating an instance of this. Subclass here and feeding it in this support, Vector classifier that I fit on the data. And it's a reason why I mean for better classifier is that it's one of the again. It's one of the models that doesn't have like a faster way of computing Japanese values, like some other models do. And so here. I'm just testing interface and getting right features.
All right. So then I'm going to load my training 82nd to spar, by the way. This is something I did in my old laptop, which has like 8 cores. So that's why this has been automatically partition, this training data into hate. That means someone explain explain the sample road here, which is the tailor row that from, from Christine sydelko earlier and this row is actually giving it 99% likely to be okay. Why is that? So then the next thing we do is feed it through this compute chapli for sample, which is from our package feed, if the Spark training, dataframe, feed it the model
and the crate row. Working. And there's 30,000. Data points here that it's using to explain against. And so while that's running we can look at our Spark. Job, you are and see what actually is going on, if you're bored or antsy or like want to make sure nothing's breaking so you can do that. And you see this eight tasks one for each one of the executor. E, r and z. Yeah, Fairy stuff about what it's doing it again. This is what's on my laptop. So it's not as fast as you on the cluster. Okay,
so you can see that all eight of the task finished show. That whole thing finished and you can see it took about 50 seconds here and we get back from the shapley values and So we can check efficiently property that forced into Skype earlier physically means that the sum of the shapley values plus the mean of the training, dataset should equal the actual prediction. We see that that holds here because his assertion succeeds and then we can actually like visualize it so to visualize that I'm using the excellent Tipper Lundberg
shop package visualizer. Movies like the leading existing python implementation of death and it will help us that the loan amount being $300 actually greatly increase start. Prediction here, from almost nothing to really a hundred percent likelihood. Delinquent. And also the fact that this person had historically one. So I can now go through a little bit. See using weighted. Rainimator set. So it's really simple. Just read it in the name of the spark dataframe. That has the training weights and I won't go through this
again with these two guaranteed for this particular. Okay. So now I'm going to show compared to the shop package, which is Saudi best explain to her that the shop package would have for this. For this model form is the colonel explained her. So this is like kind of like an approximation again of the sort of Brute Force which is going through all the permutations. Does a little bit faster than than the Brute Force, but it still takes a while and you can see it, spits out a warning that it might take a while and
it has some ideas about how to make it faster. If I can see that this eventually will finish and gives us the same results that we got quite a bit less about 14 seconds. So yeah, he is faster. Even on a local machine in this is probably, dispensary's is using all a course in my old laptop rather than this one for That is the demo. So take home from the demo. We can use training ways. We can like automatically scale using spark. We get pretty good accuracy. And we also get some like, niceties from the spark ui2 to see what's going on.
Like realtime love. Okay, so In conclusion. It's way faster than Brute Force, still faster than the colonel explainer and we're very close to the shapley values that you get from the station is available. Also, we also put it into a person. I will be happy to answer any questions. Yeah, I guess supposed to type it into the chat. Questions and chat video. Had a better way than you probably are better ways. Cool, and we also have some breakfast and then we'll link the slides to be sent out with the rest of the presentation, the rest of the conference
info. So, if you want to learn why, there's a lot of good information there. Thank you, Isaac and Christine. I'm going to thank you, so much. It is well, the one Pacific Time, 3:01 eastern time, so we actually have an extended break now for lunch or snack or if you wish to for the next 40 minutes, so we are going to take a break the next 40 minutes and will see you soon.
Buy this talk
Ticket
Interested in topic “Artificial Intelligence and Machine Learning”?
You might be interested in videos from this event
Similar talks
Buy this video
Conference Cast
With ConferenceCast.tv, you get access to our library of the world's best conference talks.