-
Video
-
Table of contents
-
Video


- Description
- Transcript
- Discussion
About the talk
03:35 Intro
08:20 Counterfactual questions
12:52 Estimation of Counterfactuals – challenges
15:51 Correlation and causation
18:55 Estimating substitutes and complements
23:55 Multinomial logit and utility maximization
29:00 Casual impact of price changes
33:58 Nested factorization model
36:04 Computational issues
39:50 Value of information for targeting
44:08 Retention of data and risks to privacy and security
48:20 Conclusions
52:10 Targeting individual level
53:40 Next announcement
About speaker
Welcome everybody. It's a real pleasure to introduce Susan Athey Susan is the economics of Technology professor at Stanford where she also is associate director of The Institute for human centered AI she has a truly spectacular bio that she went one major award every year at so let me not to go over. You can find that in the guidebook app. I just want to share a short anecdotes from 2009 when I was just out of PhD and a friend of mine from graduate school and I were trying to put together a tutorial at a much smaller conference and on a very narrow technical topics,
but it will not be like and so we just we went to the literature and we look they know who is the best person in the world in terms of the results that they have to help us give this tutorial and Susan's name came up. And I'm a boldly. We just sent her an email and would you be interested in giving this tutorial and I think we are actually maybe even a little bit more Starstruck cuz she even back then and I really have no excuse because my own institution Duke University
had just decided to award her an honorary doctorate and so we still just asked her but she was very gracious and very flattered that you asked me yesterday toriel. And so she came and she gave both of us, but I think our heart was in the right place and our intentions were good. We just really looked at the technical results and she was the right person to do it and I think it's like that today as well. So yesterday we had the debate here about industry vs. Academia. And then this morning Don song talking about
Security in machine learning talked about the importance of thinking about the economic value of data when you think about those things and if we're thinking about you who should talk to us about the economic value of data. I cannot think of a better person in the world then Susan Athey, please join me in Thanks so much. I'm I'm really honored to be here and to be able to give this invited lecture. So thanks so much for having me. It is fun to chat with Vince because he sort of has seen me through my
evolution of different research agenda. So I did spend about the first 15 years of my career focusing on designing Market auction base marketplaces. I'm in data driven approaches to that. I've been spent some time as to the condom is Microsoft working on the outside for him for the search engine and that's really where I got exposed to machine learning and AI at scale and should have re-educated myself. And so in some sense, it's an exciting time to be seeing the the evolution of AI and machine learning has gotten more mature at the same time. We had this history of using
data-driven analysis. It was more Guided by models. And so one of the teams in my research now is trying to bring those strands together. So today I'm going to I'm going to talk about a line of work. I've been David Bly is my co-author that you all will be most familiar with Rob Donnelly Ben Ruiz Mitchell vinegar and Tobias Schmidt or other co-authors on these various projects. And so today I'm going to have set of three parts of my talk. I'm first going to talk about counterfactuals motivation for thinking about us meeting counterfactual. Is that
supposed to prediction as well as some of the challenges that are encountered in doing that and some of the specific tactics for I'm getting around that and and here I'll just stay at the beginning. There are several different strands of causal inference on even with in economics. There are multiple strands. So one part of the literature focus is a lot about estimating the effective treatment things. Like what's the effect of raising the minimum wage? What's the effective of giving a training program to a worker and I haven't a line of research on that. You can look on my website to
talk about that. There's a second. Which we call more structural models and that's what I did a lot of work in prior to my machine learning adventures in in that area. We tend to use more assumptions. So for example assume that agents are optimizing and try to estimate the parameters of their preference functions. So they were coming to talk about today fits into that second strand. Although it's the most simple part of that second strand we're going to be thinking about inferring consumer preferences from their choices. I'll just advertise because I'm not going to talk more about it
today, but there's a much richer set of literature along those lines and economically. I did a lot of work. I'm estimating preferences of Bitters and auctions for example, and then thinking about changing Reserve prices are changing auction rules and thinking about how Behavior would change when you change the rules of the game. We also in economics have done a lot of work on Dynamic models. So assuming that agents are solving a dynamic programming problem and then estimating the parameters of there. Cody functions inverse reinforcement learning, we have lots of examples of that
starting in the 80s and economics not to say we did it. Well, we had a lot of fun to reform assumptions and we have very small dataset but this idea of really taking that the agent optimization approach seriously and then matching that up with data is something that we have a long history of working on so, I think it's not surprising that as a dispute the field of machine learning has sort of started to mature. We're seeing people wanting to come back to some of those old ideas because in the end if you want to get Beyond prediction you but you end up back in a small date of world. Do
you have lots of observational data, but when you start thinking about 10 or factual's you often need to get back to models an additional assumptions in order to draw inferences on the second part, I'm going to customize a some of these this work to the problem of counter kind of natural inference for targeted pricing policies and against the very very simple application that it's one that you can really wrap your head around and then The last thing I'll do is is this really a title of my cock it's not going to be equal share of the talk. But once we built up all the pieces all then apply
that to look at the value of different types of data for Target and this is a problem that I've worked on in Industry as well as an antitrust regulation when people talk about you know, what should we break up tech companies or should we prevent mergers or what is it? What are the barriers to entry for different firms? They talk about data as being important barrier, but it's it's very important to distinguish different types of data, even though that doesn't often happen in the popular discussion. So we'll talk about relative value of different types of data for a particular
circumstance. So just start with the motivation for what's the difference between a predictive question and a counterfactual question. I'm showing hear some maps of the Bay Area from the paper that I worked on using mobile location data where we see people's we see where people are during the day and then we see where they go to lunch from their Mobile location data wanted to do a predictive exercise. We might ask, you know, if there is a Chinese if I have a certain restaurant the Chinese restaurant how many people would go to that restaurant now, it's going to turn out that in the
Bay Area the different ethnicities of Asian restaurants are clustered in different parts of the city. And so you might predict if I just told you the restaurant was Chinese you might predict that it would get a lot of customers partly because the Chinese restaurants might tend to be located in busy areas. The counterfactual question would instead be if I take a specific location and I changed the restaurant from one restaurant type to another what would happen to To the visits to that restaurant. And so if I do that counterfactual I might put down a Chinese restaurant in a
place where there's not a lot of other Chinese restaurants. So just a historical data might not tell me exactly what would happen. And so the standard way to answer this kind of question using you can on this modeling if you have access to Micro Data is just to try to estimate the preferences of each consumer. So in this particular case, I I can consumers choices among restaurants and I can I can estimate their preference for travel time as well as their their preferences for different characteristics of the restaurants by looking at the choices. They make relative to their choice that and we
also have restaurants open and closing during this time. Once I've estimated the preferences of every consumer of my dataset if I know their preferences, then I can estimate the counter factual's what happens if I change the restaurant type some particular if I clocked out a different restaurant type. I didn't go back and resolve the optimization problems of all the agents and I can figure out how the red agents will switch among restaurant in again. This is the very simplest type of agent empirical agent-based modeling that we we do when you cannot make sense these types of
models are very very widely used. So what I show here on the picture is it is it is the results of such a counterfactual exercise stuff. You look at the mid priced Asian Cuisine. We we look among these different sets of types of restaurants for each location. We Loop through the different Cuisines and see for each to each restaurant type. How many consumers would it would it get taking into account through the competitors are so if you put say of Chinese restaurant right next to another Chinese restaurant that's different than putting a Chinese. Next to a Mexican restaurant in Stow on to it
takes into account all of the local competitive structure and then we can see throughout the Bay Area seems like preferences differ across the bay area. So in different parts of the Bay Area the also will restaurant ethnicity will be different and we can look at those counterfactual things in the paper. We actually have a bunch of experiments where restaurants open and close and we try to assess how well can I see how well we can predict what actually happens when a restaurant opens or closes in the data. Other types of exercises we can do with this the sort of analysis we can say when a
restaurant opens or closes what share of the consumers are are drawn from restaurants that are close by that restaurant versus far away the course, we don't know exactly what that's going to be because it could be that people are that distance is the most important factor for people or it could be the new I want to choose between different Chinese restaurants and I'm willing to drive and I don't really care how far away they are and so in the data using the data, we estimate that about half of the market share the stolen comes from restaurants within a 2-mile radius. I'm in Montgomery in
the in the paper. We assess how well that matches up with what we see in the data when the restaurants open and close. So this is just a simple example of you know, how you would assume that why it would matter if we're to do predictive model VS counterfactual in general most business decisions are counterfactual question. Although if you're trying to forecast it would be nice to know what's going to happen next week. Actually if you're deciding what to do, if you're interested in the counterfactual what if I put the restaurant here put the restaurant there and and so those are
really though the more commonly encountered questions in the business World, which is I think one reason that as folks get out and apply a lot of these machine learning methods and practice and start interacting with fur and they come back to think about cuz one friends because they realize that those are the questions that they're actually being asked. When we go to do this across many different applications, there's a couple of categories of challenges for estimation and then I'm here and my conversation is that estimation we want to get the numbers right? We're not trying to figure out
whether opening whether or not one opening a restaurant causes a change in your by restaurant. We know that opening 1 restaurant will affect nearby restaurants. We already know the causal structure and what we don't know is what the magnitudes are. We don't know how people will do that until all of the emphasis here is the magnitudes right on the phone. The first problem is is that there's often a lack of availability of experimental and quasi-experimental variation of the data that allows you to separate correlation from cago sex. And so in the
application logos will show how we do that using pricing data, but there's lots of different techniques that are used in the literature review using something that's similar toy call regression discontinuity where you have people very close to a boundary. That are very similar otherwise, but they get a different treatment. And so people use that for like if you have a test score threshold for getting into a school you compare people just above or just below the threshold to figure out what was a causal effect of going to that school and example of regression discontinuity there also
techniques Like Instrumental variables and so on but we're going to show you is very easy to understand but if you cut the cut if you only are able to learn about causal effects from changes that happened in the past, then you even if you have a very large historical data set you may not have much information that's actually valuable for answering your question. So if you're looking at the effect of changing prices, but a firm has never changed prices in the fast, you might have billions of observations of historical data, but you might not know very much about what would happen
if you change the prices to generally that there's a there's a lack of of of the right kind of variation in the data, even when you have lots of data and that's related to the second category of challenges, which is statistical power. Maybe that's just a statistical power that even though you have lots of data. You don't have enough experiments and what really matters is a number of experiments or power around those experiments. The second thing is that the effect sizes are often small if you're if you're working a tech firm, it's often the case, but they've already optimize your the prices
aren't crazy. The design of the website isn't crazy though. It's often the case of the treatment that you that you put into place has only modest effect. And then the third thing is if I'm trying to actually do personalization if I want to get exactly the right treatment for you exactly the right recommendation for you. There's never enough data. If you have the dimensions of characteristics of people, you know, you're you're just never going to have enough data to really accurately get a very personalized recommendation for you. And so the if you're trying to get personalized
treatment affects you then always have problems of statistical power. Let me start with the first example of correlation vs. Causation. I'm going to be looking for most of the rest of talk. I'm going to be looking at data from supermarket and one thing that you see and if there's any other parents of young children out here this will go hit home for you in the data coffee and diapers purchases are positively correlated BK and we can all think about why that is you know, new parents are also very tired and sleep deprived. So they buy a lot of coffee. Okay,
that means also and I wanted to hold out your your diaper purchases and try to predict whether this person on this trip is going to buy diapers. If I tell if I tell you that they bought coffee that's going to predict buying diapers. Hey, that's that's a correlation, but that doesn't necessarily tell me what I should do with the supermarket if I want to think about giving coupons or changing prices for coffee and the causal question is what The purchase of diapers if I change the price of coffee and of course intuitively, you know
that you don't no matter how much coffee you do or don't drink your kids. Poop is going to be about the same guy that got to have those diapers, right? So there's not a structural relationship between the coffee and the diapers. But if we wanted to think about trying to learn the cause of having a fight the change the price of coffee on diapers that can illuminate whether the diapers and the coffee are substitutes or complements if I increase the price of coffee and there's no effect on diaper purchases, I would say these products are independent and I'm going to
make that statement conditional on going to the store. So if I say condition on a person going to the store, if the if the price of coffee doesn't affect your purchase of diapers, I would find that these things are independent on the other hand if I increase the price of of coffee, that means I buy less coffee, but I also buy less diapers. I would say those things are complements. They go together. Where is if you increase the price of coffee and you buy less coffee, you buy more diapers there substitutes when you don't buy the coffee, you buy the type that ipers if you think about them
what you would do with this. Well, if you were thinking about attracting people to the store, then you might want to advertise that you have a great coffee bar that would bring in people to the store who would also buy diapers. So if you think about the margin of getting people to the store, the correlated preference is irrelevant cuz I want to get people to store who will also buy diapers. But if I'm thinking about what's going to happen to the people while they're in the store, then the change in the coffee for conditional people being in the store would not change the the the purchases
of diapers and so for different questions you would you would care about which one of these things is leading to this correlation. So in one of my papers that drink with friend recent David fly, we try to estimate from the data which products are substitutes and complements out of about 6000 products in the supermarket. This is actually a pretty hard problem. I started working on this in the 1990s and I got stuck with looking it up to four product for computational reasons. And so I was basically you think about there's all these different bundles of of to the 16 bundles
of for products. There's different vectors of products for that you would by and tried to untested latent preferences and the problem was just in Computing different regions of unobserved preferences that would correspond to different probabilities of purchasing and the occasional stuff just got very messy and doing all those integrals in the 90s. So it's very happy to come back and now think about bundles with 5,000 products and a few decades later that we were able to compute. It's still hard problem. If you think about all the shopping carts, you can put out with it more than
5,000 products to choose from Disney Gary very very large number of bundles of different combinations of 5,000 products. So we don't actually think that every consumer that the rational economic model would have each consumer walk into the store computer utility from all of those bundles and then have a separate error term for each one and maximize over all of them. But if if there's if there's more combinations and there are atoms in the universe then I can also imagine that the human brain is at is solving that optimization problem. So it makes sense to think of a more bounded Lee
rational consumer and then estimate the perimeters of that bound of the rational consumers objective function. So here we write down a model where consumer goes to the store but they saw the update they do look ahead to the Future. So they put something in their basket then they when they put more things in their basket they consider what's in their baskets, but they also think about other things they might put in the basket. Play Future and it kind of just put some structure on the problem and they go to the store in a particular order here. We have it in terms of item popularity, but
you could also have them go to the store like bile bile or section by section, which was simplify the problem for the consumer have some simulated data that show how the compliments and correlation kind of operate in this model. So in the bottom panel, we have a consumer going to the store and filling up their baskets and so in the first stage diapers are or a popular product for this consumer. So most of the time they put diapers in their basket first in this thing. We have these alero's buy coffee and taco shells which means these on this particular day the coffee price in the taco
shell price is higher than average for the consumer that is that is looking ahead and being rational about their complementarity. They see the taco shell is being expensive and so they don't buy taco shells and they also don't buy taco seasoning because they understand the taco seasoning and taco shells go together. And so there's no reason to buy one. If you're not going to buy the other and if the price of one is high, you don't want to buy the other and contrast when we have a consumer that go shopping but doesn't take into account their complementarities. This is totally shutting down
the channel of the complementarity somewhat what we see is it in in the simulations the taco shell price goes up, but the person buys taco seasoning anyways, and so that's an end and it's a difference between those two things the fact that when the price of one is high that you see people buy less of boats that allows us to infer from observe choices, whether these things are substitutes or complements. Now, we take this to the data, I would say what the big challenge we ever came here with computational and Fran came up with some really brilliant computational techniques that allowed us to
actually estimate this model which basically had found it Economist and marketing people for decades. As I said people after me also tried and got stuck. But the thing I would say is the caveat still is it in order to really know which every product Who are substitutes or complements? You need to have a lot of price variation? And so it's still I would say we haven't succeeded in getting all of the cut the pear is ripe. But we were able to do is to identify some of the most the strongest complementarities that we saw in the supermarket. We estimated this without using any information about
the textual characteristics and names the product or where they are. What section are categories are in. So I told my kids guys, I work for two years and I figured out that you know, hot dogs and hot dog buns go together and that tacos and taco shells go together and you guys should be really proud of me. This is true scientific discoveries. But so indeed we find we find the biggest the things will be fine with the highest Temple and charity scores are things that you think of it as being genuinely complementary. If you're putting together a taco feast you kind of want all these components
now even inspecting this you start to think about things like case for variety and so on. So we also ideally would want it a richer model than what we have. It's just helps us understand how we can do this and so behind the scenes again just to emphasize. We have an agent optimization problem and we are estimating the parameters of the agents preference functions using their observe choices in a multitude of shopping trips where the prices change from trip to trip. Okay, so now let me go on to actually show you a little bit of the algebra this and this isn't very very
simple. But Back in 1970 to 1973 Dan McFadden. How do you say nice results linking the multinomial logit model to a utility maximization model and actually won the Nobel Prize, I guess maybe 10-15 years ago for this work and he made it seems like a fairly straightforward observation, but it's an important one supposed to have a consumer is maximizing utility the consumer user you and for itemized items as a mean utility that and they also have an idiosyncratic shock onto their utility of the Epsilon if a consumer maximizes
utility and that stock is Extreme value or a Gumbel distribution. If you saw consumers making those choices maximizing utility then the probability that the consumer would make a particular choice. Is just the multinomial logit formula and what that means is it if you estimate a multinomial logit model then what's your in the mean to tease you infer can be interpreted as the parameters of the users utility function and so it's just a reinterpretation of the loaded if you have consumer data and you're watching the consumers make choices overtime the
probability that the consumer makes a particular choice has his functional form and from that you can infer what they're mean utility is now this application that Dan McFadden worked on was it extension of the BART and in the East Bay, but damn is at Berkeley and so the choice to these people how many people would substitute away from other transportation choices like cars and buses on to the BART and so his model was used to justify the expansion of the BART. The other very commonplace that this is unit is in pricing. So if
the mean utility is equal to a basic on changing time invariant mean utility - a user's perimeter times the price then if we estimate the parameters of this model, we will estimate the consumers preference for price and that price parameter in turn is useful for all sorts of counter-factual and then I sit in the earlier model I did with the restaurant choices that instead of price we had distance so each consumer Eyes located in different place and they had different distances the restaurant. So there I could look like an estimate consumers willingness to travel to
a restaurant if these prices during overtime in the prices vary in a way that's not related and not correlated with it with the idiots who had a character named and I'll come back to that we can estimate the prices. So this base model is used in antitrust reviews of mergers. So if Staples Office Depot want to merge The Economist will go out and get data and build these models estimate consumers preference preference. For travel and for the different stores and for prices and then you'll estimate what would happen if these firms merge and they shut down some of the stores or they
raised their prices. And so that this is the this is for the standard approach for doing murder review. It's also used for estimating the value of new products and for doing welfare, we think about consumer welfare because once we know if I can see how much worse off people are when they raise the price and some people won't buy it off and other people will just get lower utility when they do by and I can aggravate that all up if I know these things. This is from a different dataset. I have a Time series of prices and
quantities on this is for specialty sugar in India. And so what we see here in the state of is that there's a positive correlation between price and quantity. So if I just naively went out and took the state of that without positive correlation, and I didn't think about causality. I would say what I should do with race my place cuz then I'll sell more sugar. Of course, that's an idiotic conclusion. We know that that's not true. And that's why you have to be very careful with modeling this and I should say that there are many many products and assets that have this feature. In fact, and I
think they like in 1915 which was the first time that Economist wrote a paper where they tied to estimate the slopes of demand curves and found that a lot of them sloped up higher prices higher quantity and basically that's wearing a condom has started thinking about this emitted variables back in the nineteen teens. What do you do about that? Well, and of course that here's another product with are negatively correlated. These are filled cream biscuits. But again, you can see in the state
of their plea surprise Series in 40 series that they're big-time Trends and we probably Don't think that those aggregate time friends are necessarily related to the consumers preferences about price. They could be related holidays, which of the green bars and other things that aren't there going on. So when way there many ways to try to get around with this if you have a great day. It's much harder, but when you have individual level data you actually and you have high frequency price changes. One thing you can do is try to look very close in time to a price change the look just before the
price change and just after the price change and you think that the consumers arrival at the store is as good as random and that very narrow time interval and maybe once you've also controlled for some consumer characteristics and what day of the week it is one of my data sets the supermarket and so we look in the data and we look only at Tuesday and Wednesday data. It turned out Tuesday and Wednesday have very similar shopping baskets overall. They're very similar days of the week, which is probably why they decided to choose Tuesday night. Is there a price change day and so we
control for what's happening at the week level isolate the changes in purchases from Tuesday to Wednesday right around the price changes until we set up the model to force it to learn the price coefficients from those high frequency short interval. I'm price changes in arguments you why that should be the case and I supported that with a bunch of facts, but you might also like to have some sort of statistical test that would help give you more confident that whatever I've done is actually going to give me cause what effects so one tactic that we use an economics is called the placebo test.
So that you Placebo test don't test against every possible thing that could go wrong, but they focus on one class of things that could go wrong. So when did you could be worried about his prices and quantities are generally moving together than the prey series I have might give me a certain price coefficient. Because it's generally time Trends driving everything if I had a slightly different price trend if it was shifted to the left or right, I would still artificially find that price is matter. Even if they didn't they shouldn't because I've got the wrong time Trent to the placebo test is
I'm going to try to estimate the causal effect of a sugar pill and my sugar pill are going to be price changes that didn't actually happen if the way I do that if I plug it into my eyes I plug in a fake price series into my estimation routine and then I see do I falsely discover causal effects of fake price. So that's explosivo tests. And so if you remember your statistics, you know, if you have no affect the distribution of P values that should be uniform and so what we what we are plotted here or the distribution of P values. The red line is from our real price series. We see very low
key values which means you can reject the hypothesis that there's no price effect and we get strong negative price affects, but for the sugar pill price series And we're looking across lots of different categories in the supermarket roughly uniformly distributed and then I didn't I don't have the picture here. But if you look at some things like Stone fruits or candy or cranberries, they will flunk this test and that's because there's they are they are seasonal and generally if you shift the price series little bit it'll look like you're picking up a price effect would really am just picking
up Thanksgiving or Halloween or peach season. So that's one of the simplest approach is but it is effective in this high frequency data. Now I'll show you the model that we're going to take it to Super Market data. So I previously showed you an example where I threw away all of the product hierarchy in the store now and I'm going to do is use the product hierarchy and just look it look it up parallel purchases across a set of categories go to the supermarket and you're going to think about what toilet paper do I buy and should I buy toilet paper at all?
Which apples should I buy and we're will I buy apples at all, but you think about Apple separately from toilet paper and I know I'm assume that they're not substitutes or complements. Okay, so I'm putting aside the The Substitute compliment the taco shell taco seasoning case and now just looking at the choices of the categories in parallel. So this is based on damn McFadden's early work in the 70s something called a nested logit model and he was basically trying to solve a particular problem. So going back a second to the logistic formula when you do counterfactuals with the basic
load it if you take a product It's basically the counterfactual prediction will be if I take a product that way I will redistribute your purchases to the other products in proportion to their market shares. So if you have three products like say a train Red Bus in a blue bus, can you take the Red Bus away you will redistributed equally to the train and the blue bus and this is called the red light blue bus problem because we think the people probably don't care about which kind of bus and it's an unreasonable prediction to say that you're that you're going to redistribute that way
mathematically. The thing that makes a prediction happen is that the errors are uncorrelated a cross product in the case that we can think about the buses that is a subcategory and you're going to more likely substitute among buses different color buses and you are between buses and trains and The Uncommon place. This is going to come up and grocery shopping. Is it sometimes you just want apples and sometimes you don't want Apple and so the different kinds of apples your shocks for the different kinds of apples are actually correlated and if I took a What kind of apple give you much more
likely to buy another type of apple then to substitute for not buying apples at all. So this model allows you to to get it that in a full utility maximization framework. So basically the way the framework works. Is it your utility from not buying an item is some utility Pleasant up salon. That's in the no box. Then when you go to the yes box, there's a there's a baseline utility of buying something in the category as well as an error term that specific to buying something in the category 7. That's high that sang All I want apples today. I want some kind of apples and that's going to make me
more likely to buy all kinds of apples, but then there's this other thing called inclusive value and the inclusive value is basically the log of the sum of each to all the the utilities of the different items. And so what that's going to do is it's going to say one of the items utility goes up then it's going to increase your chance of dying in a category through this functional form and dysfunctional for him. All these functional forms. Just come out of utility maximization and Gumbel distribution and then they tell you exactly how increasing the price of one apple seeds into the
probability of buying apples as a as a as a unit as a as a group. So this model used to estimate shopping in the supermarket we're going to do that and it again in a dataset that has people buying items across lots of different categories and we're going to assume that you were just look at categories were people usually only buy one item in the category. They're going to look at you to stay like bottled water. You would by one unit of the bottled water and wine you would by one unit of wine and we're going to ignore multiple purchases. So then we estimate the
perimeters of this in the thing that we do different than Democrats in it because nobody could do it cuz they couldn't compute it is we estimate late and preferences. So each of these choices has a vector of latent preferences as well as of death for Clayton characteristics for the choice. And so then to compute that we're going to use variational Bayes so the standard and economics in marketing with use Markov chain Monte Carlo, but people usually studied like you by your whole thesis a little late night. My husband has a student who wrote the entire thesis on demand for toilet paper.
There were five brands of toilet paper and you know, what one latent brand to put a paper and that was about what you could do and then you let the model run for a couple of weeks to see if it converge so welcome to the new world where we can do much better things. And so we're going to use this variation and to Casa gradient descent to compute this now, they're actually a bunch of challenges that come up and doing this because of the nonlinearities we have, you know, these these probabilities have latent variables in the denominator and Also, the fact that we have time during
prices makes it much more computational intensive. This is one reason that firms are not doing these models today A lot of times like Amazon or Netflix will just look at purchases for the year, but not the prices at which they were charged unless they're doing them or micro study. It's just computationally very hard to take account of all the characteristics that the time you made your choices, but you get so much smarter on answer to another issue that come up in doing this is how you validate into. So I mentioned that are ultimate goal is to do counterfactuals. And so one of the
things that we do if we explore different tunings does so for example, we try to Tunes only in cases where their price changes that occurred to try to make sure that our model is focusing on the ability to estimate the impact of a price change rather than just to predict probability is in a static environment until we explore different types of tuning options. I'm in the paper and we find that it matters. The next thing that we're going to worry about is how come we actually validate that we've gotten the causal inference. Right? So I showed you before the placebo test and that was sort of
just showing that we we didn't fall asleep find an effective a placebo, but now I want to figure out and do it tonight. How can I prove that I've actually identified the consumers who are the most price sensitive the data said where we have lots of price change we can basically hold out a lot of experiments and supermodel can Will predict the outcome of an experiment and we're going to see how well we do in predicting it. So the simple thing we do here is for every consumer and every item we make we put the consumers into buckets the most price-sensitive the middle price-sensitive and the
lowest price sensitive. The model is going to predict what type of user you what you are for each item. Then in the test data, I'll I'll put you classify the users but I've got trips that I did not put in the training day tax withheld out user. Trips in the test data and then I look in the test data and every time there was a price change from Tuesday to Wednesday. I'm going to have an observation for each type of user. What was the price change? And what was the what was the quantity change for each group and so I can just take simple averages of that in the data because I'm going to
have all these experiments. So I have the before-and-after and then I have three different types of users that are doing their shopping on those days. So that's what I thought as you're just the average has these difference in needs from Tuesday to Wednesday averaged over the different items in the different price change experimental group by the different users and what you see this is in and if your fertility Concourse, do you have these are downward-sloping demand curve and we see that again the ones that we said should be the most price-sensitive have a very steep demand curve and the
ones that we said should should should not be the ones that are not dry sensitive sensitive are flat. And so we basically have a correctly. Identify who is responsive to prices. I'm in the state of So that's basically then how we can assess whether we've done a good job with counterfactual and friends. Now, I'm going to look at the targeting example, and I want to know in principle once I have the model I could do a lot with targeting. I can try to send individual coupons to every individual consumer and I can count see counterfactual e what would happen if I
could work perfectly personalized prices in the supermarket, but for that exercise, I would have a hard time knowing whether I got the right answer or not because I've never seen that kind of policy in my test a duck. So what I'm going to show you here is a more limited type of targeting is less profitable, but also easier to evaluate a nest and see what I've actually done a good job in in using my model to Target consumers. So basically what we do is we look in the data and we find the to breach item the two most common prices in the data and we also see what share of the trips
consumers experience the high price in the low price. So in this picture, we see the high in the low prices were charged. Different time and the consumers were also showing up at different time and I made the consumer. Read if they showed up on a high price day and blue if they should have been a low price day and so good about two-thirds of the consumer saw a high price and 1/3 were low price then when I do without use the model my model estimates for each user how much they care about prices so I can say if both I was going to optimally assign each consumer to either lower
high price. I would do it differently than what was in the data. I would have something to take my price-sensitive consumers and give him the price and the ones that are not been given the high price. So those are the optimal assignments. And so now I've circled these these squares in an orange for their the red ones are the ones that I would like to assign to beehive based on who which consumer it is and the green ones are the ones I would like to sign to be low. So now we can see they're sort of four groups hear the orange red the orange blue the the green blue and the green red and so
we can use those for groups to assess. What was the actual effect in the test data of raising the price for each group. So basically within the high guys I can I have that for the group. I would like to assign to be high. I observed that group with both low and high prices so I can estimate for that group. What was the average effect in the test that of going from a low to high price now, there's some challenges because the days are not necessarily randomly distributed into is a variety of techniques to do to adjust for the fact that they were there on
different days. It turns out in my data those adjustments don't make a very big difference but it's important to look for that. So if you've turned out that the low prices were on holidays that would mess up this analysis what the prices were randomly assigned than just a simple difference in averages the average outcomes in the in the the for the hide group the one I want to sign for hi between the red and blue days would tell me what's What's the benefit of raising the price on that group? And so if I find a bigger benefit to raising the price on the hydrant than the Lowe group? I
say aha. I did a good job. I actually discovered people for whom I would like to Target prices and value basis of the test that so I'm I'm I'm I'm I'm really have a data-driven way to see did I do a good job of setting the group? And so what we find is it we we we do this for prices that are at least 10% apart and we find that if we had targeted pricing our model tells us we would have it on 8% Revenue increase over the prices. It was observed in the data from reallocating the consumers to the prices optimally when we look at the tested and in general you would actually stepped on model is
going to overstate this and the test that is lower. And if I did this a whole bunch of times in a bunch of different ways, I'm sure that's what you find. It. Just so happens in this particular data said in this particular model that we found bigger a fax in the test that then in the in the model, although I wouldn't think that would that would generally be the case. But it but it didn't do this again is a test that verification. All right. So now we have all of the pieces we know how to estimate the value of data from targeting. That's these numbers here. This is the the increase in profit
you can get from using data to Target prices. So this is a particular notion of the value of data. Of course, I had another exercise I would have another profit motive for the value of data now we can think about how they see how that changes in different characteristics. So I'm actually going a little long on time. So I'll just go quickly you guys all read the newspaper, you know that people are worried a lot about data retention. You also know that that that we think that data is is for competitiveness.
The next big data is not a barrier to entry. So this is a highly controversial issue right now. Let me think about the value of data. We think about the number of observations and what the curves look like. Of course, the curse the book of variety of different ways. They could be States. They could be Flats. They could be s shaped and so on one of one of those very prominent antitrust advocate for one of the major Tech firms to spend about 10 years going out and telling everybody how these curves always should Decay with
grow distant because we know that accuracy decays with the word event, but that leaves out the fact that of course you're going to make the model richer as your dataset growth and said that can make the curve be much more than here. So we have a full set of 200 product category six thousand users and six hundred days hundred categories 2350 days based at consumers. I'm trying to Target then increase my training step by adding users adding or adding category and see how will I do on my base stat? And then finally at the end to see how much better do I get by having a
router model forces just having data with the same test that in the same model. So these results are preliminary. This is just one single run if I had more computer as I would have a lot more but you know life is slow when a construit right before these conferences your computer successors clogged up. So I have a smaller number runs in this work is not complete but this is just showing that what the reserve one one run of these results. And so what we see is it on the x-axis is the percentage increase in CO2 is doubling the amount of data and not to mention we see if we increase the number
of days per user the goodness of fit goes way up very steeply. I'm trying to personalize prices if I retain your data for another year. I'm going to do much better than that. Now there's been some medical papers arguing with a kind of the crowd kind of a collaborative filtering that maybe there's no value for your individual data because if I just see lots of people like you I already learned everything there is to know about you. In this particular dataset, we're not seeing a lot of evidence of that adding more users is not actually improving the goodness of fit on the original set of
users very well and adding more categories actually even seems to make things a little bit worse. Now than important thing to remember is that I'm holding the model fixed now here is what I show you when you retune the model and that kind of changes the scale. And so what we see is it returning the model to have richer model does a lot more in terms of predictive quality than just adding more data for an existing model, but the flatness of these curves it turns out we do some examples to show if course if you have a Richard model the value of data can go up faster. And in the final
picture is the value of targeting so did here I just have a few points. I didn't pull out the whole set. But what we see is it again for the if I think about my metric is the value of the increase in profit the percent increase in profit from using data to Target coupons having more days is important and returning the models to customize for the number of days is a bigger deal than adding more at just the part the component adding more data to the full effect is going from 4 to 8% But and most of that is coming from retuning with you more data and that suggested to
protect former. Someone was trying to evaluate data and it just said, oh I'll look locally and see what happens when I change data, they could be really understand estimating the value of data because they haven't taken to count the returning. This was returning your model can be a big engineering effort. And so you might have a hard time really knowing how much of a game that you will get. So just in conclusion, hope what the takeaway Counterfactual imprints differs from prediction. What we tried to do is learn the parameters of utilities who revealed preference and then
predict response is an alternative situation. We went out and sound data sets that were large enough and had enough variation enough price changes appointment enough high-frequency price changes that we could distinguish causation from correlation. And then we also selected initially counterfactual. It could be validated in the test set so that we could really know that we had good cause low estimates. Once you were confident of that you might try fancy or counterfactuals with a little bit more confidence and then you come to the value of data not all data is created equal
information leak consumers seems relatively small in this case rather than retaining your own data longer and the benefits of estimating a rich remodel or large relative to the value of data for a fixed model this idea that data value goes down with square root of these can be wildin the sky. At least in this type of Commerce data, but of course the answers will vary with the application. And so this is just shopping data. In other types of data sets, you know, the Curves in the value of different kinds of data could be very different. So ultimately
it's an empirical question. Okay, so I think I'm a time and I will stop there. Thanks very much. Susan when I shop I like to look at the price history of an item using my cell phone, what would be the impact of consumers have the ability to check the price and they know is it's a high price day low price on some of the results you shared they were really convinced that all the firms were going to go out and try to rip all the consumers off like crazy with personalized price discrimination that
turns out to be a bad idea a lot of times because consumers will find all sorts of ways to manipulate it targeted coupons is one way that does work people. Like if you mail people coupons it's a little bit costly to keep track of them and so on but I would save it like what I recommend to an e-commerce firm that doesn't have a good sense of identity to do personalized pricing. That would probably be a terrible idea and you don't see a lot of firms doing it because consumers would then try to manipulate their identity to get better prices. The way you describe the the
shopping carts in the hollow, you do the analysis you seem to assume some consistency in the data for certain items across the population. For example, if you take the data from Albertsons in Seattle, it'll have a similar Behavior to taking Acme in New York City the how consistence of the data across different populations of the same items. And then I have dated from India where I have lots of different stores. So in the stores in India, I do allow the preferences to differ for the same item across stores for the
model can learn how how different those preferences are and I think it'll across regions of India certainly those they could be wildly different in even the holidays are different. It's a lot harder actually to work with India. I discovered than this to work in the US. So I think it's an interesting empirical question. We do assume I'm stable preferences over time again that can be relaxed but it's but it's the base models are assume that right now. Thanks for a great talk. What do you think of the
hidden confounders in urinalysis mean you have enough data, but there are so many things, you know that like to self one example or seasonality in a know all the things that when you try to Target at the individual level changes are Tuesday night. And so we have we control for demand at the week cross item level. And so all of the estimation is coming from the variation quantity from Tuesday to Wednesday within a week. So anything that shifted the demand for the week is a whole weed control for directly and we can do that because we have lots of
consumers coming to the same store and we have individual level data. So that allows us to do that in the Indians for the price changes are are harder. So I'm not completely convinced that we've gotten all the confounding removed yet in the Indian store on but it's something that you basically have to be quite careful with What Economist do this they often study like one change and they will spend like pages and pages and hours and hours learning every single thing that happened in the institutions it some kind of dangerous when you go out and look at thousands of changes and you
haven't studied every product in every week. So it's important to have empirical strategies as well as validation to ensure that you haven't looked overlooked something when you take this to a larger scale. But generally it's times friends and we will be through about holidays. And we also throw out categories that have a lot of seasonality. So we don't even try to do cranberries. They're just impossible. So next we have the panel on advancing AI by playing games. And so all the panelists set up. I'll get my laptop out of the way.
Animal, I'll make a few announcements. So tonight we have the game tonight the speaking of games what you can find in the guidebook app. Also tomorrow morning at 8:50 a.m. To order Russell will tell us how not to destroy the world. That seems important people wake up for. And if you're waking up early anyway, you can also come at 8:30 highly recommended when we have a special surprise in the in the guidebook outfits listed as a fireside chat. That's not exactly accurate, but I encourage people to show up.
Three of The Five Percenters will show slides during the remarks like in the beginning and then close the laptop and I'm just have discussion. Hi. Winnie the Pooh No, I thought I did. August any order Subway something like that. I just took out all of my bullet points to make it feel like a presentation. I'll try to find it. only beings I was I was typing it wasn't when I looked around and didn't see equipment. I think I will begin everybody ready.
Buy this talk
Ticket
Interested in topic “Artificial Intelligence and Machine Learning”?
You might be interested in videos from this event
Similar talks
Buy this video
Conference Cast
With ConferenceCast.tv, you get access to our library of the world's best conference talks.
