-
Video
-
Table of contents
-
Video


- Description
- Transcript
- Discussion
About the talk
Recent and expected developments in AI and their implications. Some are enormously positive, while others, such as the development of autonomous weapons and the replacement of humans in economic roles, may be negative. Beyond these, one must expect that AI capabilities will eventually exceed those of humans across a range of real-world-decision making scenarios. Should this be a cause for concern, as Elon Musk, Stephen Hawking, and others have suggested? And, if so, what can we do about it? While some in the mainstream AI community dismiss the issue, I will argue that the problem is real and that the technical aspects of it are solvable if we replace current definitions of AI with a version based on provable benefit to humans.
00:00 New tendency
04:20 AI better than humans
07:05 Standard model for AI
12:13 New model – probably beneficial AI
15:28 Image classification
20:10 Basic assistance game
23:50 The off-switch problem
28:40 Ongoing research
34:22 Many humans
39:18 Altruism, indifference, sadism
43:36 Pride, rivalry, envy
47:35 Summary
52:12 Q&A
About speaker
Stuart Russell works on fundamental problems in AI including theoretical foundations of rationality, the interaction of knowledge and learning, the unification of logic and probability, planning under uncertainty, and decision making over long time scales. In collaboration with UCSF, he applies AI to intensive care medicine. With the United Nations, he has developed a new global seismic monitoring system for the Comprehensive Nuclear-Test-Ban Treaty. His current concerns include the threat of autonomous weapons and the long-term future of artificial intelligence and its relation to humanity. He is a fellow of AAAI, ACM, and AAAS; winner of the Computers and Thought Award, Outstanding Educator Awards from both ACM and AAAI, and the World Technology Award. From 2012 to 2014 he held the Chaire Blaise Pascal in Paris. His book "Artificial Intelligence: A Modern Approach" (with Peter Norvig) is the standard text in AI; it has been translated into 14 languages and is used in over 1300 universities in 118 countries.
View the profileThe consternation and a kind of almost like a revolution in the field that people actually thought could I be right or wrong and it occurred to me then that if you are asked and they are such a what if you succeed nobody in the field had really thought very much about that question because we were all just dead set on making all confusions a little bit less stupid than they were and clearly this was a good thing. But now things have changed and we see headlines like this from the bolt Baltimore newspaper.
And the question is why so this is not a new speculation that goes back thousands of years that we would make machines that we wouldn't be able to control. So in case you're worried I didn't mean to worry you yet, right just because that's scary robot pictures and and so on so I want to show you some actual robots or simulator Roblox. This is the state-of-the-art today. This is actually a very impressive demonstration from openai off to simulated robots that look into place at one learns to be the goalkeeper one learns to
school goal. And this was just, you know, an hour or 2 reinforcement learning starting from a point where the robots were less physically capable than a newborn human baby. They had absolutely no motor control capabilities. What's the weather so they had to learn to move to sit up to stand up to walk to to find a bowl to kick the ball and someone really, I mean if a human baby in 2 hours with playing soccer in the garden after they were born you'd be pretty impressed. Right. So this is a very impressive AB initio just a
demonstration of reinforcement learning capabilities. But not all is what you see just like in the magic presentation before in fact something else is going on. It's. Blue guy is trying to score goals in the red guy is trying to stop him. So we kept the blue program exactly the same we did not change it at all and we just changed the red program. Could we make a red program? That would do a better job of preventing the blue guy from scoring. So let me show you what that program looks like it consists of falling over and then
waggling his leg in the air like this and now they're just what happened to the blue guy. Right. He's completely incapable of scoring goal. In fact, is that most of the time he can't even pick the bull. It's alright. It's okay. Forwarding the English person who grew up watching England lose on penalties. Year after year after year. This is really gets get to me at all current machine learning systems are fooling us a lot of the time we think that they are good at playing soccer. We think that they good at recognizing
objects. We think that they good at Michigan translation, but I think they're really fooling us. I think we have a long long way to go and I think actually we need to look very carefully at what systems are actually doing not what they appear to be doing that we would like them to be doing so we're not really looking carefully at what's going on. Fortunately not everything in in AI is just deep learning and I think there is a movement now to bring in some of the older and I would say really essential
ideas. So he is just one thread. This is par blisstick programming systems. And this is the growth in PTL papers over the last decade or so, so you can see that there is stuff happening in a other than deep learning and it's very exciting and it's accelerating very far. So not in the immediate future, but I think inevitably will have a a time when AI systems are better at decision-making in the real world. Not just on the go board on the chessboard of the stalker off game, but in the real world, they be they will be able to make
decisions better than weekend. So the question is what happens? That's what I mean by success and then what happens when we succeed. And I would say we haven't put enough thought into this right and some people feel. Well, you know, it's it's 30 years away with 50 years away. You know, let's not worry too much about it. But suppose we got a message from outer space right suppose that the seti project actually succeeded and they claimed they were able to receive and decode a message from outer space and it looks like this right.
Alien civilization to humanity that you end. Org arrived in 30 to 50 years. You know are we just going to send them an out of office reply? Fang humanity is currently out of the office will respond to your message when we were obviously this is not important. We don't need to worry about it cuz you're 50 years until they actually arrive and so it's really no big deal. Right? Maybe we could even add a smiley face to the end of the message. I don't think that would be the response of the human race. If we got a message from
outer space saying that the superior alien civilization is arriving in 30 to 50 years and I don't think that should be the response of the human race to the probability that not I mean, I'm actually more I think it'll take longer than 30 50 years, but the vast majority of a virus which is believed that super intelligent machines will exist within 50 is time. So this is not a wild speculation. This is not just to Elon Musk and those people who don't know anything about AI even though he does write this is all of you guys. You all believe this right? If you really
believe the superintelligent AI was impossible that would be like, you know cancer is such a thing. I will never kill a cancer, right? We just fucking out federal funds. It's okay, you know, just keep keep giving us money. But whenever going to cure anybody, right? I'm sorry. You don't believe that I don't believe that and yet we not paying much attention to the obvious logical consequences of it. So let me try to figure out what we're going to do about it. Okay, the natural reaction to the prospect that we're actually going to have super
tell when a i s s a a. Well then I guess we lose right to Alan Turing is completely matter of fact about this we could have to expect the machine is to take control. That's what I want to ring said. He provides. Absolutely no remedy. What's the weather? Okay, but I think we can't be as fatalistic is that I think we have to understand how would we lose control? Why and could we do something differently? And I think the reason we would lose control is because we have the wrong model of how to build AI systems old together. So let me remind you what the
standard model for AI is gundam model is human specifies the objective like he is the discounted somewhat reward machine says, okay. I'm on it right now. We know this model doesn't work. because we can specify the objective correctly minimize risk in operations research you maximize the discount to some rewards and you can all makes you maximise our welfare function or whatever it is. All of these are the same basic standard model and it's a bad model and King Midas could
tell you that because he said okay, here's my objective. I want everything I touch turns to gold. The Machine Shed right? Hijo. In fact, it was the Gods in his case. They said righty ho that you go and then his food and his drink and his family will turn to Gold when he dies in misery and starvation. So it's a bad engineering model because the smarter the AI system the worse the outcome. Right we protected up to now let me know until recently we were we were protected from the Badness of its engineering
school in the lab where they couldn't do any harm anyway, right, but now they're out in the real world. For example, we have Little Learning algorithms and social media and there was set up to optimize something like six through engagement, whatever you want to call it. And I think the original that's great. You know what people want? But actually they've learned to modify people to make people more predictable in that form or monetizing. Not I think differently but this is just the consequence of putting systems out there
with the wrong objective. All right, they're out of the lab. They are still stupid, but they're out of the lab and they're affecting billions of people on a daily basis. So and this is just he's really silly algorithms. Right? I don't even know that human beings exist or that we have mines or brains or opinions or any of that stuff and yet still they having a serious negative effect. So I think the way we got into this mess and I have to confess I'm partially responsible for so Dakota flying this way of thinking is that back in the
forties and fifties when we were formulating what day I was supposed to be right beside? Okay. We want machines to be intelligent. What does intelligent mean what means whatever it means for human and humans are intelligent to our actions can be expected to achieve our objectives write. This is rational Behavior. This catchy came from Philosophy from economically and we basically shifted straight into a all right, when we started building goal-based planning systems early papers by New Orleans by maccoffee. They all assumed that we're going to plug in a goal and the
system is going to find ways to achieve it. Sewing machines are intelligent to the extent that their actions going to be expected to achieve their objectives. Right makes perfect sense. Unfortunately, that's the wrong mother as I've already explained. What we want is machines that are beneficial to us that their actions achieve. Our objectives not objectives that are plugged into them and operate as if those machines always had those objectives, but actually the objectives that we really have right only in that way will the machines
actually be beneficial to us. So this is a more complicated problem statement because now they're not in the machines correctly operationalized. There's a gap right the machine to go to decide how to behave when the objectives are in the humans. That makes it more complicated. So NSU model rights are the robots of satisfying human preferences robot does not know what those preferences are. Write the method of finding out the meth, which is necessary. If the machine is going to become useful to us is a potentially human behavior behavior is the
principal source of evidence that machines might have about a preferences. They may be other ways of getting it like it's a machine could do fmri, or it's in your brain to go to measure your preference is directly somehow that will be a difference between but by and large, it's our behavior that includes everything. We have a written down all communicative behaviour is evidence of our preferences everything that you are doing right now. It is evidence of your preferences, right? You are all sitting there stunned by the magic trick and you haven't quite got up
the energy to go and get coffee. So this gives you when you formulate this mathematically we call it and assistants game, but it gives you a mathematical problem whose solution. He's going to be beneficial. The hummus and so the key point is the smallest of the aoi with this problem formulation the best of the outcome rather than the worse the outcome. And I think that's probably a good sign that this is a way of thinking about that guy from that's in the long run. The one way of thinking about where things went wrong. It's just to look at the field that
is kind of like a graphical model. Right? So there are humans. We have objective and we have our Behavior. in the old model as described in the first three editions of a textbook the human of ejected is assumed to be observed. So in the graphical model when the human have ejected variable is observed that human behavior variable becomes irrelevant to the machine Behavior just straightforward separation sufficient statistic for how the machine if you could be jumping up and
down saying stuff you could destroy the world and the machine says so what I put you to so whatever I'm doing is right by definition. Where answer is not observed and the separation doesn't hold in fact the human behavior and the Machine Behavior a couple together. Write and in fact, we really do need to think about this in games hieratic is going to remain in force and it's going to be no. The machine is solving it's half of the game and hopefully we're solving all ha for the game and those two solutions together produce good results for us.
Let me just run through a couple of examples of how this changes the way of thinking about some very classical problem. So what could be more straightforward than image classification? How many people have ever in the room of Abaddon image classification just put your hand up if you've ever done in machine Learning System. And so typically what you do is you you minimize the network to or whatever it is to fit the data as well as possible reduce the number of eras on the training set and usually what we do and what we certainly do and all of the machine learning
competitions. Is we assume a uniform Las Matrix which says that classifying any object as any other kind of object is equally bad. Right. So any a classified as a bee is equally bad regardless of what A&D actually are so then you accidentally classify human of the gorilla with your Google system and then you spend millions of dollars fixing your massive Global public relations disaster because in fact classifying a human is a gorilla is much more expensive than classifying a Norwich terrier
as a Norfolk Terrier right far as I know that area is doing mine that much about that particular Miss classification. Okay, so you said you said okay fine fine. I made a mistake. Let me write down the Lost Matrix and then you realize well the Lost Matrix has 400 million entries and no one has ever written that down and no one knows what was in there. So you'd better have machine learning algorithms that know that they don't know what the lot function is. And behave accordingly so that means that they forgot Apple refused to classify some
examples cuz there's too much uncertainty about the loss that they have to go back to the human expert to get more information Institute to fill out certain pots The Matrix before they can proceed. We don't even know yet how to write down a praia para Bill. He distribution over that 400 million and 3 Matrix. So there's a lot of research to do to make this kind of system with the way that it should. Is another example right? Classic you buy a mobile robot for your lab. The first thing you do is okay. We're going to make it touch the coffee. Okay, Robux Tessa
coffee. All right now is take this seriously that in fact there could be lots of other things that you care about in the world. And if I just told the robot fetch the coffee then it's going to be paralyzed right that there are so many other things. I might care about that. It might interfere with that. There's nothing it can do. So does this approach involving uncertainty over objectives lead to machines that are simply incapable of acting a tool? Not quite because for a while actually going back to some papers
by John Doyle and Mike Wellman, this is idea of Caterers paribus preferences. So what fetching the coffee means is that I prefer to have coffee or other things being equal. So if you leave everything the same but you just get me coffee. I'm happy. And so that's what fetch the coffee means it's a case with power but preference and the Machine can satisfy that by just fetching the coffee and not turning the oceans into sulfuric acid or boiling them or or wiping out the human race will putting it through all of heroin trips or any of those weird things. That scifiwright is
like talk about right. So you get as a natural solution to this problem, you get minimally-invasive behaviorist. There's another important point to add on to this which I didn't put on the side, which is that In fact Biolage, you should think of the world as being the way we like it by which is why I leaving it unchanged is a good idea. The world is if you like it stomple from the State's tree distribution that results from the actions of billions of rational agents operating on it. And so it will always tends to be in a situation A
configuration that we prefer roughly speaking. Okay. So now I'm going to illustrate that is that idea of the assistants game Sona systems game? You've got the human who are some preferences teacher machine who doesn't know what teacher has a prior probability PS Vita? And then you solve the game you get the equilibrium and when you look at the equilibrium look at the solutions to the game. You naturally get teaching Behavior. So the human has an incentive to teach preferences to the robot Sophia the robot can be more useful the robot
understands the human teaching Behavior because it's part of the equilibrium the robot will ask questions last mission will defer to the human and will allow the human to switch off the robot, which is a good thing. Stuff, you know about inverse reinforcement learning. It's similar to that but it's really taking that up to a game to write a formulation rather than the kind of a one-way learning process. So he is a very simple assistance game. About as simple as we could devise so the world is really one-dimensional, right? This is really just paper clips and Staples and
it's it's it's how many paper clips go to Staples that you have? So the human reward is Tisha times a number of paper clips and 1 - 32 times a number of Staples. So theaters DEC exchange rate between paper clips and Staples. So here we have pizza for 9. So a paperclip is worth $0.49 and a staple is worth $0.51 if you want to think of it that way because the robot doesn't know anything about the Hue makes change rate between paper clips and Staples and so Also, it's going to learn something from what the human does that was set up the game with just the human gets to make
one of three choices, right so he can make two paper clips. I know Staples you can make one of each or it can make no paper clips into Staples, right? So those are the choices that human and then the robot. Get to get to make some paper clips of Staples after that now if it's a human would just by themselves, right just making the paper clips and Staples and the human has a $0.49 valuation for paper clips than making two paper clips will be worth 98 cents making one of each will be with a dollar making to Staples would be worth a dollar to sew the human would make
the to Staples. Okay, but now the robot gets to go and the robot can choose to make 90 paper clips 50 of each or 90 Staples. Okay. Now if that's the robot that you have to work with what should the human do? Okay. Well you have to solve the game and this is actually goes in a 2 rounds of its rated best response solve the game. Very straightforward Lee and the solution is that if you'll see the value is between .44 16.554 then its optimal for the human to make one of each and then the
robot will make 50 of each and that gives you the best possible solution. Okay. So the human actually is doing a teaching Behavior here. The one one is actually coding information about Theatre in order to communicate it to the robot so that the robot knows what it was what it is that the human wants. Okay, who's the very simple example of how teaching and decided roof coating of behaviors? And then if you make the games more complicated the same basic principles apply? Let me illustrate this point about allowing the the machine allowing you to
switch it off. So in the in the old way right through the standard robot, that is a gold base robot where the goal is fix the coffee. Is that robot is is intelligent enough to realize that it has an off switch and that if the off switch is pressed then it won't be able to fetch the coffee. Right, then the rational strategy for that robot is to disable the off twitch so that it eliminates one failure mode in the in the plan for fixing the coffee. And this is just a logical consequence of having a goal to fetch the
coffee. It doesn't require any innate self-preservation. It doesn't require any emotions or any of those things is simply a logical consequence of having a goal. so we want to avoid machines that will immediately disable their off switch because I thought that's usually how people think we're going to solve it for me so you can just switch it off because you can't just switch it off anymore and you can just beat the blue while you can just play the Right Moves and you'll beat Alpha go well.
So if we formulate this with the with the view that the machine is uncertain about the goal, then in fact that machine will allow us to switch it off. So he is He's a minimal model of that of that idea. So basically we're going to have a machine that can either act or allow us to switch it off and if we don't switch it off, then it's going to act right so I can choose to just act or I can choose to let us switch it off. So why would it choose the letter switch off? Well, let's have a look. So here is the initial choice, right so you can put yourself off. So that's the
that's the branch to the right so he can commit suicide and will say that has it will fix the value of that to be zero. And then it could I I press the big red button, right which maybe starts a nuclear war or maybe it just makes coffee. Right? So there's uncertainty about the value of the big red button and I've Illustrated that by this little Galaxy and distribution and some part of that gas efficient is negative meaning it's even worse than committing suicide, but the majority of it is, so if it was the robot's only choice it would just go ahead and press the big red button and start
a nuclear war. Okay, so we're going to give it another Choice which is to wait and let the human. Switch the machine off right? Another human can stay right switch off machine utility value zero or go ahead. Why would the machine do this? Right? I mean the machine could switch itself off. What do they gain by having a human to it? If it's a good idea to do it should just do it, but that's an incorrect way of thinking because the humans pushing the Redwood off and the robot switch in the Redwood off a different. Because when the robots
doing it, it's because it doesn't have anything better to do in life when the humans doing it. It's because it's human doesn't like what the robot was going to do. And so the human doing it carries information from the point of view of the robot about the human preference function. And so if the human says go ahead that means that the negative part of that utility distribution is gone because whatever it is the robot was going to do the human is it is not unhappy about it. And so now the robots got a choice between committing suicide
or allowing yourself to be switched off and if it's not switched off then doing the action and that branch is positive or zero doesn't have any negative branches and it's a straightforward zero Factor theorem is just the same as non- expected value of information that the robot will have a positive incentive to allow itself to be switched off. Okay, as long as there is uncertainty about human preferences as that uncertainty disappears the robots incentive to allow itself to be switched off also disappears. Okay, and you can show
that understood assumptions this robot is going to be probably beneficial if this robot is going to do what the human want whatever happens and that's what we like. Okay, so we are working on some of the research problems that are The following be asleep from this way of thinking about Poway I should be to find so we think about how do we solve these assistants games? And that's actually quite a rich area of algorithmic research there. We also have to redo all the chapters of the book. So and we're on the unfortunate situation. We we had to publish another Edition
because the old one was 2010 and a few things that happened in a license 2010. Do we have to do a new addition to the new edition says okay everything we told you about a I lost time was wrong. This is the way we should do a I now unfortunately, we don't have anything new to teach you yet. So we're going to teach you all the old stuff but just remember that it's all wrong. Okay, so difficult time to be doing the book so we have to redo all of all of different areas if I assume fixed objective. Okay, unfortunately that pretty much all the areas of
a I have to redo everything fixed objective and need to be generalized in this way. The one possible exception is is deception. So, I'm not sure that we have to redo perception because as long as perception is not allowed to make decisions. So the problem with the Google photo case with not that the algorithm thought that this might be a person rather than the problem was that the algorithm itself with allowed to then make a decision to post that classification on a public web page. And that was the problem, right? It's a machine has simply supplied probabilities
Grill 584 to write those possibilities then sent on to a decision making system that understood the risks could say I was such a significant chance that we're misclassifying human of the gorilla, right? And so the section that simply provides our abilities to a decision making system I think is actually harmless and the better you make that perception systems. The better of the outcome is going to be So there's a lot of research that has to be done. On two things the first issue is if you're going
to try to understand human preferences from human behavior, you have to take into account the fact that people are not perfectly rational. So this is not the standard inverse reinforcement learning problem where you assume that the agent the policy is the optimal policy for the MVP cuz it's not going to be off table. Right? It's not going to be boltzmann optimal or any of the other mathematically convenient approximation. Is it going to be the way humans actually make decisions. We are complication limited. So when Lisa Dahl played against alphago,
he made some moves that will guarantee to lose. Now if you assumed he was rational and You observe those moves you'd have to assume he was trying to lose the match, right? That's not correct. As far as I know. I think he was trying to win. So in order to understand his preferences, you have to understand his cognitive limitations right in some situations when when is possible. He will see it and he will make those moves in others. He may not. If you think about our Behavior are normal everyday existence. The actions we choose right? We have
about 20 trillion choices to make in our lives. Right, we can choose actuations of all eye muscles about 10 times a second. So it adds up to about twenty trillion choices that we make but that's not how we think about it, right? I don't think okay. What motor control come on should I send you all my muscles to maximize my expected utility for the rest of my life? That's not how it works. What is listening to a talk with me and sitting down and check him? Right? So that's so so that's so that's what I'm doing. Right? We are always embedded in a high Rock of
commitments of activities that we are engaged in right and thinking of us as service in a flat Altima policy for the rest of our lives completely wrong, right? So if you're going to understand humans at all, you have to understand what are the activities how are they organized for that particular person? What are they doing right now? Why are they doing it without that you're not even starting to understand human behavior about imperfect humans plasticity of preferences is a really important one because we don't want machines that satisfy human preferences by modifying those
preference is to be easy to satisfy. politicians to do that so another set of questions that we have to answer as a iris images of how do we satisfy the preferences of many humans if there was just one human then it would be but you know, that was just one human we wouldn't need moral philosophy. Right moral philosophy exists because there's more than one human and a eyes going to have to deal with this fact. So how do you know how do you trade off or aggravate? The preferences of
many people? This is something that has been studied for at least 2,500 years. And in the context of AI it takes on a certain urgency because we are already building. I systems that are affecting people on meth and we want to understand how to get that right and their lusts of classical philosophical problems with example into personal comparison of preferences is a really big one. Right? Should we assume that everyone's preference is a just measured on the same 01 scale, right? So so we all could have had the same preference currency and we will
have if you like the same amount of preference Capital operating on the same scale empirically, I know from the case of my children. This is not truth right. I would say that human preference scaled very by at least a factor of 10. And so maybe things have to be thought through much more carefully. So let me let me talk about a few of these these many human questions. So how should a robot aggregate human preferences? What is a classical on Sir which is updated form of
utilitarianism Tanya Kruse. It was a economist who won the Nobel Prize. I told it Berkeley. So he proved the following quite reassuring fact. that if you are optimizing on behalf of an individual's and those individuals have a common prayer belief over the future and that turns out to be really important qualities optimize a linear combination of the preferences of individuals. And that's a fixed linear combination. Who voices Hassan is aggregation? Theorem? Theorem is a very important result and
it turns out actually that it doesn't generalized in the way you might expect when you have people with different prior beliefs, which is actually much more realistic situation. the Pareto optimal policies have Dynamic weights and the weight is proportional to how well that person's prior predicted the observations that subsequently occur. Let that sink in for a minute, right? So if you are really good at predicting the future then. I pray to walk, policies
will weigh your preferences much more highly than those of other people. We sounds like oh my God, this is disastrous right from a point of view of social equity and and all that stuff. It sounded awful. And in fact that you know that the weights could be exponentially bigger and smaller than each other as time goes by but this is a serum. There's nothing we can do about that. But understand what the term is it why is it that I'm true that funeral was true because think about it. Everybody thinks they're right.
Nobody thinks okay. Well that guy believes this and I believe that and he's right and I'm wrong right? Of course, if you believe this you believe they're so you believe you'll believe so the right beliefs, right? And so you're always going to accept a bit that says, you know, what's to it if you're right. I'm going to give you all the money in the universe, right? You don't we prefer that that to the one that says, I'm just going to split the money equally among everybody. And so everyone agrees to this kind of policy because they all think that right. They prefer this policy
to the one that says we're going to split the money evenly. Right. So that's the situation right? We have a theorem. It seems to lead to extraordinary Leon unequal consequences and we got to figure out how to deal with it. I think one of your solution is resolved that seems like an obvious consequence, but there has to be more to it a little bit about altruism and indifference and sadism what are the common formulations in In economics is to think about you or your
utility function your preferences as having these two components ride the self-regarding component and the other regarding component. And just to be simple we can add them together. Although that's not logically necessary, but could be other phones a combination. Okay. So here we go. But you people is Alice and Bob right? So Alice is utility is Alice's well-being with toilets owa and then cavu so that's how much hours cares about Bob c a b x Bob's well-being WB
own well-being and then how much Bobcat about Alice X Alice's well-being? Now you can study systems of this kind and you can imagine the sort of fix Point their rooms and in various other kinds of stuff. But let's just look at the spines of these caring factors. Obviously, you would hope that the caring taxes would be positive right? But it's a caring Factory zero then you know, Alice will if an act gives any increase in well-being Dallas then Alice will carry out that act no matter how much damage it does to bulb. Right because she literally doesn't care. It's not that she's a sadist
but she literally doesn't care at all. Right so you could think of Barbie in this case from Allison point of view. Bob is a rock or a potato. And Alice were going to eat potato if Alice is hungry and she doesn't care as a potato stuff is so that's what happened to 0 right. So if you have now you're now the robot is operating on behalf of Alex and Bob and Alice is this in different colors individual, but Bob really cares about Alice. I want once I was to be happy Bob the nice guy.
Well what happens while you optimize you a fuss you be right? You're going to end up shoving a lot of well being on to Alice in order to optimize the song. Okay, and Bob is going to have less well-being as a result. But for might end up being happy when I left because Bob the right happiness from Alice's intrinsic well-being is a very normal situation. Here is a newborn baby. And Bob is Alice's dad, right? This is completely normal. Alice couldn't care less about Bob and
this is how things you end up when you're a parent, right? We all know this so, okay, but If you have negative, right if it's the current actor is negative, then you got what you called. Sadism Alice Ashley derives well-being What happiness? I should say from negatively affecting bulbs well-being. Even if it does nothing for Alex is intrinsic well-being, right? So this is just inflicting misery on bulb is something that makes Alice happy and he says in this case we should actually
not include that term in the overall aggregation maximization. No amount of Goodwill to individual X can impose the moral obligation on me to help him is hurting a third-person individual why So this is one place where maybe they are so just will have to put off Farm on the scales of morality so to speak so as if we discover that someone's behavior is deliberately inflicting pain. So I think if you look at online chat rooms and and trolling and all that kind of stuff there is a lot of deliberate infliction of misery
on other people but no intrinsic gain to the person who's inflicting it then we should zero out those kinds of preferences. I'm not not Implement them. Okay, so that sounds good. I think most people would agree with that and Hassan you certainly agrees with that but now there's a complication so when you look at what happens with Typical human preferences. It's been known so veblen wrote about this at the end of the 19th century and then hush in the sixties. I think this idea of positional Goods that you know, a
house is a house. You have a nice house. You have a nice house, but it derives a lot of value for you if it's nicer than other people's houses. right why why do people drive around in in shiny Porsches? Because it's a nice car. It's a nice car lot because it's nicer than other people, right? Why do you want a Nobel Prize? Because you get a million euros in cash. No. Because other people don't have one, right? So, in fact this notion of relative preferences relative well-being or what Economist called positional Goods. He's really pretty Central to a lot of what
drives human beings and when you do the math right here why written down Alice's utility? So Alex is well being An out caring about Bob, but now does n v e a v is the EnV coefficient? Right and that operates on wv- wa so envious of Bob's well-being, right and then does pride so PAB is the pride to efficient. So Alice is proud that she has more well-being and Bob. and now when you just rearrange that equation You see that Alice is attitude bulbs well-being. Pass these negative times. So it is a negative term on Bob's well-being and p a b
is a negative to a ride operator mathematically exactly like sadism. Right. Now we all just agree that because sadism is a negative coefficient bout to a big we should zero it out right now. I'll be going to zero out the well be the Envy in the pride. Right. We started zeroes out maybe two-thirds of everything that people care about. Right. That's a that's a pretty serious. That's why I should tell you the story if you just so I was giving a stool and Jeffrey Sachs was in the audience. So Jeffrey Sachs is a very same as development, I miss him. So he advised
advise many countries on how to pull themselves out of poverty and so on and I think he said he was in Bangladesh and they've been a huge flood. Lots of people that died and he was talking to this phone. It turns out that the farmer has lost one of his kids and a lot of these animals and I feel really terrible and so I'm really happy. Jeffrey was a bit taken aback, but it just goes to show that positional goods are a really significant pot of intrinsic happiness.
So we have to be careful before we zero those out and think about that hot. Okay. So if you're interested in learning more about this, I wrote a book in October called human compatible and then some of the stuff is going to be appearing so I thought you might like to know the 4th edition is on its way. I just finished it yesterday. So Soon as we get commission copyright permission for all the pictures on the cover which takes about five and a half years. Apparently the book. The
book will be out. Okay to summarize. I think probably beneficial AI is possible and I think it's possible under the new under a new model. We should get rid of the old standard model for AI. We should stop thinking about this as an ethical issue or an AI safety issue. Right. It's just a i this is how to do a I will as opposed to how to do a i badly, right? You know, if your if your a nuclear reactor designer, you don't go to an ethics Workshop to talk about how to make sure
you knew clear a cat doesn't explode right? I find it weird. I get invited to at least you know what to talk about me since like this is just common sense. Right? We don't want to explode. Okay, so let's make them that way to lose lots and lots of research to do. So be ready to go work on the Wii model preferences Halloween food preferences of the algorithms that solve these games and real practical stuff, you know, some people already thinking about how do we make self-driving cars that
operate in this way and they're already and not one nice example is the self-driving car that gets to a stop sign and then he wants to make it clear to the other cars that they should go ahead. So it actually backs off little bit right just like that to show the other cars that it's not intending to go ahead and that that's who falls out as a solution of the game that we programmed in or invented. It's just a solution of the game that you formulate. So I think was an enormous range of interesting behaviors that fall
out as Solutions of these kinds of games which simply don't exist as Solutions of the standard Model A eye problems. And so we are entering a much richer space and if you think about it, right the standard model is the measures 0 special case of the new model where you happen to have a deal to function posterior Distribution on the human preference function. Okay, and that's an extremely rare and unusual situation and it's not surprising that when you were in the more General situation of the range of babies open to you with much larger is a couple of problems
that are Difficult to solve that I haven't really figured out how to how to even approach one of them is is deliberate miss you. So if we make all these safe AI systems is like we can make safe nuclear reactor is the weather people still at uranium and makeup on anyway, right? So same thing here, how do we overcome the misuse of AI we can call this the doctor evil problem if you like and then how do we overcome the overuse of AI? So when AI systems become capable of doing everything that we currently think all of his work
and making all eyes easy. How do we stop ourselves from becoming the humans in Wall-E where they have lost the incentive and the Habit in the culture of learning of achievement of knowledge of capability because they no longer need any of those things. That's a question that we will have to face and if you're interested there's a nice story called the machine stops written by em Forster in 1909, and it's actually free on the web and he anticipated all of this and if that was an early
warning to us that we mostly ignored. Thank you very much. Great talk thank you one could argue that some of the AI systems are deployed today actually do satisfy human objectives. Like you talked about CTR and we all know that a lot of that part of the web business is based on optimizing that for-profit. So I'm just wondering if maybe talk about this in your book is part of what these systems can do in the future is point out to the negative set of more macroscale to other
optimizing CTR. What else are they doing that that's hidden? Yeah, I so I didn't hear all of that question, but I think I think I agree. Yes, I think. Machines can help us be better people. But in fact, there's still an unanswered question right is what do we want our future to be? and you know for the last hundred thousand years of human history. We haven't really had a choice about our future right? We just going to like, can we manage not to die today? And if we do have a choice about a future. That maybe the machines will be able to help us figure out what
it should be cuz we're selling all doing a very good job right now. I was just wondering maybe citizens getting a bad name here. But you illustrate some of these things with parenting and you know, if you want to beat your child at a game to teach them a lesson of being a good loser that might appear to be sadism. And so I wonder is there a is there an argument for making an explicitly paternalistic part of the bottle or you're saying that somehow somebody's opinion is in the short-term don't matter. So
yes, I mean that they can often be a conflict between short-term and long-term preferences in and part of what the machine has to understand is how we making that trade-off. the the issue of paternalism is an interesting one is that there isn't really a good model. So just let me send yeah, you you could say well of course parents eventually tell a child know I'm not going to tie your shoelaces this morning. You have got to do your shoelaces yourself, right? And so the AI system should tell the
human. No, I'm not going to solve all your problems today. You have to learn how to do it yourself is good for you to do that. Right but saying that the AI systems have been in the position of being a parent that's not the situation we want to be and we don't want to be the child to the AI system being a parent. We are supposed to be the ones in control and they're just a no-good models anywhere in the universe of a less capable. Entity being in control forever of a more capable entity, but that's the model that we look for. That's a tricky one.
Thanks for a really long. Can you hear me Stewart? Thanks to a really interesting talk and new model before I get to my question. I just want to know if people haven't read Marge piercy's he she and it science fiction novel. It really smells with what you're talkin about Stuart. So I I recommend it hits from the early 90s or did your book so I didn't quite fit that optimizing model and I noticed your long list didn't include of the old model in your textbook didn't include that. I bring it up cuz I'm a little worried that you're still
focusing on utilities and maximization. There are other ethical theories and moral theories. And I'm also a little worried but I'll put this is the question have you thought about society's government social organizations so much as collaboration isn't the sum of the individual plans many humans aren't just the sum of a bunch of individuals and I'm wondering if you're thinking about that as you think about these new models. There is some some in the book about the different moral theories.
roughly says the utilitarians is a deontologist and there's The Virtue people and I guess I tried very hard. 2 to see ways in which the deontological approach isn't Simply your implementation of the utilitarian approach but it just seems to me that it is an End Mill goes on at Great length. He says, of course, you don't just maximize utility. Of course, you follow follow model rules for the reason you do that is no different from the reason that a sailor uses an almanac
when he goes out on the sea so that he doesn't have to redo all the astronomical calculations and then the virtue ethics is is a lot to do with the fact that you want the individual making the decisions to be doing. So in a virtuous way to have to have a virtuous character and TV make to to be thinking along the beach with lions and doing things with which was reasons. None of which makes sense for machines. We don't care whether the machine is virtuous as it's making a decision that has the effect. I think from the point of view of how
we design machines. We need to be consequentialist. In a sense that it doesn't make sense for us to design machines that have consequences that we don't want. We should talk and you might look at the embedded ethics webpage of Harvard.
Buy this talk
Ticket
Interested in topic “Artificial Intelligence and Machine Learning”?
You might be interested in videos from this event
Similar talks
Buy this video
Conference Cast
With ConferenceCast.tv, you get access to our library of the world's best conference talks.
