Kingsum is currently a Chief Scientist at Alibaba System Software Hardware Co-Optimization. Since receiving Ph.D. in Computer Science and Engineering from the University of Washington in 1996, he has been working on performance, modeling and analysis of software applications. He has been issued more than 20 patents. He has presented more than 80 technical papers. Kingsum appeared four times in JavaOne keynotes and almost a dozen times in JavaOne and Oracle OpenWorld presentations covering the topic of software performance systems and optimization. Kingsum also co-chairs QCon Beijing 2018. In his spare time, he volunteers to coach multiple robotics teams to bring the joy of learning Science, Technology, Engineering and Mathematics to the K-12 students in USA and China. His teams appeared in national and world championships three times, taking home best programming in 2010 and best sensor technology and machine learning for robotics in both 2014 and 2015.View the profile
About the talk
Kingsum Chow is currently a Chief Scientist at Alibaba System Software. He presented on the emergence of large-scale software deployments in the data center and the resulting challenges. In his talk, Kingsum specifically addressed the challenges of measuring software performance in the data center and optimizing software for resource management. He discussed the techniques and processes that Alibaba has developed to address these challenges. This session had several interesting reminders of things we should watch out for in measuring CPU performance.
Good morning. Set ask mention that I've been working on system performance for many years. It was in Seattle University Washington is really nice to be back in the same city of stars angel in Oregon. I will be there for about 20 years. That's the time that I realize how hot where can help. Well, I was looking at things how I spend a lot of time working. That is looking for ways to improve their software applications running on Intel systems. I'm doing that time. I was very fortunate to work with companies
like appeal sibo. So I mentioned disposal companies. I want to I wonder if any of you have heard about these communities. Have you noticed that title about baseball companies? That's why I by One company eventually as Oracle. So for the last 6 years, I was in America. I had been working working with Oracle to speed up articles how bad performance on Intel service job applications and sometimes call computing. the weather time I was wondering what should I be doing? Well, I was working as an Intel
engineer working with these how many's I could do a lot of it, but I could not change. The software is software belongs to those, and I may not even access some of the components. I want to study. So I was thinking about you be great if I can work for a company that actually occasionally allow me to change some sort. So I was looking at Ron and Ali Baba gave me a choice to say well. Come over figure out what problems we have a baba and do something about it. That is the job description. So many of you would like that job description and I took
that so I moved to Alibaba in Hangzhou China about 2 years ago. Of course once I went up what I need to do with other issues, but I'm not going to bore you with the other day. I'm going to stay with the interesting thing. That is when we look at the performance of a large number of systems. What kind of interesting problems we observe and what can we do about it? I went to call at this time. I hear is all that they got here is not real data so that we can tell the story. And also
the talk today we'll will compose off a lot of grass and data anytime you if you have a piece at a time, you don't need to wait until the end of the section. That would be telling make the discussion Asia. So he is a journey. We are looking for ways to improve the performance. For us improving performance means that we can do more with less. I want to put the problem at scale so we can all see the things differently. What we are talking about he is we have
a lost number server. I need a really big number that is not a real number. Okay billion service is way more than what we have but give you a sense of the scale of the problem. We are facing and I was so I cannot get the exact number. So I just went with that but the most important thing about the number service we have is we are looking for a 1% performance gain, if you have a 1% performance King across all the servers beaches of the rebate number. The
weather is hot bar when we are looking for that kind of a small performance in terms of percentage button. Telephone number is a big number what kind of data we're looking at and what kind of problems we play? And more importantly is once we have all the data in front of us. We are studying the data. We need to make some decisions and because of the scale at a problem if we make a wrong decision is going to cost a lot and hopefully we can make all the right decisions And all the way to make all the right decisions there we are encountering
issues. I will call them. And how do we discover mistakes and how do we go around those problems? Someone to talk about performance in a small and performance in the back. So on the left side area. I guess Tia. I just don't know how the point okay. We spend a lot of time while I was with with Intel is we we run small benchmarks and we started the performance impact of these benchmarks. Can we try to escape or the performance of the sponge pants back to the very last say so if you have a bench bar that is stable. Everything works very well.
Bobby we already in our Las Vegas Angeles and foremost, He's he's a software changes on a time. Sometimes that's all. Changes every two weeks. So by the time you see you try to stay with the performance of any banks into the data center in may or may not work well. So we have this thing's can we use Benchmark? What should we do in Alaska? Don't send it back here is when we have a lot of data centers the performance of a single Software System may or may not be the most
important factor. So he is a paper that was published a few years ago. On the right side is a paper that was published by David glow from Stanford. So he study today Los Angeles a few years ago. And on the left side is one company that start with T on the right side is another company start with g. And he look at the station the systems. What time you point out that is roughly 20% to 30% CPU utilization. We use only a fraction of the CPU. So, how can we make use of Maltipoo? That is a question? What I want to point out today is We are going to look
at what the CPU utilization by looking at the process. We have been thinking about solving a problem that isolation is one of the metrics to measure the utilization of resources the other way to measure what we are looking at the we want to break down the process into three component. The first component is can we really measure what is happening right now? How come we done? Are we about what it is? What is happening right now? Next question is what can we do to make things better?
When we talk about what we can do we are just we are not talking about changing a single line of sorcerer. We're not talking about changing the schedule and replacing a wreck on machine. So, how do we determine whether that is a good thing or bad thing or how good that is? the Wii U Better Way. And on the right side I need to make a decision. Is it a good thing to do? And once we make a decision on Depot in La scale, we started the event weather. Is it worth it? If it sits Atop A Change we can roll that but it was just Hot Wheel change it would
stay there. So why anything that involves a Hot Wheel change we want to be very careful about it? Alibaba and Nanny companies have a lot of Engineers. They are taking a lot of black telemetric data in yesterday. There was a slight about an elephant as well. I think he has for people starting at 11. So I have six people that's the only difference. You have engineered trying to study the systems. They pick up theater lights size Asian memories ization, maybe switches contact switches Network interrupt a bunch of star
coming in and every one of them is going to tell me that I am starting the most important metric. I want to use the metric Drive. However, everybody is looking at a small piece of the problem. But we still have the ultimate prom the song is how do you study all different kinds of software running on many different kinds of Hardware in a data center. We're not talking about the Pu Yi inside India power utilization of doing something in a day. Can we unify the study of the
optimization of such a problem? The first equation is very simple equation that we're thinking about maybe we have a solution. what we are talking about is resource usage effectiveness Every piece of work we need to do in a Datacenter running some software. The software is doing something that's all that maybe doing a bad processing. They may be processing some big data operations that reduce water softener and maybe doing something like the e-commerce application. speaking of e-commerce Have you heard about singles day in China
anyone about that good? Yeah, it is a very busy day. It is going to load up the allow David network activity and let me talk about how thick that is on the singles Day last year 2018, November 11th. We generated sales of if I remember correctly is 30 billion US dollars of Legend eyes. 30 billion dollars. How big is that is more than Thanksgiving. cyber day Which change? It's on that day. We need to handle a lot of traffic and all this is Sunday to run. What is
We want to do away with a satisfy all the request when people are shopping online doing a bunch of stuff. We also want to make sure that we will probation. The right amount of resources and because of that we pay attention to soften efficiency. So this is a very simple equation. And can we actually do that across a lot of different software applications? Teaquation on the top because we call something called resource usage somehow we should be able to measure the heater isolation of some
kind of resource several examples of the Resource Network. Supposed to put some of you in a room here. Can you guess What what are the two most expensive components in the Datacenter? Across any company the answer should be the same. Air conditioning to Papa. Yeah, that's true. But I'm talking to take two out of this book. Is that what that what is expensive when you when you buy a server right the server is composed of one or two CPUs and some of my memory and sometimes I said what comes with some storage Storage
storage. It would be expensive. If you are buying the network computing something in response. So use a request or they are being used to do something about the paper topic. He is a i m a And why all these applications are consuming CPU that use memory for temporary storage? In many data centers CPU and memory are the expensive component in a server. And today we're going to pick CPU as a component to study the utilisation. So listen to take out the resort CJ usage and we need to deal with the second thing is how do we measure the amount of work
that is being done in the system. Because we have many many different kinds of applications. They are measured in different way. We want to unify them in some simple terms to most of the people can agree that it is a good thing to do. Some of them we call them to receive something. I'm curious as a transaction happening on the server like you are shopping online go to Amazon you put some items in the shopping cart then later you decide that you want to buy that thing then you execute any other transaction. So there are bunch of
theories happening in the server to process this machine learning Big Data operations are processing. So we come of this equation and we hope that is useful. And at this time we're looking at. misleading behavior of civilization Has any of your local station running in your data center? Probably many of you have no clue. How much do you believe what you're reading is what you think you are reading? Thank you. exactly Beyond Darkness So you have 0% of 100%
What is really interesting is if you look at how the OS kernel is Computing the tribute ization is the use a specific formula and that that predation is what they use to produce the secret Istation. I'm going to get back to Dad but to illustrate the problem of looking a secret isolation. I want to post this question to you doing that. You have a data center Alaska and across-the-board 50% of the CPU and you don't have load balancing problem. No salt as getting problem. No interference from that was promised a real but to simplify the problem with saying that well as
everything scale evidence public. We are just focusing on the definition of seep utilisation. naturally, some people might think that well you buy I'm only using 50% of the CPU then I should Anybody agree with that I need at least one person to agree with that. Thank you. so I'm going to go straight into what some of the symptoms might be doing to help you and while they are doing something to help you. They actually caused some misunderstanding some of the numbers that being generator. I'm by far right now. Most of the city was using Data
Centers at Intel CPU. So I'm going to focus on that and not against you but why I'm coming Elizabeth Seaview has a feature that is similar to until that. So hyper threading a summer job. How much reading is a system to make your software run faster? In this example is on a top player running on a CPU without hyper-threading the thread where does Consumers and you will proceed and give the threat is stored and anybody from Antalya. With Intel hyper-threading technology that has been around for more than 10 years. So when you have to stop at rest
running and these two severe threat can be place on too hot with rice. Md2 how our prayers are running in parallel and they are consuming a single-core but we some sleep and be a CPU Intel Core appears app to nautical CPU to the soccer practice does not need to change a single line of code. The OS can play can place all the software running on the same core. Everything is running fine. And the next picture illustrates how how the efficiency is improved? Turn on the left side and see that without hyper-threading technology.
Andy green software running on the green hat with red is also using some of the resources of the poor because they cannot overlap and with private technology things. I thought that because they can use to use the resources. not all these things are happening behind us in the Datacenter is happening every day right now and believe it or not many of you have laptop and hyper threading can or you can see that you have a spot for CPU show up and that is the reason These Boots ocp without bringing performance Improvement
Did the example with and they will show up as for and windows Apartments number if you have varying degree of your isolation. The only thing we need to remember is that these are not real CPUs that behave like for lunch if you so we need to keep that in mind. Now, let's get back to the 50% sheep utilisation. If you have to quit running at that is the picture on top of a running some software is at 100% Nutrilite. Is reliable give you don't have hyper-threading turn off.
However in a Datacenter to get the Optimal Performance, how do I know if you go to any Cloud providers you subscribe some reason that you supply a number of the visa to use Virtual seat view? So we wanted to compare two scenario in these two scenarios. They both showed 50% CPU utilization. Let's see what that means for you. on the left side We have hyper-threading channel. That means we have to pour that appear as logical CPU. And within these four logical sip use on the left side two of The
Logical CPUs are fully utilized. 240 utilize safe to use / for City use a week at 30% On the right side. We also have a similar scenario. We have to coerce their PS4. Encore zero on the left side the too hot with Red Sox hoodie is high. on the right hand side This is a very simplistic picture to show that we are utilizing tensei. We are also utilizing 50% the question is body to the same or different with the picture. I'm chilling here. You can probably see that they
are different. How different are they? That's a question. Why does why does it matter to you yet? You just run some piece of software. You're looking at the entire station. What do you get? You get the picture about the iteration across all the logical sip use or do you just get the final number? The answer is most likely you just have the final number and it would take you a lot of effort to figure out exactly which scenario you are in. So in summary what we are saying here is
when you just have one number about the civilization it may or may not be what you expect. So now you may be wondering if this is a real problem. How come it is Haunted Mansion in a lot of presentation. Do you guys have that feeling? So we run some experiments. You can run these experiments on your laptop. If you are curious whether this is really true same procedure here and do some experiments on laptop, and then you'll see that Did you see the peppermint and rice and a phenomenon like but we are.
Koala experiments, we don't use laptops with you service that we use Intel Xeon CPU with two sockets two CPUs. And is CPU has 16 quart. Everytime I wipe with red berries typical scenario. We run a server in a data center is not just we many companies are doing the same thing. Not to get Australia's problem. We pick up Benjamin. So it's easy to see where it is coming from with expats KBB 2005. If you have a second, I know you from the start website, Is it Warehouse the component running within this bank will run out
so we can do to build the warehouses based on how we want to do Seville on the logic. That many wait to run the band fossil make things easy. We only Run 2 configuration. We run on half of the law to go. See if you so you have $4 to go see views on your laptop. No case we have a pretty picture of us and we have a total of $64 to go. See if you and I'll text me run on 32. And then you'll compare that with running on order to go. See if you doing. Okay,
she will be 64. Colitas, okay, are these two simple configurations? This is a experimental result. I will go to this Library slowly you get a super number the truth. What number is like one of the things we talked about earlier in the transaction. Pill with recoil m o p s Meridian operations per second. On the left hand side. We're running on top of the world. We are doing about 1.2 million operations per second on 32 logical City. We are running a 64 everything. We're doing about
1.2 million April 2nd. That means we are doing about the same amount of work between the wrap bar and a blue bar. On the left. We are running on only half of the logical CPU. But that is the top chart on the right here. So when you're running on only 32 logical sip use out of 64 and everything is easy lice. What do you get 50% Civilization by definition of how does 64 logical sip USA has ACT you can access 32 is very busy. And because this is real. So, I actually have a smaller 50.16. So sorry about that study more than 50% And because it's a real
danger I got for $64 to go see. I actually got 98.59% but is very important to notice 50% Utilization is doing about the same amount. And that is a key metric. We're going out there when we are consuming CPU on the system. Are they doing useful work in our case when they are generating operations per second. They are doing useful work. So now is now we're puzzle. What exactly is going on 50% of the CPU about the same work? So which one more thing? The next thing we're checking if the number of instructions that got executed per second. These are the
Machine level instruction instruction that are something like the your software is compared to Sun by Noriko in the binary code is loaded to memory got executed by CPU and sound instructions are executed repeatedly is the dynamic of DeSoto number instruction. And not surprisingly went when we look at a dynamic, They are about the same. So I'm not sure what we are showing here. When we turn on hyper-threading the CPU utilization is distorted. Now, this is an extreme Case by almost 100% in some of the cases. Maybe
they are okay. But I'm maximum power of hundred percent is what we are seeing place on this example. The real problem is actually are going to be more complicated but we won't go into that today is the real problem is when you are dealing with this during The Rock. Call Paula saving. Call now Willy. Here. I hope some of you will agree. Maybe what we have been looking at a CP. What is Agent is not what exactly what we are looking at is OS will Define it that way. Is it not about the capacity? We are consuming?
Is about something that that is generating a number for you. Right now I want to show you examples. About some of the symptoms might have made. well on the way to gather data, we need on the server we Do you choose to gather data? When we have Engineers develop the typical. Thing I hear from the engineer's is my to hel0 overhead. You guys have everything right? No problem. and additionally. I want the I want to experiment experiment one. I'm doing 90 520 operations per second.
Call my Chris Purvis know everything's fine. I'm glad that you're shaking your head. So there are other things that the system. And one of the things we measure area is a civilization. Maybe the two is introducing something. What we are doing here is your weight ignore other metrics. We may have a deposit picture. Is it another example while we are doing the 30-day experiment and maybe some of your running some software, especially if you're running machine jvm with many parameters.
And sometimes I get so probably just got changed. in this example Even though they're running the same 48 warehouses. We are observing that the truth number. is significantly reduce DanTDM but is it possible that running the same software on the same jvm. Dad can be such a difference. symptoms of croup so when we get around to get We realize that. The machine was configured in a different way. And because of the difference in configuration the performers drop
by about 20% Is something about a special teacher in the JBL? So you're running job application in a cloud or into the center and take a bath bomb. Do you want to play attention to that and if you usually weigh less than 32 or why this is not a problem for you, but because it's a job t-shirts album. what I'm doing in the last two examples is when we need to look at the resources right second to figure out something. We also need to understand the trooper. Can they change by
some configuration? What does that mean? summarize what we are dealing with is Rudy example of just showing synchronisation may not be adequate for me. So we look at the published papers, especially the papers relating computer architecture and software performance. How many professors how many cases students have written about this iron law? Does iron law is? a way to break down the efficiency of that's all right. We're looking at how much time we need to
cook. do a lot of things to think about how we can divide the world into components for bringing down into three things that the first thing is every unit of work with a transaction data operation. How many times can you make instructions? You need to execute Play Posse? The second component is the number of Cycles. We need to execute per instruction. The number of Cycles refers to how many clock text you need to execute on a CPU. Can you divide that number instruction?
In the third component is about how much time you need to take per cycle. So these are the three component that can help us to think about chocolate performance. As always when you when we have a formula about soccer performance with take it with a grain of salt. Is it adjustable? He has some limitations starting point. Among these streets ain't the one thing that pops up in a lot of the Year presentation is CPR. If you look at the last time the number is
right how to compare? And on the far right side is among the Mexican in time psycho. So give you buy a CPU is running a certain fixed frequencies. How much you can do on the right side. What's interesting is Middle piece that the CPI in this case is so interesting that Microsoft and Google has published some papers about how they use that to help with the eponymous tunic. If you want a shot under a time of your program, you might want to reduce the typos. Resume
Netflix engineer call Brian and Greg. He has published he died. What is a guideline? What is a good CPI was a bad CPI is a good thing. Any of you have nothing that? Don't that's good that may or may not be neither. Give you a running exactly the same way as before. The instruction makes will be exactly the same and CPI will be a good indicator too hot to handle. Now that's a reality. We are running the same piece of software and I bring up my paper examples by JB because that is
supposed to question. What is you have bought a piece of hardware? All the default Highway option already optimized for your workload. Before I go there I want to ask a question. What is bios computer? Have any of you play with the BIOS options there that can change the settings of the TV. Okay. Thank you. Thanks for sharing. So you can get results option. And the number of BIOS options on the server is way more than you. So they have like with the currency of you. They have extra tortures. I have
different ways to do resource director technology. So when you are dealing with that a CPI comes in handy when you are tweaking his options, you would be helpful to check. Are you refusing to CPR? So that's what we check. Then the other thing I want to check is the hardware options are not optimal for a glow is probably closing up maybe a few percentage of the optimal point where okay. Des trois example running on Broadway We are seeing is that could be a
25% performance difference running the same software by tweaking the highway option. That's how it all turns are chatting or not sending you and also along with the Hobbit options control is you and also a c o d is a special teacher call Cassandra is breaking down a single socket into to note that can process things by reducing the memory access and improving the cash app Business. All these are really cool stuff that each of you probably has a special Hardware engineer that deal with these things for you, right?
What we got here is when we are turning off everything everything is not a default. We we we get a performance about $293. Operations per second. We called add the bassline 1.00. When we turn on everything. We got 366 for 23 HP. Is 36423 / 29327. This 1.25 increase in performance if you could question. Why is nobody fault because Intel actually a good person until actually has a recommended default when they work with different vendors. So you're buying your servers from different vendors. It would Implement some set up default values for you.
And if you like you can go to the bios of the of y'all. You can chat what what I said, there is another interesting thing is sometimes they don't expose that to you. So you don't know what I said because I thought they already fix it for you. So you shouldn't need to know if you and candida problem. You can go through a a convoluted way to gather data about looking at the MSR register you get curious about that at the send me an email. I'll send you the document now because of the vehicle turning off
everything is not likely the best option. So some of the options will turn on but the options they are turn on all these three options. They are turn on Related to the protection of data through the CPU speed on a memory. What site? Yes, exactly. at some point I was wondering if they were sorry. I was wondering if you know there was room for either here or offline to change a mess inside about that. They actually take me to watch awesome. So I want to address the comment that may require different
optimization. And that is a challenge we face. Can we set one set of optimization for everything? How can we get the Seaview to study the workload so we can sell yourself and for some that sell tuning activities are some common is will be driving you from the hot well point of view and some coming in like the soffit how many as they would like to look at it from the auto-tuning from the software point of view like the Welcome by MIT call open tuna. You can try it and also there's a welcome by Twitter about
Play Operation processing tweets. So there are many different options and this is the area that machine and may come in handy to how about you all. The next session is exactly what we have a lot of different things and things changed and how do we do that? If you have a lot of different work, so hopefully the average Behavior we have the extreme cases canceled out. So hopefully we can do something about it. Now we need to come up with a way to evaluate if we are doing something better or worse. Record a speed up. The definitions of speed up is
consistent with the Hennessy who won the Turing award last year ablution derive from AR test book. Is a century. Alta Bates case / all you need to take that speed up ratio of these two numbers. But we want to apply gate to a very large number. And we need to do with the operation difficulty of this rebuilding a lot of software applications on a law server. So at this point we cannot have a setting like what you said. We are going to deal with some Broadway tunes of data and we hope to be able to
make sense of the data and still make some decisions. And see if anyone to remember it for speed up the bigger the number the better edges. So here is a scenario of the problem in a Datacenter. Everything is why we put on a laptop. We are we want to test a new future the new picture could be updating the software updating iOS oil changing some hot way of thinking a configuration something. We cannot afford to change a lot of things during testing. We can only change a small number a small
number is a big number in okay, but is 1 percentage. The 1% of the instances would be running. For this experiment to be conducted we need to have fear confident that nothing really bad is going to happen to the 1% So we have done some testing conjecture that you cannot be too bad. You cannot crash bumper assembly service will be in big trouble. And we cannot ask many other team to change the way they work like you're there is a scheduling software distribution the application on a lot about customers. They were just schedule the
way it it. Don't no change in any jeans look. The only change OnePlus animations with a new configuration. I don't remember one thing. Is there any software will be running on the same song server the same server is running many application. I think I have a picture in the next like Okay, so this will help. in this picture, we are showing that we have falsettos one two and three and four The Topsy machine it has about. and the Machine for has sex cause
a different number of application on the top vacations at 1 to 3 for the distribution software applications are running a machine to and applications are running a with different amount of a courtesan to each of them. We make a change to the systems. And I when we make a change to the system, we don't control exactly what system can pick some how some systems in this case. We got the new configuration. And all the software applications running on the shoe one running on the new company creation as a result. So in this case is a hardware configuration
change. And all the other applications are not affected by the change. So what is interesting here? I want to read a call here is we are not comparing before and after what we have on the right hand side here. We have sandwiches running Bo configuration and we are some machines that run in the new configuration application processing a lot we have The reason we have this we allowed to have this family. We want to make life easy for people. We want to locate things we can do it. We want to repeat things that
So this is how big is a while he's more. Yeah, it's more about his last number 7. The Wii we have this hypothetical example. We are looking at a change in the complication 99% of the air surrounding with configuration one. And we are getting all u g of 885 resource usage. The small event. So we are spending 885 Resort unit. And then we have something about machines along with the software applications running on 815 resource minutes per transaction. Take the ratio of discernment numbers without 1.09.
I would say wow. Somehow this lady will it rain in improve the average performance by 9% Are we going to do that to all the servers? That's a question for review. Why not? Why not try out maybe works? Changing the last number scissors for this computer change is 30 dias. Guess what the person that discovered there is a push weed eater switch go for that. But then the person that is actually we plan to do that. Are you sure? Then we have another group of people that say we need more data. But exactly what more they had.
You need that the question. How do we break down the the need of additional data? Have I used to think about you? Is this an artist song? It may not be a sound basis for changing all the processes because we don't know whether the pay the load behavior on all the processes. It's the same. If you have an enormous amount of variation, then you could actually be doing some hum. Exactly last number that was not going to help you unless the on this the population variance and mean of the same all the
way across that is one thing we check. Need this is a good point. So when we look at a loss time of data. We are we are just now looking at one kind of workload and always say these are all applications doing about the same thing. We have a question. Are they processing the same kind of transactions up table executing the same kind of transactions on this piece of software. In any case of Amazon is are we doing I'll check in with all checking in items to the shopping cart or are we doing search of items even though these operations are happening on a similar sap software good
behavior will be different. So it went through down to the behavior. We need a lot of data. But asking for a lot of data will be very demanding for a lot of people. We want to find some compromise. We don't want to get too many people to do too much work, but we want to look at some sense of how good it is. So based on the operations of the aves we divide the apps into three groups Why is why is interesting here is the breakdown in the three groups? On the right here.
We are showing three numbers. for absolute one we moved from all you he 1289 to 1484. For everyone we are increasing the utilization of resources for this whole group. And we are we are losing 30% of the efficiency 13% I know for absolute to we do something similar to a patient with losing about 1% It's not too bad. banana apple tree when we sing about 16% And we call this potato Paradox leading to the next slide. So what we are seeing here is When we look at the overall improve overall impact, it seems very positive. But when we were able to break down
into different components. Sorry, I'm going to make it fast when you bring down into different components. We actually lose performance. Unfortunately, this is not something else people have upset that over the years. Some of you might have heard about Simpson's Paradox. Anybody has hell about it. This has nothing to do with boxing. So he probably has an observation in 1951. He said that sometimes we look at a trend overall trend is the opposite of the individual trends.
So today I want to spend some time to explain place on what we are serving. How did it happen? I am by the way, I apologize. I did not realize I'm running out of time. So if I don't get a chance to go through all this lies here, and if you're curious about how to explain this to other people, I'm happy to talk with you about the dissection here. When we turn on the feature, I put the color blue there. So that is when this feature turn on and color green is Future's off the Depo.
We are checking is the increase in performance is real and we found out that the pool group are the purple comparison will have a reduction in performance while they're open up. Now some of you need to get ready and you need to go back to your high school mathematics now. In this case. We are looking at a number of Trials. We will try and on the y-axis is the number of successes. And when we're comparing fast way, we're comparing the slopes of the line. When comparing the slope of blue line versus the slope of green line
in this case a slope of Blue Line Is Better Than Ezra between lines of there is a better performance 88 inches of time. I'm going to skip new slides go to the last Light. When you are comparing. The dark blue line vs. The other day supposed to be green why sorry about that? The navy blue line and there's a performance increase as indicated by the rap battle but individual comparison red arrow and the right red arrow. There is a decrease. So we need to correct for this kind of misunderstanding. We will try to avoid making a
decisions. Is this kind of symptoms occur? So I just want to summarize the three big kind of the three big problem groups will try to avoid the first thing is civilization may be more complicated. And CPI is not similar. laughing is the average performer change from a lot of data might also be misleading. You probably think sounds really complicated. Remember one thing when people are taking racial watch out. If you really want to try or they didn't we just published Alabama concert 2018. We just released two months ago.
If you decide to publish a paper a little bit like our 2017 papers out of the day and one of them got the best paper and OST. because I wasn't hiring so I want to cut to this lie to summarize x-ray how to summarize the line you read about it. And if you have any question, I can address it here or we can just go out. I will stay outside to discuss some some other interesting problems, or you can share me your penis as well. Thank you.
Buy this talk
Access to all the recordings of the event
Buy this video
With ConferenceCast.tv, you get access to our library of the world's best conference talks.