Over 15 years of hands-on engineer through principal architect level experience across private, public and hybrid cloud environments for highly distributed and scalable systems.Excellent hands-on technical, architecture, leadership and management skills. Strong troubleshooting, analytical, planning and communication skillsExtensive experience in leading global DevOps, NOC, large scale automation, 24/7 production operations and datacenter teams.Full stack performance tuning, site reliability engineering, capacity and cost management, alerting, monitoring and metrics experience.View the profile
About the talk
This talk introduces the concept of cost/efficiency control with GKE autoscaling. Together with our customer and design partner OpenX, we show the story of tuning and controlling infrastructure utilization while balancing cost through use of the GKE autoscalers. Additionally, this session provides live demos.
Speakers: Jerzy Foryciarz, Ivan Gusev, Joel Meyer
Google Cloud Next ’20: OnAir → https://goo.gle/next2020
Subscribe to the GCP Channel → https://goo.gle/GCP
product: Anthos, Kubernetes Engine; fullname: Jerzy Foryciarz;
event: Google Cloud Next 2020; re_ty: Publish;
Welcome. My name is Jersey for Richardson. I'm a product manager on GTA V hacks and principal infrastructure. Architect, In the next 25 minutes, we will show you how open X looks at performance and cost, how to become a hero metering and understanding your utilization. What does it mean? So, let's get started. when we talked about cost to Performance optimization on GK, while the ultimate amount of your spend on the infrastructure, depends on the provisions. Resources, it is important to understand your application.
How it operates? And what you expect from the services represents? So is your application stateless or state for isn't serving or they the processing? How long does it take to start from the Blumenthal active and how much time does it take to drain? The second Factor expected, as the law of service, you are providing, what is the latency of the service that you expect? Can you afford the longest start up time? If this means lower-cost, do you need to provision for high availability?
So how do you do the same practice? Let's hear the life story from Joe and Don from openings. Thanks Jersey. So these numbers represent open X in a nutshell. Prior to our migration. We were a hundred billion request per day. running 15 million lines of code for than 15,000 servers around the globe in five different regions, Now open, this is an ad exchange until those billions of that request. They're coming from a multitude of clients with an SLA between a hundred fifty to four hundred milliseconds depending on the client services. That handle those request.
A delivery system that comprises about half of our infrastructure cost, but you can see from the load balancer stats on the screen, we are both the daily pattern with Peaks and troughs and then weekly pattern as well with lower traffic on the weekends and then hire during the week. Prior to migration, we had a pretty traditional deployment strategy Services were packaged does RPMs, and they were deployed, the assaults back on physical Hardware, but without the thing on, Kronos limiting, our efficiency show me move to the cloud. Our goal was to modernize for the purposes of
gaining efficiency, scalability and consistency for that. But we also knew we didn't want to be running kubernetes ourselves. We evaluated the different offerings from the competitors that seemed like he was probably about two years ahead of everybody else. Both in terms of features and stability. Not really on a bonded. Looked at the CPU utilization of all of our services do that pecan on average to get a sense of how much compute, we actually need in the cloud was pretty interesting though. The x-axis here is Peak utilization and
the y-axis is average. Utilization Mountain on-prem environment. Your server counts fixed someone left. Your service is running at Full Tilt all day long. All he is basically wasted compute and what are diurnal traffic pattern on our weekend? At earns? We had a lot of ways to compute. The graph also Illustrated for us that we did save significantly and compute cost. If we could buy only what we needed throughout the day and could share Hardware with auxiliary Services. Now, keep in mind, this is only one dimension of CPU utilization. We're not looking at memory or I owe.
Those are captured be there in. Services Circle there those are the delivery system. And we knew that these were the candidates for GTE and that moving to gka you would enable us to scale compute based on our traffic patterns and needs multiple gain efficiency by Optimal packing of the services on Hardware. Let's take a brief look at some of the before and after so I'm Prim Services were rarely restarted, I you would deploy and you would start and it would run unless there was a problem, also didn't grow and shrink.
And number where is in the cloud. The paws were very Dynamic spitting up spitting down and also the cluster size was Dynamic, a growing or shrinking. So we're going from a very static environment to a very Dynamic and bar and that's what we wanted. But it also introduced some challenges for us. The other difference is worth noting on-prem, Hardware is dedicated to a service and as a result that service was tuned for maximum. Utilization of that server, we're going to Cloud your optimizing, your pot, for the resources that you've allocated to
it, and then you're optimizing your cluster for overall utilization. So what that means is practically speaking as we weren't just migraine from one containerized environment to another, we are moving from a very server base model to completely unoptimized container model and that added a lot of complexity. But what it did mean that we could go from using the fewest service service possible, a beat to a world where we could provision, only what we needed at any given time and that was our definition of victory. We had an idea of what was technically possible in kubernetes, but we
still needed to translate that into a cough model and to accomplish that, I wanted some pretty expensive modeling. There were a couple Dimensions, the first email, the type of computers that we would use, would we be using on them and commitment has committed use, or most cost-effectively, could we use preemptible now? Second, he translated computer hosted services in some cases rather than running react TV on TV items. We decided, let's take advantage of big table. Let's take advantage of lb of load balancing and finally, to adjust the amount based on our
actual utilization. So that model gave us a good idea of what we could get to actually getting to. That Target efficiency, is the journey that we'd like to share with you before we go on the Journey of all the scanning, let's recap what we're dealing with. in kubernetes there, two layers of abstraction layer where your deploy your false and a cluster right here where you deal with notes Google kubernetes engine provides automation at both of these layers so that your service works, optimally all the time. The
VA is adding and removing parts of the same time, based on the utilization metrics, which can be a Casper mattress slide request for second hand store number of talking about substitute for the cases were more request call for more machines which would be likely the case of yours serving from X application back and Order full of processing work. But what is the amount of memory or CPU is not appropriate for your process or your system cannot scale, by adding more machines easily, a database or file system as an example.
This is where vertical pull together comes to rescue. It is monitoring the resource utilization and allows for adjusting the Pod requests to either improve performance reduce unused resources, similarly, on the infrastructure layer to Astro to scalar automatically, arts and remove nodes in the cluster operation opens in response to a request by the scheduler and No, dodo provisioning. Decides on the type of notebooks on your behalf automatically, this way. Through to scalar,
can always keep the most optimal meaning, cheapest configuration of your plaster. So, in summary, GK is offering four dimensions of 02 scaling to help your optimize cost and performance of your workouts. So, how did openx use them to optimize for cost? Thank you, Darcy. Awesome. Freight number of performance. for most for clothes City, Utah has a shin Target's, work best as change to be constantly adjusting if a customer has used mobile company. If your brother or network Bob, remember
process? When department is scaled down. What are the circular normally takes care of if you need an additional control station. When a star has one or more pictures of the fast awesome notes schedule. It also takes into account and estimated cost of water size and fees and will attempt to increase sequentially. Some kind of false or stuck out in and feed the Texas. Who started this very prescriptive model, when you find exactly how ever, let anyone at the station or no. Application form for now. One day I allowed the Kloster
autoscaler for to use less memory and less. The author also offers fast scale down this killer profile which removes much more rapidly off your. Faster. Don't back to you. Thanks Yvonne Yvonne just describe the benefits of the horizontal pod, autoscaler and the cluster Auto stealing. Once we had implemented. Both of those are computer usage, began to follow traffic patterns, both the diurnal daily pattern, and the weekly pattern, which you can see on the slide there, that's the beauty of auto-scaling. What is the maples at the
application layer? And at the hardware layer, it translates into cost efficiency. I'll take a brief side journey to talk about measurement. Do optimization without measurement is nearly impossible. You need to leave this ability and your starting point and where your resources are being consumed. So you need the ability to measure Improvement. You want, understand your pod, utilization? How are they performing you want understand your node ululation? How many are being used and to what extent? You also need to understand your overall cost because at the end of the day that's what
matters. So really love to measure everything and we used a combination of different tools to accomplish that first Wii used from December van all the measure service, metrics and application, metrics all our services are instrumented, and goretti's offers those metrics are the boxes. Well then we also fix the gcp makes available and there's an entire summer talk on that topic which we encourage you to check out. And I'm finally, we used to ji ke meter and dashboards to see which Services were consuming the most resources and cost. Or comes to optimization your destination is tomorrow.
You've created and measurement is your GPS that tells you how far you've come and also how far you got to go. Let's talk about some of the next stops on our optimization Journey. Sure, sure. You understand ratio of RAM and you need it for the application to operate that certain support. In some cases, you may also need to tune underlying technology. Resources for utilizing the request. Request request. He provides vertical process that does free things. It what is the resource usage by the selected Bots?
After its recommendations to what should be the actual resource that should be specified in the bus station to select? You can apply the recommendations yourself, similarly to what openx did with their work. So you can also enable according to this can happen in two ways and another gke feature enabled us to optimize performance with CPU manager which of the following is going to tell us a little bit more about Just like the Knotty neighbor effect on Virtual machines,
exist for Innovation, and on the same CPU for all the time and arms walked out or competing for CPU with other up. Orseund Iris like this request. While resource requests are only used by Copernicus scheduler to make informed decisions on where to schedule pods. Continuously prohibiting all auto parts from stepping into territory. The flipside of Cydia manager is that cannot exceed their requests. And efficiently sandbox in times of CPU capacity. Or our Airline systems. That's exactly what he wanted. So what's auto-scaling Mabel?
And our pods tuned, we still needed to tackle another area optimization. Make the most use of a cluster that has a diverse set of services running in it, but I'm not sure how do we upload. Bendpak are two parts to that challenge. One, are we signed our pots are there more easily accessible men to our wee size are notes so that they can accommodate the right. Number of PODS, we have some control over both dimensions. So we needed to find a sweet spot. We're not just schedule. Our take several pods properties into consideration to make a decision on what node or get
scheduled to primary resource or CPU and memory. Brother works at 1. This is not possible and poldark on the list crafts. You can see that even though I'm just scheduled Radiology memory capacity, Inefficiency, and make sure each node and fit exactly. This is not always ideal capacity requirements and ability to ^ Doing so make sure that you will have time for space reserved for demons and Colonel also, don't forget the containers I was there, a resource request, might be affecting scheduling.
Generally speaking in packing when Paul request for small fractions of CPUs while now, it's a relatively large. yes, if you're using Graham spell virtual machines, you have to watch for losing large number of Paws, Measuring the impact and efficiency is quite simple as you need to request. We usually get around 75% of CPU reserved while fastest is Improvement in efficiency, reach all the way to 95%. Similar accusations can be done for memory Dimension.
However, memory generally requires significant capital and uses less predictable course or reservations are around 50%. Immigration policy in more than one dimension is totally possible but depending on how we decided to leave this exercise for the future, when we are doing our Cloud cost modeling Bond picked out services. That could take advantage of preemptible. Virtual machines, for the delivery stack, we assume that roughly 70% of our computer, could be preemptible, know when we first tested, we were
discouraged because it was hard to get the TV into pass-through we needed. But after discussing with the GECU team, are they encouraged us to try a smaller size, pvm, note, after making that change, we are much more successful in fighting the PBM capacity, and we've even exceeded our initial Target PBM utilization, but it is another factor to consider in been packing. Do the GTA coaster autoscaler allows us to tap into different PBM Pools by attempting to scale up different notebook. Went really, that could be no goals of different VM sizes or using different from a few families.
Oh well we presented. Our journey is a nice linear progression was a bit more winding with some back and forth between the different optimization techniques. That being said, the six approaches we've discussed are all worth. Looking into first. Configuring to in your horizontal thought I was Steelers to the name of the cholesterol does healer three right size, your request and or use the vertical pot auto dealer Number for use the CPU manager for the applications that need that level of guarantee. On number 5, Optimizer been packing and then number six find the right
machine time for your workload. Thanks to the work. Then a bond in the team of done. We're not able to run our gke, clusters on a very small set of no calls that Scale based on price and availability and thanks to the up in my spot, ye right choice of the machine sizes. We've improved efficiency by over 45%. One of the things that was invaluable to the success of this project was the ability to meet with and ask questions of the different teams, whose technology were using whether that was Jersey and the g k Auto Steelers team team, the gace team or any one of
the other teams that we met with. We were really impressed by their willingness, to to work with us and help us. Use their technology will also take advantage of the gcp PSO team to help us in various parts of migration and found their help to be very valuable as well. Thank you particularly to will be be Jersey. Thanks for allowing us to share some of our story here with you and the audience. Good luck with your own optimization. Efforts, Jersey back to you. Thank you for sharing your experience, for those willing to get on a journey
of optimizing, Custom Performance on GTA best practices and how to use provisioning. To optimize your resources, this guy's or I collected summer of experiences of open eggs, and many other companies who share their stories with us. We will keep updating it as we are. You should also take a look at related breakout sessions airing this week. On behalf of a team, I want to thank you for watching.
Buy this talk
Buy this video
With ConferenceCast.tv, you get access to our library of the world's best conference talks.