Video
Real user performance monitoring at Netflix scale ‐ Martin Spier
Available
In cart
Free
Free
Free
Free
Free
Free
Add to favorites
727
I like 0
I dislike 0
Available
In cart
Free
Free
Free
Free
Free
Free
  • Description
  • Transcript
  • Discussion

About speaker

Martin Spier
Performance Engineer at Netflix

For the past 14 years Martin's career evolved around Technology and Performance Engineering, leading major initiatives at Netflix, Expedia and other companies. Currently, as a Performance Architect at Netflix, Martin is responsible for improving the performance of the Netflix service, end-to-end, for its 167+ million users, watching hundreds of millions of hours of movies and TV shows every day. Martin is also a Venture Advisor at monashees+, one of the largest venture capital firms in Brazil, angel investor, advisor to multiple startups, and an avid open source contributor.

View the profile

About the talk

Martin Spier is a dynamic Netflix speaker. He spoke about how performance is, almost literally, the business of Netflix. Martin shared insights about how they track user performance globally. This was an awesome session! A very interesting story and some useful information about the tools and methodologies Netflix uses to usefully aggregate enormous quantities of high-dimensional data.

Share

I'm a performance engineer at Netflix down in California Los Gatos. We're basically we do all product development. So all engineering and everything that is necessary for the product to work and I'm here to share a little bit about why we actually developed a real solution. Most people don't know if you have been a few of the got you as a few of the issues we had to face during the development of that and how he really help us improve the experience to the user. So going to that first a quick show has a how many of you are

customers are at least know how Netflix works or why does so, I don't have to go through all that stuff makes it a bit easier for me. I mention I work in performance, right and when I'm talking about performance size does matter scale matters because only at scale you'll surface a lot of issues everything breaks at scale. So nothing better to talk about steel den den by numbers for sure. If you not sleep number is just so you have an idea about the descale that we are

talking about. So as I mentioned we just left quadrant enclosed to be over a hundred 39 million subscribers and that's worldwide Netflix works everywhere every single country on the planet except a few countries in embargo and China. So mobile app where you can open your own accounting Bermuda while you're traveling and you'll get exactly the same experience just bit different catalog but he's a global product subscribers. They watch over every single day. They watch hundreds of millions of hours of TV shows and

movies every single day. I don't have the pick numbers from last year, but 2017. I think our Peak day was / 340 million hours of content in the single day. So a lot of video mostly HD 4K soak a lot of video a lot of vehicles. I Remember song Some publication while ago mention that some peak hours to consume more than a third of all internet traffic. So a lot of bandwidth most of those video beats they get shipped to the user to get sent to the users to our own CDN that was purposely built for that and that's Indian. It handles dance of Terror beats a second

of bheegi bheegi Obits just puny video game. It's something we have to do with Are users daily? We have tens of millions of devices that talk to her back ends every single day if we have a lot more register devices, but these are just devices that talk you need to talk to her back-end Services every single day user devices. And not just doing normal devices. You might need to handle write your iOS or Android your web we have to do with a lot of weird stuff. You need to deal with the smart TV as we need to deal with

Chromecast and your PlayStations and Xboxes and some weird cable TV boxing Belgian running Windows CE we need to deal with that. There's a Netflix app for onion that and we have to make it work. Well for users under those conditions, so thousands of different models of devices. But what I'm talking about their forms his performance of what exactly right performance of the product and for us different from a few Words Be Few companies. We don't have a separate like business or product organization. And

it one we have one large product organization that takes care of all product development. Saul engineering happens there. I'm in Forest Products simple is the streaming service that some of you know and everything that is required to make that work. So all the back in service is all the storage all the infrastructure networking cold running on your phone's everything. Everything is required to make that work that experience to the user. Are golden Kim they are pretty broad performance. He's a pretty broad term. I would say at least for us so we try but the first thing we

we've been growing a lot who want to make sure that our service can scale with our music group. We don't want to see the point where okay. Yeah, we kind of screwed. We can't wait to take more users generally not a good thing to do, but we want to make sure that everything we can keep growing can keep scalene, right? We also want to make sure that we are growing our cost so blind early compared to our user bro. We don't want to grow cost faster than we grow our users. Right? So there's a lot we will hold back in services and then a lot of things just to make sure that we are not growing

faster than the users and hasn't been that much of an issue but it is a big big portion of what we do primarily one of the most important things. We want to avoid that spinning thing. Terrible experience for the users you decide to watch Netflix you open app, whatever device and that spinning thing just keeps spinning in the air. Where are you give up and go do something else terrible experience. We want to make sure that it goes as fast as possible. You guys know that better better performance better customer experience better attention that requisition and all the

same applies to death. You finally found something to watch and you click play and that thing keeps buffering and takes a long time boyfriend doesn't go anywhere terrible experience and plus when you start playing there's a lot of things that we want to avoid that happening. So we work a lot with the Athenian team on trying to talk to my dad and then everything is required. Also a lot on reliability. It doesn't matter if it works best if it doesn't really work at all. So if you try to do something is just doesn't work. You click a button. Whoops. Something went wrong. Try

again in a few minutes terrible experience, especially if it happens frequently has given our scale and that we are up consumer application goes whenever something goes down especially for a large number of devices and then larger region this happens. You guys have no idea the outcome in what we seen social networks and then everything else things explode really quickly for some people the world just ends when a police goes down they have to sit down and talk with the relatives and you were so I luckily I only

happens days Ryan. The same time globally it is pretty rare these days. And we do that at least performance engineering side all the way all the way from Cornell. We we do a lot of tenable as we we work a lot of modules we have hundreds of thousands of us is running Linux kernel Ubuntu want to do a lot of work on that side all the way to the cold running on your device in between. So it is a fairly large scope Engineers to do with all of that. So focused fence to shift for the Austin. I'm in the

past. I worked a lot on back in service says trying to optimize pillows. I build a lot of tools lot of them are open sores. So in case you guys want to try it out, there's a lot of stuff we building the team that you guys can use but the focus of today's topic is really the device. So, how do I ask you up to my staff? So how direct me to user experience? Independent if it's back and front end devices or whatever methodology using to optimize something usually starts. Are you trying to understand what's going on trying to

understand what's going on the system the device is specifically in this case and I can't really understand what's going on if I can see what's going on. If I don't have the right metrics what's really important for us to pull back the right mattress the correct match so we can understand what's going on in can go down and optimize things. Important to first be able to punch high performance porting a number to it super important for automated systems detection alerts in all that stuff helps a lot when I'm

comparing something to see my a progression overtime if things are getting better and getting worse putting a number to each and Lee helps makes things a bit easier so important and then lot of people do that is just have lab a bunch of devices that I used some sort of test harness that will try to reproduce some some some sort of workload there. I tried to reproduce some natural conditions may be put for us. That's just it doesn't work really well. We have just too many variables just too many devices

to many natural conditions Bullseye conditions. We know of and conditions. We just don't know what to really I only on laps we do have a bit of lab testing but we really one data from real user devices. We really want to understand what's going on at the user and it has a lot of stuff. You don't even think of is there a user's device like Android device using a lot of memory. So we getting flushed all the time. You need this a lot of stuff. You don't even think it happened just really happened in the field and you can reproduce that in the lab. So simple, right there's a lot of

third-party services today that do exactly that you go to website to sign up. You got a JavaScript web page you add a timer and start getting your nice dashboards in metrics. You didn't have to do much at all. Unfortunately for us that doesn't that doesn't work is not that rebuilt at symbol besides the issues with scale and most of those services will not deal with our scaly mine with the amount of data. We are keeping in mind. We all said the issue on most of those just working a handful of different devices. They would not run a window see that we have to deal

with that and I don't want to compromise. I don't want to just ignore a few devices and just focus on the few. I want to really get one of you have a good grasp of everything you want by users are feeling I don't want to neglect anyone. Lesbian Netflix also is everywhere. So I have to do with a lot of different network conditions devices will perform very differently on different on different natural condition that makes a berry Network intensive app. So I need to take that into account both in the logging bit. I can't just keep sending beacons all the time because that will cost probe affect

and effect by also need to be able to slice and dice and try to imputed. Okay. This is only happening in that tiny ISP in Cameroon when the user is talking to the wrong address region. I need to be able to tell anyone. So we decided to build which is not uncommon for Netflix. We build a lot of tools internally as something you had a few requirements in mind for you needed work everywhere. I don't want to neglect devices needs to work on all our devices thousands of them. I need low

resource utilization your phone your laptop super powerful doesn't matter what you say smart TV's cable TV box. Not that much we're fighting for kilobytes of memory there. So every little bit counts, so I need something is very very low resource utilization. Especially memory needs to do with different natural conditions, right? I can't just keep sending Biggins on the back inside and eat something that can handle that amount of data. Those tens of millions of devices and users the sessions tend to be very long normally hours and hours

long. So a lot of actions lot going on there. So I need to be able to handle that amount of data. Ideally something your skills horizontally to support our user growth. I'm planning to do a lot of analysis on top of that data. So I need some sort of fast Ferry cuz I want to hear something and I got a response and I want to dye hair. I want to change that and do that again, and he is very interactive process and I I need something faster than waste. A lot of time. I can just issue a data warehouse and maybe a couple hours later to get a report back. He's just doesn't want for us I need

something quick and last but not least something that is flexible flexible. Xbox one can evolve with time. I talked about real user monitoring. So but what exactly are we monitoring in this case we decided to start symbol is everything we just want time no time. For user to do certain action and is a real user actions. They opening the Netflix app on that specific device or clicking on the play button and waiting for the title. Stop playing or maybe I'm seeing the details page click and then you see the

description of the movie. So use a real user actions. look at the morning so bad start simple wall time, but only measuring that where does the timer actually a stop in and start an end that there's no standing there right at least not for a rap that something that will attract for the user is doing End of the day it's really depends on what the user is trying to do what he's trying to accomplish what we started thinking from a user perspective and try to capture a few of the feelings. He doing that action. When does he get feedback from his

action if something happened right like you might be experiencing but he at least got some sort of it back when hitting can he continue doing whatever he was doing. You might not be fully loaded but he can continue doing like for example you open the a list of movies, you might see the title of the movie you might see some description, but you don't see the box art but you can already started reading so you do something you can continue doing what you doing. And when is actually done you have your poop has rendered so we created we couldn't find a single number that

would actually track all of that on all actions. So we basically created different timers that are very specific. Which user action so we end up with a lot of timers? The ones we have probably don't make sense to you. But just to give an example of most of the navigations inside the inside the app from one page one of you to another will have at least let me call TT and TT are so times interactive and time to render. We shoot track the time from the user initiating the action being taken a button or something up until he can start doing something again. Not exactly the

same this Me shows the browser CGI but seems similar to that so he can start reading the titles and then we end the time of their time to render when everything is rendered about the phone so you can view it without scrolling so different different different user feelings don't really make sense. Right? So we couldn't have different timer from clicking play until the first frame of the video is visible to use it and he can actually start watching the title. We spent a lot of time discussing why we should

track or not. Cool Copart Dunn. We we know what we want employment how we how we actually start doing that getting like our hands dirty and then getting that out in the field. So we started simple. Let's just start with a few very simple Libras that we can use on all our devices. We needed one for that at least one for each kind of device are so we ended up creating a few and they do really simple things. They do some sort of fashion tracking they would implement the logic around the timer started creating those events. They will Implement some

sort of ring buffer so I can keep buffering all those messages some sort of logic to the side. When should I flush those events or back-end? Hey is the network busy right now, or should I wait a little bit or disorder watch so we create a bunch of those libraries that was pretty straightforward. Nothing too complicated there. The hard part was actually implementing the timers on the devices. Not just because there's a lot of actions Analog Devices but also because a lot of device do they have new answers to them. So some TVs for example days

support suspend. So I just put them I basically spend the night worried. And when I bring those back is that an actual app startup or not. We need to kind of decide that and we need to find the right Hoops in the code to to find the start and end points. So that took quite a while. I was quite a bit of time to actually get that's right, but we spend time and it was definitely worth it because by the end of it we knew that we were capturing something that was meaningful mini footwell users and we actually rely on those numbers. So that's definitely worth it. But it was it took a

bit of effort. Cool those devices generating a lot of events and how do you actually receive them on their back in so we can analyze them whenever possible? I want to avoid. I want to avoid when bending the wheel so why I decided to piggyback on something you already have we have our data pipeline internal that we use for a lot of other things. He was trial and who been in the works meaning of Alice skills really? Well, I don't want to get into details of the data pipeline. But if you interested in that bed, there's a link here that goes to the blog post that

goes and really really detailed bits of the data pipeline piggyback on dad to send all the devices in the events there. But in a nutshell depending on a few rules that we saw some of them end up going for real-time analysis alerts in this sort of thing. Some of them some of the data gets indexing elasticsearch, but the bulk of it ends up in espresso Amazon supposed to ride service. So we dump everything there so we can analyze it later. That's the Balkan and

everything and where I'm going from on the next few slides receiving dumping everything into S3. And as soon as we started analyzing all that day that we were getting back from the clients. We started noticing things like this that you guys probably familiar with that kind of distribution. Rylix multi-modal nicely distribution was good and one point because we noticed that we were kind of capturing different different moods on on what to use it was proceeding but on the bedside, this is a between especially to automated systems. It's

it's just it's not it's not super hard but it just complicates things a bit more so Headboard Dimensions discount basically happens because we have a lot more Dimensions than we initially thought. Way more than usual. So we started splitting things until we got a bunch of those nicely looking guys here nice and normal distribution. And as you can imagine we end up with very high dimensional data. We have a very complex device categorisation system language harder. The thing is runny on which platform the thing is running on which you why is it running on that thing which version of the UI

which version of all those libraries there. So very very complex device categories, which way things by goty your region sub-regions country is an area's and CDs and so on so forth. We have a very granular natural classification Dimension ESPN, which show Amazon region you talking to which CDN box you talking to so we we need to classify by those two. As you can imagine we have a lot of tests to induce can affect the user experience. So we need to split by a betta in a beat SL2. So more and more

that Max was just keep adding takes us have more specific classifiers a warm vs. Cool startup, they're completely different. So I need to be able to split those web page is they don't have the notion of what they have and then subsequent visits and those going to behave different. I need to download different things. So I need to split those two. So keep keep adding things to hundreds of billions of events every single day. So terabytes of data coming from just performance data coming from this device has every single day and that's just talking while we

Animal, thanks. A lot of data penalizes already challenged capturing. Analyzing. Dad even treat year. As I mentioned we don't want to try we've been through will Soleil first thing I'll try is just use one of those third-party data analysis Tuesday when using a bunch of projects internal well and you get your nice stash poison you your grass in aquarius so you don't have to put into it. So just load all day in there. We've decided to pre aggregate all of

that into all possible combinations of dimensions and still was just too many rules for that to handle. It was very sluggish. So okay back to the drawing board and we need to use something we really know and understand. Well, that's a pain patches Park to get some stuff together. We were not aware of anything that could analyze that amount of raw data life with white berries without costing us a fortune. So, okay, let's just stick to Dupree aggregation bit. Turn all possible combinations of dementia that reduces things quite a bit weird

Apache spark. So that his pasture does that in-store us all this data back into a hive table to S3. S3 back high table. And when I see Aggregates we started simple with just a quintiles. You're fifth you're 25th to 50th and and and so on so forth. So we start that back and then I can use Presto aquatic Prairie Injun to carry that data. I was getting the numbers back to get my my data back you've even using press that which is quite fast. Okay, so but what can I do to actually make dad even better? And skunks Druid. I got to know how many of you actually heard of doing

before another open source project. We we tend to use a lot of Open Source projects year later store. You can imagine it as kind of a cash in front of a tight table that can also do aggregations so I can do a lot of interesting things for us. So we loaded all those all that data from The Hive tables all those pretty aggravated 1000 and we can execute a sink where he's at work really? Well we're getting numbers were looking for right? Pina kinda imagine a lot of the analysis. We do requires rolling up right

and a bunch of stuff. I want all the apps. Medium app startup time for all Samsung TVs and we have a lot of Samsung TV. So I need to go enroll those up. I have pre aggregated data. And again, when does generally a weighted average of averages and you know, that's not ideal right? You can get some really really bad numbers there and see how he looks any most cases in Loop. Okay, when we were getting a lot of Roseanne were looking mostly after the middle of the bands that you're mediums and you're like

maybe 20 V 20 V. Are you were quite well, but once I go to the ends of the tail in the hat off the distribution or when I'm getting a lot of things can get really ugly the numbers. Call sometimes and that can lead us to some really bad decision. So I couldn't live with that. I just don't want to the burden of people making bad decisions for me. So. How can I how can I choose sold that I get the more Precision without having to go to the old I draw Roy Vance. Hence comes the greatest catches and that's the Slidell like the most epic has a man has to go to

plug there. So if you Analyse guy thing, but it in a nutshell Adidas catch is data structure that kind of vaguely resembles a much larger than a sound like a sketch Boogie still preserves take the basic characteristics of that larger dataset. What is a lot smaller and a lot faster to operate on top of that and it's okay Works real? Well if approximate results are okay, if you don't need the exact numbers and four I said, that's fine. As long as I can remember this precise enough. I don't need to correct them as long as precise enough. There's a lot of data sketches

around Yahoo has a few where we opted going with to digest and now they're open source projects there. Which is really good for a rank based statistics like when when tiles and histograms and in all those so we did works really well on those cases is really good too because it can be easily paralyzed all those individuals catches which end up being just blobs. I can merge them so I can distribute that much much much more until I actually get to the final sketch and I can extract all the statistics from them some really really fast and really precise

approximation. And the other cool thing about that. I can I can fine-tune it I can make the sketch bigger and get better Precision, or I can make it smaller get last Precision so I can kind of fine-tune it works really really well in cases like that. So I kind of sketches with a previous solution. So sparkle Not Angry Anymore you would aggregate the teeth are just sketches to Bob's and send that to do it. Julie Dalton doesn't support in digest catches my new phone. So we have to wait on module to do the aggregation injury, which basically is

merging the sketches and then expecting the day and then getting that back to us. Dad worked amazingly. Well a lot more precise. I love lot lot more precise. What's really fast steal items of milliseconds to get a multiple queries back and this is basically dissolution we ended up sticking to but it has its own DSL for acquiring still has a nation to be in their face, but I don't want to go and teach like we're not the only consumers up that date. I have you I Engineers working in that I have a guys working

with is pee supposed to mean that day. I have a lot of people assuming that I interested in that data, so I don't want to go and teach them how to use the Dewey query language to get the data they want out about so we ended up creating a nice looking guy to spread the word for all those things and make it a bit easier for them. Nothing fancy, but you can slice and dice by basically anytime mention I mentioned before you can go buy anything you can filter bunch of stuff out. There's a lot of difference between stations to itch and you'll

do on dashboards so pretty pretty flexible. Don't got any videos of the app itself, but I'll just go to a few very time when you use cases that are users have been the first one comparisons. You want to compare something. That's something faster than the Ottawa. This is simple line charger passage of time over latency year duration. So the highest slowest a good thing is that I'm only comparing mediums all iOS devices and I'm breaking by country and I only feel good for pain cuz I don't know if you guys can read that there but

I have Brazil and India Mexico and the United States there. You look at this. They giving architectures something smells funny here. That's what it was. When I was looking at this doesn't make a lot of sense. I know it and I know where my regions are. It doesn't make sense that India is actually faster than Brazil. That's kind of interesting. I want to bring that up because if I just an another filter through it, I feel to buy only a single device iPhone C is x + things kind of changed iand goes all the way up here in case of any gas

why that happens why we see that thing. The / it's right by us in our case. It's just that DUI super powerful, but you need to know what you're doing there because you can get really wrong decisions. If you're not really careful in what you're doing. If you decide to go that route via tension to like how much power you give users and how much they know about what they are doing. You can get to really Roxie different divisions. Like we have heat maps and histograms and I really like the does inverted the a graph it's really cool to

I like it to compare things that are kind of spotty all the way around like basically this case you have the the durations here in the amount the percentage of transaction to finish under that amount of time here. So perfect for me in some cases makes a lot easier to visualize and spine issues. We do the same thing with a beat apps I mentioned before we like to compare a beat ourselves if I'm doing it if I'm doing some sort performance optimization in device. I launch don't really really is anything that can use as they wasn't a T Test first. So I launched as they be past with

different sounds for different levels of organization and I want to compare the sales and I want to see which one actually like has keep the user better performance for that action. Then I can do that for you to be using Icarus few very very specific views. I mentioned the warm vs. Cold. Sometimes I can just optimize the time so I can make cold startup faster, but I can also improve the rate of 1 startups which are a lot faster. So I want to keep track of those who have a few unique things done for our Arias cases that I had a DUI to

anomaly detection Very important, we found a lot of really cool things just browsing through that and Industry Aquarius lot of issues we fixed but at the same time we missed a lot of things to we just have too many combination of Dimensions to analyze manually all the time cuz you're just impossible. So we invested a bit of time to get anomaly detection running on dos X Series Showtime series of all possible combinations of Dimensions. You don't have to go super fancy. We started with just simple

standard deviation and I was in there was that is the simplest thing you can do and you were ready. You'll have some really really cool results with found some interesting things and then they are like afterward we we started getting fainter and fainter. But if you have too many dimensions, you have to check that all the time invest time in the normal ejection just helps a lot we get the load. So it just looks funky. So we go in and Eliza bits for it and then we find a lot of things we would never have found adjust. the App Store almost almost there

so far I talked about so just time for you to do. Perform some sort of action. This was the beginning with started with that. And with time we started any more more things like method racing statistics around Network require Austin and Endo skin to be be more high frequency band things that happened to frequency for me to just start creating events putting that in the buffer. We just feel the buffet right away and I'll be screwed. So what do I do with those high frequency events? I take for

example, Natalie prass. I can capture a lot of stuff like response size SSL handshake time DNS resolution time that can capture a lot of stuff just from Network West which is really important for us, especially for our teams working with I have peace we do reading for information on who do you want to get her that but I can just create event for every every single natural request. So, how do I deal with that? I'll send you my everything. Hey are you might just go simply right? We're not big fans

of sampling is not simple at all. Like we just get every single event. I mean we do that for a few specific reasons getting sampling ride is very tricky has her so plus we don't want to miss rare events like things that I hate this specific Android device on this tiny ISP on this tiny country. We don't want to miss that anything you want to ask you to be able to find that data. So instead of instead of something we went with different approach which is pre

aggregating things in the client and then sending that data back less frequently. And when I'm saying aggregating the client, I'm not talking about just they calculate the average for the hour and then sending that back as you as you know that I'll be getting things and that's awesome. That's not good. I really want to see the whole distribution. So we went with the with this guy here frequency histograms were really great. In this case. I know Chrome uses the same approach on a lot of things and

basically it is basically a you have a bunch of buckets of specific times with numbers in request size. It would be the size of request but specific numbers there and every time an event happens within that buckets Ranger you just increase the counting keep counting and then every hour or so or whenever possible to ship that dated back and then we Analyze That Cool thing about that is Dad. For our resource-constrained devices, it has Platinum reviews and we can fine-tune that to the device. If I have plenty of memory. I can increase

number of buckets and get more Precision. If I don't have that much then I have last buckets. I have less Precision, but it's flat. It doesn't matter how many vents I guess. I just keep adding to the town as long as they don't but it doesn't happen. Very fast. Don't you I just keep increasing count very very very simple. So what really really great for a resource-constrained devices? Deza, deza got you here memory and depends on the amount of buckets I have right now so we can see a lot of

holes here. So that's just wasting them in and we not maybe not getting enough Precision when I wear more most of the action is happening. So how can we do in and go in and prove that a bit? So we borrowed the idea from chrome Jim Ross can has a great talk about it is we just using exponential! Any worse really really have really tiny buckets where most of the action happen to have high Precision there and ask for the tale you would like the buckets that the bucket size increases you don't lose that event. You still capturing that but you use Precision a

bed bug that works really really well for us to solve exponential buckets. Using the same basically the same approach we had before spark plug wire things store to hivelord to do with you why to Perry things and then we get some pretty graphs like that. We have some nice some heat Maps. This is DNS resolution time on the x-axis that I have a few hours of the day. And then I have the DNS time that the time to resolve DNS there on the y-axis and the doctor to call her the more on samples with in that bucket, right? This is us very nice and uniform good looking

right and then you go. The rest of the world is things are all around all around. This is really important information for us so we can go and slice and dice this even more and we can start working to improve dose. We can do things like that bubble chart, but it seems line chart, but then you have the size of the bubble which represents the size of the population. This is I think it's a transaction duration for specific requests on for large us isps where we can go on the x-axis we have our of day and then the duration that and you can

clearly see that specific hours of the day we seen decoration in most is peace in this generally due to congestion in that ice be so we can go in and start working with those guys. I don't have to look at times. I can look at accounts. Do I find some interesting things there? This is the amount of DNS resolutions per device for single user device. And then Group by the kind of device device ID here in the bottom is Smart TVs in the top. I think it's I think his gaming consoles and this is an interesting you should we worked a while ago where do devices head issues with DNS cache and data

resulting a lot more DNS requests and they should so we found that work on that make things a bit better and that's that's the thing that's about it. So I have to still have time. 2 minutes. Okay, cool. Okay. Let me ask you to do something else. I have time I have to do with that information. We actually got a really nice map of the internet at least two regards to how it actually talks all the devices talk to are back-end Services, right? So, how can I actually use this information directly to improve our

products? I can just go to developer see this number here is go make it better doesn't work. Can you do engineers a really great at solving problems if they can see the problem. So I just show me a number is not a problem whose number so how do I actually may use that information to make it better for the user? I have to make our developers improved call the Magic Bullet, which is exactly that it is a physical box. Where are you is a physical box that stimulates can see any ISP that we have

customers on on the developers desk and we gave it to developers internally so they can reproduce every single ISP on the play. I didn't see how that feels and let me see if I can actually play this video guy thing. I didn't have the ability. overnight so this actual actual device I'm simulated urine is being cuz Exton, right? Open the Netflix app. Everything looks pretty good. Right Netflix main page works fine ice because Excellence is a bookmark II high speed all good.

clear everything just be that up. Then I go and change that to a different ISP. That's a nice be in Cameroon. Different Network characters that you can imagine cuz I was not great. But yeah, it's kind of kind of tricky and then we try to open the Netflix app and then we wait and then we will be longer and we keep waiting and there's not much there eventually something will show up. Eventually or something. So what I want to bring with that is that this help tremendously internally week

after this data and we can use not just money through things. But also bring daddy should develop result. Once they see that one day is Wednesday experience that they feel it and they can walk to make it better and the end of the day is is is not that hot in most cases. Like sometimes they just make stupid decisions because they are used with the nice 500gb connection from that we have in the office so different different worlds and they need to experience. And that's the thing all I had. Let me go back to my previous lines.

And there you go, so I don't think I have time for questions, but I will be here after the reception. If you guys want to chat and exchange some ideas. My contact information is there I don't feel free to reach out to me on LinkedIn and I'm always happy to answer your questions or or discuss anything performance-related. That's about it. Thank you.

Cackle comments for the website

Buy this talk

Access to the talk “Real user performance monitoring at Netflix scale ‐ Martin Spier”
Available
In cart
Free
Free
Free
Free
Free
Free

Access to all the recordings of the event

Get access to all videos “CMG’s international IMPACT Digital Transformation Conference”
Available
In cart
Free
Free
Free
Free
Free
Free
Ticket

Interested in topic “Software development”?

You might be interested in videos from this event

September 28, 2018
Moscow
16
161
app store, apps, development, google play, mobile, soft

Similar talks

Allan Zander
Chief Executive Officer at DataKinetics
Available
In cart
Free
Free
Free
Free
Free
Free
Brian Wong
Technology Fellow at Capital One
Available
In cart
Free
Free
Free
Free
Free
Free
Stuart McIrvine
Director, Product Management at Broadcom
Available
In cart
Free
Free
Free
Free
Free
Free

Buy this video

Video

Access to the talk “Real user performance monitoring at Netflix scale ‐ Martin Spier”
Available
In cart
Free
Free
Free
Free
Free
Free

Conference Cast

With ConferenceCast.tv, you get access to our library of the world's best conference talks.

Conference Cast
561 conferences
22100 speakers
8257 hours of content