About the talk
RailsConf 2019 - The Life-Changing Magic of Tidying Active Record Allocations by Richard Schneeman & Caleb Thompson
Your app is slow. It does not spark joy. In this talk, we will use memory allocation profiling tools to discover performance hotspots, even when they're coming from inside a library. We will use this technique with a real-world application to identify a piece of optimizable code in Active Record that ultimately leads to a patch with a substantial impact on page speed.
Hello, everyone. Hi, or how di I guess I got that. My name is Richard Seaman. You are in the dark for the life-changing magic of tidying up active record allegations on the internet. I go by Janine just about everywhere. She needs. Cam seems on Twitter Sheen's I can get Hub. It's not a really common thing. So some people who know me know that I I love Ruby some people who really know me know that I am actually married to Ruby. This is my wife. We have two beautiful children. We also have a couple of dogs. So this is where my
dogs his full name is Hans Peter Von Wolff V. He is actually the fifth hans-peter phone wolf. My family has had this tradition and they're not all genetically related, but they all are black and tan wiener dogs. It's a bit of a mouthful. So whenever we like our screaming out the door, we don't say that fold a man instead. We just call him Cinco. He likes sleeping with his tongue hanging out. I don't know why I work for a small startup based out of San Francisco. It's called for Roku. And we will do the rest, So in fact, we've had a booth at real some for the
past seven years since I first joined and since you like performance, or you probably wouldn't be here. There's a talk given by another engineer Gabe on Multi database in rails, unfortunately have to check that out on conference, but I highly recommend it. I got to see a sneak preview and you might be wondering what's up with these fancy gloves. I got into gloves. It's not because it's cold here. It's because that I heard that a lot of companies are really interested in ninja Developers. All right. So in actuality, I I
hurt my hands and you might be unstable. How did you hurt your hands and I will tell you extreme programming. Just too agile to Adderall. Okay, it was actually regular programming just a lot of it and my physical therapist says it's going to take a couple of months to heal up. In the meantime. I've actually learned how to I'm back at work and I've learned how to completely drive my computer using my voice and my eyes. So here's a semi video showing the process just if you're kind of interested and I am a little bit faster now.
but Sun cap park near each each each it works a little better. I got a microphone and all that. So you might be wondering how I was able to make such an amazing slides in such an amazing presentation without sleeping of the touch a keyboard and the answer is I wasn't So for that, I would like to offer everyone to give a hand. to my hands Caleb Thompson instead of working on his part of the doc. We spent the last week pairing and and building out the deck. So
yeah, I'm super grateful. Yeah, I was I was originally just distraught in my on at the cancel my talk and then Caleb stepped up and like I think the deck is like way better than anything. I would have come up with so okay. Alright knows who this is. Okay, this is Addison. Marie. Kondo. She's a world-famous organizing expert. She has books. We actually got her manga copy of the tidying-up. It's actually really good. She's a best-selling author and has a Netflix show. I wanted to show you a clip of Marie that yet that she's currently working on
484 new shows. A lot of people don't know this but she's actually also a programmer and so here's a little clip. object allocations Optical locations things to be more simple. I don't know how to fix it. We have too much stuff. Yeah, tidying-up active record allocations. I think it's I think it's going to be big lot of lot of Mass Appeal better than Defenders. Anyway, I mean, come on SO sadly Marie couldn't be here with us today, but we have the next best thing. I would like to introduce you to her pet rabbit. Call me if you have
your key. I don't think everybody speaks English. An English Lop rabbit loves to go. I love colors. So I love movie. I also love a woman. Today you were here to hear it up climbing gear Ruby applications. Mac computer 81 bugs finally keep on as of the fat pad. Great. How do I know if something Sparks Joy? is better According performance problem then it's part and Danny Sparks. Yeah, that allows them kind of like a technical correctness technicality. I hate this object. so
Where are we? Wait till on weekend, All right. Okay. Thank you very much. I see where you're going and and that's all the great point. So to put our objects on a pile we're going to be using two tools the memory profiler and derailed benchmarks. First. Do a little Prince Sign up here is a benchmark of two methods. They do the exact same thing in two different ways. Both of them determine which is the larger of two inputs. Take a look at this code. It allocates an
array in order to perform a comparison on to it. But this other code is just going to perform a comparison on two inputs. Now, which of these do you think is faster. Do you think the array version is faster? Okay, I saw a hand go up. What about the comparison version? Alright, okay. Yeah, so I will tell you the comparison version is faster, but does anybody know how much faster shout out a number? 17 time anybody else 3. Okay. Did somebody say pie? Was that a okay. In fact it
is doing the comparison with our direct logic runs out twice as fast as doing it with the array allocation. And I don't know for me. They're semantically doing exactly the same thing. So to have such a dramatic performance difference between the two is, you know, pretty pretty Stark in general touching memory is going to be for is going to be slower than performing calculations. And this is not just ruin Ruby. This is also true of just about every programming language even in see that it's like a mantra of like Malik Oslo don't malachor. It's like only Malik a few times you have
to Since we know that Ruby allegations are slow and we seen at least one case where we can write the exact same logic and perform if you were allocations, we can optimize by removing those allocations at the brewery, Louisville. And that way if a program uses fewer objects than it can run faster. Ruby is also an object heavy language in this example. We are returning to outlets as opposed to just one and even though there's no array in here thing. Call doesn't produce a ray for the rail asses down to the REI in order for Ruby to run this code behind the scenes
that actually allocates an array. And so, you know, whether they're using object and allocating objects, even when we are not doing so directly. If we find where we are allocating a ton of objects than that is going to lead us to potential hotspots that we can optimize. I want thing. I want to mention before we really really get started is that not all obligations are created equal and some take longer than others do in general. I ignore the number of allocate number of objects in and said I look at the amount of memory. I find this to be a better technique. disclaimer
my personal opinion and also will not a totally perfect metric. I find that the percentage of objects decrease roughly translates to the performance percentage Improvement. So if I can decrease the application by like 5% objects, then roughly I can make it faster by about 5% It is just kind of back-of-the-napkin math. So you will always have to go back in and double check yourself because oftentimes in order to do that. You have to add an additional comparisons and your comparisons might actually eat up all of that performance saving.
The real anything about this is that bites reduced is a consistent metric. Typically whenever you are doing a performance work, you constantly have to Benchmark make a change Benchmark make a change Benchmark and the problem with traditional benchmarking with running timings is that they're inconsistent you get a high variance. You have to run it over and over and over again, you know, you wouldn't Benchmark your method once you have to Benchmark him at the ten thousand times to see on average is it slower or is it faster? So with this method all we have to do is run it once before
and runs out once after if it would if you shaved off 5% of August allegations, it's going to be really really consistent again. We do want to remember that it's only shorthand and we always going to have to go back and rerun those actual timing performance benchmarks. We can't just take it at face value. So why aren't you supposed to be? how to get all the objects into Tom Holland got a little off the rails there. That is an excellent point. So let's take a look at it right now.
All right. Call me back in a hot people. Thank you. It's a Memory profiler. Jam is going to allow us to take all the applications in our program and view them behind. The scenes is a nice rapper for the object space allocation tracing. Even find a profile a rails application then you can use the derailed benchmarks Jam. This is a gem that I wrote and it will let you hit an endpoint in your application from the CLI. The benefit of this is that you don't actually have to start your server and refresh it in like kill it and like
rulloda page and click a button do this other thing like once you like command and like you give Benchmark results and it saves me a lot of time which is why I wrote it today. When I look at a real world case study of these tools and using a and an open-source application of rails application that I run to maintain called code triage. First of all, I do need a little introduction don don don don don don don. Thank you, Aaron. I call this section. in adequate record Can you pause live long enough on stage than people
clap and laugh? Okay. So first we're going to run derailed against a real life application, which is code triage and this is going to give us a pile of memory allocations. Okay, so that's a pretty huge pile and not all of it. Actually. That's just as far as keynote would let us grow up. The output is going to show most of the memory allocations from top to bottom and I start by looking at each file in that order. In this case, we've already looked at several or I've already looked at several and I'm going to show you the process of how I do this. Just I'm going to skip to one
that's actually interesting for you. Once I pick the file. Then I need to zoom in and get more information to do this. We can use the exact same command before but now we need to tell memory profiler and derailed only focus on a single on a single file through the new output when we're filtering by single file. You can see it. It's much cleaner now and it looks like a majority of allegations are coming from line to 70 of the attributes. RB file in active record. Let's figure out what that does open that file. I worked.
Here is our problem line number. Let's look at it and figure out what does white exist. So it's inside of the sponge method in active record and respond to needs to return whenever we have a method on that object. Look as active record is backed by a database it needs to say hey what columns are in this table? So that it knows what what methods are available on an object. And in order to do that. It needs to be needs a string. So typically when you call respond to you pass in a symbol, but because of the way active record is storing the call names
in the back end. It needs to convert that symbol to a string and that's where our allocation is coming in. And then we are calling respond to a lot. So as a result is allocating a lot of strings. The once we have that string then we iterate over each of the call names and we see if one of them matches the string that got passed in. And then once we have that then we turn around and check if the actual object we are using has that attribute. It's going to be a separate API. Alright, does this object allocation
spark Joy? How do you feel about it? Alright, okay, it's in use so it's useful it's doing a lot of allocation. So not very performant. It does help. Our code for does el barco to be cleaner. Yeah, and I don't know is an absolutely necessary. Okay, good question. We don't really know yet. Let's find out and see if we can refactor this code will maintain incorrectness. Looking at the code the name variable must be a string because the database called a store or store to Strings. So we have to make a comparison or we have to make a conversion,
but maybe we can make it somewhere else. My hypothesis is that instead we can find a way to perform the column check with a symbol directly. Automated application does not work Joy. So let's just throw away the trash stomp on it and get rid of it. We never want to allocate a string. Let's get rid of that code and beautiful getting yeah, you must honor these called before you get me the okay that is that is true. This code has been in hundreds of thousands of production applications. It's served us very very well.
Even though it doesn't bring joy right at this moment. I think we should respect it. So thank you code now we can get rid of it. Unfortunately, that's all we did then. All of our applications would break. So we all have to figure out a way to convert our symbol to a string without allocating. To do that we can introduce a hash the column names are they the keys of the task will be our column names as a symbol in the return value can be the column names as a string. This
is our old color code. We can replace it with this. We can use the hash to put the check for if the column exists and then it returned the string which we can then use for return has attribute. We also get a little bonus performance pump because in this case 1/2 look up is faster than iterating over every column in that table. And one other thing to know is that calling to underscore something generally means an allegation or a type conversion. I'm in this case the most common thing that we can be doing is
passing the symbol. So therefore calling to Samana symbol is just to return a symbol no allocation. We're good to go. So that's pretty much that's pretty much it. How much did that help? Are the patch reduces overall allocations at the web request level? So I'll request is coming in stuff is happening and being served by about 1% of total memory for code triage, but faster. On average because render time to be about 1.01 X faster, which roughly matches the 1% that we saw. So I mean, that's it. I think we're pretty much
done and ready to move on and not done yet. These results are statistically significant. So who knows what this is? If you said that this is a t congratulations you passed the T Test. best mods Alright, okay. So here's some example code and an example of how numbers can lie. We've got two objects and ran a string. We can take a look at memory profiler and it will tell us that if we duplicate each object. Both of them are the same size in memory 40x, which is the standard for an object in Roofing. Pop quiz, which is going to be faster duplicating that array or duplicating the string.
You don't that was actually more of a rhetorical question. You don't have to answer. So as you can see duplicating the array is actually faster and then we can what duplicating the string is actually faster. I'll get scratch what I said about the array. So with this information we can so it looks like all of them are roughly about the same. I could lie and cherry pick and just pick one and be like, hey, look I ran a benchmark The Benchmark told me it's faster, but that would be lying with numbers and said we can use students test to determine if our numbers are
statistically significant or not. Whether we can this Benchmark actually is showing us a difference or if the difference is just caused by Randomness. So students. It was introduced in 1908 by William Sealy Gosset who worked for Guinness I was very dramatic. Why is it t test if God created it? I'm sure this is going great in the recording. Get us wouldn't allow any of its employees to publish their findings because they were worried if if they said oh, this is how this is what Guinness is using then we'll use it too and they weren't they they were very worried about that. So instead. It
published under a pseudonym of students. What exactly was Guinness doing with this information of the problem is that that they were trying to solve is that the quality of their product depends on their quality of their ingredients in different suppliers quality might be very or batches might vary and while they could take a sample from here and enable computer and compare it to a sample from here and the sample from here. How did they know that they weren't actually just comparing Randomness and in that the
values were significant T Test comes in. and so let's get back to object a location. Okay Bear show in in my work. I use a very Advanced statistical programming tool set it helps you to. Excel and there's a function that generates a t-test for you. If the result is under a certain threshold, then it means the change likely wasn't a result of just random chance a 0.05 is typically pretty good number. How was your tank statistically significant? Yeah, so got merge then.
1.01 * Papa isn't much to brag about you need to get Temple request Mars in to make the problem 1.1 * 5. All right, that's true. But it is a 1.01 times speed up at the total web request level. So it's not just like a micro Benchmark. Theoretically if you had a hunt if you needed a hundred servers to serve your application, then that's something right. good point but still it's only one. That's so these good people and example of using my word of changing timing at what do you say you want to see another one? All right.
That is the type of enthusiasm I'm looking for. Okay. Well, what's our first step? All right. So we're going to use the real again. Here's our pile of memory and like before I started for my top and work my way down. Let's let's zoom into this file and see if we can make anything faster. This time I will be taking a look at active models a Time Value. RV. When we re run again, we see that a lot of objects are being allocated online 72. So let's go right there a you'll notice that this method is
named fast strings of time contrary to popular belief just because you name a method fast. It's not like an optimization thing. So online 72 which is what this year. We are matching our input string with a constant that is a regular expression in the reason that this allocates a lot of memory as you can see on 74 is that there are a lot of groupings that are mats that dollar sign one dollar sign dollar sign three and that is our major application here.
these objects location I'm working into that. I have to find out what it does. What does this method do reading the method I can see that is string comes in and then we are going to return a Time object based on this other method called new underscore time, but can we make it faster on this line, which is using all of our regex matches. It's not really doing anything. It's just taking those and passing them to go to the new time method does not much to optimize their. What's going on here? This is a dollar
sign 7 we call two are on it and which makes it a rational float basically and then multiply that by a million and then call to eye on it. I didn't know exactly what this code did the first time. I saw it until I ran it to see what the input was and what exactly dollar sign 7 is dollar signs 7 is a string that looks something like this in that process of calling to are multiplying by a million and calling to I turned it into an integer. So when I look at them like they're already pretty close really don't we just have to like Drop That. And so that's kind of what I
did. We we have to add some guard checks. We have to make sure it starts with a. With make sure that the length is equal to 7 and then we could just rip out that. I'm directly call to eye on it is a faster in this case using micro Benchmark and it was in fact faster. But is there more that we can do with this method is is there more if we go back to her allocating the regex can we do more? I mean, I don't think so. So we should probably just Give up now. You practically know nothing
about how this method is used. Good point maybe if we just dug deeper. Maybe if we knew how this message was used we could avoid would have been called in the first place or optimize it some other way. So let's take a look. We're going to add a collar. And this is an example of one back-trace that I got from this method that it took a lot of time of what's going on their backs races and so not going to bore you with that just immediately let's jump into the exciting part the time Value method we're optimizing is Caldwell generating a cache key.
So let's open up that file. This is how we generate a cache key for for active record objects, if he follows the sector is backwards we see what is happening is that active record is taking a string from the database and converting it into a Time object. So that's what that that's what our method is doing. That's what that fast tostring method is called and then that is used to generate a cash version. This word. It's called on my 99 so we can go straight there and conveniently, it's just below where we were looking at.
magic move Okay, here's all carp our culprit. What is happening is that in some database adapters such as postgres the driver returns a string and then rails has to cast it so it does that lazily. So it only does it when you actually use it for the first time and in Alot applications. It's only ever needs the update of that information when it's generating a cache key. And so in this case it is taking the string turning into a Time object. And then we're turning right back around and using that to generate another string for
the cash version. So it means that we might be possibly able to just take the string directly from the database and this is what we get back from Liv pique you and here's what it looks like when it comes from the driver and here's what we need. So if you look really really closely at this, you might see that if we strip out all of the special characters We get the exact same format. these our location what what's a good luck? Is it usual? Yes cashing is very useful
feature. Is it already fast enough? Well, you know the code to create lots of allocation. So it's slowing us down. Does it help our code to be cleaner? Will the original code that we're looking at? I would say yeah, like literally just one line. We're grabbing like try updated that and it's extremely clean. Is it absolutely necessary. I say, no. It isn't we can directly convert from a string from the database to that cache key without having to do an expensive allocation. Let's take a look at how we can do this differently. My version to go to the little bit
more complicated guard statement to build on the existing code add some cyclomatic complexity and adds new methods that have been used that are going to get maybe a little harder to understand what's going on if you're just looking at it. So I know my version is not cleaner buy it at all. Does that mean that the old code Sparks Joy kind of depends on your judgment before we decide let's look at the performance impact. The patch reduces memory allocations by 5% our last patch only reduce memory allocations by about 1% So it's it's Lots bigger than the other batch.
Is it performance? Yes I say so. What? Well, it turns out that every time conversion is extremely cpu-intensive and that's my target application when I turn this on got about 1.2 3x faster. That's pretty good. That's why I said, you know that memory decreases kind of a proxy even better. It's statistically significant that 1.2 3x works out to be about 19 fewer service required to serve your requests Thank you. Finally some praise up. Well now we must think this old code that we have used for many years. Thank you.
Application is glossed over something which is that this is me optimizing code triage and then Coach realizes getting faster when we talked about Benchmark, and we also need to consider different cases. For example, my really big change that I just showed you. I'm like if you were servers all of you can like spend less on your Roku Bill do it is it would be great. I love it. But unfortunately not all database adapters return values in in string some have actually already been optimized to
do the conversion directly from the driver layer. So my sequel is an example of one that does that it converts the time objects to it to convert a string to a Time object automatically to Ruby never even sees the string in this case that optimization is accident make my cycle users code goes like tiny tiny little bit slower. Which is kind of sad. Just when I thought I got some positive praise but the opposite happens as well. If you look at the respond to patch things that one. No one X
faster, but that was just for code triage Benoit from the R-Spec for team use my same patch on their application, which was heavily using Draper with calls respond to a lot and they actually saw performance improvements of 1.5 3x faster. So not only are you not only ready performance patches, but you are writing before his badges for a bunch of different cases and they can respond they can respond differently. So I'm sure you're excited to try this out and drive it for yourself. So we are going to take a second. Well, you know, I'm sure sir.
Thank you. This is thank you for being my hands. Yeah, well, I am pretty comfortable adding around and back traces and it requires a good amount of ability to get in and Navigator project. But it's a little harder to do when you can't type and Caleb here Isabella, pierogi and great handyman. And you said you had an idea. enough with the handguns So yeah, this is looking at before and you said they could take a lot of time looking through each of these lines and it involves comparing dozens or
hundreds of stacked like me anyway, so what if there was a tool that could actually make that a little bit easier? So this is that same set of hundreds of calls to call her but organized into a tree structure with counts of invocations for each line. You can visually navigate the collars and see that the code calling your fast during the time method changes path and you can kind of see that it comes from different places which will help you to investigate why it's being called in the first place. So that's a pretty complicated. Let's take a look at a slightly simpler example of a
single test run. The update includes the location which is the line number and file. For each color in an includes the frame name, which is usually the method or maybe a block identifier. And that this is all so far the same information that you would see in a back brace so that the same backwards that you were just looking at or he saw an error this that same stuff as well. It also includes the number of times that the location in occurs in the contacts. If you had seen this 400 times then maybe one of 100 of those would have been from this line.
And finally the symbol that used to visually integrated indicated tree of the colors and where they're coming from. So you can look at the big picture and find where method is called most often and that's usually a good place to start investigating as you can see here. This one was called twice instead of four time instead of one time where choose all of the other good branches. So this type of you is also useful for other types of Investigations. Do you have a piece of code that you're not sure if you can delete you seen it puts when it print something, you know that it's being
used but you're not really sure from where and this is a great way to track down where that's coming from. the tool that does this is a new debugging tool that Richard and I are working on called Wentz which means from where it's API is pretty simple wherever you would previously have been doing York race around of like 80 bullets 80 equal signs to find things visually followed by printing off the colors. You just call when start when And then at the end of your test print out wentz's tree representation winds collects the array of colors for Eddie Jen vacation in case a tree
that makes it easier to navigate that call stack. It's still in early development. You can try it out for yourself out. And that's it. Daddy Let's leave you very much Caleb. So for a review first, we want to take all of our object allocations and put them in one place and put them in one pile where we can see them. We then next one to consider each. Does it spark Joy? Finally keep only the objects that spark Joy. So hopefully I'm convinced you that I'll just allocation hotspots are a really good
indicator of possible performance optimization locations. And now I think you're potentially ready to try these same techniques in your own code or your own Library. But that doesn't have nuts in it. Thank you for coming out talk. Sophia thank you. My name is names. This is Caleb and go forth and allocate left.
Buy this talk
Access to all the recordings of the event
Buy this video
With ConferenceCast.tv, you get access to our library of the world's best conference talks.