Emily Giurleo is a software engineer and avid Rubyist. This December, she'll start working at Numero, where she'll help build the next generation of campaign finance tools. In her spare time, she enjoys creating tech for good causes, reading fantasy novels, and hanging out with her pets.View the profile
About the talk
Did you know that Ruby 2.7 introduces a new method for manual memory compaction?
Neither did I.
Then a user reported a bug on a gem I maintain, and well...
In this talk, I’ll tell you a story about how one bug forced me to learn all about memory management in Ruby. By the end of this talk, you should understand how memory is allocated on the heap, how Ruby implements garbage collection, and what memory compaction is all about!
Emily Giurleo is a software engineer and avid Rubyist. This December, she'll start working at Numero, where she'll help build the next generation of campaign finance tools. In her spare time, she enjoys creating tech for good causes, reading fantasy novels, and hanging out with her pets.
Well, here we are, my friends. It is that time of the day, in which we are about to jump into the next live talk. Okay. Well, there's a couple of announcements that I want to make sure are out there for all of you who are just wondering about some really important things. So first I did just announce the Rubicon 5K or 30 minute exercise challenge. Check that out in the open chat. If you have not, this is very important, I hope to see you participating. The other thing is there's some there's an ever-growing list of amazing slack channels around various
topics one which I want to call out just cuz I think it's fun is the great Ruby cops sandwich make off if you're interested or want to know what the heck. Any of that is about. Don't forget to check out that channel 2. And I'm also, we had heard from the facility manager that there's some, there's been some crowding in the kitchen. The current facility is going to like routing hallway, traffic sometimes, blanket, couch crowding. And there's a slew of furry friends that are just making their way into the conference without a
ticket. So not to say that that's bad and we welcome everybody of course but it's really important to remember that you know be nice to one another and your facility you know? I mean like if you're going to make a sandwich or going to make breakfast or coffee like it's okay. The talks are recorded, you can take your time. You know what I'm saying? So with all that, without further Ado, here we go. I would like to introduce our next speaker. Emily. Emily. You want to join me on the stage? And sure we are. All
right, we're good. Good good, good. Good with that. Without further Ado, to all my conference rooms out there. Emily it is all yours. Hello everyone. Thank you so much for coming. My name is Emily through Leo and this talk is called the bug that forced me to understand memory compaction. So in this talk, I'm going to tell you a story and it starts with I thought that I had until recently working at mongodb. I helped maintain three fairly popular gems, Ruby
driver Mongoloid and the Beast on gem on gem, which is the rest of the Troublemaker of, this talk is a ruby gem extension, which is going to come back later. And I'm basically just see realizes data to and from which is the date of format that mongodb uses to send it, but it will come back later. So one day I was going about my business. When I received a User submitted ticket, the user said that the beasts on Jem was exalting whenever they called JC. Now, I am a fantastic jumping Tater.
And so my first reaction to this ticket, is what the heck is Juicy. Compact? Clearly, I had a lot to learn before I was ready to fix this bug. And so this talk is the story of how I gained the knowledge that I needed in this talk. I'm going to teach you everything I learned about how Ruby medicine every start with what memory actually is to how Ruby and commence garbage collection, and what number compassion is all about. Then we're going to talk about how this gets more complicated and see exceptions and finally, how I managed to fix the bug. There should be about 5 minutes
at the end of the session if anybody has any questions, but I'll also be answering questions in the Fox channel for this, talk for the rest of the conference. Let's get started. So back to my original question, what the heck is Juicy. Compact? So juicy. Compact is a new method introduced in Ruby to 7 and it does something called memory compaction. Now if you are a Fantastic Sam in 10 or like I was you will ask but what is memory and why would you want to come packed it? So clearly I had to go back to the basic if I wanted to get
anywhere with this bug. Memory is where your computer stores information and there are many different types. But for the purpose of this talk, we're going to talk about Ram or random access memory. So ran is like the short-term storage of your computer, computer keeps all of the information that is going to need for the next little while in order to run the programs that you are currently using. Ruby uses a section of ram called the Ruby Heap just store the data. It creates. While it's running a program that he has made up of many slots at each spot is about 40 bytes, And Slots are
organized into heat Pages which are about 16GB as being made up of many of these keep stitches. Every time your Ruby program creates an object, it takes up one slot in the heat where it stores the data for that object. That's not like entirely true but for the purposes of it is true. And it's also important to know that every slot has an address and this is going to be really important later. So as a ruby program runs, it's going to use up more and more memory in the Ruby. Keep up. All the allocated
memory is going to create more heat pages that I can fill up to continue running the program. However, memory is a physical component of your computer, which means that it's not intimate, eventually it's going to run out. So he needs a way to reuse the same memory to make your program as efficient as possible. And this process is called garbage collection. You can think of garbage collection as the Marie kondo of the Ruby language Ruby. Keep track of all of the objects that you created while running the program and once an object is no longer being used, meaning like it goes out
of scope. It's not referenced by any other object, in your program, Ruby's going to destroy it and free up the memory that that object was previously music. The Ruby garbage collections out algorithm is called Market sweet. There are a lot of variations of a Marcus, we called them. So I'm going to go over it. Like, at the highest level, what is actually this is what's called a tracing garbage collector because it uses the traces between object references. In order to properly garbage collection, the
artist new program. So on the left, I have a representation of Keith and on the right, I have representation of objects and Ruby, and their relationship. So the first part of this week, I'm betting you could guess this is called the mark face. During the Mark's face, the Ruby garbage collector marks, all of the Ruby objects that are still in use by the program and that just means I sent lyrics slipping a bit so that each object has that Mark that is still in use. It starts out with two objects objects that are going to be like
long-lived in. Your programs are going to be used throughout your program and so Ruby the Ruby garbage collector knows that they are still in use. Garbage collector is going to find all of the objects referenced by those route objects and it's going to mark them as well. In this example, it would also Mark the blue object, which is the yellow object, and then it would Mark the great object which is also wrecked object if the blue and gray object object here, it is in the route and it's not referenced by any other object. And so it doesn't get
marked and that brings us to the next phase of markets week, which is you guessed, it sleep, during the space garbage, collector goes through all these objects again and finds the ones that aren't marked, which means they're no longer in use in the program. So since an object is no longer in use the Ruby garbage collector will destroy it and free up its face in the heat so that you can reuse it in the future. So once you performed garbage collection, your keep is going to look a little bit like this. Slide is going to have some spots that are full with objects is going
to have some free slots and you would think that once this happens Ruby can reuse this free slots in order to allocate more objects and keep running your program. However, it is not that simple. So we can only reallocate use keep Pages once they are completely empty, the empty slots that exists on a few pages that still have some stuff in them, are not reusable, create a problem that is known as memory is when a program has allocated way more memory than it's actually
likes it. And so like according to your computer your program may be using pages and pages and pages of memory on the heat but those pages might only have one or two spots. This example on the on the side where you have heat Pages upon Pages allocated but only one spot on each of those keep Pages. It's actually being used. So this is where memory, compassion compaction? You can compact all of the used memory to the start of the heat, which frees up space at the end of
the heat. So, that that memory can be real and you can imagine if you have like one sheet Pages worth of memory scattered across multiple pages, and then into one page. That frees up the heat Pages. At the end of the heat to be reused, which is fantastic. So this is what you see. Compact. Does juicy. Compact is a method introduced in Ruby to seven that influence memory compassion because I'm like garbage, collection was just runs by itself in the background.
You actually have to call this method yourself. Just to recap. What I've learned so far in this process so we object to take up space in memory in part of the memory called Ruby Ruby keep objects that are no longer being used in order to free up memory memory. Can't always be reused because it's not all contiguous Lots at the end of the Heap. So member compaction is a process by which induced memory is grouped into. Continuous trunks at the start of the Heap, that way freeing up the end of the heat to be reallocated for use later in the program.
What is all well and good but I'm sorry, I did not answer my question, why? It is not good. Luckily, as a different user, came to my rescue and left a comment on the ticket along the lines of hey, have you read this page about using PC. Compact with the extension? used to say, I had not read that page and so I did Are you remember that? I mentioned at the beginning of the talk that the beasts on Jem has a c extension. Turns out that this is actually really important when it comes to compatibility
with memory. So for some background, I see you extension is C code, that's integrated into a ruby gem. You can use a sea expansion to include like a prebuilt Steve Library into your ruby gem. If you don't want to rebuild some functionality, that's already out there or you could Implement some feature from your ruby gem in C4, increase performance or till I take advantage of some feature of the C language that you really want Hughes C-sections are cool because you can create and manipulate objects in C code using this the extension API provided by the
Ruby. Maintainers. So this is an example of creating a Spring Valley. Hello Ruby St. So, the first line is just playing Ruby. Mine is the C equivalent. So you start by saying the type of the variable you're creating, in this case is about you and explain what that is. The minute the name of the variable is grieving and then you use a, Rubik's extension method r b string due to that creates a new Ruby string value that you were. So that's how you create. Like I
said, we are going to use a variable type called value of value is a pointer to a ruby object. So earlier how I said that every slot in the heat has an address, while this is where that comes back a pointer is a variable that contains the address to some slot in the heat. So when you create the creating variable, what you're doing is essentially creating a variable that is storing just the address to on the heat and it's saying I know that there is a string there and at the Heap address to find any time
that you use the greeting variable in your seat code. What you're essentially saying is, hey, go get me that thing that lives at address to and that's how you do a lot of C code and the code in other languages that use point to references is written. You can even use this same logic to declare new Ruby types. This is why, the exceptions are so cool. So, this is an example that I took from a great article by Josh Haberman, I will make sure to post these life in the slack Channel, at the end. Just so you can click through these links and take a look at this article. But in
this article, we create a new type called a ruby hair. Sew a pair. It just has two references to two other Ruby objects, we call those objects first and second and those references are values which means they are Pointers to two locations on the heat. So in this case you know you could declare a ruby pair and have the pointer to the first object be on. He's bought one and the pointer to the second slot for Awesome bright, because it's really flexible. And you can extend the Ruby language in a really powerful way. However, it creates some
complications when it comes to things like compassion. So if you remember memory compaction has the effect of moving things around to the heat. If you think a little bit more closely about this, this could be a major problem when you're writing code of pointers where you're basically just referencing addresses in the Heat and hoping they contained the right object. So let's go back to our Ruby. Perry sample if you remember or repair contains two objects called first and second and those objects are at positions, one and five in the heath right now. So
let's say that we weren't memory compaction and our objects move around in the heat. Subtly, the reference in our Ruby pair is wrong. The object that we called second used to be a physician five. Now, it's a position is for and so whenever a ruby, the repair a goes to reference that object, one of two things is going to happen. The first and like best case scenario is that your programs going to crash either because there's nothing in the heat of that location or you know, whatever is in the heat at that location is so incompatible with
whatever you're trying to do that, you're programmed, sex bolts and the worst case scenario is if you have another object in that location at the Heat and it doesn't crash your program and you can program keeps running with this silent bug. That eventually is going to return long day. Your users or otherwise mess up your gem. This like this is the absolute worst case scenario. This could be really really bad and I'm imagining like a sixth sense. So I realized this must be what was happening in my piece on Jen.
So object, somewhere is moving around for a memory compaction and future references to. It are being broken because it is not where the program, expects it to be in memory and that causes the sex all. So how can I figure out where what object is causing this problem? In order to do this, I first have to learn more about how Ruby manages memory from the perspective of a sea exception. So it's a major thing. You need to know is when you create a c extension, there are two scenarios where you have to tell the Ruby garbage collector how to properly garbage collect
your object. The first scenario is when you create a new Ruby type of like, our Ruby parag's ample. So references to objects, they're called first and second at sea level, the seat code, understands these references. So, it knows that a ruby pair references. These two objects. However, Ruby level the Ruby garbage collector, does not know about these references and that can cause very wonky Behavior. Let's imagine that the Ruby garbage collectors going to is running garbage collection and it's going to Mark repair object because it's still in use.
Is it doesn't know about the references to the first and second objects from that repair. It could sweep them thinking, they're no longer being used by your program. And then when you go to reference of the next time, there's nothing there in memory or like I said earlier, if there's something else there that could cause a silent bug and really, really mess up your program. So this is the kind of thing that would cause a segmentation fault. luckily, the Ruby maintainers I anticipated that sucks, the very smart and they provided a way to prevent this from happening and that is
called Mark call back. So a mark call back is a method that gets called every single time. Your new Ruby type gets marked during garbage collection. It tells the Ruby garbage collector. How to Mark related objects in order to not break your program. So this is an example of a mark call back for our Ruby hair, and you'll see the two last lines of the method called the method rbgc mark on the first reference. And then the second reference, this is an important method and it is going to come back later. So
what we implemented Bismarck call back Ruby knows about the reference between the Ruby pair and the first and second object. And so after it marks a pair of jacked, it knows how to Mark its two references as well. Using the rpgc mark method and it's a great day to be a Ruby programmer. Another scenario in which you want to make sure you are properly marking an object, miss the extension is, if you have a long-lived object, what I mean by that is what state you create an object in your C, extension into schwyzer. This is a
method that has called when your suspension is loaded for the first time. So, say you declare variable in there, and then you want to use it later on when you're sick, sentient. If Ruby doesn't know about this, it could garbage collector this object. And then when you try to replace it later, your program price, this is why you want to use the rbgc register. Mark object method to Mark a long-lived variable as an object during garbage collection this way. And once
again it's a great day to be a programmer cuz your program is not going to break. So as it turns out, how you mark, your object, in a c, extension is really important to memory compaction. And this is going to explain why. So let's look at the way that GC. Compact is actually implemented in rupee to seven. There are four steps, the first is garbage collection and I'm going to give you a hint that something happens here that helps prevent the kind of issue we saw earlier where memory moves around and then you have a sec. But later on the second
step is to move objects and I'll collect them at the start of the heat. The third step is to just update any references to those move objects, or that you're not referencing objects, that are in a different location. And then the last it's another round of garbage collection to clean up anything that's left behind. So like I said something special happens during this first round of garbage collection. And that's because the Ruby maintainers did something important in Ruby to 7 to make it compatible with existing see exceptions. So of course they realize that objects,
move around and breaks the extension. So it was really important for them to safeguard against this kind of thing happening. In order to do this, they change the behavior of the existing, C, extension API. So that marking an object. When you run garbage collection also, depends it in memory. You can think of pinning as like you can a piece of paper onto a board. You know that paper is not going anywhere. So an object that's in memory is not going to go anywhere. And this is what's happening when you mark an object in Ruby to seven I'm so
if you remember our friend rbgc Mark a great method, this is the method that was modified to also pin objects in addition to parking them. So let's go through our Garbage collection, with, with the addition of pinning. So, if we take an example of a ruby pair, it has two references. As he said a first and a second object, first the Ruby pear object is going to be marked and it will also be pinned in memory, then it will Mark its first reference and also pinned that in memory and do the same for a second reference.
Then when all is said and done those pins objects in memory are not going to move and so when the Ruby pear goes to reference them in the future, they will still be there. And memory compaction will not have broken your your program. So going back to like the bigger picture, the implementation of DC. Compact. When you do that first round of garbage collection, at the very beginning of the compassion process, you are hitting objects in memory that are still in use, so that they do not move around and they do not break your
C extension. Give me all the information I needed to figure out what was going on at my gym. Ruby garbage collection pins all marked objects in Ruby to seven this. We do. Any object that spins does not move around in memory. So I'll take too much him, he's moving around a memory, which means that it must not be getting pinned and thought that it must not be marked. And so, as I said earlier, there are two cases in AC extension, where you want to make sure you mark an object that is if an object is long-lived or if an
object is a reference from a custom Ruby type. As it turns out a long-lived object that was not properly being marked was the culprit in the piece on Jem. So if we look in the Nissan gym in response to extension initializer there is a line that declares a variable call Darby beasts on registry which is just a module that's defined in the Ruby. We never marked this this variable or this object, and I believe that historically that was okay because I don't think that much worse get garbage collected at my correct me on that but that's okay. But once you introduce memory compaction, it
starts to be a problem because even if you talk to you mark, that it's not going to stay in the same place and it's going to cost at Falls pendant memory and it cannot move around which fix the gem. Everything that I've covered so far, has kind of been maintaining compatibility with an older seeks attention. As you're transitioning to Ruby to 7, but this doesn't take full advantage of memory compassion, right? If you're pinning objects in memory, it means that you can't move them to the start of the heat, which means that they're going to be objects hanging around your heat
pages and those heat Pages. Can't be reused. So if you're creating a gem with a c extension where you just plan to use it with Ruby to Seckman and newer versions of Ruby, then you can take some different steps that will allow you to take full advantage of memory compaction. So, first thing is, you want to use rbgc, Mark? No pin. I think this method name has been changed to rbgc Mark movable since Ruby to 7. So like we said, Arbor juicy. Mark was modified to not move objects memories
and so we're going to use a different version of this method that just marks them Then we're going to implement a new call back much like our Mark call back that gets called every single time. An object is parked. We are going to implement a compaction. Call back run. Every single time in object is going to be compacted. so, What so what you want to do is make sure that you can update the references in your object. So that, you know what their new location is in case they've moved
and you. You use the rbgc new location method in a very similar way that you would have used rbgc. Mark in your mark call back. I learned two main things from this process of the first is that systems design and understanding systems designed to be just as important as coding. In this case, I heat, it was a perfectly adequate coder, but because I didn't understand how Ruby memory management was implemented and a high-level design, thinking behind that, I was having a hard time facing this bug and doing my work. So sometimes those things can be just as
important as being a good coder. And then the second thing that I learned is, that is absolutely If people hadn't written the blog posts or made the videos that they had, I wouldn't have been able to read them, and I would have been able to fix this bug. And so, you never know when the knowledge to put in the world is going to get someone else out of a big jam and so if you're thinking about you know writing a blog post or making an informative video and you have any doubts about it, I would say always do it because the more knowledge we share
with each other, the more we're going to empower each other and the better developers were all going to become. These are my sources. Like I said, again, I will post the slides in the slack so that you can click through these links and reduce awesome articles. Address to summarize to seven including from any customers and you can use the rpgt, mark, no pin method and make sure to add pack, some callbacks to your nothing is impossible. If you believe in yourself and your 1000 browser tabs. Thank you very much. There might
be a couple minutes for two and a right now, but I will be taking questions in the slack Channel and you can always reach me on Twitter. So Rose asks, do pigeons need to be removed or manage later on? so, I believe that there are automatically like, Handled. Indie like garbage collection process, like they are removed at some point so he was a user. Don't need to do anything and the Ruby maintainers have like built that into the process of garbage collection. At least time for one more question.
Angela asked how long did this entire investigation? Take I don't know if I did verify. This is definitely took it like a couple weeks just because I think I wasted a lot of time flailing about uselessly which is just like not a helpful way to do anything until I was really down on myself. Like I don't know how to fix this bug, I don't know anything about Steve Sanchez. I don't know anything about memory management and instead I think I should have stepped back and said it's okay. I don't know
anything about memory management, let's learn about that. And once I, once I got to that step, then I was finally able to make some Headway and then it took maybe like a week from that point of just like reading a trying to piece everything together. I think that's all we have time for it but please ask your questions in the chat and I'll see you later. Thanks so much for coming y'all. All right. Thank you. Thank you. Thank you, Emily. All right, so we are now moving into our next big break for the day.
Coffee service is available to write down the hallway for each and every one of you. T service-related Lee. We've actually done a really good job with catering services this year, every single person. Has exactly what's in their own fridge so thank your local organizer. Let me tell you what. Thank you, logo organizer, okay? Alright with that being said, like I said, we are heading into another break. Talks are going to resume here at 1:50 Central Time which from about now is
If I got my to have my time, right at 1:50 Central. I am not doing the math today, but with that being said, check out slack participate. The communities to head over to Emily's, talk Channel. If you got any additional questions, she's going to be there to answer those. And with that, Will see you soon.
Buy this talk
Buy this video
With ConferenceCast.tv, you get access to our library of the world's best conference talks.