PostgreSQL contributor, educator, and evangelist. I specialize in PostgreSQL index access methods, backup solutions, and data structures. I have been involved in PostgreSQL-related projects as a developer and industry speaker for more than five years. In 2020 I joined PostgreSQL’s Code of Conduct Committee.View the profile
About the talk
B-tree index is the most common index type. Most if not all of the modern DBMS use it. The data structure and concerned algorithms are really mature, there are about 40 years of development. And PostgreSQL's B-tree is not an exception. It's full of complicated optimizations of performance, concurrency and so on. But there're still many ways to improve it.
This talk gives you a deep dive into B-tree indexes. It covers several recent features of PostgreSQL B-tree which are already released as well as the upcoming improvements. It demonstrates use cases and shows the limitations to help you use indexes in the most efficient way.
I think that everyone has settled down so we can start. Today, I'm going to pretend to talk about B Tree in Texas, which are the heart of port in my opinion and I'll try to to share these idea with you. So I want to talk a little bit about phosphorus in Texas in general. So what are they, what's the difference between, what's the difference from other? Maybe. The basis that you are familiar from then we'll talk a little bit about internal temperature that will help us to understand optimizations better
after they sell presents to Victory features, which are one of which is like brand new and another is just And yes, we'll go into the details of this to Features a little bit about me. Let me introduce myself. My name is Anastasia. I worked in Russian vendor company. Posters professional. Is there a Padres contributor and core developer. So I worked there since 2015 and I contribute to post roast like, in different ways. I don't ride called, I develop
some pictures of those ones. I will talk about it today and also I speak at conferences and teach database at University. So today I focused mostly on indexing and storage of the musicians and also owned backup Solutions. So let me remind you of some important things about indexes in postgres. In general, in Texas, it means that the next file is physically separated from the table. That is described so that it's in Texas and also a pointer to its location in the keep, which is called supplies into fire Old Orchard, Apple ID,
and virtual reality is defined by access method. We have several excess methods such as bitter taste in there. Also green and others today, I'll concentrate on beachway. Zeta type support is provided by a class, its special Special object in was Rosetta base, which defines how exactly these data type will be compared with a success method. So obviously you're comparing integers and Text data in different ways and a class is responsible for days for this implementation
like regular and it says, they'll still can be powerful in Texas that covered. Just the subset of tables data, they can be expressed in the indexes that are useful for queries that match some function or some modification of data. Here is some more information rules about indexes. And as you can see, every rule has an exception has its own atomization bicycle. There is no visibility information in in Texas. So regular index can to perform regular needs to accept at first index and then retrieve the visibility in Hip table sample
and a special structure map to check that all information can be retrieved from index directly without excessive Heat. This allows us to save some access and speed up the Prairie. Next point is about mvcc rules. According to Embassy rules update overall always Grace version, only entry except Court update, which we will discuss later this operation and also has enough stimulation that is called my probation. It's a special plug. If you have a special flag to mark that toppled and to avoid
with check valve, regiment into, delete them faster from index was invented in 86 perimeter data structure, which is in use, in many databases and other systems has four Valence Street and imposed restrictions B+ 3 medication which means that on the list notes contains and pointers to keep while internal notes on the contain pointers to lower-level TV trays self. So all things are sorted inside each node and Olive notes are linked with each other. So this makes the structure very efficient for phones and brains Paris.
And prosperous the tree is used not only to satisfy queries. That compared weather is like for equality, or larger, or smaller but also to speed up order. Buy clothes and table charts. Also, betrays is, default index, Tab and currently, it's the only in the type that can be used for unique and primary key constraint. So, this is the core functionality of database engine, which makes retrieve an important component. As a default in describe, the tree is used to support system, indexes and talk indicas. So even in empty database
already, has more than a hundred of which we in Texas, which supports Buster's catalog. Between Plymouth and Plymouth Station is Bulls versus based on these paper by lemon and yellow and it has some assumptions. Some of them are in summer, not. So with the idea of this paper into regular between, is that every Everly Fate has rightly informed her to write sibling, so that we can read read through the fascia level. Also, it introduces the accounts over heike, this is
an upper bound bound of the keys that are allowed on this page. And there's also allows two Hello to answer the square is faster. So you can check whether or not and also it has an assumption that we can feed at least three items per page, which has met in progress is just too to give the victory structure notes at least. And also, there are some more assumptions that are Mostly theoretical, they do not match reality. And like this lemon and yellow paper. As
soon as that all these are fixed as it makes most of the simpler Bath & Body Works in different weight also, it is seems that in memory copies of three pages are on shared, so it doesn't bother was looking and all that stuff. And finally, it assumes that all the tricky's are unique. So most mostly between two to support constraints to support unicorn, primary key constraint. So this is fine for Siri, but not exactly true in some practice. So major ideas of lemon
and yellow algorithm are implemented and change to have both right length and height. He's but in addition to the right link for electoral ink printer that is used to do back road in the scans. So, for example, when you do order buy order by ascending order, you can also use the same index, just follow another direction in which we are not. So maximum number of keys is defined with some special macro and this is actually a restriction of forces between which means that you
cannot fit into your betray any data. I need a rope that is larger than 1/3 of a face so it's approximately 2005 or something. So if you want to be next to some larger while you're probably need to use some expression or functional index, And the last bunch here is that all pictures in B3 are shared. They are located in which level they blocking is still required Wilson Rite Lock in when we update the Patriots. So sometimes it's also the homeless issue when too many rides into a bit re-read. Some conventional of contention
contains all information about similar to Siri the differences and what are you very much information about the execs implementation of phosphorus-33? Next specific detail is about Assumption of unity's. Even actually can have known unique values because of invisible to differentiate its allies multiples of the same key. The same level, we use the link the stoplight, a field. So, Yep, this couple idea is used as a unique identifier in. What's the rest of this stuff? Was heavily optimized. It's mostly internal optimization. So they are not
exposed to your survival. Your can I set up them or something? They're just there. And if you want to learn their unfortunately, I was just called. If you want to learn more about them, you can check these article or documentation the idea. There's no couple idea is considered just to be one. One more column of these index is just simplifies, the implementation of obituary algorithm, which is always good to simplify things because you can, Your world gets less errors while coding
and also, if it's improved improved performance, when you work with some specific work clothes, for example, when you go in Stuart, many subsequent while he's into your index, which is usually something that you do with primary key ID, this to do in Texas will be smaller and they sleep after new Temple of imitation is. So thanks for Peter gagan who implemented this stuff in Pokedex 212. I want some more pictures that I simply have to mention somewhere since version 11.1
provides an ability to create an Express in parallel, which traumatically improve the usability on large database deployments so you can multiple people use to create your index faster. Currently only available for betray. And also, one more feature of the trees is unchecked expansion for index verification. So if you have time to talk about a possible database corruption, you know that it's always good to have some like one more tool to check that everything goes well in your database and you can use this in check. So with your Progress Index is to ensure that their logical and physical
structure is it correct that it's not corrupted. Uncheck toll also supports all Timmy stations of in Texas so you can use it with any Any other index? Let's move to, to the pictures, to the major part of this talk. First of all, I want to describe covering indexes and how are they implemented in phosphorus? Actually cover an index is now the future itself is rather an idea based on a combination of the Future's. So we have multiple and in the end we have in the colors tan and if you can buy them, we can get covered in index between two expressions
in progress, since the beginning of time and then allow multiple case to be stored in a single topple and they so that they can satisfy glories that excels says both both Fields. So for example, if you want to collect something was close on A and B on bousfield you can use disparate in Dixon. Discount is the optimization that allows us to retrieve data. Not only directly from index without any extra access to ship. If you have large index, have large stable and not every everything in Database fits into into memory. This can save a lot of operations
and speed up groceries and improve performance and example of current index. Let's say you want to retrieve both Fields A and B from this table and you're only have this, we're closed on a field. You can braid index on Boss feels it will be Magical Index and this is kind of simulation. What's the problem is covering index has suffered a major problem. Is that usually have you already have some unique index on your table, at most of the day, both have primary key or something or some unique feels atleast
and you have to have these unique ended. Its greatest implicitly, it's also betrayed. So, even if even if you'll help rated, it's already there. You have this index index on. Let's say, two columns. And let's say you want to use this current index, optimization to speed up your queries, that access Citrix. So you're great at you, in your cabin. In the problem with this setup is that we have a lot of duplicate. The data we have to maintain to, in Texas instead of one in there. It's just double CC do space taken by indexes levels to
space in memory that the syndics will use. And also they slow down the insertions and updates because every insertion into your table. Also means insertion to your index or to multiple in Texas if you have them. Here comes the new feature. It's called index with include columns, with keyboard include, and it allows you to combine those two into one, to merge them into one physical index. So you can create in the unique index on these two columns and also includes sample or two column to include keyboard. So, this is basically the same
as on the previous flight. With the difference that you don't have to maintain to in Texas now have much less space taken and it still kind too. From this car is as fast as example. So keyboard include specifies portion of columns on which the uniqueness is not enforced. So included, they exist only to allow marketers to benefit from index analyst can you can also create index with such include columns, but to be honest, I don't stay. What am I wear my shoes
off of these kind of picture. It's also possible to create constrained included column index. You can see it's at the bottom of the slide so you don't have to write index. And then I'll start able to use the syndics. You can simply write it in a table table, detail definition. So it will use this index will include columns. Also, you can use the same Santa Cruz. Primary key constraint, So what are the benefits of this feature? That allows to come by and constrained and cover an index and decrees maintenance her head to size of the index become smaller and better faster,
and everything is just got. One more benefit is that it allows you to add into index some days a week without victory Of God, lets you want to grab some some information, some special information from your table by ID or my want to marry some specific specific data type information from your table by user ID and your wants to use it with Isaac sunlesskhan because it's technically that you do where they often but you cannot build index on these specific data type. What is included?
Columns ordering won't apply to this includes a column. So what he really needs multicolored index that will use this will support ordering on what colors you should use multicolored index. But if you want just to add some payload into this index, you can use index was included College. To understand the drawbacks of this feature of this tavern, in Texas, we need to learn about one more optimization. Or between Nick says, it's only two couples picture
of this picture, is that it eliminates redundant in the entrance. So let's look at the example. We want to roll in in the stable and igneous, that we also must insert some, you entry into the index. So I hope you can see the bottom of the slide in this. So let's say we have a stable and the syndics and we want to update in Dexter field ID want to update ID, 16, 15 through 16. So I'm sorry. Okay. So yeah, I think I need to explain the picture. So, green is the index and yellows, the table and breads are high keys in index. So it was, can I use it?
So those places are pages that represent features like in dispatches and its index page except the right must they have high tea that restricts the upper bound of this page. So it means that there can be none interested then keys that are larger than hiking on this page. This allows to optimize time for today in Dexter and also supports and can parents page please So let's say we want to update Fields, ID, updated, the table and Index, this is pretty much we cannot do this anymore, but if we do not change value of
a little bit redundant, so let's say we want to update data field for now. They have interviewed. Have hot in the hotel Palatka station. We can simply added into tables database and avoid it in your toppled to Index. This helps us to keep in the smaller and also like to avoid extra insurance operations. There is so-called hip hop chain, which supports the links of those fields, so we have or have them linked in one page and we can access them by Friday. So when we
perform search Google for Success index, will find that we have ID, 16 and then websites table and search for how to change to find all the values. Can you use the my place? So welcome here will have one additional step vacuum that hot you to remove that option. No, vacuum. Yep. I can will it remove hot change when it will be visible for all the typos? But if it's still visible to some transactions, some transactions can still see this apple, 6, Danny and some
transactions are updated topple 16k. And when it's not visible to any transaction, it will be deleted. My welcome. So it's also somewhat of imitation because they should not mix with those Hutchins is just a base table. All right. So, no answer to you about the musician. It's very useful for heavily, updated tables and accessories included columns are good with Reload. So I can say that it's very good future for reading when you can enable index unless conferees prayers but you should think twice before
using it on a datatable because index indexed with include columns, they are in it and you have to not use hot date anymore and here is like the trade-off. You can choose whatever you want to tell it where is Buster or you want your end of the smaller. And he'll try to correct posture is always up to the bay to to decide So if I have any questions on this part, I think I can answer them and then move to the next picture. His Radio. So, hot changes ready to save in the page
because I change at Chase Morty. Alexandra Gater. So, three hot transfers made so be this hot chain detail has been disabled in the bridge Fortress key pages. So, just ever next couple, has some flags that shows that because subsequent apple and its points to do the next couple in this chain and fast. We can find all of them. It takes more than one page. We have to split it and we have to add you into in it. So if you have like really a lot of updates on your I'm going to feel. So if I have more than
more than 3 updates for this, we will have to insert you your entry into index to point the next page. But in one page, we can form a chain to to connect estoppel. Okay, so let's go to the next picture which has betrayed his application and I'm excited to announce that it was committed just today to tonight. So it's officially the first talked about this upcoming posters future and Are there any car developers in the room? I want to ask, you understand me, it took about four years and more than like 50
revisions before they dispatch was was actually accepted the buzzer score. Yeah, I already said the tree is a very important component of phosphorus and said, it's cool because when he proved it, you can improve the experience of many people, but everything for them unless they might as well test and reviews. So any change to the score companies, require a lot of testing and of you. And I also encourage you to participate better testing of upcoming posters version. So so we are sure that this stuff is really work. Well, So the idea
behind this picture is very simple. Actually, it's already. If you are familiar with Jean indexes, it's something like Boston list engine index. The problem is that I don't have index with a lot of the case, we have to store repeated many times and it takes space and a place to some different station of in the structure. So, in this example, essay we have very many delegates of Okay. 5 and we have to repeatedly storage and it takes several features. In this example by these supposed to at least the application
application. We can just turkey once and along with it. We can store all the pointers to keep all they topple identifiers. So is super simple and I think that everyone understands how it should work. So, how how was the principal to user? So it's introduces new items, which can be set for every particular index by default. It's enabled for all the other indexes, it's disabled for system in Texas for now, and we want to decide to ring beta testing. Whether we need some more open with a cluster, white option to enable or disable the duplication. But as we
decided to improve its discussion, we can go for this default for now because the overhead is quite small. So the lights on the binary, On the Bun, red quality of keys. And that's it does not support numeric, float and contain no types, such as Eeyore record types. I think it's not much, I'm not a big deal because I don't think that's really many people use between indexes on record types, or on something like this. We will try to work more on Ares because it seemed to be
important feature, but it does not support numeric and flawed because of the datatype nature, because the binary equality of two. Attributes does not mean that they are equal that they can be held equal to use her. So this temptation I think that's the major restriction because there are some between Texas on Limerick deals so that is amortized across insertions. So where are you your other head? So it's only two persons on F and unlimited Benchmark which can be considered the worst case. Again I encourage you to test it with some real data and if you will help us to find some
next time, what cases, which will be able to improve So yeah, good application only applied when page is about to split. So we Do Not Duplicate apples insertion, we only compress a thief age once. Someone's number from benchmarking, the application from our benchmarks, like two to four times smaller. I think that's really impressive because you like, it doesn't agree. Otherwise, it will probably help some indexes to fit into memory so that they will work much faster and 48 minutes. Mark, we can
see that it says 40% of space removed from 5 almost 6 Gigabytes of in access to my aunt's house. And also I didn't say any significant slowdown on integration. So there is obviously some overhead because we have to compare compare virus to understand whether they are duplicates. But this or head seemed to be eliminated because now we have smaller injectors and it doesn't slow down creation or inside. So it sure is the Excel numbers from to PCH Benchmark. So you can say that those three indexes which are not
molt. Their numeric in those days. They do not use this station and their size is the same while almost all other indexes are two, three times smaller again, it's totally transparent. It doesn't change anything in logical structure and and still gives us all the same. One more thing. I need to say is that this feature is not compatible with your future with witches, in Texas, include accounts. And I think it's more or less adequate solution because usually, indexes with included columns are unique, and that they
won't have that many delegates. one more thing to to learn about this feature is actually that it helps to to support unique in Texas. So we don't have a question have many duplicates in unique Index, right? But still they can appear because of Embassy because of Embassy copies. So let's say you a date at your primary key. And now you have two entries in your index, special optimization, which is called microbacterium or on-the-fly deletion, and it's useful to avoid page plates and index bloating. When it's possible to speed up, that came in,
it was like this. So when we read in the skis and we checked their visibility in the table, we can find tap that this particular Temple is invisible to all transactions and we can mark them as the temples in index special plug. After after that, we're going to insert some new interests on this full face. Before we go page plate, we can check whether it has any garbage and we can remove those truffles. We shouldn't wait for real vacuum operation. We can remove them while
in certain entry, that's, we can avoid split unnecessary and make room for entry and avoid in the explode. So this is where between did application. Help interfere with my Chrome account, it helps to avoid index Lord because it gives some more time to microwave came to come. And do not choose that those interests are dead and that we can, we can remove them and is basically just sex couples on the page, more deadly and Increase more space and more time for my truck. I came from a benchmark with
table and end table with an index. And then run a lot of update Curry's on these index and index group, 10 gigabytes, almost twice. So to almost 20 Gigabytes in several hours of benchmarking, let's say 30%, which I think we're good number. It means that in the long run, this picture helps to avoid and explode, of course, if you have very, very heavy updates on your primary key and my production does not Kenneth cope with your index. Will still loud and grow but
foremost work. This will help to keep index on some, what are some of the right size? And yeah, I think it's also very, very good. Think we was not expected when we started working on this feature and just appeared in the discussion interview. So thanks to Peter gagan for this particular picture. What's next? There are some more even more features to expect for husbands Beatrice. First of them is kind and search for Clara's when you want to find several points nearest to To the point that you carry, if you can do Tuesday,
September station and it will speed up such queries in time, this is almost ready. It was like blocked by this dedication patch and I hope we'll see it in Westeros 13 as well. In discount will optimize and it says it in such a way, that you'll be able to run distant or order by Clarice much faster. So, you won't have to recheck every duplicate entry and you can just move from one Android to another. I don't actually know what's the current state of the splashpad. I believe that it will be in post
14 for sure. But I don't know about phosphorus 13. There's also some more English version of limitations a couple of Interest here. And he went to vacuum index to your dad. Had to run through old data in index. You can just access specific and the more Global index organized table, The primary in Texas and in Texas on partition tables, which were locked for. Now, global indexes on partition tables or Global partitions and accessories, which can be a participant on some different different attribute. Those are just
future plans for victory Of The Magician's, so that's all. Thank you. And if you have any questions, I'm ready to answer My curriculum is a old picture. It was it was for sure in postgres 11 and I think it's even older, so it's it's not your picture. Any more questions? Thank you. Thank you, Anastasia.
Buy this talk
Buy this video
With ConferenceCast.tv, you get access to our library of the world's best conference talks.