Duration 40:36
16+
Play
Video

RailsConf 2019 - Optimizing your app by understanding your PostgreSQL database by Samay Sharma

Samay Sharma
Senior Software Engineering Manager at Microsoft
  • Video
  • Table of contents
  • Video
RailsConf 2019
April 30, 2019, Minneapolis, USA
RailsConf 2019
Request Q&A
Video
RailsConf 2019 - Optimizing your app by understanding your PostgreSQL database by Samay Sharma
Available
In cart
Free
Free
Free
Free
Free
Free
Add to favorites
1.33 K
I like 0
I dislike 0
Available
In cart
Free
Free
Free
Free
Free
Free
  • Description
  • Transcript
  • Discussion

About speaker

Samay Sharma
Senior Software Engineering Manager at Microsoft

Samay is a Lead Solutions Engineer at Citus Data (now part of Microsoft). He works with application developers to scale their Postgres based apps to handle very large data volumes with Citus. He’s been working with Postgres all his professional life and wants to make databases and database performance easier to understand (and optimize) for developers.

View the profile

About the talk

RailsConf 2019 - Optimizing your app by understanding your PostgreSQL database by Samay Sharma


This is a sponsored talk by Citus Data.

I’m a Postgres person. Period. After talking to many Rails developers about their application performance, I realized many performance issues can be solved by understanding your database a bit better. So I thought I’d share the statistics Postgres captures for you and how you can use them to find slow queries, un-used indexes, or tables which are not getting vacuumed correctly. This talk will cover Postgres tools and tips for the above, including pgstatstatements, useful catalog tables, and recently added Postgres features such as CREATE STATISTICS.

Share

So this dog is optimizing your app by understanding of postgres database. I understand it from the title. What I wanted to communicate within the stock. We're going to cover our problems you face from the application user complains. They make about performance saying that my grade is a slow or as an application developer. You're trying to understand what are the issues which are causing the application to be slow and then back to the post office database. So what kind of Statistics postgres gives to you so that you can figure out what exactly is going on what's wrong and then

also how to fix it. So a bit about myself, I'm semi so I work as an engineer and scientist scientist is basically an open-source extension to steal our post toss across multiple machines. So when you have a single or double spaced database and you start having performance issues and not being able to keep up with the amount of workload, you're putting on it. You can use cytus 2 Skillet out across multiple servers Solutions engineer. So what we do is we work with clients application developers who come to us with

like problems in their application from a performance perspective, scalability perspective and they want to grow their database extensive terabytes hundreds of terabytes while continuing to maintain all the benefits of first class. So I will quit to kind of help them model application give them recommendations on data modeling excetra to help them build a database out about myself. On all his life. So I recently got married about four months ago. Some people are telling me that is fun fact now and it might not be a fun fact after a while. So so

we'll see how long it is a fun fact. So I didn't mention cytus is open-source. You can actually go to get her decide us and then just Stardust. If you like to learn more about Cyrus, I'm going to be around in the conference. We can have a conversation about that. But this talk is not about cytus. The stock is mainly about like the postgres and whatever. I'm going to talk about applies all of it applies to like your normal post. So the challenge sometimes you face is relating issues and pinpointing what's going on within the database. So from an application

perspective, you might be seeing slower credit users might come to you and say all my apps page is not loading as fast as possible excetera and you know either. We are monitoring tooling or just because of like your debugging except the problem is in the date of the databases. What is causing the slow next now, how do you discover what that problem is? How do you tie the rope cost? And then how do you solve that problem? The simplest way to address that is using the statistics with postgres exposes soap actually captures a lot of stats about your data souvenir and analyze

or when was Chris automatic Iran's analyzed. It looks to capture a bunch of statistics about your data in terms of let's see if the user between put this as the filter what would the selectivity ratio be Exedra switch keeps tracking data about all of your columns and also about your database. It also exposes. Do you capture data about activity and exposes back to you about getting that as marketing statistics are so it's basically what are the queries running on the system right? Now, how many connections are coming into the database which indexes are being used which indexes

are not being used which berries are doing how much all that information is actually captured in different tables with in postgres, which get exposed to you and you can just see them as regular tables and get whatever information you need. Edmond statistics if you're using a magnet service, you need to get about them a bit less. But then your managed service provider needs to get about the server Administration statistics of how is replication keeping up. What's the size of the database? What's the size of different tables? So on and so forth,

which is my application is slow and I'm sure everybody gets that feeling every time like I don't think anybody says my application is amazing and it's always fasting from that problem. Let's say we figure out that it's time to write how do we get to sew a problem? My application is so symptoms application users are complaining about slow performance that most of the time is going in database cause maybe it's like maybe it's entirety of the database and whatever Pages downloading everything is turning out to be slow. Taking Lexie hundreds of

database cause you don't know which way it is slow. You don't know what exactly is going on. So how do you start from there it ratio. So what cash it ratio is actually the ratio of how much is Ben shephe how much of your queries I being served from the cash versus how much of it is coming from elected having Street front desk, right? It's a good measure of just getting a feel of where your database should be and where it is off sofa transactional apps. You don't really don't want to hit disc for most common operations. So you want it to

be a greater than 95% ideally around 99% for your cash ratio for your transaction applications now like depending on the application. If you have an analytics application where you are actually storing letter 10 terabytes of data buying 10 terabytes of memory is going to be expensive. So maybe there it's okay. Have a lower caste ratio, but typically that's not the case for transaction laps. Now. The question is, how do I measure this? And how do I keep an eye on it? Right Chris tables and talk through what are the different information? They

can provide and then how you can use it in done to improve the application performance. So you can run on Bee Gees. IO user tables. So this is one of the tables and want to talk about you just count how many blocks were read so far from a terminology perspective is actually where to store your data and index indexes so they are not necessarily so when I say he blocks red it's basically how many blocks were read from the hip how many blocks were hit and then the ratio of that so number of locks /

Star Tire user tables as the source for this data. Let's look at what information that this is just an example from that table for one of the tables. I created like real ID like table. What's the name of the table? What's the id vs. Schema? I think that's all like the standard stuff. You notice. It keeps information about Heap and also keeps information about Nexus. So it tells you how many times your index blocks with red from like the postgres bad for cats versus it could not find that record in the buffer cash. So I needed to go and read that record. I'm soaked in addition to

just getting the table height ratio for the four from this table. You can also get an indexer tissue. So maybe your indexes are not fitting in memory, right? And that's the problem you need to stop. So depending on both of those you can kind of figured doors information out from this table. There's also some other information such as toast toast is when when you have very low Five columns postgres actually compressor system and stores them on a separate place so you can also make sure about that. So let's say you have a huge Ace on blob in your postgres table and then you want to figure out

if those like reading them from this is the main bottleneck private so you can figure out from this like get the ratios and see which one of them is kind of causing this from a lake just slowing the cash ratio is low. So let's assume that the cashier Glacier was low that would actually mean a lot of things right. So just what what that tells you is you're not able to serve your work from your cash. But why do you need to go to disc and what what are the possible issues with it? That could be multiple causes? Like the first two points talk about bloat and auto vacuum.

So how many of you are similar without vacuum? Okay a few so basically whenever you update or delete a row in postgres postgres doesn't necessarily deleted at the same time. So just marks it as not being visible to Rose coming from like Alexa transactions, which are coming after your transaction. And then later there is a process called vacuum which comes and cleans up all those rules. So postgres has Auto vacuum, which automatically triggers that so you could manually trigger vacuum and say okay clean up all the rows which are not being used anymore.

But by default postgres as Auto vacuum and you can do you need to be as aggressive or less aggressive as you need to be and then so maybe which is why you're not cleaning up the old Rose which is why you're getting very bad just from your cash. Right? So blowed is actually that chunk of unused data. So that's known as blow gently. So you would say, okay. I have Lexie a terabyte of data. Maybe only two hundred gigs of data is actually usable data rest of it is just Auto vacuum could be clean that up. So 800 and if you clean that up you

save up a lot of space and only relevant media queries are not optimized and you're doing a lot of sequential scans. So it has to be either gender identity pension scan or an index can if you have indexes on your database if you use the index to scan the rose and only the relevant roast you so let's say you have a table with storage events in you create an index on the time stamp and you run a query for the last day for stressful use that index to just return the results for the last day and not scan the rest of the

day. So if you did not create index appropriately maybe you're just scanning the entire database all the time for queries, which are much more pinpointed right most most app Frameworks account with like creating primary key and just a basic in Texas. What if Doing more complex queries, you might have to create in taxes on your own. The other side of that problem is maybe you have a lot of unused in Texas, right? So you have a lot of indexes. We supposed to just needs to keep up-to-date because whenever you insert the role was Chris needs to update inside that roll into all the

indexes you have on your TV, and if you have a lot of them and they are not being used that could just be increasing the total amount of data you have on your date and then not being able to you know, you was that for the right quiz. If so again, I would say like try to optimize a similar postgres to the best exchange. And after that if you want to think about sharting was getting out using side. That's when it makes sense. I would not suggest that just as soon as I have performance problems just look at buying more Hardware in trying to solve that

problem. There are certain things you can do but after a point, it does get difficult to conduct in postgres and then you can look at other Technologies like site just to kind of Skillet out right? So I didn't actually help you pinpoint. Which of these problems is going on in your database to start the standard one which covers the entire database which is called Fiji start database. I just picked up one particular rule for this database. It gives you quite a few pieces of information. So it tells you like number of backgrounds

like the database name. The most important feature is a transaction, Texas. Commit an exact roll back to tell you how many transactions kind of committed and roll back in your database tells you how many blocks were red versus how many blocks were hit so we looked at that information. It also tells you kind of how many temples were inserted into your database updated deleted. So a lot of times you might be kind of people asking you questions like is your database and insert heavy database. Do you have a lot of updates? Do you do a lot of delete the answers to those questions? You

can come here and you can actually get statistics on what like basically what are the answers to those questions. What are the uses? Typically what I use these tables for is find out the number of rows fetch vs. Return by the queries to the database now put these terms are different than most guys actually named this Collingswood family. But if you look at that, that's actually how many diapers will return and then fetch. Actually how many between needed to run the query right? So maybe it returned a lot of queries but you didn't have good in Texas and you are

unnecessarily returning a lot of Rules to This ratio. If you look at this particular instance, you can see about 367 million I think vs. 675 return so maybe half the Tapas you actually return those didn't need to be written so you could have optimized that by putting in Mexico on at find sample example of how to find the insert update delete ratio for the day like when you kind of looking to do benchmarks better when you're trying to find the specific bot likes you want to

find out like should I tune my delete work. Should I do when my insertion pipeline should I tune my updates as to finding the ratio kind of helps you get to that and then obviously you can get to put so what kind of Troopers are you looking for? What kind of guarantees are you giving to your application users? And then what is the database if you go off? You see if you look at the exact Comics you will be able to find out how many comments how many transactions can you can reset the stacks. So if you want like more accurate numbers for a particular day or something, you can reset the

stats. This was for the entire database now gently. This gives you a good idea of where to look for but it's not enough. Like if I were to tell you a lot of transactions going on in your database, that's not it just tells you that there's some problem with this report you want to figure out which part of my workload is causing that right? So just at user tables, it has a lot of column so it's a bit small to read but the more important ones you're already tells you how many times you initiated a sequential scan

on that table vs. How many times you in initiated an index can write and how many couples each of them had to read so that's very useful information to kind of determine if you need indexes or not. If you are like I talk to a person at times I like Okay Ice my database will never be doing sequential scan as we don't scan more than 5 times 5 rows at a time and then we go to the database and the date today. Because even though you are asking for five rows maybe you don't have an index which tells the database that it just needs to scan five rows to get that

information to you inserted updated delete. It's against similar statistics for a table bases. The number of Life tables number of debt Stoppers, that's kind of where the blow so it doesn't have any that doubles but that's probably not going to be the case when you run this on your database. The number of debt Stoppers is basically the Tapas which are not going to be used anymore but fast food place has not gotten to clean pop. So there is like if you search for bloat postgres, you'll get a very complex query it basically

uses like this table and a couple of other tables to give you that information in terms of a gigabyte or terabyte how much blood do you have in your postgres database broken down by three? Query if you guys are interested, I can add a bunch of useful queries to the appendix of the slides. And so that you guys can just copy paste and bring them on your database and see if that's useful and then it also shows you like the last one vacuum Auto vacuum last analyzer turn on the lights. So I stole you stress has Auto vacuum which comes up

every once in a while and then actually drowns vacuum analyze on your database. What vacuum does is it cleans up that bloat? What animals does is it captured statistics so that it can use the statistics in the greatest. So if you're not running that often enough if you have a feeling that okay, like my auto vacuum and like I should be doing well enough to kind of removal that bloat. Maybe it's not and you realize okay 17 days, like I've seen databases weather like for you to a table has not been vacuumed and then like the day and then it becomes

So gently this won't happen. This is also an extreme case but by default postgres Auto vacuums, once your table changes by 20% so that's the default which is generally not enough for large databases to imagine you have a billion euros in your database you're waiting until you have 200 million rows of bloat to actually clean it up, right which might not be the case and dealing with most of the post press uses. I talked to they have large data. That's why they're looking at seiders. These problems are very often when you have when you have lice. So pissed at user tables. What can you

do a proximate number of Life debt Stoppers gives you a good idea of how many how much blood do you have find out if a table is insert update delete every other user who essentially like over 60% of their workload was updates and then it was just coming from one statement in the app. They were like if I had known that this update is the main problem David light workload by about 90% by just fixing that part of their app code to do less updates. So if you can figure out your expectation from your database on like, okay, I will not be doing that many queries sometimes active

record might be doing okay this for you and then you might not realize that you return go to certain way and then it leads to much larger traffic on the way to face. Obviously you can look at Auto vacuum Auto and lies to figure out like if your table is being sufficiently vacuum. Not and this I find really useful is the number of sequential scans + index cans for transaction lab sequential scans generally are not required unless you're doing analytics across the entire data sets. If you see a large number of sequences raise a flag. You have similar static sticks for in Texas

as well. So indexes you have index cans. How many times did you initiate an index can on that table right down into this when you have a lot of you know, right problems. So you want to write 2% not being met because you have a lot of indexes. You have 15 in Texas on your table and when you go and look at how many of these indexes are being used to realize like seven of them are not even being used by your queries. So you're just like in Texas are always a trade-off between a storage space Android report and read it is a

point when it makes sense to actually create indexes at the cost of some slow nice and right, but if you have just in Texas on every possible way, you can run that actually hurts your database more than it helps because it probably introduces a lot of unnecessary lights going on in the database. ASU Campus red versus double spaced this was a perfect index. So it's only reading what you need to read. But then sometimes if you're running late is with 3/4 filters and you have index is only on one of them. So then you'll find a difference between these and if you see that that is an index which

is just reading way too many rows then how many things do you can make it a more specific index by making it a compound index of two columns? So those are the biggest loser in Texas teks problem-solving, which is the cash it ratio not being good. So you can actually use the information in all these three tables and there are more but I think this 3/4 of the best starting point where you kind of find all the glaring issues if there are any and then kind of understand why you had a bad heart rate. So you see

so actually I should say application patterns, but so fine with the application site, so you think you shouldn't be doing a lot of updates but it turns out your database mean workload is updates you figured out okay, I think this table should be small. I realized it's too large. I shouldn't have dared rose on this table. So this helps you check your assumptions. You can dive deeper into them and fix them if there is a lot of load data tables which are larger tables, but they are vacuum where you can actually do, you know, Directions to trigger at 2% of changes so you can

save 2% of my table changes or any large tables so that you can start with that. You can actually increase the number of processors Auto vacuum is using so that's another thing you can do until maybe Auto vacuum is triggered frequently enough, but it's not able to keep up with the rate of change right? Then you might have to give more chords to Auto vacuum to keep up. Maybe you're doing a lot of the problem. That's the problem. You should create a few indexes. Maybe we have a lot of

unnecessary index for everything but like there might be something and then you can delete So all of these are more genetic, I mean not looking at specific queries not looking at optimizing specific weight is so this is generally useful to kind of start and then dig into these are the kind of problems. But then most more like let's say you spell discipline. You have a good index or cash hit rate and you still have slow performance write a certain qualities are not doing well enough second graders are still not performing a particular

web pages to slow. So how do you kind of isolate what are the qualities which are slow? So like this is like pgstats statements is the answer to that question. And then like it's actually like one of my colleagues layered was at fault them in the most kind of the most common advice which was given is essentially to use pgstats tapers. So what statement does is it's an extension on table postgres which taxes execution statistics of all the SQL statements executed by the server. So it actually normalizes the queries

and groups them together each and every which way but it is safe for all queries of this format. This was the like these are the statistics about it and we'll get into what the specific statistics are witch it Xbox and it's exposed to the user using of you which is surprisingly also named her peaches that statement so you can just quit it again like a regular postgres table and get your stuff from it. So what does that you contain right so it has like user Database IDK, didn the text of the query so you have let's say this

but it's executed 175 days. That's probably okay. But if there was a great ear itches like where you can save 5 10 15 milliseconds, but that's executed a thousand times a day that's actually more useful for you to kind of Optima. So it is in addition to kind of like the times which is like what is the minimum time this type of a query to play execute Max time mean it also gives you a standard deviation so that you can understand. Okay, maybe one or two bad occurrences or is it always kind of Performing bad this way and it also tells you the number of

times it was executed together. I'd also tells you information about I owe so that is how many shed blocks were hit return the deed how many local docks we're hitting dirtied and also total time spent in reading and writing blogs. So let's see a few examples of what you can do with this information. So this is one of the most of the most common grade is Iran on the database immediately is what are the top 10 time-consuming queries. So like you notice that I'm using total time, I'm not using one time. So just tells you what is the greatest on with

your databases just spending most of its time it because it's story time. It's actually a product of essentially the meantime times the calls. So that's a good measure as opposed to just looking at Kohl's and saying this is the most common query but if it executes in a millisecond and it's okay or just using it time for the crazy, but maybe it's executed. Find out like for example in the bag gashi trait example, you find out that dude doing a lot of discrete. So what do you want to find out? That is one of

the top 10 great is spending most time on this because if you see like in your morning ringtone ucok lot of my time is going Indiscreet you find only those queries and you don't knows where it is or are you just want to find what are the top 10 most common queries? That's useful when you're doing some kind of a bench marking Etc. You don't want to Benchmark like you want to buy a smog it was pretty but you also want to Benchmark your most common quit is so you can put all this information to kind of pinpoint your attention at what exactly should be doing that other than you know, Jane

brigley making optimizations after you've done a good basic set of best practices to run on your database to kind of find out which are the ones you should be looking at. This is from one of these blog posts on tstat statements. It kind of shows you what are the most important thing? So it shows you the calls total time meantime Max time and standard deviation time it also so that it looks nice and you're going to easily going to look at it and ePacket shows you only buy the current user it

does that and well not so sometimes like backup things will show up and then it kind of says find me the top 15 greatest sorted by the average time. So this is a good starting point. It displays bunch of things to you. If you look at this example, it says, okay the daughter time the databases execute I spent in this query is about 121 seconds mean time is actually 861. The max time is 3 seconds 3.7 seconds. This is the standard deviation. These minerals are returned from this so

Coming up from it. Now. Let's say you kind of found out what are the qualities which are slow shops on its own but very basic of it as you start by doing an explain analyze. So you pick up a query you do explain analyze what's useful to know and is the common thing like people generally just to explain and explain give you just what was resting soffits worldview, but not what might actually happen when you execute the great. So if you look at explain and you look at the cost of it just tells you and if this was a more complex plan, you see many more cost and that

kind of who's guess as soon as this is going to take this much long, but it might actually not be true at all. So what do you want to look at when you're looking at explain analyzes? The actual time is on the right hand side. You see actual time where it shows you when the following rules were outfitted. How many days without put it? So, this is Asbury. Liberty Village of selecting all the rows where call of a particular column values one. So it shows you it expects to return about 9584 Rose has actually about ten thousand but pretty close estimate Saturday clothes.

And then are you read the query plan and then you figure out which part of the plan is taking the most time in the actual time now and then either by creation of indexes or by joining other some other postgres tuning parameters the slowest part of your query a few interesting things in explained as well. So let's look at this time selecting from table read column 1 and column 2-0 if you look at this in addition to plan, but you realize that actually there is a discrepancy

and a significant. Send a text off in terms of what postgres things this query Builder dancer pose with estimates is going to return a hundred Rose. Where has actually is the database returns $10,000 and that's really of if you had anything about this plans. If you are doing Aggregates, if you're doing joins, the plan postgres would actually choose would not be good because it believes that only a hundred dollars are going to come from this can but actually 10,000 Rosa going to come from the scan. So there is actually like something you should know. I think that is not of advice on killing

postgres and making it like once you see and it's been alive how to go about it. But then there is something which is very important to consider is the reasons for slow queries is not always that you know, you can add an index and speed things up sometimes postgres me have the wrong statistics are like it has not looked at enough data to kind of get the correct statistics about your data. So if you look at this the problem is Which happens off of is it might be very badly like bad execution plan and that's not because you didn't create the right indexes or anyting else. It's just because

postgres thought that something is going to come from this particular scan. But the number of Roseville much larger, so kind of Statistics is to analyze again with the higher divorce statistics Target. That's actually the number of rows postgres users to take an estimate of data distribution. So you say okay that makes analyze more expensive because now postgres was scanned more rows to kind of get those texts but it makes the Stars much more accurate. So that's the first thing to start but the problem here is actually not that and I was just pick one example of

how you can when you have mismatching statistics how you can actually tell postgres to fix it statistics or how you can tell postgres to improve its statistics. There is an issue here is in this particular schema, call him on and call. You are correlated with each other. So call him to actually the value of column to buy populated the data so I know but is actually call him 1 / 10. So if you know Colin one, it's actually enough to determine the value of column to but doesn't know anything about that was blessed by the fort with only capture

statistics for the single column independent. So it's okay. Maybe there's going to be $10,000 you're done from column 1 10,000 from call him to not both of those filters are going to be applied together. And then maybe I'll just get $100 out of it. The selectivity ratio is it has what kind of come up with this number? But then actually this is this is not going to produce the characters as a respectable of whatever indexes you add anything unless you tell postgres about this correlation between them. So allows you to declare multi-column statistics. It doesn't do it by default because if

it were to do it on all permutations and combinations of all the rule all the columns in your table and allies would just take way too long, but then you can tell postgres two columns are correlated by creating statistics on it pics of column one column two on each other and once you have that information, you can see this is the exact same query I ran it again after analyzing the table and two telling us capture the correlation statistics, you can see that the sequential scan now it estimates 9584 roast instead of the hundred and it's much more accurate to

anything which happens after this is going to be much more efficient. So I actually liked it seems like Lisa District seems very like people don't think it was going to actually make a huge impact. This is actually something I've treated like last year in July. We had a customer who saw over thousand times faster performance and I know it seems like oh I could I just needed to run this one SQL query to get thousand times better performance, but it does happen like wrong Equity plan estimates. Like this is such a simple query if it were more complex. You would

have a lot of issues just because postgres things that I'm real Frozen different from what it actually has spread to summarize this like you Speedy start statements to find the right set of queries to optimize use explain analyze and cumin the slowest part again, I am such as when analyzed don't just look at explain and resume that person is going to do like we saw that explain is very different from what actually happened in certain cases keep an eye out for Frances between estimated and actual to can't find ways to fix or tell postgres to fix it statistics because they may not always

be correct talk. I added a bonus on quick estimate. So this is something which is not necessarily related to the rest of the dock in terms of like performance tuning and other things but just using the best captures you can actually get a lot of good estimates about certain things for which right now you might be running very expensive ways to figure that out. Right that is kind of captured in a PD stats for this captures information about each of your column and what postgres things about it, right? So if you look at this, it

tells you the fraction to start with your number of distinct values that has a fixed number of distinct values to a 10,000. Sometimes it tells you so I can tell you okay, 20% of your table is going to be the number of distinct values. That's useful to understand sometimes the ones which I find most interesting and I write up my hockey scripts using is a most common values and most common frequency so for doing estimates for okay, like let's say our top 10 customers, you will actually find good estimates from them in the most common values. So you

will say okay the most common values and the number of rows I have for this customer are going to be lets see in the most common frequency. So what this is actually saying is for the value 945 about like points 17.0 017 into the number of rows in the databases. How many rows that particular ID has so you can use that for like just basically approximating like let's say you want to find out let's we have a table off Network by its anyone to find out which devices are actually giving the most amount of Next bikes which

devices are admitting the most amount of bites you write a complex. Even if it's not complex is definitely expensive account distinct kind of agree to figure that out check if Post-Gazette statistics on your top 10 devices and then just use that to figure out okay, how many packets did went through the system when was dressed as an estimate for that? I can give you a rough ballpark. These are the things you can do in one of 21345000 results. You don't need to actually tell them what is the exact number you can say about these many rights are useful to District says you get

Instagram problems as well. So if you have certain assumptions of how you can help your data supposed to be spread out so you can actually confirm that so don't like approximation that uses for his what's the ratio for a particular column. What are the approximate number of distinct values for a column right? You saw that you don't need to do a count distinct? Sadly trying to figure that out. You can just look at and get a rough idea of what that is. And then what's the most common values for the column and what are the approximate number of rows which have that? Approximate

count so this is another thing which I generally go and run. Like if you have a very large database postgres doesn't cash your account are so like people are expected to be a very inexpensive. It's not in postgres. It actually needs to scan your entire table to count each of the rows. So I don't like to do that when I have a billion to table. So I just go to the second cataract ballpark of okay, how big is the stable? What kind of operations can I even try to do a distinct? Right if I see so you can just go to the table lamp glass. You can just go there

find the multiple switches pose dresses estimates of how many couples that has So to summarize postgres exposes quite a few steps to install crib mainly covered performance-related stuff or what kind of places do you want to look into for performance issues? Like what are the kind of pointers which point you add things to dig deeper into PG stack tables gently help you find which in Texas statistics about an axis label is Ayo, statistics vacuum statistics PG stats statements helps you find your slow queries use explain and lies or actually

even better is explained on Lies by first to kind of see what kind of iOS doing. So I tells you how many like for this query. What was the hit ratio and how many of them are Too Faced Festival desk and then lastly you can use PC stats class to get good approximation and understand what was the stands about your data? So I have one last thing to say, so if you have nots The new citus data was acquired by Microsoft. So we are actually working on like launching sites on I sure if any of you is interested in

seeing when that is going to be available on IG or you can just go to this link and express interest will just keep you in the loop on that and then that's that's my talk and I have two seconds remaining so I was right on time.

Cackle comments for the website

Buy this talk

Access to the talk “RailsConf 2019 - Optimizing your app by understanding your PostgreSQL database by Samay Sharma”
Available
In cart
Free
Free
Free
Free
Free
Free

Access to all the recordings of the event

Get access to all videos “RailsConf 2019”
Available
In cart
Free
Free
Free
Free
Free
Free
Ticket

Interested in topic “IT & Technology”?

You might be interested in videos from this event

September 28, 2018
Moscow
16
166
app store, apps, development, google play, mobile, soft

Similar talks

Matt Duszynski
Senior Software Engineer at Weedmaps
Available
In cart
Free
Free
Free
Free
Free
Free
Glenn Vanderburg
VP of Engineering at First.io
Available
In cart
Free
Free
Free
Free
Free
Free

Buy this video

Video

Access to the talk “RailsConf 2019 - Optimizing your app by understanding your PostgreSQL database by Samay Sharma”
Available
In cart
Free
Free
Free
Free
Free
Free

Conference Cast

With ConferenceCast.tv, you get access to our library of the world's best conference talks.

Conference Cast
577 conferences
23287 speakers
8705 hours of content