Events Add an event Speakers Talks Collections
 
Duration 44:22
16+
Play
Video

PGConf India 2020 - Schema Change and Schemaless - Simon Riggs - 2ndQuadrant

Simon Riggs
Postgres Fellow at EDB
  • Video
  • Table of contents
  • Video
PGConf India, 2020
February 27, 2020, Bengaluru, India
PGConf India, 2020
Request Q&A
PGConf India, 2020
From the conference
PGConf India, 2020
Request Q&A
Video
PGConf India 2020 - Schema Change and Schemaless - Simon Riggs - 2ndQuadrant
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Add to favorites
110
I like 0
I dislike 0
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
  • Description
  • Transcript
  • Discussion

About the talk

How does PostgreSQL cope with schema change on production databases? What functions are available and how do we use them? What other features are on the roadmap? Is a Schemaless approach to data management via JSON a valid alternative approach? What other issues are faced if you adopt that approach? What improvements might we expect in the future for schemaless?

About speaker

Simon Riggs
Postgres Fellow at EDB

Professional developer and consultant for PostgreSQL with 35 years of experience with a variety of database technologies. Regular conference speaker. Enterprise-level experience working from within startup businesses product/service companies, including 2 successful IPOs. Specialties: Creative thinker, Designer and energetic Doer. Data management and Database tech.

View the profile
Share

Hi, how's that? Excuse me, I've got a bit of a cough so I hope you all enjoyed your lunch. I said leave it. So I just keep you awake up for lunch. I'm going to talk about scammers and skinless. Enough, we're not under attack. So what we really talking about here is how you move from one version of an application to another version when when relational databases were invented, that kind of imagined that you were going to design your application and then basically never

change it. And that was a really bad idea. Ran over the intervening years, we've realized that applications need to develop very quickly to stay on top of business requirements. And so that Ben presents, the challenge of how exactly we can change the underlined dates by steamer. Now, sometime in the early 2000s, moving around a nosql that advocated skinless approach to data, and one of the reasons why that approach Play Miguel was because relational databases did not really correctly address the topic

on schema change within the database. And as a result of that, people look for wise to avoid using the pay scheme is all together. Now, where we are in the present day is post raisin, probably other ones as well have gone to a lot of trouble to make schema change with in your database much, much easier. And what I'd like to do in this presentation is explain how that works the options that you've got Force came and changed. But also to contrast that against the actual schema-less

approaches and the wife of that would work as well, difficult minutes. But I'm going to try So the first thing to realize is that post praise ddl is actually transactional which makes it a lot easier to enact certain types of changes in your database. A sight easier. I don't use the word easy because there's some challenges along the way but you need to understand the first of which is that there are some ddl operations that you can't execute within a transaction.

For example, of academic writing that concurrently convey executed within a transaction. That's because the wind index can currently works is actually to execute three separate transactions wallet, come on Progressive. So there's a possibility we might change that in the future but ready for now it is difficult to make that transactional. So the first thing that we need to understand DGL operations is the top of looks that I can buy different types of operation.

So full crate on the table. As you might expect, we need an access exclusively which is what post gray skulls that the highest level of law access exclusive. Meaning you can't even read the table while race is executing, and that also applies for drop. As you would expect, you wouldn't expect better read the table at the same time as it's being dropped. Now, the table is different because old two type of oil is executed. Potentially while there's a stream of

changes against the table. And so, we need to understand that some operations do require a complete lock on the table, but many operations doesn't require. The the full lock. And what we going to do is look at the fact that some operations needed a much reduced level of look across, there's actually a few seats over here in a few seats that. So if there's going to be some more people join them, they'll be at my room at old ass. So I might as well, stop the video now with an

interrupt it lights up. So there's a difference between create index and create index, concurrently create index. How old is Sherlock on the table. And that prevents you from doing any bright operations against the table, obviously very restrictive on the production system. And so that's why we invented, right? Index, concurrently and mail these drop index concurrently and re-index concurrently So the way the walking work is actually going to be covered in some

detail by my colleague a budget. So, I recommend that you attend his sessions well, but just to mention a couple of points about that buddy. White Forever, which means, if you issue a ddl statement, it happens to White behind a long-running select statement, then the PDL operation that you thought you was going to be a nice quick. Operation could sit white hours behind that select statement. Okay, so even if the operation itself runs in less than a second waiting for the law, might take hours. Okay? So they actually look white is important to understand is a TBI.

That's one of the reasons why we have a permit to go, look for him out that allows you To specify that you don't want to wait forever, giving the forever is quite a long time. This is another test practice that comes out of that. In the. You should try to run your ddl at quite a time. Now, that's a ridiculous statement I know because a lot of people running 24/7 website. So what is a twat? A time? Okay, well they might not be one but if there is one, it might require some assistance from the

application itself where you are actually turning off parts of the application or constraining them in in order to run the DL. So, it's no surprise to imagine that. If you start executing vdl on the system and the application team, I've got no idea that you're doing it. It's going to file a guy and not sweaty. Wet Devil comes in the developers. Need to be aware of what's happening in the operational database. Otherwise you'll get a file. The other thing that you need to think about is the duration of ddl operations

duration. Meaning, how long will execute for like a crate table? Operation is a fast method age repression. What I mean by that is, it's the type of operation that only makes changes to the database catalog. It doesn't involve extended operations on the titret. Sell Soffe example, drop table. When we drop a table, it's against the catalog app, search to see if there's any day two blocks from that table in shed buffers. So there's a slight but noticeable to lie. If you thought 32 terabytes of cash or whatever it is, you're running with these days, it's going

to search all that before it drops the table, okay? So that does Take an extended amount of time. A crate index, for example, who weighs needs to scan the complete table, even if you're building a partial index. And as you might expect that for the big of the table is the longer, the crime rate index will take. And that's one of the reasons why we use partitioning because partitioning, then allows you to split your table up into similar size unit not put some Maximum duration on the execution of the ddl

operations Decay. That's a particularly useful thing and the table is again, a social complexity in the summarization of fast method. I threw up raisins and some are very slow needing to scan the table. And there's also some operations that look the table completely and rewrite every single Row in the table. Okay. So just because it says Ulta table on the C'mon Inn Doesn't mean it's going to execute Faust or slow. What you need to do is understand the difference between those types of operation.

So, just to explain that in more detail. If you add a column and that column is going to be full to know, the not so fast operation, I mean post was 11. We know that the if you added a column with the default value, that goes like I quickly, okay drop colon is all so fast but it's fast because it doesn't remove the data in the table. We just tell you to ignore it. So when you scan the table, it's still looking through rows that contain that data and it's only until you see a way to remove that data.

Now again to type of a column is something that can be fast or it can be slow depending exactly on how that works. So, if you doing something called a binary cowesett will change that will be fussed an otherwise it's slow. Now, let's have a show of hands. Who knows what binary colestipol means? I'm not really sure if I do gym claims he knows will I will test lights up, but okay. Okay, goodnight. Sleep. So what does binary cover civility mean? The first thing to understand is that it's actually one way

operation. So if we move from my example, as a colon poop description, that defined his voucher 1/2. Right? If we move from one voucher 1282 voucher 20. That is not buying recoverable because quite clearly, if you got hundred and twenty I bought some character died. Not going to fit into 20 but like I said that doesn't work. What does work is? If you go from 128 to 256, knowing that example, every single engine 28 character string will fit inside a 256 character died to type. And

as a result, we were able to Magically say that change has taken place. Okay, so the bunnery colestipol 1 is quick on the bunnery non-perishable one. Is slow to. Another example of that is at range from integer to Big ain't now, a lot of people of when I first to find the table, that I put my keys on the table tonight to find is an integer. And then now I've got to replace that with a big event will oversee. An integer is full bite on a big into the lights and there's no way that you can make one work inside the other. So, that's not by Noriko Isabel and it causes a

complete rewrite of the table. So, let me just remind you again, the rewrite operation willfully lock the table, and it will rewrite every Robin that table. So if you go to one terabyte table then. Going to run for a very very long time. Okay, so you'll probably be sacked if you try to rewrite your one terabyte stable. Where is if you do I find a recovery symbol giant. You get to keep your job. Okay. So kind of useful information to turn the sand right? So so make sure you think about the type of change

that she'll gang to Mike. They're always to test this. But the most important thing is that, you understand those operations before you make those changes on the production system. I say that because we experienced last week, somebody making a change that I hadn't tested on the production system and that causes difficulties. So one of the things that we've been working with in bdr, which is second quadrant distributed postgresql, because of the way that we do packing, it's not necessarily possible to make that work. I agree. Conceptually. It could be possible

that they were hiding in the row for the work, would allow you to make that change. But at the moment the test is not does not include that in sight in the code, it just hurts whether it's binary, possible. So what I might do is note to the fact that it possible that such a rewrite might be achieved without rewriting the table. And I recently made a list of all of the ways that we could improve on to table and gave that to my team. That we can potentially think about

improvements in the future that exact case was on the list. So it's potentially possible to change that in the future. But right now, it rewrites, the whole table. Thank you. I'm sorry, I hate when you sleep. Can you please, repeat apply two types of language like C or Java. I have a sewer pipes to sign, but I think the answer is no. Yeah. So what so what I mean in terms of what we plan to do in the future as a set of just literally 2 days ago to list of all the possible ways that we could improve yields type Pokemon

and including tables that makes different rhyme version twice within them, which was sent one of them. I think I might be what you're suggesting. So yeah, this is certainly room for further Improvement, in postgres. Much of what I'm telling you, is a journey that I've been home for the last 10 years to improve looking strategies and improved, why that we do things a recent changing post code 12. For example, I was an improvement that allowed you to change a timestamp into time-stamped diaper type which had previously

been a rewrite. That was okay. 16 Pi Square 12. So every release. If you look through the release notes, we are improving the Y this FEMA trains Works. What I was about to be about was I was busy off because one of the things that we built into Beady off is the ability to have different database Came As for the same table on different nodes. And we'll buy allows you to do is something we cooler a rolling database camera upgrade where you can actually change the table. One plus one node

in your cluster as a y of upgrading your database. So should be there in just about her availability is actually about system and scheme Rock guards as well. I'm not going to talk anymore about that he but I just wanted to mention that that's one of the features in there. So these other actions that we have within OTA table possible to enable and disable triggers and set of various things, but statistics information operations metadata, Riley operations with the trigger. For example, will take a shed row exclusively which looks right? Where is the other options? Take

a share update, exclusive lock, which doesn't block right? So that allows reads and writes. Well, it executes Of this different lock types held for partitioning in the when you attach a partition, it will look the actual politician fully as of 12 to Robert. We've now got up tight exclusively on the parrot table, which means that we're allowed to do reads and writes against other politicians, but we just come to other dtl while at 6 when we catch a politician, we haven't yet worked out a way

of using and if they have access to exclusive look and as yet we don't have good while collecting partition. If we did, or be worried about how long the locks were held and that kind of thing. So, again, some for the work to do What some people are not aware of is the old table has gotten up to my zacian. That allows you to make multiple changes in one statement that actually optimizes the number of scans of the table, which makes it extremely efficient full production use. So there is some optimization in there already. And as a friend mentioned,

this potential future optimizations as well. What are the things to watch for when you are making? A lot of changes is if somebody sleeps in a crate in the excitement, in the middle of what you thought was going, to be a short change because the the old two type of statement is an expeditor only operation and he's quick with the cry and excitement is not optimized and it will scan the full title so that transaction will For a long time. So you need to be careful to inspect your

detail to make sure that it works effectively in production is another trick that you can use Kohls valid, which is when we add constraints to a diable, you can request for the constraint is not validated. That is not validated that means that you don't need to scan the table. And what that means is that the execution time of this CDL will be much reduced, like one second of the one hour. I know after we've added that constraint, we can always come back later to

execute, I validate, constraint operation. Unfortunately, you can't do validate or constraint. You have to name them one-by-one but you can do it in one part of the table. That's a good thing about Valentine constraints is that runs with a share of uptight exclusive look, meaning that you can run it while using reads and writes against the table. So just split your CDL operations into looking characteristics are greatly improved. So what we're talking about

here really is devops for database. Some people pulled application, release automation. The key Concept in doing this successful in production is in my opinion to use scripts and code a note to use gooey tools because I guess he's famous in my mind, but I want to send to a customer and while I was with the customer, I clicked on it, thinking that it would be a foster probation and it wasn't, and the whole database looked up and everything stopped, and I was severely embarrassed. But you know, we might mistake the point there is

that if you just point-and-click using a gooey in Imagine magically going to work it magically works but it doesn't stop it from looking up. Nobody else on the system like Ike. So you need to think about infrastructure as code think about automatic testing, Andrew reproducibility of bugs. You know, when people say to me all systems going wrong and you say, what did you do last night side? Then that's not very useful. It's not really going to lead to a problem solving.

There's no salt product for liquid base that helps with the execution of ddl, which is what my phone to turn application really slow motion. You please card work with all the databases is the post-credits can utilize but I'm hopeful that it can be changed to do. So useful tool for being certain about the Y that you roll out your CDL onto a system so that potential future as well. Anyway, they said that's the current state of affairs with managing your scheme is what I'd like to talk about.

Now is a price in your database in a in a skinless, Manor. So the basic support the we have for a skinless approach is to use document Centric types of types like Jason or XML data. And obviously if you're using Jason and the best implementation is to use the Jason be like to talk because that's better. Youd access. And if you're using Jason tighter than you can use one of the advanced Intex type. So we've got, for example. The index is suitable for you. So Jason be data approach. You have that allows you to

add new fields to your data without executing cdl-a nuts. White cool scandalous. So, if we insert a diagram to the measurement table, you say it, I mean, 13, a kid with the value of 23.4 and then the very next insert, I cannot sleep invented. You failed and use that immediately with an SQL. Another good thing that is that I'm able to use the new field automatically, and also the new piece can be automatically index. Dtl of any kind in order to introduce that type of change within your table.

Now, that's right. But my view I'm not is that the regularity of your data within the database actually is effectively just a different kind of schema and that requires you to manage that as well. And what I actually wanted to mention is that Ivan is going to give forget to practical use of Jason. And so in this town who are not poking very much about the capabilities of managing Json data, how to connect, it's going to be a good talk, so make sure you go to that one place. Okay, I'm going to be talkin about the

difficulty of managing data as it got progressively, more complex, within Jason. And what I'm saying is that as you invent Yuki's within your diet, Managing that becomes a more difficult process because if I send an email out to my fellow guys I just invented this new K is pulled sunny and you can use that immediately in the application and I miss read it and I think it's actually called song Thousand Sunny. Then I'm in I'm in setting Die 2 with Sonny and it another guy's inserting die to his son in it and we

both think the tight is the same but when we searched for the data, actually we we don't find the title that we're looking for. A not then leads to die to quality issues with your database because they say you thought, you didn't searched with one data value is actually stored on a different key. Possibly, and I'm going to talk more about that as well. So I'm not just bad-mouthing this. I'm talking about the potential pitfalls of of this particular approach.

One thing just to knock their is the depending on the length of the name for the key. When I use the key song, it was shorter than 10 and so Jason be actually puts that first in the implementation. So they actually the order of execution, of the older place. He's can change depending on which which team values use. So there is some differences that So what I'm saying is that this potential to introduce state to Quality problems into your database unless you in for some kind of strict management application liar that's going to

enforce what those keys are not good but what I'm imagining is that there's a database with his multiple applications acting against it and sometimes we want to change the way I feel times up a cold, sometimes we work. The dates are in the database can get quite complex, unless you manage it in some way. So one of the things that we can do in the database is actually say a query to sidewalk keys, are in use in the data. And the purpose of doing this is so that we could look at the list and see if there's some keys that we don't want in. Any more might be 139 them, move

them out, delete them, whatever. Okay, so these are some techniques that we can use to do that. So once we've identified keys that we don't like a week and then issue some queries to find rose that have those values. And in the situation here, where I have decided that actually, I only want to see by these wonderful tenth, one called Sunny bike and then execute a query. The retrieve the data that field values, or I can search for the other way around, I can

bring out the data that doesn't have those bodies. So not in the list of key values that I want. So this is a technique you can use for finding irregularities in your Jason skin. That's a great question because it's not on the current function. That's on my list of option. Is this? Just be objective case? He's not recur Civ. So doesn't descend the hierarchy. So, when you dropped her off, your cool Json data, it is much more difficult to search within it. Anyway, because of a guy,

I want to put constraints and reference checks inside the inside the application. So there's a movement on going now, pulled Json. Schema the idea is to invent why of specifying that the Jason in your database should be regular so that you can come up with a strict definition of what the database should. And that is an example of a Json, schema. That I've written specifically to enforce. My example, I know it's a trivial example, but lots of valid Json, schema, and then we can actually

in. Scammer within our title Vice using some validation logic. We can actually have a check constraint on the Jason pullam. The enforces, the Jason stamen and there's actually two examples already available that allow you to enforce. The specifics came up within your diet of 1 is written in plpgsql, which means it will install on your favorite clouds. And the other one is written. See, which makes it a little bit more challenging to install, but it's actually a lot

faster to validate your Jason. Beta Decay. So, what I'm saying is that if you use Jason in an uncontrolled manner, it could cause problems for you, but I am recommending that you use your Json data in a structured Manner and that you think about enforcing constraints upon that data. And I guess with a slight smile, I say that's where it stops being scheme list, and it's actually just a different kind of schema McKay. So I thought a slide on Jason. I'm certainly not going to be covering so I won't go into that

here other than to say there's a couple of things that you need to think about. I am not bad-mouthing, Jason perfectly valid to use annual postgresql, database is what I'm trying to do is present information for skinless, and scheming approaches so that you understand that these pros and cons to each approach. And my slip strike please. And that is the most approaches can coexist peacefully within the same table even let alone. The guy is really one of the big strength of

postcards to hell is that Do you mix different types of data within your database so we're not prescriptive. We don't say, you must use Jason but we don't say you must not use Jason. What we say is you can use either and you can mix and match it in any way you choose. Okay. Is what is really powerful I do have to say that certainly in researching this tool, I found some problems in the implementation of Jason, be no bugs, but possibilities for future improvements, but like I said, I also found some possibilities for future in the way that we do all to type.

So as you might expect, these going to be right by vitamins on the device credit development, schedule from it for many years into the future son. What I wanted to do hair is actually refer to a real world example of the use of both of these techniques inside a single application. And I received the permission of Moody's Analytics to describe the way that I have used that vice president to allow patient. So the latest version of the application for credit lines

runs only upon postgresql. And the reason for that was multi-fold, honestly, they were migrating away from other databases but the credit lens application itself is managing the disparate types of information and it's quite important in that world that the different types of information and managed correctly. So when you Launching Credit Data that tighter is coming from a variety of sources. And so, it's very complex, and it needs to be structured in regular

wise, but also flexible eyes. So, what we have within this application is the within the application itself, you can actually specify use it to find fields in homes. The definition of that database, full documentation. And because of the way, the price per sq, I was reading the menu option that allows you to add a filter and then that generates an old two type of stipend and then it will be executed for you. So it's not what's the date of ice is just point-and-click to. Actually I've used to find field and because the old people statement is adding a

column is it? Fights very quickly and easily with no DVI impacted stool. And so that's a great example of how we've managed to bypass completely the PPI in terms of adding flexibility to the application of a bro. So what happens with in credit line is that additional user specified data is held within a load of Jason datatables oil isolating that data into a small number of tables. We get two things Festival. We get the flexibility from being able to a new type of fields easily. So, both Moody's, Moody's customers can add new types of data

to the credit reference application. Easily. But when it comes to upgrading the application in place, that's also easy to do because of all the configuration data is held in one place within the Jason, okay? How're they want application, we can actually use the mobile features of bicycles, 12 to Great advantage. And what means is the credit Legends only runs on pipe break you out. I fully support the idea that applications can and should be written enough in a database agnostic. Why where there's advantages to

exploiting additional features in price prices, they're not square, or a Celine courage, people do that. Nothing. This is a brilliant example of all the features being used only once. So, thank you very much too Moody for giving to me. To mention that on the slide off the conversations that I had this morning. So, thank you very much. Excuse me, I'm sorry I won't discuss it any further. Detail, the contents of Moody's applications you can speak to that but I

can re-enact many multi-talented. If I did I couldn't. Hello, Simon. One question about the scheme of Jason, how do you update the data, and how is that different from updating a column in a table to go to slide on that? So, thank you for asking that. So one of the difficulties within full, sing any kind of constraints with, in the database is, if you invent new constraint, then you have the difficulty of do. I Implement that constraint on all of the existing

data or do I just Implement a constraint from now onwards? And that's one of the difference is that you get with adding a check constraint for you. I didn't know valid because that's not to take play satisfying. So my solution to the problem is to actually have a scheme of version pulling on your table. And then when you choose to implement constraints, you can actually, implement the constraint. The saying, Cool rose that have schema version. 5 onwards, we can implement this. Check constraint the earlier versions, it's okay for them to not be compliant.

And that technique works with Post-Standard relational data and for Json data. So if you actually Implement a new versions of the schema, you can always do that in such a way that you're only in cementing the new steamer from a certain version up words so that works. But the general rule is if you're relaxing a constraint. It's easy. If you're pointing a constraint is hard because that's the sort of general advice about what to do. I do need to stop smoking at some point on the subject. So I

bought an appetite for the questions. So what my my summary of this is pie spice to help is very powerful. Allows you to have both fixed the cleanse, steamer approaches and stimulus data within the same table. If you want it or in Separate Tables, if that makes sense to you. And both types of data are supported properly with good, fast, executing ddl, and multiple types of index is the proper police support those dates pipes. So gives you a wide range of options when

you're designing data basis. So thank you much Thank you. Sammy any questions? Excuse me. I would appreciate if you raise your hand before asking the question. If only because I can't really hear you either one. So this is conception postgresql, been there for ages called? For Indie developers. Limitation, in particular functionality, you can always write in the rapper which links in as part of the instance, Jason coppy recursive, I don't really see that at a disadvantage for the simple reason. It defines the starting point, if I go, if I apply to Jason part Library.

So I'm doing still a flat search, it just starting point, shifted things are supported in compost or whether you have to write them yourself. Honestly, all you do is open the manual book page 57 and it tells you to use a function, then that's cool. If you have to write yourself, it takes time. So, clearly ever release his going to have More and more things within it, but they are always going to be things that you need to add specific for your application in order to make it work. So there's no blockers to using post placed the just

things that require a little bit of attention that you will specific implementation. So that's really another illustration of the power of price, guys, because we supporting use it to find ice type. So you can always write your own stuff if you want to. So in summary, please go to Habitat store, looking. Please go to Ivans, talk impractical Jason and please enjoy the rest of the conference. I apologize. I need to leave tomorrow so you won't see me. But it's been a great conference. Thank you very much for your warm, welcome as ever. And a

couple tokes. I think that I'm not seeing it.

Cackle comments for the website

Buy this talk

Access to the talk “PGConf India 2020 - Schema Change and Schemaless - Simon Riggs - 2ndQuadrant”
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free

Standart

Get access to all videos “PGConf India, 2020”
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Ticket

Interested in topic “IT & Technology”?

You might be interested in videos from this event

November 9 - 17, 2020
Online
50
19
future of ux, behavioral science, design engineering, design systems, design thinking process, new product, partnership, product design, the global experience summit 2020, ux research

Similar talks

Nikhil Sontakke
Contributor at PostgreSQL and Postgres-XL Global Development Group
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Ivan Panchenko
Deputy CEO at Postgres Professional
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Parikshit Savjani
Principal PM Manager at Microsoft
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free

Buy this video

Video
Access to the talk “PGConf India 2020 - Schema Change and Schemaless - Simon Riggs - 2ndQuadrant”
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free

Conference Cast

With ConferenceCast.tv, you get access to our library of the world's best conference talks.

Conference Cast
835 conferences
33915 speakers
12829 hours of content