Duration 39:25
16+
Play
Video

RailsConf 2019 - Database Design for Beginners by David Copeland

Dave Copeland
Experienced Technical Leader. Formerly Chief Software Architect and Director of Engineering at Stitch Fix
  • Video
  • Table of contents
  • Video
RailsConf 2019
May 1, 2019, Minneapolis, USA
RailsConf 2019
Request Q&A
Video
RailsConf 2019 - Database Design for Beginners by David Copeland
Available
In cart
Free
Free
Free
Free
Free
Free
Add to favorites
7.09 K
I like 0
I dislike 0
Available
In cart
Free
Free
Free
Free
Free
Free
  • Description
  • Transcript
  • Discussion

About speaker

Dave Copeland
Experienced Technical Leader. Formerly Chief Software Architect and Director of Engineering at Stitch Fix

Technical leader with over 20 years of software development experience. I lead by example, providing solutions and products directly to users. I know secrets about scaling engineering teams at startups as well as agile product development.

View the profile

About the talk

RailsConf 2019 - Database Design for Beginners by David Copeland


Rails’ migrations were a revelation when Rails came out, as it provided a way to manage your database schema, but also included many wonderful defaults that made all Rails developers pretty good at database schema design! But these defaults are just the beginning. Properly modeling your database can bring many benefits, such as fewer bugs, more reliable insights about your users or business, and developer understanding. We’ll talk about what makes a good database design by understanding the concept of normalization and why a properly normalized database yields benefits.

Share

Alright. Hello. Thank you all for coming. I want to start off this talk and I thought experiments. So it's a thought experiment. Do you think through in your head to try to learn something about something and you usually place a hypothetical situation and it reveals information. So you think about an application that you work on that has a database and and think of one that's like a very important to whoever it is out of paying you to work on it. So for example for me, I've worked on a warehouse management system at Stitch. Fix it manage our inventory. How much did we pay for who's got it?

Where is it? Who do we sell it to write super super important? Think about what would happen if Source go to the application disappeared from the face of existence Gone Forever. Never to be returned. No backups nougat Repose hang around and where is gone? How could you recover from that? So I think you could and we will be painful it would be would take a long time of expensive but right you kind of do have a app works you work on it. You got users. They kind of know how the app work to get figure it out the business people paying you they know how old they wanted to work. You got the

database still there. You can piece it together it would do in stock, but you could do it. So no take out. The other way application is fine the data and database goes away forever. No backups. No dump sitting on S3. It is gone forever and ever and ever to be returned. How could you recover from that? So I think that is an extinction-level event, right? You would never be able to recover from that. There's no reason to have a daily basis to remember stuff that you can't remember or to share stuff with other people who need to know it and that's why it's there. So they're there is no other

way to to get that. You can't remember what they look like and repopulated you would be dead. So what does thought experiment tells me is that the data are applications manager is actually more important than the source code did manage them which is interesting because I don't know about you all but I if I had to guess we've all spent a lot more time learning would how to write code and we have learning about modeling databases or managing data and I have two degrees in computer science and I didn't take a single database course there. They do not make me do any of that. So that's what

I'm going to try to start closing that gap on today. So, my name is Dave Copeland or Dave tan 5000 on Twitter how I know anything about this despite my lack, of course in college about it was I was lucky enough to work with someone who understood database design who understood the theoretical computer science that that like makes it a real thing. He was very Persnickety about having our databases model populate. So explain what that was why that was important to be worked on a database. It was very terribly modeled and he gave me the language to say like why it was so bad and then I've

since read books and papers about this thunder. The theoretical computer science that underpins were going to talk him out. And if I do a good job here, you won't have to worry about any of that cuz it is very hard to understand. So hopefully we can sort of start to be more confident about how to do that also is a side note while I have your attention and the co-author of agile web development with rails 6, which just came out of beta today if you were looking to your new rails and you want a book to learn reals is very good one is the discount code for getting some money off. I will tweet

it later. So don't worry about that. Cool. Okay, so first what is the date? Do a lot of things in life are databases. Right Reddit is a database your file systems a database Excel as a database, but these are the kind of databases were talking about we're talking about the kind of database of your application probably uses and that's called an Olt PS4 online transaction processing and I had a whole bunch of slides to explain what that meant. But I'm going to cut to the end because it's not super important what it really means that the oltp is a source of Truth. It is the one place

where the facts about the world that you care about go. So that database I talked about how much did we sell this item to someone for that source of truth that is in the state of Base in this warehouse management system. It gets copied everywhere. I get the cash for poor performance. It gets aggregated for business intelligence all over the place, but the source of Truth is at database. So therefore it stands to reason that database design is how we make sure that our database can be trusted to be the source of Truth. So think about Excel if we just put everything in Excel will you can

put whatever you want in there. So, how could you possibly trust that to be the source of truth? Because literally anything can go into it. So the design of the database in Excel does not really make it easy for us to trust it as a source of Truth. So this then leads to the first thing that we want to start thinking about his what are the facts about the world that we want to record? Often were given a generic business problem to solve or a wireframe to implement and that's totally fine. But somewhere in there is an implication that we need to capture information and make it available to

someone else to answer some questions. And so that's that. That's what we want to put in a database. So there's a fact that we want to record and we need to understand what those are cuz that's will see the active database design is really about making sure you understand the requirements very clearly. And so the first step is to write down with this facts are so we're going to go through an example database that is going to start out terrible and become better and better and will understand why it's getting better. This is very hard to understand this up without a running example.

So I've chosen a domain that I think everyone is very intimately familiar with which is of course professional wrestling. So we're going to make an application about professional wrestling and store data about professional wrestling. So here are the facts we want to know about the world. So wrestler might have a finishing moved or that they used to complete a match right? They want to win a match their finishing move. We want a database of that who's using what? A regular mattress on a particular show rights of the rock went wrestle on Smackdown Stone Cold Steve Austin Russell's on

Raw. We need to know who's wrestling on what show is not every wrestler wrestles on the same. Show me a particular match of a shell. So we want to have the Rock wrestling the main events because that comes on last we're going to get a big ratings Boost from having him there. So we want to keep track of that in our database but we might have Zack Ryder who's less popular wrestling the opening maps lights need to be in our databases well and lastly on every show in professional wrestling. They're fighting over a title usually so on Raw they fight over the universal Championship. So we want

to keep that in our databases. Well now you don't have to know anything about professional wrestling to start thinking about how you might solve this problem. How do you make model this data build this application? Right? You're good rails and Ruby developer so you can start looking at all. These little nouns the show up right now answer the key to know what to do or going to have our regular table finishing move attribute to show table right? You can start to see if form in your mind seeing object Matt object model car forum. So, you know, you might do something like this. Okay, we will

know what shows have what matches by joining them together. And then if we want to know what wrestler wrestling what show would match with another table that brings it all together. Now, if you haven't done a lot of data modeling at all that probably seems like an incredibly fast and confusing so for here and I were going to go very very slow. But yeah, but got, you know, we can just going to do it and I've done that many times and you haven't got feel about how things should relate based on how you understand things and it was not bad. It seems to meet our needs but it's got some issues. So

suppose that Zack Ryder Russell's on in The Opening match of the show Superstars, right so we could store that in this model. Now suppose that Superstars doesn't have an opening match. Well that can also be stored in this model so which is it right with we have a data model that was at the store two pieces of information that don't agree. So that's not good. Right? We don't want to have that situation. Then that's the best of case we can't rely on this to be the source of Truth. So what type of thing is called an anomaly sonali some data model prevents the storage of facts

or allows ambiguity to exist like we just saw or requires the leading one fact in order to delete another fact, we'll see some more specific examples of these later but the takeaway here, is it a data model that hasn't on Ali's can't necessarily be trusted to be the source of truth because it allows you to store wrong information or not store, right information. It's a normalization is the process by which we remove anomalies from our design sounds very formal and formality is kind of obtuse sometimes but it actually is going to be very helpful because

of what we're going to see is a little bit of formality allows it to me very strong statements about me changes were going to make to our data model and that therefore gives us confidence that we are improving the quality of our data model. So, like I said, we're not going to get into a ton of math but I am going to use a few of the terms from this area of research. Mostly. So if you actually want to know more about it, you know what the words are to start kind of Googling and start your search towards the the great depth of knowledge here. So normalization to possible to learn about here

normal forms or mathematical truths about the data model that tell us what sort of anomalies have been removed. So normal forms have an order lower to higher. So a lower normal form is open to more anomalies idea is a worst day to model I8 cannot be trusted to be the source of Truth a higher more normal eye. State of model has fewer anomalies and therefore Kenmore be trusted to be the source of Truth. So normalization is is doing that promise is not going to be a ton of math. So let's get back to our domain here. So instead of trying to make some object graph in in attributes and

stuff like that. Let's just throw everything into Excel and see where that gets us. So here's our data model. We got a column for everything we want to store and we just put some frozen there to kind of keep track of this right The Rock's finishing moves the Rock Bottom. He wrestles on Raw and SmackDown Raw is when Universal Championship has defended and the right so we can store some facts about the world. Probably. Nobody thinks this is a good database design, right? But how do we know? It's not a good database design. Yeah, we can say it looks gross or it's uglier the duplication,

but we can make much more strong statements about it. And so we're going to go through this process rights the right now we have kind of whatever right and good just put anything we want in there. No rules to hold anything. That's why it's not a good. So the first step is going to be to modify our table into what is called a relation. I'll talk about that in a second. But a relation is like the bare minimum of any kind of design of a database and from there. You can start to make real improvements and the way you make those improvements is actually pretty straightforward and those

improvements will lead us to what is called a normalized schema. And again, it will be in what's called Boyce Codd normal form against my Googling do not worry too much about that. The point is it will have provably fewer anomalies and therefore provably better and therefore more likely to be relied upon as a source of Truth. Okay, so I talked about a relation. What is that? It's going to seem, like not much. So relation is a table doesn't have any duplicate rows for every field has a single value in it and the fields of each column are all of the same type that sounds kind of productive

and not that interesting but this is just the Baseline. This is the bare minimum and if you remember what are Excel spreadsheet look like you can already see it doesn't exactly meet these needs you. So before we see what our database looks like as a relation, I want to talk about types cuz I did mention that right. They all have to be in the same type and type isn't like category Theory a skill level of specificity, right? This isn't designed. So you just need to think about it a little bit right think about all the names of wrestlers in the world. What is a little strings in the world is

more strings than there are wrestling name so you can see that kind of just there. So it's something just really basic is not a value. So therefore no cannot be in allowed value. So therefore no field in our database will allow know and I will talk about what that means later when we actually write code to deal with this day tomorrow, but until then no is not allowed at all and we won't need it. It's not going to be a thing. We have to worry about we won't need it. So we know what fancy want to record we want to get them in some form of some tables that we can call a

relation. So here's our Excel spreadsheet. We can see some problems right Universal title. That's not the right type of should be Universal Championship. So that's a problem to values there. Right when you want single values not to same deal. They're so how can we resolve this while we can just we can fix it the universal title change that to Universal championship and then we can just duplicate every single Row in here. So if you don't have multiple rows or multiple values, right so same data, right? So we see the Rockettes in there twice because he's onto shows so that's why

there's two rows there but this is the relation. So there's no duplicate rows every field has a single value and everything has the same type are you don't see the rock in the finishing of, don't see Smackdown in the title, that all kind of makes sense and it's not goodbye. It's not great it still open to anomalies, but it's at least a base that we can start to say definitive things about Right into this is what's called first normal form. Now it is open to anomalies as we said, right. So right Kenny Omega is a wrestler for WWE

because he's not on the show the defensive title or a match if we were to remove the Rock and Kevin Owens, we would also lose the knowledge that raw is with the universal Championship is defended. So that's not good Stone Cold Steve Austin changes his finisher to the Million Dollar Dream, but we have to remember to do it in two places. Otherwise, it's confusing as to what his finisher is so these are all the kind of anomalies that our data model allows that we are going to know eliminate. But because we've actually gone to

the process of taking whatever to a relation we can now actually use a well-defined process to do it. That process requires that we capture the business rules in a certain way and it's very very lightly formalize like very lightly formalized. We need to capture in terms of two things that I'll talk about functional dependencies and keys and so we can ride the business was down in terms of these two things. We can literally execute an algorithm to improve the design of our database. So let's talk about these two things separate will put them together. Hopefully it'll make sense. So

functional dependency is when the value of one column unambiguously implies the value of another column based on the business rule. So what does implies mean? Do organic sample write a wrestler has a finishing move if I want another finishing move of a wrestler. All I need to know is the wrestler there for wrestler implies finishing move same with show in title. So things are not functional dependencies a show can have more than one match. So therefore there's not a functional dependency there because if I have the name of a show, I can't necessarily

get a single match. I might give 0 make it one of my cat to I make it 5 so there's not a functional dependency there. Also know the functional dependencies don't necessarily go the other way by we saw in our example data that a wrestler multiple wrestler can use the same finishing move. So therefore finishing move doesn't imply wrestler if I have stunner, why don't know what wrestler uses that cuz there's multiple wrestlers that can use that and again this depends on the business world. What if we decided in our world of wrestling No Rest no to Western could use the same finishing

move that were true. Then there would be a functional dependency so you can see how this is about capturing the business rules to the way that you do. This is you need to write them down in this format. And then you talk to your business partner like this. Is this correct in my is it true that if I know the wrestler I can do the finishing move in, you know, you can talk through and it's a way to capture what the actual requirements are. So I also want to point out there's a difference between the data itself satisfying the functional dependencies in the data model doing it. So this database is

data model is an Excel spreadsheet more or less the data satisfy the function dependencies, but that's not that important to data model is what needs to sort of here to this because this data model allows us to do this now that it does not satisfy the function. So as a statement of our design write a design in which we can execute a query based on a wrestler and get an ambiguous result back about what they're finishing move is Is open to anomalies Because by the business rules, we should not get an ambiguous result. If we query for a finishing move based on just the wrestlers name. So

hold on to that for a second. Now talk about T's and then we'll bring it all together and I hope it will all make sense. So you probably heard of keys before primary Keys IDs things like that foreign keys. Right? So now it is simply a set of columns that uniquely identifies a row. Set another way no more than one role in our database can have the same set of values for those keys. There can be many keys in a relation based on whatever the business rules happen to be the business will say what forms of duplication are and are not allowed.

Pointing out that all Columns of a table do for Mackay because the rumor relation is a single no duplicate rows. So therefore all of the all of the columns of that table are key but to the business with the other Keys now so you have to do so According to our business rules a wrestler cannot wrestle the same match on the same show more than once even in wrestling that would not make sense. So therefore wrestler showing match is a key know there should be no more than one World tableware Wrestling show and mash is duplicated if I have a wrestler and a show and a match I can get an

unambiguous answer from her database. That's what a key. That's what he means it. So you have to find out what this keys are and you have to do that by asking questions of the people that want you to build this application to verify. Yes. That is true. No wrestler can wrestle the same match on the same show more than once they will tell you that and then you will then determine that that is a key. Okay, so data satisfies the key in terms of the date of model satisfying the key. We simply just need to indicate what the key is so you can see the little key

icon there. We'll see how to make sure that gets enforced in the database a little bit later, but for the purposes of modeling you just sort of need to stay. This is a key. So now you got function dependencies and you got keys now we can bring them together. And remember we do this by understanding the business rules looking at the data can help you see what these might be but this is a way for you to verify that you understand the domain that you're actually building to you understand what you're trying to accomplish. What problem are you trying to solve? So what are the implications of

these two things? So to get an ambiguous information, we need a query to give 0 or 1 Rose. Therefore. We need a key to do that. That's a definition of a key. Now are functional dependency between wrestler and finishing move tells us that we should be able to get an ambiguous information about a wrestlers finishing move simply from just having the wrestler, but the wrestler is not a key. The rest is part of the key, but it is not itself a key. So that means that if any query that is just based on wrestler cannot be guaranteed to give us an ambiguous results. It could give us ambiguous

results and we saw that right I have the wrestler and what is The Rock's finishing move? Well, it depends on which role we're looking at. Rights of this is how we know our database is allowing us to store data that violates the business rules, right the design of our database because it is allowing us to write queries. It returned ambiguous results when it shouldn't not satisfy the business rules. So how do we solve that? So this was all the hard part for you doing design understanding and capturing and looking at the requirements to fix this, right?

What do we need? Well, if we had a table and wrestling was the key of that table and that table also have the finishing move of that wrestler in it. Therefore we could query that table to get an unambiguous result there for the functional dependency would be satisfied if that were the case. So let's make that table. Let's just create a table that has that and see what happens. So make a table that has the left side of our functional dependency, which is wrestler that's going to be the key of this table or going to populate the other fields of the stable with the right side of the

functional dependency, which is finishing move. Now because we have this we don't actually need the finishing move move column in the original table. So we'll just remove it. If we want to connect everything up together we can so we put our data back into these two tables now. It's starting to look a little bit better. And if we want to connect everything, right the wrestlers names match on those tables, that's how we can join them together and we can even join them together to create the original table as a view if we want to see where we can recreate exactly what we had now and if we repeat

this process with show and title which works exactly the same way in this case that's starts to look more like a data model that we might have come up with by our gut instinct. Right? We've got there's no duplication. It's it's looking pretty good. So what's kind of talk a little formerly the right we can see where the keys are now if we put the functional dependencies on top we can see that all of the functional dependencies are dependent on a key. Meaning we can get an ambiguous information out of our database which we couldn't before right by definition. We can't have The Rock in the

finishing moves table more than once. And so therefore if we want to know that he is finishing move with query that table and we'll definitely tell us That's cool. What we just did is underpinned by a ton of theoretical computer science and math. So this isn't just me showing you a thing. You could do this is like a actual algorithm proven by math to do what we just did. So we just turned our original relation into what is called Boyce Codd normal form. And what that means is that our data model has no anomalies based on our functional dependencies meaning

Based on your understanding of the business rules. Our model does not allow us to put data in that violates as business rules. And that is not like a gut feeling that is provable like that is you can't like say that's not true that is 100% true the math proves that we did that. And now that we have the structure, right all of that data that we wanted to insert or delete or change the me couldn't before now we can write we can see Kenny Omega is in our finishing moves table. Even if he doesn't work for WWE, we can store that fact that we couldn't before we can change the

Rock's finisher to the people's elbow and we won't worry about changing multiple places. Like we can do that in a way that is safe and reliable we can add Ricochet who has no finishing move, but he wrestles on Rob we couldn't store that before now we can so now our data model based on your understanding of the business rules can store everything we need at the store. So what we just did was we executed Heats theorem ganas the thing you can Google if you want to know more but the takeaway is that he's theorem proves that what we just did actually does remove

anomalies actually does make the day to model better probably objectively and that we didn't lose any data. So that was a very safe thing that we did and there's no debating about it. We have to debate the Elegance of remodel or have gut feelings about it. We do it's it's proved that we made it better by doing that. So we wrote down the facts. We wanted to record partly. They understand the business requirements. We made sure that they were in a relation of some kind in our case. We put everything in one table, but however many tables and how they should be at least a relation and then we

wrote down our understanding of the business requirements and terms of functional dependencies and keys. That's the hard part. That's all the thinking once you've done that executing his serum. You can write a ruby program to do that so I can show you one that I wrote write. It just looked at the data model looks for places with a functional dependencies are not dependent on the key of the table there in andrie designs a database to do that. It's mechanical which is not my intuition when I was really funny about this. It seems like this fourth part was the thinking what are my tables going

to look like, but that's actually not not it's more what are the business rules? Okay, so this is kind of all theoretical right? We didn't see it. The field called ID. I said the word primary key but we never talked about what that meant. And if you've done anything with rails you probably bad. This is the only thing you've heard right? There's definitely IDs and that thing is going to primary key where is all that and what is that doing? So let's talk about that. So a primary key. Usually just the key that we all agree to use most of the time. There's nothing particularly special about it in

our case right wrestler showing match. That is a primary key. That's the the main key of that first relation that we were looking at. That's how we get information out of that table and you'll notice if that's based on the data itself that's called a business key or natural key to the naturally arises from the data and that's why it's so important to identify because it captures the fact that you understand the business domain by understanding what data can and cannot be duplicated. People also understand everything users understand this stuff cuz right is capturing business

rules, but you can imagine a problem. What if we wanted to change the Rock's name? What were using his name as a way to cross reference all the tables together. So we have to change it in many different places and even if they tell us they're not going to let wrestlers ever change their names. Well, that's not going to stick their eventually going to change their minds and their database is going to be really screwed up. It could be impossible to fix things. If we using the actual data itself to cross reference in other tables. So we get around this and this is what rails does we create what's

called a synthetic key or a surrogate keep surrogate. It stands in for the business Kia stands in for the natural key. And the idea here is that this synthetic or surrogate key is just a made-up value Ryan Reynolds as 1 2 3 4 5 6 has no meaning at all so we can use that to connect Oliver tables and cross-reference them. And then no matter what happens to the actual data will are linkages between tables are safe because those values have no meaning at all. So that's why rails does this and it is a very good practice. You should always do this. There's no real reason not to because these

things happen to data the rules will change. I want to date a model and how it's implemented in the database to be as resilient to that change it possible. So what's add some synthetic keys to solve this problem? So we're going to take wrestler out of finishing moves and replace it with the wrestler ID will take show out of titles and replace it with show ID and put the show 18 wrestler idnr wrestler shows table, right? Every wrestler has an idea every show has an ID. So that should sort things out right and here's a good example of a table has multiple keys. So what's inserting data

into our database satisfies all the keys satisfies all the non functional dependencies, but clearly something is wrong. Right? I mean the rocks and have multiple IDs and ID shouldn't uniquely identify a wrestler. You can say and ID and Plies wrestler, right? So that's a functional dependency that we failed to identify. So wrestler idea Plies wrestler show ID imply show. So now that we understand these functional dependencies we can see we have functional dependencies. They're dependent on part of a key not the entire key. So therefore I databases

for this is why our database allow that we're data to be inserted into it after we added these feels Okay, the hard part is done weed. Now we know how to fix this fixing. This doesn't require nearly as much thought as it did to get to this part will we can fix it? Right? We make two new tables that have as their key wrestler ID and show ID, we move wrestler and show into their respective tables and remove them from the wrestler shows table. And now we're back where we started free of all the anomaly is based on the functional dependencies of we understand probably right. There's no way you

can say otherwise unless we missed a functional dependency. Now, if we throw our other tables back in here to get a whole view of the data model as it stands now, this looks pretty reasonable. This looks like what we might have come up with. It looks kind of you squint similar to this thing as what we had first. But this we can make definitive statements about no one can tell us that it's open to anomalies based on an understanding of the business with no one can tell us this is bad. We can see exactly what this doesn't doesn't do. We can't say anything about this and so that's kind of

the power and taking a little time to write out what the business rules are and use those to drive your database design. Okay, so we're still kind of in theoretical and though so how do you take this and actually use this in your rails at right if you're used to doing been rails G migration or reelz TV migrate what you're doing is manipulating What's called the physical model the actual tables in the actual database when we've been talking about is a logical model and so everything we've talked about your your business partners. Whoever you're working with a should go to

understand that model. They don't necessarily need to understand the physical model, but they should be able to understand a logical one. So that's a bunch of specifics about the database. Right. So I'm the same way as you look at a wireframe as the design and the actual rails views as implementation. Well, well we just did is the design of our database. And so now we need to talk about the implementation, right? That's what what's there was a physical database is three concerns. You want to make sure that your physical database enforces all of the keys

not just the primary key that wields creates for you. You want to enforce all the associations between the tables and you want to do some Association are some enforcement of the types of date of the go into your database. So the first two are relatively straightforward to enforce Keys you make a unique index like this. So just basically says there can be no two rows in the wrestlers table where the name is the same that sells the database to refuse to allow that dated in there. So they sent forces that keep you can enforce a key across multiple feels like this. So that says that no row

should have the same value for R SRD show ID and match in the database again will enforce this for you associations can be enforced using foreign key constraints. Do for whatever reason rails doesn't do this by default and there's really no reason not to do this. So in context hear what this foreign key concept says is it if I insert into the show titles table value where the show doesn't actually exist in the show's table in the database refuse to insert it because that would be a dangling reference. We don't know what it means and also means if I

deleted Michelle didn't show table subtitles in the flooring to it won't let me delete that show because that would make her database inconsistent. So you always want to see want to rely on the database and get out things like this. I need I need to do cuz you've understood them from your model. You just need to type them in your migration file types are a little harder and maybe impossible right? Think about the the allowed values for wrestler named won't be there could be an infinite number, but they're certainly feel wrestler names in there are strings. So, you know, something has to

happen here. Now we talked about know and we'll talk about it again, but we definitely No is not the name of a wrestler may be a wrestler could be named to the string doll, but not know so we can tell database don't let no be inserted. That's pretty easy. You should probably do that on those of your fields. But it gets a little bit more subtle, right? So these three strings these are different strings, but they're all the same name right there. They all say the rock essentially. So do we want to deal with that? What about this? Certainly no wrestlers name is like a bunch of spaces and

maybe they'll be a tag team called a bunch of spaces, but it wouldn't be a bunch of asking space characters. So this is certainly not the name of the wrestler. So you have to think through the trade-offs here. How bad is it How likely is it bad that is going to get into the system in the first place. And if it does how bad is it? If it's very likely very bad. Will you need to do a lot to make sure that doesn't happen, but if it's not likely or maybe not a big deal, maybe you don't need to over-complicate your life example of things you can do. So this tells postgres to make a case

insensitive unique index. So we insert the Rock and then we later entered the Rock in all capitals. This will this will prevent that from happening we could also use a thing called check constraints. That's very very powerful. So this tells postgres before you insert a wrestlers name trim the white space from either end and if the result is the empty string than do not allow that to be inserted soon as we handle our a wrestler whose name is like eight spaces in a row this you can go deep on this if you need to realize validation, of course, you can do a similar thing. This is a little easier

to understand it's way more powerful. But of course rails has to enforce this. Do you have to tell rails to enforce this you can circumvent this easily so this won't actually Prevented and you know uniqueness doesn't exactly work. But again, it's a trade-off right How likely is it. Is it going to happen? Now, what about no, ethanol is still not about you, right? That's the definition but we write Ruby or rails code and nil is value knows a thing that shows up. I can kind of have to deal with it. There's no easy way to get to avoid it. So, let's see, how that

plays. So if we were to continue making our physical database match exactly our logical model we make this finishing moves table that has the name of the move. No false and a reference as a wrestler. That's fine at work. And I'll remember Ricochet who has no finishing move. So he just wouldn't have an entry in the stable at all. But would it look like a narco drug? We can't do this because if wrestler is Ricochet than finishing move is null null pointer exception so we can use this lonely operator thing. You know, you're going to litter your coat with all that kind of sucks. So maybe wrap it

up in some methods going to save yourself from that. I mean, there's really no way to avoid than the Neil here just as no waiting to real way around it. And I that matters right that manner as to how you're actually creating the physical database. So if we don't need to know anything else about a finishing move other than his name, maybe we'll just put it on the wrestler table and not make a separate table and will allow it to be know and we'll be nice and and explain what no means even though it might seem obvious right? That's a trade-off that you might want to make. This will make the code

a lot easier. It's really not making a huge difference to how the database is implemented. And you know, this is logical and physical. So General guidance to summarize what we just talked about Credit Union Dexter all the business Keys. You've gone to the trouble of identifying them to figure out your data model. You should have the database enforce them always use foreign key constraints. I do not know where else I can do this by default. You should totally do this by the father's really no reason not to and every reason to default to not know right now is not a value. So you

should default to it understanding that you might need to make a trade-off where you might allow it like we just saw and if you do not do it knowingly do it with a good reason and write that reason out in a comment. And lastly we were thinking about types, right you have to do that trade-off. And if you're using a powerful database like postgres, you have a ton of tools available to really enforce at the data level. What's the allowed values can be so if that's super important than do that if it's not important, that's fine. Don't do that. Don't over complicate your life. So

in conclusion, I hope all this makes sense. I really did this took a lot of reading to figure out how to how to how to explain so I really hope it's sunk in even a little bit and not to make it even worse. But of course Boyce Codd normal form that we talked about. It's not perfect that your day tomorrow could still be open to other anomalies that are very complicated to explain but nevertheless it exists. Normal form what we just did if you do what we just did always and your dated models are always liked that you were in really really good shape. Like that's that's excellent. So that's a

good thing to aim for when you're editing the model when you have to make changes to the database change the model and see what that change implies to the implementation. You saw what happened when we added those ID Fields, it wasn't obvious at first, but we like actually created a real problem in our database that we were able to get rid of by sticking around in the model before we got into it. And again I said the word business rules to spray. His business was about a thousand times and that's what this is about. This is about You understanding the business rules you understanding what

problem you're trying to solve and how you're solving it. And only when you have an understanding start modeling your data because I remember the thought experiment right the data is more important than your app in almost every case. So treated that way so that is what I got for you. Thank you. Can you prune a We got a couple minutes of Q&A if anyone has any. Are there situations where you don't want to do Boyce Codd normal form. So making a data model to be

the source of truth. I would say probably not a way you might not want to do that is if you have a separate database that you sort of what they call denormalize for a quick access. So you might copy the data into a form that doesn't require so much of joining but you always have that date of somewhere that is in that is that is in your kind of fully normalized. So I have a blog posts and many of thing is on this and I'll try to sum it up. My opinion is that rails validations are for helping users not screw up and database constraints of her keeping the data clean. So sometimes you use

one and not the other sometimes use both, but I definitely don't make the mirror each other cuz that doesn't seem like it make sense to me, but I also know there are some gems you can get that kind of try to keep parody there. Some people do actually like rails to mirror the database constraints. So it's a little bit of personal side, but for me, I don't why am I making a validation if I have the constraints and Talking to right and I'll cut right. But anyway, I got some very some very quick summary of my strong opinions on that what resources to dig a Little Deeper. So the Wikipedia

articles on this stuff are not bad. I I read a book. I have to dig up the name of it now, but it basically explains this in one level more but I think the Wikipedia the Wikipedia articles are actually pretty good and they walking through the stuff with maybe a better example than professional wrestling or depending on your thoughts. But yeah, that's a good place to go and those will have links to some of the literature and other books on it from there. seventeen hansol Alright,

thank you all.

Cackle comments for the website

Buy this talk

Access to the talk “RailsConf 2019 - Database Design for Beginners by David Copeland”
Available
In cart
Free
Free
Free
Free
Free
Free

Access to all the recordings of the event

Get access to all videos “RailsConf 2019”
Available
In cart
Free
Free
Free
Free
Free
Free
Ticket

Interested in topic “IT & Technology”?

You might be interested in videos from this event

September 28, 2018
Moscow
16
166
app store, apps, development, google play, mobile, soft

Similar talks

Glenn Vanderburg
VP of Engineering at First.io
Available
In cart
Free
Free
Free
Free
Free
Free
Matt Duszynski
Senior Software Engineer at Weedmaps
Available
In cart
Free
Free
Free
Free
Free
Free
Samay Sharma
Senior Software Engineering Manager at Microsoft
Available
In cart
Free
Free
Free
Free
Free
Free

Buy this video

Video

Access to the talk “RailsConf 2019 - Database Design for Beginners by David Copeland”
Available
In cart
Free
Free
Free
Free
Free
Free

Conference Cast

With ConferenceCast.tv, you get access to our library of the world's best conference talks.

Conference Cast
575 conferences
23117 speakers
8618 hours of content