About the talk
PostgreSQL has Foreign Data Wrapper (FDW) functionality and it is a powerful feature to access distributed data from across heterogenous data stores. FDW became writable in PostgreSQL 9.3 and therefore PostgreSQL with FDW has potential to act as a distributed database supporting both distributed reads and writes.
However, one of the biggest missing pieces is transaction management for distributed transactions. Currently, atomicity and consistency of ACID properties are missing which are essential to achieve full ACID supported distributed transactions functionality. A few proposals have been proposed but they are still undergoing discussions.
This presentation talks about the current status of FDW and the problems regarding atomicity and isolation. The talk then moves on to introduce a few proposed solutions.
15+ years of experience doing product research and development in Enterprise class and bootstrapping startup environments. Exposure to and understanding of most components of the product stack: from the kernel layer right upto the web application layer based on sound Computer Science fundamentals. In-depth exposure to open source technologies like PostgreSQL. Along with this a very good understanding of the Open Source eco-system in terms of community interactions, remote location based product development etc. Specialties: Postgres Training, Postgres custom code development, Postgres Database Internals, Postgres Replication/Hot Standby Setup. Distributed System Protocols, Systems Programming, C language proficiency on Linux/Unix based platforms.View the profile
Good afternoon everybody. Let's give a couple of minutes. So yeah, this is the last official presentation of Piquant, India 2020. Elvis forget to get onto the stage stage fright. So, Play album. This is the last official presentation and we have a happening after this. So thanks for the response is much better than yesterday. Okay, so let's get started. so, my thought is The words for lesson two separate transactions with foreign data rappers, that should tell you that it's working progress. It hasn't made
its way into the core, but the idea is to give you a glimpse of the efforts that are happening on that front, right? And, yes, it is a 300 Game Changer, right? And a lot of efforts are happening around that and I would say this effort is the one that is happening from Corpus right now. So let's see what you're trying to do here in Corpus Christi 2006. Somebody was asking me know, which is your. I think I've worked across the breadth of the code which I think is a good thing. Jack off all master of none. I've done one side of
the gym and I tried some stuff on my own, but currently I'm a full-time and a full-time consultant at 2nd Cottage. So I'm sure you know, we have had a lot of talks and I don't need to discuss in depth about this. But what is a VW Beetle rapper? And it's the implementation of the SQL Med, standard standard for Med management of external data supposed has local data and the standard specify, how you can access external data from inside prosperous, prosperous has implemented.
And we call letters f e w. As, with most things posters, everything is extensible. So, even, this is applicable architecture, right? So what that means is you could Implement of $400 for any That can provide a transaction interface, right? So you could have your local police Rescue configure to connect to a remote server on server. If that file exposes the apis off of a $400 /. Are I remember there was a black hole or you put anything into it, it just disappears, right? You can never select back from it so you can come up with your own fdwc. I think there are
a VW Golf car or things like that more and more people are coming up with these kind of rappers nowadays. so, What you do in postgres SV, have the concept of a foreign table, write the local traditional tables that we have in postgres, so there are standards that are for rent tables. And as the name suggests, the foreign tables are present remotely, and whenever a query happens, Comes in from the plant. You only have the metadata for that foreign table present locally in postgres and
then uses that metadata to convert. Very appropriate remote for Android server. Ready to post office. It need not even know whether it's our local table or a table, right? Query into an underlying, which can be understood by the remote server and vegetables from the remote site. Top of a different databases, but mentioning the remote servers. The first implementation that we had a $400 purse was to support select queries, right so you could select Barry Sanders V optimized that operation to try to push down
as much as that part of the query as we can to the remote site, right? So far, example. If you think of it as a naive implementation is you know, that Implement select star from table, right? And get all the results locally and then go on the post processing, right? Select from table, wear with a red cross and only 10 rows from the remote side, meat that back loss. And in this case, if you had millions of rose on the other side, getting those locally,
and then applying the reklaws locally is very, very expensive, right? So, Most of the operations that we are that we want to carry out on foreign tables, do you want to push them down to the other side? Dinosaur example of a very expensive. Great, you have two tables, you have a joint class and you know, you're getting complex date out of that. In this case, if one table is looking, and the other is remote cannot do much other than pushing down the predicate for the remote able
rights. In that case. It makes sense to push that entire John, Kerry to the other side. So I was just done all these kind of optimizations very, very possible. We then added a writable foreign table support. What does ratable mean? You can do inserts updates and deletes under remote objects in the remote service. So that was at 3 and we will continue to make enhancements in the next releases as well. Soak up things very easy with read, right? When you talk about rights, that is very transactions and
processing and no issues in transactions. Come into the picture right now even though, you know, if you want to do transactions, so basic injection is a unit of work, right? You have some logic that you want to do a acidly, right atomicity, consistency, isolation durability. What do you also want concurrency? Right, if we talked about concurrent database access, right? I'm in nobody's happy with just having one plant connecting and then dinner waiting for it to come.
Move. The sofa. Concurrent transactions can cause anomalies and these have been explained in the postgresql dr. Ceccarelli, I just picked it up from there. So the animals you can have a dirty deed, but one transaction is doing it. Lets an update on one road. Another transaction. Concurrently to be an update on the same row, right? When the person that made her change and the other transaction values. So, in this case, If if, if you allow dirty reads, the second update, might be able to see the update on the first one, right? So I don't want to confuse, you
know, regarding all of this. But then the next day, non-repeatable read, right? You did select count start from table in your transaction. That's a 100 rules. There was a concurrent transaction, which deleted Andrews right now. In the same transaction, you do select Down star from table right now. Here you are. The current transaction has committed. So I got some value at View. There's a phantom readers well and you have a v as often as this very trying to order t-shirts
for our females attendees and I could feel right and then print it off at 4, right? The most effective thing that we can support is college civilization. So here we assume that Transactions. If they happen, concurrently or M, serial order in any order, they should have the same effect, right? So, if if I didn't have you on the front table saw, Rose is fine. But if it's happening on the same Row 1 update happening later or earlier, the results will be different, right? So this is
so, Just wanted to put that into her head before moving on. So see and I and acid nbcc with snapshot isolation. So basically the idea is Read readers, don't love each other and write us once that happens because you can create your own version of the road when you make a modification to it. So that readers can continue to see the old version and the right. We can work with it with its own scratchpad, right? And Snapchat, isolation, a Snapchat is the is a view of the database at any given point in
time, right? So, the database is in constant flux, right? You're doing Alliance of concurrent operations on it all the time and it's with the use of a snapshot that you can view the, a consistent view of the database at that given point in time, right? So using that snapshot will tell you whether a row is visible to that transaction or not. Right. So I snapchatted it looks like the example. So here it is a 10 colon 28, spell invention. Gets up a little bit ID transaction ID. And what it says here is that 10 is the X-Men.
20 is the x-max and the, separated list is the list of concurrent transactions. X-Men of a Luton means that all transactions with IDs gloton. 20 minutes, all transactions with a tax ID, 20 and about are invisible to this transaction. And then you have a list of current active transactions that are there in the system. And it's using a wider use of these snapshots that we can come up with the visibility for each transaction, uniquely and consistently. Southern, you know, left
from our very good documentation. The ugliest, Laura's light talked about you know things like non-reputability, Phantom read civilization dirty deed, I have skipped. So was this allows you to run your transaction in various transaction, isolation level, right? So the first you see the transaction isolation levels that are supported in post office. Read committed is not supported as such it is implemented as read committed. And As you go down, the stricter
rules become in terms of visibility and more, the chance of Funplex, coming up due to concurrent taxes, right? Hostess run Cinder read committed isolation level so you can just do it big in that transaction block as using greed committed and that you got, you can get laundry Pitbull breeds. And so on and so forth. So you don't even understand her own, post will instantly know you have this challenge with transactions and visibility in which mode to use and so on and so forth. So thanks to become more, exciting, and crazy
right now because you support rights with a vw's, you have to start thinking about. Transactions, which involved multiple notes. Now, right, it's possible that you are updating data locally on remote server, one remote server and so on and so forth. So, The lbw that we have in postgres, it is it is responsible for implicit transaction Management on the remote notes. As I said, the blind might not even be aware that it is talking to remove tables and it is the job of the fgw to
do the actual transaction management when is being modified on remote locations. The plug-in exposes apis and implements another standard. Begin transaction, rollback, and even save points. So I see your point is a point in a transaction where you save the transaction state. So, that later, if you do any activity, which causes errors, you can choose to go back to save transaction, state state point, and whenever you want to, Computer foreign transactions, are you call this contact? Call
back. Call? So, here are some details of how management. So, We don't like create hundreds of connections to all your remote service at 1 go, right? It's only when you access that a specific remote table from a specific remote server, that is, when you would start a foreign transaction for the first time on that remote note. in terms of a The isolation levels if your local transaction is using serializable. It will also use Eliza Bell on the remote service, this means that you are remote server should be capable enough to support his isolation levels as well,
but all your practical use cases should be covered. Even if you use Veet committed. Locally owned yacht in your transaction. And if it's touching a remote transaction, it if uses a wood table read, You might ask why it's doing that we use repeatable read because we want to ensure that multiple scans within a transaction on your local note sees the same consistent view on the other side. Soap. Let's look at the problems that might come in this distributed environment right off of d, w and supporting writable
transactions. Soap. I guess let me just use a picture. So this is how transactions happen, right? Next to the instance that some work and says commit. When Progressive processing that, it realizes that a remote server board. So it'll also sent a commit one by one remote remote, one. Success success. Go to the remote server, to send a request and get the response. If all of the remote server say okay, then the local server will return at the party stands for coordinating. Do you know what? You know. So
so so coordinated note So you know, when things are working fine, this will work fine. And like 98% of the cases, you know, when everything is up everything is hunky-dory but What happens when this happens, right? So blind said, we started a very successful and commenting on remote one server. But before your motorcycle permit issues, network issues are you know, Subpar submit partition or in the machine. So in this case, you know, it can return in error to the Plant. But now the question is how to recover a situation
where the local note has not committed. There is a remote and remote not one which has committed and remote to Note 7 have an idea of your setup is inconsistent and that is a problem, which we want to avoid, when we are dealing with four and a half hours, So that's the standard solution, if you look at it right now, the standard solution for ensuring auto mechanic, Too Faced with the use of Two-Face commit. So, This to PC, to face, it consists of two phases. The first phase is, what is called
as the preparer commit face. And the second face is coming to prepare fees. So, in essence, what happens if instead of sending a command immediately to the remote site, what we ask them is, are you guys prepare yourself for for, for this commit? And then it gets a response from all the notes saying, okay, we are prepared and only when all of them respond positively, do we tell them to commit to prepare transaction. If any of them says it has issues, then you about the transaction on all the others.
So, So this was, you know, proposed and it was acceptable to the community as a viable solution to implement distributed transactions with fdw. And this is working progress. This has not yet made into the code base, and it's been awhile. And up the patches that are dead in the community day or expose new apis. So, you do an API to get a ID for direct repair operation and API to ask a note and action to a foreign power to prepare the transaction and API 2.
Dammit remotely and so on and so forth. And then we'll do the following steps will prepare for the transactions will get their responses if they're all. Okay. I will commit locally. 115 with the locally will issue a prepared to all the Foreign Service, right? I knew you were going to ask that question so so yeah. But now we have the plant still seems like a bit but the lbw will convert that into an implicit to face come out right cuz this will also support an explicit 222 face. Commit an
application can choose to prepare transaction and the ID and you know then, transaction, right? So, in terms of implementation, this prepare is basically. It does the same activities that are local transaction would do but it saves all the data and everything else but it does not make the effects of those transaction visible. So the transition isn't repaired State and Angela. Unless you tell it to actually do a commit, it will remain in that prepared it and you see a kamet prepare will the effects of
the transaction become visible to everybody else, right? So now what happened to the client, so connect to the cognitive issues to commit, it sends a prepared. And get the response from all the notes only when all the notes say okay does it start work. A commit on the remote sites so if the person was a remote one set, prepare is okay but remote to could not be contacted for some reason. I could not prepare for some reason. Instead of sending a commit to the remote site Center about prepared to the other side.
If a transaction transaction fees. yeah, so so the next scenario is Repair was okay and it was a remote server one but now a remote server to write love. In this case you need. What what is a foreign transaction resolver and the job of that? Resolver is to resolve these kind of transactions remote to note was able to attract right on the coordinator. Query it and say what is the status of this repair transaction on you? Every potential get ID by what you can uniquely
identify the transaction. So when remote server to comes back, we can inquire and say, tell me the status of this is it still prepared? If it is, that you don't need to do anything, if it wasn't prepared State, you can now ask it to that transaction. The point of doing these multiple phases. I'm in the first phase is to make all the servers ready and reading it in the sense. It is persistent Lee ready. So even if it crashes, are comes back, it will still remember that information and then you can
tell it to either about that paper transaction or roll. Back the repair transaction. Yeah, that's what I said. That, whatever it is. The next time it is reachable, you will query the status of the transaction and it will tell you Psych. Look at the steps. I took the step to his commit locally and only then do you send the comment preparer to the other side, right? Prepare is almost comic state, right? So you've committed locally transactions, On yourself. Yeah, I'll be asleep. But I mean, this is a distributed setup, right? So you can write a aggressive
foreign transaction to resolve. The moment it comes up immediately asked me to prepare The Amity Affliction, now, you can see right time in this takes care of all your commits related issues and That's a problem right now. And if you're doing, if you're relying on a distributor transactions for your work, then you should have ample monitoring and ample no reports to tell you that. Yes, if you're protective of your blind at 4 to bring the production by bring the paper, right? And once it is prepared, it's up to you to decide whether you want to commit that to make the text
messages or about that. Yeah, exactly. Yeah, I mean there's a problem with the transactions, you have to, you have visa to deal with multiple notes and depending on at what point productivity happens. It happens one by one. Right. Even if you do it badly, there is no, it's not going to happen instantaneously at the same time. North. I mean, what if what if what if what if steps refills, what is the commute to the local climate crisis? What do you do then?
I know, right? I'm in a state state step to face locally then when the Machinery starts, April 2nd and about prepare to the other side because it has not been able to connect. and when it comes back, the resolver on that note will instead of sending a complete report about to remote one and remote to So we have to interact to make it across the note. So you cannot change the sequencing. But have you already committed on the remote one? So how can you roll it back? It's already been committed, right?
Yes, they're there are no guarantees in this process and if it is a prepared on all the notes successfully, you can come on. All the other remote notes. If it has been committed locally and it has been committed on OneNote, then you should come it on all the notes which come back, right? You can have the flexibility to do. What about on the news. That is what that is, what the foreign transaction result has a dealing with distributed setups. Yeah. Obviously, you know, this hostile forces right on your pastor, survive
crashes, in the body starts, and all of that, right? So all of this information about foreign transactions is recorded in World Records, right? So if when when the summer comes back has the information as to what transaction was ongoing in what and what state is it can be. And that's what that's how we can get the idea of the prepare transaction. And that's how using that it can carry that. I'm outside saying, what is the status of this prepare transaction on you? That's the idea of preparing to make the system prepared for the computer and once all of them are
prepared, you can complete the transaction even if there are issues and even if one, no takes time to come back and so on and so forth. All of this information. So you can basically redo the standard way of dealing with the restarts. And As I said, the transaction resolved querido, state of the transactions and decide what is to be done. So, I took Izzy solve the problem. It is mentioned here. The logic as I said, If you're prepared everywhere, if you committed locally, then
you can come it everywhere. If you prepare that, you could not commit locally when you restart you should do in about everywhere. That was about Atomic dust it, easy? Right? Let's move on to the other problems, right? And I know the client declined, it would be totally unaware. That it was dealing with for rent tables, and it expects the same behavior that it expects from a local stand-alone post a server, right? But unfortunately, if you're dealing with multiple notes than there are anomalies again, right
now. So, you have two plants plant, A and B connecting to your local instance. You, you, you will commit right solution, level locally with every Wu's, reputable read on the remote site. so, Local plant begins a transaction and it does a select account from remote able to answer. No. Concurrently local plant B. Starts up and deletes 10 rows from the remote able. Not because the client is using great committed. he would expect that the, at the step, number 6, when it runs the count star, it expects that its roots in 98 will see how,
Dan from the remote site. We start the transaction in the remote site between reputable read mode isolation. So this is an inconsistent read example. Let's move on to another example where you have, so even with OneNote Even with one more, you can get an inconsistent read issue. And there are two tables from remote server 1 and 2, and each has $100 right now to start a local repeatable, transaction. And it does a select concert from Table on remote one server so
that will create a reputable connection transaction on the remote server. One Plan B Against attributable read. And it doesn't delete from server to. Now. Understandable. A server, if you run a counter and table to Stix, menu to select counsel from table on remote to server, other side only when you connect only then you start a transaction and even though it isn't because it was You will see the effect of the delete on inside your first transaction, right? So, it's very difficult when you're dealing
with multiple notes. What is the solution for the site earlier? And then I said, the solution was too PC for Atomic, tomorrow. Let's see if there's a solution for Atomic, visibility across multiple notes. One way is to provide us, Global contestants snapshots, right? And the snapshot could actually be regular snapshots, right? So, if you remember the snap For Today, Show You 10 colon 20 and the list of active rprx ID. You could use a similar to give you a global list across all the notes in the system, right?
Are you could use like a global command sequence. Number, there are some implementation switch used at the timestamp, right. Look at some of the solutions about batteries and what kind of approaches that use. Posatex otic solution and it has g t m. Global transaction manager, has a dedicated note and all the notes before the car started X action. Disconnect the gym and ask for snapshot. Could you give me knows what all transactions are active across. All the notes and returns as a
global snapshot. Comprising of all those values. Seminole, there's a Google product College percolator and it has an oracle increasing order. And using that timestamp read and write up reasons can happen. What do you think somebody will quickly see, right. I'm in service. So xl5 to solve that by using, standby for GTM GTM stand by. And then if the G team went away, right? So different in a different fashion. The problem with this is not just a single point of failure,
but even the, if you want to scale a lot. And if you have a lot of connections coming in than even, the GPM is not able to scale, you know, because it's one entity and everybody said connecting to it saying, give me Snapchat, give me snapshot right? So, even if it can also be a bottleneck, There's a recent paper by Microsoft, it is about clock SI which is like you know Snapchat as I a solution using loseley synchronize plugs, right? So here the idea is to use the time as the command sequence number, right?
And We now have you heard of empty. Network time protocol so you could have all your notes synchronize using ntp. So that everybody sees the same of roughly the time, there's going to be some clocks to buy. Orange, should be pretty good and using that. Reply. I'm in a complicated paper but roughly a note can use its local time as a snapshot, right? And using turn off the other note to return data. And the semantic that allow is that if you get a Snapchat because I heard of the local time, you should simply wait for the time before you returned it to
you of each row. is based on the oh, time stamp of the transaction, which added that row or modified that row, right? As long as your commute time stamp data is less than the block as I snapshot, it should be visible to that query. So the good thing about this is, you know, it is like no losses synchronize item in there is no p.m. to go to, or there's no complicated. So it's it performs better, it feels better as well. for that bwsl via recommending, the use of Something like this wreck of a timestamp Easter mvcc. Check
the timestamp. So was basically already supports of coming time stand, right? So when when you unable that every time it happens, you store the Comets, timestamp protect right? And then using that, you should be able to answer your visibility. So you're having this has been proposed, but this could be actually useful in warehousing type of car. Is that where you don't like to know if you're okay with some amount of still late, I write some things like that I'm in its implementation could actually be used as a serious snapshot
implementation as well. So I did not end by saying that you know how to make visibility problem has been solved, right? We have just mentioned that you know this is a very promising way to look at visibility of data. So yeah that's right now, it's complicated so that's why it's taking time but the efforts are ongoing in the community. We will keep you posted as to what happens on that front, right? And I would like to acknowledge masahiko sawada. He's my colleague. From Tokyo. You couldn't come here.
So, he's the one who's been working on this. On the post office mailing list Thank you. Thanks for listening. On time, 45 minutes. Exactly block Snapchat. I guess they were little questions during the talk. So, Okay. So what will happen is, if there is a clause that is part of the joint, we were tied down to each of the remote note that you get the least number of data and then our local you have to answer back to the user. So you can do it by bringing localita locally.
Buy this talk
Buy this video
Our other topics
With ConferenceCast.tv, you get access to our library of the world's best conference talks.