Duration 30:20
16+
Play
Video

PGConf India 2020 - Real time data streaming in PostgreSQL - Kaushik Iyer - Endurance

Kaushik Iyer
Software Developer at Newfold Digital
  • Video
  • Table of contents
  • Video
PGConf India, 2020
February 28, 2020, Bengaluru, India
PGConf India, 2020
Request Q&A
Video
PGConf India 2020 - Real time data streaming in PostgreSQL - Kaushik Iyer - Endurance
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Add to favorites
189
I like 0
I dislike 0
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
  • Description
  • Transcript
  • Discussion

About speaker

Kaushik Iyer
Software Developer at Newfold Digital

I am a Software Engineer in Bangalore and am with the Platform team of Endurance International Group Pvt Ltd, APAC. A life hacker by nature and a database enthusiast by interest. Am currently in pursuit of exploring different data stores and understanding the art of persistence.

View the profile

About the talk

Real time data streaming of any Data Modification event on the database can prove to be instrumental as a form of derived event sourcing to update heterogenous and non slave data stores. PostgreSQL out of the box provides functionalities and settings which can enable the generation of such events. When combined with services like Debezium which can monitor and record the events we get a pipeline which is popularly termed as Change Data Capture(CDC).In addition to this, CDC has numerous applications in a microservice environment ranging from Cache Invalidation to maintaining Data Integrity where there can be multiple processes that update the database. A detailed setup of such a CDC pipeline elucidates that it is very easy to build, maintain and monitor it. These features are being augmented to dissolve the lag that is caused by having a pull based mechanism to update an Elasticsearch index that is frequently queried by the search service.


Evaluation on real world data shows that if you consider two pipelines:


a push based mechanism based on CDC principles.

a pull based mechanism which can be achieved by running periodic queries to fetch the recently updated queries.

The CDC variant places an order of magnitude less load on the database, and can operate in a more reactive manner by waiting for the logical information to be written into the WAL file rather than the pull based mechanism which can be prone to have multiple no hits among frequent polls. Additionally different modes and configuration of the Debezium service are explained along with its impact on the entire pipeline.

Share

Swimming pools with people. So, before we actually have a small introduction about myself, my name is I am a software developer in endurance, International Group, very umbrella, company off of hosting and domain Services. We have, you know, that the companies like big rock bottom boxes in the hospital and I personally walk with the accounting and we are a family and our environment and majority. You know, I've been doing so well past couple of decades actually. And so, but the Pervert from the past couple of years, some of the channels that we have is to provide better. User experience has

an integration Services go, well, along with the plateau message and for that, you get a lot of business and turkeys cases where we need to take data points, which are there on a platform, puzzle, SQL database. I'm not texting me to close in tires in a system of services that are trying to build upon. So today I want to see example, where, where we try to optimize one particular product request of and business use case across all our entire platform and you know how we and a nice Russian obviously and then and how it actually thanked

and other. And then I went back to the drawing board will look at my questioning techniques and then we actually came up with more improved Pipeline and how it spelled, it was the latter half lie, I've been discussing certain metrics and certain certain certain system properties in system performance, which were unique and virtualized inferred. It was so I'll Be Sonic Abdul greatest for experimenting with, so yeah. Oh, okay. So, hot and most important product, which which, which via the black and wanting proof is to provide

such. We want to provide a title search features to all our customers, because we have customers who buy domain hosting and, you know, a lot of products that are associated with one single dummy and, you know, a person point of time in there for specific plans, actually to divide them into by hosting, you're mostly saturnino. You are you go, you're probably come in like 3 months later when he wanted a new and after picture of a professional educator into a lot of our customers. So you're going to want you going to rely on this

Thursday, but most prominent everyone to know that search, you know, we weren't right Intex, pool, text wild-card and probably even fuzzy because you might not want to show the closest to me. So what we thought were Albert and so we had a couple years ago and well as we want direct all the goats related result was the last six. Happy birthday small service around us but as I mentioned on Conners and alligators and positive people in the other corner and we got to get that data from there to here. So we don't spend

much time on this cuz I had a lot of else to do so we can with the nice that is on every now and then it pulls the net effect of data points that were changed during a certain point of time. And then basically is not available, we don't want to repeat the Reeds to introduce a letter of complaint where the gym is actually posted. So it's actually a taco connector runs on carpet. Who's the president of from post office, dumpster onto the topics and then we have Apache nifi with simple material and dumpster onto elasticsearch and we thought this was going

to be affected application of data use the song for all different sorts of persistence, has in equal and persuasive being sequel. We can try vice versa and what not, but it wasn't for the kids and we Face a lot of this large drawbacks. The most important being that when we have an active mechanism that was required, was quite cumbersome to a successful in life is empty, has a cool Scar Tissue, so we got to actually like the data and but make sure that there are no, you

know, in Sesser of this happening at that point of time, or even if there even is that happening, you know, it's going to be visible Universe cash. Probably like around the next cycle of the fine-tuning locastro. Like what, how what is adoration at which the daily PC connect Alexa to continuously pool and Lyft at net effect of state of change was not a feeling better. So we started with like Note 2 minutes. I'm currently at 12. So I'm running my alarm, 58 to 12 and

let you know. 58 Olympic knife to make sure that they are connected committed and But it wasn't so the case cuz a systematic search is so like when they committed the last updated time, for some of the roads are well beyond this particular whatever time that I'm into that yardstick and well after certain like if you had a t and we're going to lift information from the post, a sequel database errors, committed at T-minus five minutes actually. But again this is

not set in all cases. So when there are like I said Meet me so happened that some of the information made some of their two points. Make it mr. Come much later than 10 minutes or even when I like letter that are caca is slowing down and there's back pressure. So it's nice at least 5 minutes that's the cracks or accidents that list and also so we already have an existing user experience. So what's happening in the customer looks of the existing user experience is like letter

from us and it. But since the message hasn't traveled, all the way to the church. When he goes to a new experience, an axess TV shows, the order is active and he's able to search it and it comes up as an active order. So this Earth inconsistent service keeps them off. Like, we're not actually enabling a better experience for them, but actually causing the mold confusion and Like a like a like an add-on. You could say, so we are using a document store and we're sort of like jumping relational data as into it. And I like I

like the concept of Cairn Terrier and a success. And so technically you had to like assemble the body normalize the data from the relational database and double-cross to Classic percent, right? And we weren't doing so on a service and we have to do the Giants. And what lot of competition as soon as how to reminder think that a lot, of course, you aren't using that a success to the max potential of its limits. Play me, I have to go back to the drawing board class, 5 minutes, at least it is not accepted in any other experiences of what novel starts swimming techniques, you could say.

And so one of the first one that we saw with you and Susie, so, you know, we decided that effectively we cannot pull data if we're not able to effectively the same constraints. Okay. So we had to push the event and then make the other system systems react to change do events or something in this liquor is simpler. What happens is that when a business product happens on an application which created the main event and reports that the main event into something called a

transaction log or Journal actually so that we haven't talked. But the main issue is that our business Is a multimodal monoliths of terrorism orders which are more new. And then stand there like some some some some person who sings indicated their new so probably like they write their tables in Excel to give me example of a safe to order is Providence and Eliza to pending. And once that is done, we actually said the tractor and obviously, there may be some other cord, which

directly put an active state. So we cannot sit and manually shift through each. And every single closest, Florida to design and streamline, all the events that have to come into the transaction log. No. But that is one of the key in this entire pipeline that is one of the key key key portions. We don't have a database of sorry to hear that, but that is a source of Truth and that is the advantage of to get events Austin where we don't store next date. And whenever you want to go pay the next day to take a window via play all those events. And the next day that we get

that is what is effective birthday. Obviously, in the sense that you can always replace them whenever you want, you know, he stays in history, has been made in, this can be repurposed as they audit log of sorts but we need a Persistence of sorcery, need a persistent state to be a source of truth. And that's where We come to change data capture. So, a lot of the majority of the pipeline is almost the same. There's one key component that is dead and change data capture that is that we do we have an application to not push the domain

events into the transaction log. We can make the transaction to the database and the change events are emitted from this product from the different there. A database log file, when I get to my security pin lock and in the case of automatic will be the Warlock and those changes, you know, we captured and reprocess it. So this too focused on on just let me know a couple of services is the domain in the event sourcing example. When we say that we are not able to push the sanitized events

into the Q V, how to add logic into our consumers, no one wants to do that but wish your body can do is since we are committing into a poster. SQL database. Let's say that since you already have us do so you can take the events you can sanitize it and we can replay them back onto the database or even so we can figure out that we can print it out the event. We can get some of the events and put it onto another Downstream cuz you my beautiful just because we know that we have a database as a source

of Truth. So the majority of application that are relying on the database or not been affected. So in that way, this one up the events or sing a special NACA scenario. Not 100% sure that water flows will hit because if in case we miss out some particular event will be remind me short on information so you know the latest I would probably put it better than missing data that's more and also in the event sourcing events or system smack middle of the queue. You have to thank you. That's all

I really want to deal with this kind of pain points. What is an effective CDC pipeline? Look like we have a database and then we have that we have sort of like a gap to change process. Basically, whenever the database and resources changes, there's a system which capture the changes and stored in the memory buffer with a certain point of time. And then and then flushing it on to the transaction log base on and you can configure this how you want to do and

it's in the memory before you can apply sudden transformation will be looking at it. And I didn't want it in the lock store. You can have multiple Country Missouri from it. And the logic is most simple effectively what changes are captured as it's like the live event sourcing to redesign the events from the next date and on the same thing like everything now and then the wind. Stay. Set up events that occurred, they would merge together to form of state and then we're taking the next date, but

over there. But if you take the same, when do the two minutes, when do whatever we have, we would get the next date. But since the events of being triggered from the deep end and the events are you get flushed on to the capture process? I'm leaving the transaction has committed. Be assured. We are 100% sure that we will not miss any of the events that's Walker. So I think small comparison between cities. Cinnamon, toast news, has a mansion series. Is also a form of not losing a lot from the event

sourcing paradigms. But you know, we need me to get the flexibility of having a constant state along with the events that contributed to it and or and are attributes to a flexibility. So, we've entered the city. We wanted to build with, so then we decided that we want to build a pipeline. So having a cavity pipeline in your company of sauce is, what is a friends list of us any software? I think Netflix is Delta is one of the most recent ones and they have done it and they open source of the project as well. So you guys can take a look at it. Some interesting Concepts that the other day, but we

are open source. And that has a large community back interested. So we went this division is that in this particular CDC component division, sort of states in the is it has a guy that has like both into has a cap to change process and it has an internal memory buffer and it has the sanitation message transformation fetus masking and all and it has ability and since its 6 and 6:12 with the caca ecosystem that we currently have and so we could get at the complete package Hot. So

when the celebration is built upon what I was trying to say that we get into a cockatoo is like it's pretty detailed. It's sort of like this, you have like a rotten rookie in standard stuff, it has the day before. Like what does a row before the operation that was occurred. What is the value of the road after that? Place is a cut and a plethora of metadata. You can use two routers metadata just in case, you are trying to upgrade the division version in between urostomy process, the Visa my smart enough to understand that and you know unless he rules over

to the newer version of division if you are trying to upgrade and you'll not do not lose any event during this process and get that certain CDC techniques before where you would require you to add a political column assaults on your table. So that you could maintain information as to whether when this event was, when does data entry was captured on move to the transaction log out with Emma today. But Cardinal division does not require any of this when you install You can use it out of the box with your particular database of choice and

it has a robot Snapchat mechanisms. Like it's got a letter to name a few. It's you can choose to take an initial snapshot of your entire database into the Q and you can use the opposed us. Equals exported exporter in the Snapchat mechanism, that it does not lock onto the entire table of chili. And even if nuances are coming once the Snapchat McKenna's over, the inserts are also rolled over into your transaction log. And another mention, it's got some, it's got some good building features and masking. I will be explaining some of it, like one of

the most common message transformation techniques that division provides. And also since it's a Java connectors, we can use DMX monetary and we can we ship that to MPV like not exactly something like And we can constantly running p.m. New Relic and we can constantly monitor with some of the key stats like what it said process of turning how much memory is consuming and we can go in the activity increase in the area and sort of like a scale of hot sauces now obviously. But I know you guys have been talking about

transaction log but in case you already have a non Kafka you want to have a non-capital by playing like you want to experiment with the new Apache police or something like that. Division comes in an embedded variant also. So you can a small producers. Like whatever whatever became of the tin with Cub Cadet transaxle love you can turn on mirror. The same with the division better job and work the same pretty much like white division meant with caca is because it fits well with the entire ecosystem. Consult no command blocks

for coming to our pipeline, the pipeline that we have connected via currently deployed and we and we can only tested out the main things like the brain, the neuron of the entire process pozo-seco, all of this magic starts at 4. So secret level in the poster sequel, has to admit the changes and the address of the pipeline is basically the acting towards the whole change. Thought, the key changes that we had to do to oppose the sequel is a logical application so far as a small introduction to a logical application. Is basically, instead of

writing the information into a banana instead of application the data from maybe impossible possible, as long as the buyers are buying the wall records, you actually send that human a double information from a publisher and consumers, consume their simple Pop's Top Model and it gets played on Bye. Bye to consume. So you have all seen this and what receivers, and both engines are able to image search information in a human readable manner. So how would actually, so, how logical application actually maintains this information?

As to the events have been transferred across from one, from the matter, to the slave is my lock sequence numbers, and applications for purchases, where do before it? Like, before when the publisher and then out to the sender. So what we can see what happens is basically, no. Let's but it's not a replication slots. You could say, I like both of us to converted into human form. That you need something text to convert into the humidity in Lake. Norman, Livingston and Jason are pretty

Jason so far that we have logically quarters. So we just ride with couple of logical Dakota's. What is Jason? And then we switched over to the experiment picture of the Native. Native it comes with animation about 10. 10 10. 10 + ha. I know you don't need actually video to compete in with posters equal 10 so you don't need to install a external recorder. Yes, it's goes really well. Actually I can show you the grass today and sometime so we deploy the division connected to calculus and the connector consultation looks something like this is basically. So

most of them are pretty pretty descriptive and their present on the divisions of documentation of some of the most important. Like you could say the dragons that you can remember is so they eat. Like so if you don't want to like seeing changes of your entire database particular tables, you can wait list them. You can even Blacklist certain your tables and also some of the most interesting things are that need to be Seventeen next week for let's say, you have a cable called cellavision under the scheme, a public. So what so how did we can actually put into the copter topically? Basically, the

topic name that gets Constructor Your database server name and testing reseller to public. Reseller. Cassian. And huh? Subnautica plane looks like this. Musically is supposed to put them. It's the events through walls drywall, sander. And then it goes to Pedro, to the Walter Jason's decoder division caps is, it isn't supposed in the in the memory that cover Connie provides. And then wanted just wanted these the entire transaction it flushes into Kafka and then we have a sting connector and Kafka with start listen to it and then dumpster onto elasticsearch

know we can customize the same portion. You can, you know, we can ride around consumer and you can use the copy and paste equal to actually perform dying screams in which case equal in case things you can do in a steam table Giants. So they performed at the normal age, less than that. I do not listen to Etta. Yes. I'm so sorry. Can you speak Latin? Yes. Yes, yes. Yes. so we probably have like, I mean like, Corrigan with something like I can do both of the stuff in one go. So why don't you look for that? Because what is happening, you have the Hops and each other as having a contest

which which created performance issues and unmanageability in the ski Liberty down the road. So you don't have a clear need just for the CDC to hi. Brittany. CDC, plus CDL. So you need to look for a tool to smoke in 1. O clock. What I've already done that we have in our car to the go down through and it explains also and then use for leg running models as well. And then they provide the capabilities. It's a different thing right now to solve hybrid need. I think I could be a different way to do it

for like business tration in house on orange. But if it, if it so happened, I can you call intestine in the Southlake satisfy, our requirements for now. So we do not have to, like we did look at options like these data which calendars do they provide look into an election, not listen to any of the other, but we do not require that much is incumbents. So we don't know if we're at that stage yet. Very, very, very, very thick, this concern and ship it to them and we just worried about consuming interest using it. So this is this is early voting for this attitude, that's how I can let her know so

hot. But if it so happens and yes, we do have to take your time in mind. Sandestin. That you are a useless with a disease that you want to capture the updates that I ought to be able to write. Write add logs, basically, logical, logical, checkpointing mechanism. Basically, instead of admitting she was so that later at any point of time, you can just No reason. I should look at the the whole check my name person because like so want to make his reason is like and it comes out of a

slave to sing with specific. Our division can't receive received the main Focus RS SS. Like you know, you're not like so hell-bent on like the Ops III Ops, then hobby would have to like look at solution. Yes. Yes. Yes, yes. Yes. We have like so like a traveling to react in changes. So I guess I'll just give her a couple of it is so so. So what exactly did was the weight of a strand of clouds and progressively like so loaded it? And we try to Legacy in the time like Daddy to take to a new company in, just the

countries that we have on the DB be. So if you see that for one at around 1 million people to like, you know, it's done in like 42 seconds. So that smiles and Bounds ahead of what we're facing initially. And so, I think I mentioned this before, so we didn't do the same comparison between different logical Dakotas. So if you see as the load progress and increases at a 3.5 million, please, let you know. Good, 20% Improvement in PCR products. Contact. Walter Jason assistant that comes out of the box. It's you can lick.

It just runs. And so we also wanted to know. Want to Lake, conclusive proof as well as CDC FX application loads. So if you see me, run that a progressive Lorde with, like one user visit, we wrote like a simple Titans. Go to do random, hundred K, 1 million, 2 million out of the TV and then we simulator with tents at connections or almost. So that means that the Lord that CTC, I mean the event, you know, it's sort of an independent of the TV connections. Could I pick it? Does not take any hit. Hot sister, like summarize of water vapor to achieve

milliseconds and customers are happy. And we have a lot of a lot of the system over there and also a high pain tolerance or like give me everything you don't like your dancing consumers. Don't actually see any of the issues, a search. And so to actually like to access on this proposal or setup VR Vue II e-410 Sinatra and Vivien. Leigh Karl, production-grade. Loads on it. So if you guys are thinking about setting up a piece of paper and it is not as costly as one may think. Thank you guys.

Thank you so much.

Cackle comments for the website

Buy this talk

Access to the talk “PGConf India 2020 - Real time data streaming in PostgreSQL - Kaushik Iyer - Endurance”
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free

Standart

Get access to all videos “PGConf India, 2020”
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Ticket

Interested in topic “IT & Technology”?

You might be interested in videos from this event

September 28, 2018
Moscow
16
177
app store, apps, development, google play, mobile, soft

Similar talks

Asim Rama Praveen
Greenplum Engineering at Pivotal Software, Inc.
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Nikhil Sontakke
Contributor at PostgreSQL and Postgres-XL Global Development Group
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Denis Mekhanikov
Software Engineer at Facebook
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free

Buy this video

Video

Access to the talk “PGConf India 2020 - Real time data streaming in PostgreSQL - Kaushik Iyer - Endurance”
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free

Conference Cast

With ConferenceCast.tv, you get access to our library of the world's best conference talks.

Conference Cast
735 conferences
30224 speakers
11293 hours of content