Duration 38:28
16+
Play
Video

Building data lakes on Google Cloud

Nitin Motgi
Product Leader at Google
  • Video
  • Table of contents
  • Video
Google Cloud Next 2020
July 14, 2020, Online, San Francisco, CA, USA
Google Cloud Next 2020
Request Q&A
Video
Building data lakes on Google Cloud
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Add to favorites
5.51 K
I like 0
I dislike 0
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
  • Description
  • Transcript
  • Discussion

About speaker

Nitin Motgi
Product Leader at Google

Nitin lead product and engineering initiatives at Cask Data, Inc. Cask Data, Inc. (acquired by Google), provides industries first Unified Integration Platform for building data analytics applications and frameworks called CDAP ( Cask Data Application Platform). At Google, he is leading CDAP/Cloud Data Fusion (CDF) and over Data Ingestion and Integration areas of GCP.Prior to Cask, Nitin spent 5 years at Yahoo!, where he was an working on a large-scale content optimization and personalization engine known externally as C.O.R.E. He was one of the founding members of C.O.R.E at Yahoo! He pioneered the use of HBase in production at Yahoo!, running what was at the time one of the biggest HBase clusters in the world. Under his engineering leadership, Yahoo! Frontpage reached a whooping 300% increase in engagement due to optimization, deep personalization and a substantial increase in incremental revenue. He was instrumental in getting C.O.R.E deployed into different parts of Yahoo! C.O.R.E was featured in the 500th Information Week publication, anniversary 2011 edition, as one of the “20-IT-ideas-worth-stealing".

View the profile

About the talk

Google Cloud provides all the capabilities enterprises need to create and manage data lakes. Customers can use Google Cloud to aggregate their data and efficiently analyze it using cloud-native or open source tools irrespective of where the data is managed. Hear first-hand from an organization that stood up a Google Cloud data lake, to learn how Google is making investments to simplify data lake deployment and management.

Speaker: Nitin Motgi

Watch more:

Google Cloud Next ’20: OnAir → https://goo.gle/next2020

Subscribe to the GCP Channel → https://goo.gle/GCP

#GoogleCloudNext

DA100

product: BigQuery, Cloud Dataproc, Cloud Data Fusion; fullname: Nitin Motgi;

event: Google Cloud Next 2020; re_ty: Publish;

Share

Hello, everyone Annabelle. Come to Berlin dereleek on JC Pizza. Today we're going to be talking about, you build didn't expect any property and analyzing personal approach to put them into one single Central store. Has not work and have not provided good reason. And they have also told us they would like, to have more Eustace driven to your next, right? So, they basically ended up to you and how much of the opposite of operation cost was incurred during the genie

Mini. One of them to begin. Where is coffee? You spend a lot of operational and capital costs and efficiently during all of this information outside of your traditional, also increases the quality of the data in the lake, the quality of the garage. If you been a good deal, make it has to have a fastest, what does data lake is going to provide to you is going to reduce the price of analytics. It's going to basically make it easier for me. Have to do it before.

Annabelle explanations. You're able to get to Market Basket. Are there in building the ideal solution is because you are doing a lot of various different operations within the light by 25%. Some form of an injection, too. now, once you landed in the lake and it just turned up into a small so you have to Organizing the data in them. The next thing that is pretty clean farm in the lake is data transformation transformation. For why is the standard way of transporting, the data from

a Now. Combine, all of these things together, the most important part that you would bring all of the abilities of a little league, is the discount. Is that exist in the data itself is locked and it goes into a dark Vader. And versus Darth Vader, actually leads to creating a swamp rather than a lake that you would like to have And the quality is important for one load of wood that was happening between the Delaware so that they can use your house. So when you bring all of that data and all of the structured and unstructured data

put together a strategy for ensuring the quality of the data as it progresses to the date early. So that has to be tools that have to be integrated in order to have guaranteed apology for the deer. Moving to do you want to live but you don't want a lady that is not accessible for babies from various different poop. Play Systems in the ecosystem role in ensuring that your relation or isolator. List three primary colors on which of the Lakes. Ability to put a

pencil in Walling, populating data from various different sources into the lake. Once you bring the need to be able to destroy, you need to make it easy to find all of the date of that exist in the right. So you won't have to have some kind of an ability to do all of that and so that into a simple tent Libre cost What's the date of the Sadler then has to be in there? So that it's easily discoverable to a search mechanic through the lot of different types of metadata

metadata associated with the dealer heart can be allocated, but it has to be, it has to be turned into things, like phone numbers, things like zip code. These have to be the process in a standard fashion easy to do. You want to have some kind of a code free to set alarm YouTube? The first thing you want to do, you want to be able to access control and also wanted to know how to just do it on the day. But also the application in terms of who basically has permission to raise a child now, there's so many different jobs that different

jobs that are ingesting, the jobs that are standardized, Enterprise based on scheduled to be executed if you would have to have ability to understand all of the different metrics that are there as well as some kind of a basket, like a date of the operation statute is required to enable t. Because ladies a central place where you know and as you all know in the league 40% of the data is actually ready and it's being used by a lot of different parts of the

organization. In order to ensure Security administrator can be used to see an improved access control as well as the security for the day, we have secured the necessary, which is critically important for the lady at Christmas. Today. Is it be? If it is not accessible to the world are the pools outside? People will not find it usable. So what about a lady should be able to expose all of the data that it has the date of expecting. The lake has to be immediately accessible to propriety, as well as the

market partition tickle. What ends up happening is Venus different tools, and he needs from an optimized, it in perspective, and in such a way that it is simply that seamlessly with all of these different tools. And there is like a man that needs to be handling of partitioning of the day. Been at the end of it, you would have to have standard ways of moving data from various different Define, the needs to be a standardized to be, removing that they are. Because

in some cases, you might want to move the data from the final of the Divine Zone directly into the bank. So you having that ability outside you can fit in an end-to-end. Now, there are many uses that can use this late unless you look at the kinds of you say, is that we have seen with our customers and how all of the Jeep. Best time for the simplest of all of the additional elt by the confirmation that happens in the warehouse, they wanted to offload it. So that should provide a lot of pastas and then they have more

horsepower out of my house to run out. What's happening is the lake has to Define is doing or staging area, right? For the difference between the areas that have to be defiant and rebellious different staging areas where the Beatles songs from external systems Transformer. Send it and move it into the database from business. They have from their organization and then they move it into then ending up in to JCP right now. Let's look under the next one which is the latest

news from from a perspective of their science experiment station Adidas at Shoe. Carnival have to be accessible to explore. Do you want to have access? You want to provide access to know if you want to provide access to any of the other machine learning that exists, you want to have access from the spot has to remain accessible to all of different bulls that exist with an open-source. I'm getting added to this is the second time using Tina makes on PCP. The last one is not interesting usage and this is real this is becoming more

popular their customer service using the data warehouse in the house and put it back in. And did you delete numbers that are looking to primarily what we do? What we see with our customer on gcp. What goes into building these three? You skate party city that makes them happy right. Now we have various different services and the services have to be brought together. And this is something that we will be looking in the future. But these are some of the things that you would have to bring together in order to bring this is going to be

easier for you. If you need the Central Storage, you need to have the ability to store. I'll be at the store that are two services that are available. One is Google Cloud Storage and the other is a bit crazy story. Then you should be using one of them would like to give you more information on that. The 43-acre, do you need to be able to have the best security possible for securing all of the data that actually exist? And it really has now added I think we have added like as well as grown up as part of a bit early.

So we can choose how they want everyone to be like to be and what the delete storage is You look at different Services if your customers tend to use up data from here to gcp, direct, they end up using vertical transfer plants. You can order them. Then you have the efficient as another tool that allows you to take me to transfer service that allows you to stop by Depot right now. All of the data that exists. GPS, can you come right now by which the GCF Metro PCS dealer, gets

some kind of looking at all of the different by yourself right now. So you would have to use your favorite to our programming. Language of an ad time like a crawler kit. It's a big pretty makeup store based on I'm at the store. Search up data so that you can create using spark arrestor or high that allows you to organize it in a way that is. Do you have a question and then you have it ready? Which has the pretty engine machine learning techniques that are also approaches fart?

And a little bit more time connecting them together but you would have to have loved, only 45 in cataloging. All you think about which is a new addition to the cloud provides the much-awaited, take me to my registration for the car. What's another word for the end or all of these Bridges different Technologies or services that we have on GPS? What do have advantages and disadvantages for the pork can be used as the bass player for using. Very suggestion to Santa Fe limited to

what are? You could have vendors providing capabilities. No, you said you bring that. You should be the one of the primary destination for this and that's where most of the energy end up happening. Is that the other house offered from Now when you look at life from a bituary as a gingerbread house from the start as soon as you pulled the did, I need to be pretty. It's optimized and Spa doing Alaska, Land, allocation. One of the differences between Victory and Josiah is bigger a badina, has to respect your door, so you can,

you can you can you can have pretty much everything you want us to be stored on a GTI. Let's go a little bit and look at what are the differences between, I think the cops, cops operate. All depends on what you believe. It's been crazy but you know, compared to GPS now has optimized with flowers and all the things he gives you a lot more optimized for this wedding has to be defiant and it has to be taken to find the schematic. Common uses for all of the existing GPS is like any file form any day. It doesn't matter

is used, Lisa and transaction as you can imagine. But also by myself today. And how many users? Let me look at it from my perspective on these two different Services, anyone who can read and write a file can basically. Access panel is business analyst and then sends the data is a little bit structure. And they would have to understand. Wikipedia is pretty much is not at all what I mean. These are these GPS and Bakery, Story becomes the foundation for any delay. Text messaging,

you want to stop GPS or do you want to do. In GPS and Progressive to in GPS and then Lord only the date of it? Ballard design. Look at storage options. Let's look at the catalog and the method adoption options. As you guys already provides essential catalog across gcp indexing. That's what I'm supposed to ask, touch interface, all of this is indicated as part of a government stamp place, within the protective. But you know, the future that might be the nation automatically be provided from a GCS but perspective. But in the meantime you would have to have

a system that analyze all of them. So I'm packing data services, basically using blvd.los parents identify, any of the sensitive data that exists. If there is any validity where you have the ability to control, who has access to Avenida is Born, the metadata associated, with the horses, with him for having ability to provide access to spell boogies for all of the method. Start reading. Some of the this is not from the from the Viewpoint of processing. This is just for making the data accessible to you and making it

easier. Now, the next one is more focusing on. How do you take a nap and get them at the store that then that allows you to make it accessible to open source tools that exist Within These highly available open source meter. All of the technical material that exists and that gets registered with the department store is now accessible to Sparks high as to all of these tools that are popular in open sore as a Dr. Have ready access to the data that exists? Seems like you should please

go take it for a spin and give it a try and let us know if the things that he would like to see in the net and telling the truth. Now, what is the recommended scope for like 25 management allows you to make it even comes up with a cobra's roster. So you can come pick it up and stop mayonnaise experience for managing one or more Metals in your leg. Now that we have talked about the storage and the metadata, we can look at the other components that exist within within the lake for a Cricket phone at Alexis with an

important aspect of it, you should be able to as soon as you bring the date, anyone to do some kind of an exploratory analysis, you should be able to use, you know, tools, like higher. Now, you'll be able to do that using a high spot, right? From right from the concert, Because of the launch update a problem notebook. The Notebook that also integrated with all of his making it easier and fun. Do you have to have a support system? You won't have to manage it all by yourself. It's all matters within

I'm at the store. There's a little bit of a manual process where you would, you would, you would have to do that, man. That's something that in the future. We might be sleep, naked, automatic automatic. But right now you would have to write something. And put it in. What is that? I talked about, right? So you want to have it to spend the day. But seem like enough for the big ready has recently launched their ability to get ready. You can join. You can do all that you can do,

all of the deer that sits on GPS and any other system automatically gets resistor and get represented as an external user of the asylum. Stories. But more practical, to these tools that are basically providing additional service to promote registration of magic. Information. We have talked about when you bring the leg. Yes, you want to have the ability to transform your body. Do you want the ability to do complex Transformations on I have never seen. I think I have never seen a simple job

that's running a bit late. It's a complex network of hundreds of jobs that are actually running in the rain and in many other prizes and all of them manage all of that. You need to OnePlus call uneeda educator on the job. So you can use some of this offering is basically a supporting both be able to take these two things together and orchestrate the music I want to be in the back and that can be done with you. Look at the diffusion as a way of doing Transformations. Burned, underneath uses data prop and provide

A video of Henry have looked at all of these teams integration aspects and nobility to look at 10:20. Play. So why you didn't have your proper way to the advanced transformation capabilities data integration? Like do the Fusion Pro Wired? From what's more, a man, a CTL Solutions with no or low ponytail Python bites. So you need to also have an injection and Justin templates for bringing the baby are, you should be able to, as an example of a service that allows

you to look at. Get on a flight to sort selected destination, be able to just copy the data over not only for the usage of, you should be able to do any Tuesday. Basically we can use as an option you can use the diffusion to do that and get back into the back of the system. Do you need to have a good understanding of how much data is flowing in your lips? So you need to have good lineage Right now it's only in diffusion in the future we will be looking into an

effective. Lineage is available after work. And what and what are the other thing they do is be able to look at the field. Will give you much more information of how many speed of those operations. The best possible deal. What are the different systems we have different in your transformation systems? We have in? How do you do? What are you doing from a dear and efficient business perspective? You would have to do yourself in the future. We are providing all these deeper deeper integration with Vera's Thomas's to make the life of a fart.

Just take all of these integrated set of services and in our goal is to make it as fast as you can, right? Like right now. So, we should be able to reduce that all these two indications, that end up having a trip, come down to a few days during middle age is extremely hard. Now just to be provided to all of the disability services, stupid the best place you can possibly have. As you can also see all of the Innovation that is happening within gcp, the best indicator of parasitism example, Victoria now integrates with Dr.

Jody closing a lot of different. Department store is an opposite service that just got launched that provides the ability to integrate metadata and make it accessible more with open source tools and capabilities that exist in gcp that you can leverage to put in the best way possible. Give it a try. Thank you so much.

Cackle comments for the website

Buy this talk

Access to the talk “Building data lakes on Google Cloud”
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free

Ticket

Get access to all videos “Google Cloud Next 2020”
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Ticket

Similar talks

Kaz Sato
Developer Advocate at Google
+ 1 speaker
Rakesh Talanki
Google Cloud Platform Principal Architect Services at Google
+ 1 speaker
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Lokesh Kris
Principal Architect | West Region | Google cloud at Google
+ 1 speaker
Anshul Kumar
Strategy and Analytics Leader at Deloitte Consulting
+ 1 speaker
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Sandeep Gupta
Product Manager at Google
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free

Buy this video

Video

Access to the talk “Building data lakes on Google Cloud”
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free

Conference Cast

With ConferenceCast.tv, you get access to our library of the world's best conference talks.

Conference Cast
635 conferences
26170 speakers
9693 hours of content