Specialities:Product Management, Artificial Intelligence, Natural Language, AI solutions, Cloud, Building v1.0 products, business lines & teamsView the profile
Product manager for Google AutoML Natural Language, Natural Language API and Document AI Experience in leading development team to build disturbed large-scale cloud infrastructure Lead inventor of cloud related U.S. patents Architect design on high availability, zero downView the profile
About the talk
Most business transactions begin, involve, or end with a document. All industries face similar challenges as they seek to extract information from documents—it can be costly, time consuming, and prone to errors with manual data entry. Learn how to use machine learning to organize, process, and extract data within documents. Also, learn about some examples of how various customers have found success using Document AI.
Sudheera Vanguri, a product manager in Google Cloud AI, highlights new Document AI capabilities. Key product themes of this session include:
1. Walkthrough of Document AI building blocks in general availability
2. Showcase new UI
3. Highlight specialized Document AI models pre-trained for invoice and healthcare doc processing
4. Showcase customer examples and live demos
Speakers: Sudheera Vanguri, Lewis Liu
Google Cloud Next ’20: OnAir → https://goo.gle/next2020
Subscribe to the GCP Channel → https://goo.gle/GCP
product: Cloud Document AI API, Cloud AutoML; fullname: Sudheera Vanguri, Lewis Liu;
event: Google Cloud Next 2020; re_ty: Publish;
What does the dark mini? I sessions by products with the challenges and Market. Give me all business processes today begin include or end with the document. What companies are sitting on the document, go in my PDS emails, customer feedback, patterns contract, technical documents is our files. These documents are only going to grow with time. It's like we're only scratching the surface. A lot of these documents are unstructured, which can be very time-consuming to process. What is structured and unstructured? 20%. Enterprise data is
structured Now by structured data generated machine-generated data. A lot of us out there can be used to process structure. 80% of Enterprise data data is in the form of emails, and out. There are still very nascent Document processing has become increasingly complex with time. Crosses that involve large volumes of unstructured content are increasing day-by-day Princeton the diversity of income and irresponsible variety of doctors and former the average mortgage application over 16 major dolphins.
Why is the diversity increasing in this diversity? Can be increasing because of government regulations. And as the title business relationships, and pretending to be respectful documents increase the time. External knowledge can also be associated with a document in its evaluation process. Is the official, Nebraska driver's license format or not. Cross-linking, entities within Raiders incoming data stream. Gender is Major new value of Darkness within business.
Processing. These documents is quite challenging. Because of the Venus documents cost of a lord of God, data and lost Revenue. That's where doc media open. Document. Businesses to unlock inside from your documents with machine learning processing invoices or videos mortgage processing them as well as classified information from instructor. Google Cloud customers where they are. Within document CI, building blocks have a suite of products under Jen. Bigender document. I'm rapping just generally. Apply to your content. Custom documents
to identify domain specific content June to your own specific training media. with specialized documents you can use, Google's Journey models to get out of the box extractions applications With General document, bi we have, we have our friend OTR and hydrating obravia that are already in GA and Lisa. Both would you hundred languages without printing pravin beat up? And this enables you to identify special content? Special content should Farms. Document d. I last year we went out GA
with how do I mend a classification and enable you to provide your own training data, and Kuhn domain, specific content, extraction, classification, apply to your own business. We're also introducing custom Bombardier is the ice in Alpha that can enable you to extract content provided the single blank form template. With our specialized document, the ice with our goal is to catalog and Builder. These document models are highly accurate, I need you to classify and identify specific. You can also
provide data validation and Knowledge Graph validation against Some examples of a special models are available. Industries, such as Retail, Services Healthcare media, and entertainment industrial. You couldn't have that much for instance, in retail products for customers. Customers are using. Woodland, Healthcare Clinic. Media and entertainment companies are using document to comments and feedback systems. With an industrial use cases. Media is used to analyze High technical manuals
inside assessment documents and in some cases analyzed documents. One company that is reimagine that off Main strategy by leveraging document AI is industry-leading provider, mr. Grouper. Challenge to manage many domain specific Market. Segment, eyes, and a classified over a hundred million gauges, over 131 flavors and over 92% accuracy in plastic. Using. Many eyes. They trust the Death Mountain can be accurately classified. Another company that has leveraged
DocuSign document using are dolphins and I've been able to build and accurate system to autofill some of the fields that are highlighted in yellow on the screen that has been ordered by using AI. Documentary. I can also be of life, insurance changes cases. A large USPS insurance. Provider has been able to leverage document AI to achieve over 74% reduction in playing processing. An insurance settlement time has been over 81% another. USB financial services company has been able to achieve 80% accuracy using document yard to analyze over 350 documents per minute.
With that, we are very excited to release some new updates to dominate today. Watch packages haven't started any been a very complex and document intensive voucher in our economy for documents yelling why Marcus packages are very high-volume. There's a lot of content validation requirements on this mortgage packages and the same one over and over again and sometimes multiple business units within the same financial institution, process incriminating documents multiple times. Once the tracking is
converted into structured data, it can be signed into multiple Downstream. Analytical an operation. We are Google have identified partners and customers types that fit into boxes into various high-value use cases. One of our first Focus areas is the income and asset document bucket. I'm excited to introduce the lending document a eyes Edition. This is a bundle of specialized models focused on document dice used in mortgage lending. Using lending document. A, I alone package can be converted into stock.
I'm also excited to use a second document, a ice Edition, one of the highest volume. A group of horses. Processing. Loan applications for the paycheck Protection Program. The document d i v e r a v. I n don't documents submitted by Africans. Here are some of the key document I supported by a TV. Commercial lending, document Document is also used to bring passport documents. With that, I'm going to hand over to my colleague, with you, to talk about something, always be used under the hood. Pictures of Ciara. Now
let's see. We will try to parsigian twice the image. You saw on the lot. You could see Marco addresses different dates in many other details in the tabular form at We will make this even harder. It is a camera captured image of a W-2 form. It is wine-stained and images. How do you screwed? And a lot of paper phone to the invoice passer within seconds. You can see the extracted Anthony nicely map to a while Define schema. This is very powerful because you don't have to maintain any rule-based map in later or any post-processing Scripts. And this also make value
normalization to be possible. In the same. You are, you could also inspect and review the key below pair and table, extraction results, these outputs are handy and useful. If you need to apply additional processing or data mining, How many scenes does on the you? I now let's run our W-2 form through the command line. This is how production system in a great was documenting II. This is a highly distorted a W-2 form. We saw earlier. What we going to do is
we send this image through our Landing dock at one point. And load the output into a big red table as a data sync. We speak Curry. You could perform more advanced do that another day or just like us simply look at the data in a tabular form at If we compare the result side by side, we actually extract and values from the form of wisp high, accuracy and confidence. Let's take a look at a few of them. Assassin, sisters, social security number, all nine digits match the document, even though hyphens are in the original Feud.
Employer identification. Number is also correct. And then more challenging case, the employer name and address on growing number 6. The entire cell also extracted 100%, correct. Getting data extracted from these two documents is no easy task. I let me show you one of the state of Arts model. We have built a Google I like classic ear is often presented in the document has multiple pages. Intersection of two dimensional layout of the tax on the page is the key to understanding such documents. Early this
year we publish a new state-of-the-art research paper on extracting data from tabetic documents. This is combined with many things we learned from. If you're interested in learning more on this topic, I can't. I reach you to take a look at this paper. Many of our customers ask about what's coming for it. Let me know. I'll give you a sneak peek into the new document I portal. It's a One-Stop shop where you can race View and manage everything needed to stop your own business document challenges. This also include
a new processor Library, a gallery of specialized passers such as a spacer. 941 form hasser or W night after Apart from the specialized ones the library will also host. Either choose like o c r and swollen tosser to help you better understand your documents. Introducing another new feature in. May, I human review an integrated experience to give you food, competent, accuracy, and accountability off of. How can you make a regular tree and compliance requirements as well? Taking a deeper, look at the
human review configuration Channel, you will be able to choose different validation methods and liebler options. Is it ready? Half an in-house Workforce or are using an existing PPO or are completing you too human review in your AI ml works? Well we have an option for you. We are very excited to see what you'll be building. Was our new doctor yet interface And what's up? I'm heading back to Sarah. Thank you doing. If you think you have a great new skins for
Buy this talk
Buy this video
With ConferenceCast.tv, you get access to our library of the world's best conference talks.