About the talk
Watch an overview of Google Cloud’s approach to implementing AI Principles into well-known products. Hear approaches in place for applying AI Principles in the product development process, including user research, product design, product reviews, testing, documentation, and marketing.
Speaker: Tracy Frey
Google Cloud Next ’20: OnAir → https://goo.gle/next2020
Subscribe to the GCP Channel → https://goo.gle/GCP
product: AI Platform, Cloud AI & Industry Solutions; fullname: Tracy Frey;
event: Google Cloud Next 2020; re_ty: Publish;
Everybody's responsibility from Theory to practice. My name is Tracy Friday and I'm the director of product strategy and operations for claudii and industries. Start us off. I want to root us all in this statement in less than 10 years. A I will be the number one driver of global GDP growth. This graph shows that growth through 2030 13 trillion dollars. And not only that organizations that a cheetah absorbed in will be the leaders of the global economy
in that time. that growth has already started people around the world had become everyday, I users and soon technology that they didn't have 10 years ago, direct payment applications like virtual home assistance, our Reliance on AI has already begun and it will only become more 2012 closely tracked, Moore's law with compute doubling every two years. 2012 compute have been doubling every 3.4 months, free sample. Vision has gotten more accurate and more powerful.
Take a look at the error rate of progression here from 2011 to 2020. While this is undoubtedly, the news for Innovation, it also requires more care and attention from a responsibility standpoint. A more accurate and Powerful Vision AI technology when used in a harmful way can lead to harmful and intentional. Misuse loss of personal privacy contributing to severe consequences for individuals at incredible speed and scale. CVS in science reports that trusting an AI systems just like the Visionary. Why is the biggest barrier to adoption moving from the lab to production?
For Enterprises there's a perception that there is Brasi a deficit in the areas of ethical diligence over these powerful Technologies. We hear a lot of questions on responsibly I start to think about moving to production Responsible, AI is impacting. A i adoption in, as a team we are seeing more and more companies come to us, asking for help because they are facing challenges, there's a lack of trust in AI systems themselves. And when ethical issues arise, which 90% of organizations have encountered,
40% of companies. Abandon instead of solving. For those issues, these numbers show a huge rest and not taking responsibility into consideration. It's not possible for any technology to be morally or values. Neutral values, are interpolated at every stage of AI development and technology in order to do something, which expresses of that reveals a huge barrier to AI dolphin, and it's a key driver. White one study. Found these reasons for ethical issues within organizations. Governance is reported to be the biggest Enterprise requirement in AI in 2020.
Not only will Enterprises embraced governance, we can expect increasing legislation potential regulatory issues and consumers will begin to Josh and select organizations based on these policies consumers become more about how organizations and organizations are using increasing amount of a. I and a mouse Drive business decisions Enterprises must Embrace ideas like explainable transparent data policy for both Technical and business reasons. Customers will begin to judge and
syllabus organizations based on these policies. By 2020, Gartner expects companies that are digitally trustworthy, will generate 20% more Online Profit warning out. This all fits into our believe all that responsibility. I equals successful, I have a lasting impact efficiency gains in competitive, in a world. Where is responsibly? Develop and trust it by you first. If anywhere trust gets broken, we will Cai getting pulled back. So responsibly I really leaves to successful.
Can build responsibility, I into our DNA. We began asking ourselves a few questions. This is rooted in the idea that people and businesses with the best of intentions can, potentially cause unintended harm. And we need to understand a Rex and build-out properties to avoid those outcomes. For us a Google. As you might relate, we feel an immense responsibility to develop the system to answer these critical questions and Bill in a proper principles practices programs. In support of responsible, AI development
to ensure that the AI windows and enable for the world is successful. This work is never finished. It is far from perfect as we approach this work from a humble place. We are constantly in a work in progress but don't pretend to have all of the answers today is to share with you how we've approached building responsibility in by Design into our product. But more importantly, our organization to support the necessary inquiries deliberation and Coulter to build product aligned with our
values. Will focus mainly on Google Cloud, because that's where we fit in, because it's arguably one of the more complex areas to align. We have a range of general purpose, purpose, tools and apis all the way to solutions that go much farther up. The stack like contact center AI. The way these Technologies can be used a berry and hard to predict which makes it a good place to discuss putting principles into practice. What grounds are approach to responsible a, i a Google are our
responsibility to terms. Can feel how do you begin to Wrangle all of the contradictory? Critical components of a, i responsibility into your development for us as for many organizations. Now, we found that the most important First Step was developing our company life principles in the summer of 2017. So in 2018, we published our Adkins full, movies are not just a marketing campaign for us. They are the Constitution. We use to build Rai products and rice are policies are set of AI principles has been one of the most important things we have implemented.
They keep us motivated by a common for the sink leak. Communicate our values in developing advanced technology for the world and ensure that the way we use artificial intelligence is in the best interest of humanity. We believe that organizations and communities floor spots in addition, to our own personal ethical values. There are also shared ethical commitment within the community that person can play a part in fulfilling. Our work, is more about doing something good together. Not doing whatever we privately think is bad. And just assuming were
hoping others value in choices, will magically mirror, our own But as we know a plan on paper like these principles and practices are only effective if they're operationalized looking at a list of principles like ours can be hard to incorporate into our day-to-day work. As we deal with detailed engagements of our products. What remains true is that Rai principles aren't magic. They don't immediately answer all of our questions on how to move forward they don't relinquish that from having hard
conversations. They are starting point in establishing what we stand for and what we can agree to build toward the work that has ensued in food beyond the publication of the principal has been to interpret and apply them in practice. At the end of the day, if something goes wrong with the technology and you have principles like ours and you can't prove what you did to prevent that then that is a terrible outcome. We realized very quickly that in order to be true to the principles, we needed to develop internal governance. Processes that put our principles into
practice in a systematic and repeated This includes building responsible governance capacity in the form of product and customer Deal review, committee's educational programs, a partner program, and train, how we can solve and work with our customers will touch on each of these. Beginning with Google's, AI review, prophecies the original Hope was that the principles would be a decision guy, but in reality they can't because the company has such a wide variety of the AI. The reality is that
you can take any Baseline technology and it can be beneficial when used in one way and harmful, so, governing Mei is not only about the technology, but it's the combination of the technology. They used case the training data, the societal context, in which it operates and how it's deployed in production. There's no world where blanket statements can hold water. If you're going to align with the principal and it requires that people in the organization are committed to the complexity. So, we thought a Google why people
system around how we operationalize? We have a central, which is supported by a response. Innovation team, that team is chartered with helping teams around Google in every product area. Learn to operationalize, the principles for themselves, what means Cloud initially thought about how to begin putting into practice and this happened before the principles were published. Our first instinct like many is that we all we had to do, was create a big decision. Tree
is a thousand flowers bloom kind of place and we have so much breast in the various product offerings that it would be impossible to have a checklist for decision tree at the level of detail. And I guarantee you that's the case for you as well checklist but in practice have been ineffective at addressing governance. Properties for such nice and Technologies evaluations. But flexibly allow for the new technologies to take shape in Google Cloud. Review bodies that are connected but purposely distinct
the first one to body covers early. See customer engagement where we are engaging in Custom Air advanced technology with a customer or partner. This group is a decision-making group and it is very high throughput and its intention is to very quickly given an unfair to the field team as to whether I can afford is not aligned with principles and won't move forward or if it will move forward with some conditions. We seen a range of those decisions over the two plus years that we've been doing. the
next half of the process is how we build and develop our products when it comes to the next we developed a Google, we undertake rigorous and deep at the risks and opportunities, across each principle, that we bring to those reviews where we often have very uncomfortable, but extraordinarily inspiring conversations with a relatively early on in the development life cycle, which is essential In advance of the review meeting a member of the team takes on the task of doing an initial evaluation of the product working,
in order, to assess the principal's implications, and the recommend mitigations to build the plans for the product serves as the basis for the discussion, review, time, this meeting of the minds is one of the most important stat in any principles evaluation to allow us to do two things, you spot as a team and Tom two outcomes and pass forward for products that are not good in theory. But our able to be actualized in practice over time, these conversations have been effective at normalizing conversation about risky, technology and potentially adverse outcomes that we
can prevent. The next question, I'm sure you're wondering. If I look like a lot of different things, it can be between more narrowly scope. The purpose of the technology. In some cases, we make a decision, not a product of the general purpose or without an allow lift, as you may have seen. We decided in 2018 that we wouldn't make facial recognition available at the general seti. Even though the technology available for anyone to use the technology by scoping it to an arrow celebrity recognition offering which will talk about in a bit. Other
medications can look like identifying educational materials or best practices to package up with a product and launch. It this has taken the shape of launching of the products. With the associated model heart, a project we took bronze organize, the essential facts of a machine learning models. In a structured way or launching more specific implementation guys are interested in our products from there, an alignment plan with the medications discussed in the meeting, engineering leadership
to know that, not all pass forward, involve Tech Solutions or fix it. Because Essex problems are not always the result of Technology glasses that are actually more a product of the style contest in which is to being deployed and how it's used as intended. Overtime through this review process, we identify patterns similar to case Spa where we are creating a growing set of Precedence and best sensitive topics to inform decisions and drive consistency on whether a new product, can go ahead and with what medications. There's no one-size-fits-all approach
flexibility is important and it's not just a point in time as that societal contacts changes. Then we have to respond different as we make our decisions going forward. Here are some of the imperatives and challenges need learned over the past two years. One thing that I really want to stress is the importance of a diverse Review. Committee members are intentionally multi-level, we have folks who are just starting their careers out of college all the way up through these. You're drawing on a range of
experts from A diversity of backgrounds, both Technical and non-technical members, including myself, have deep, social science, and philosophy, and ethics background. And we are really intending to be cross-functional across everywhere for the most part and that we are not just drawing from people with him, but that we are drawing from expertise across Google. Bring me to review. We also seek out additional expertise weather. Be on issues of the morning, fairness, and robustness or ethical AI or particular use case or technology, or donate, or
industry. We have a lot of that expertise in-house. And when we don't have it in our walls, we seek an extra early. The other most important things we did in the beginning is we had the great opportunity to hire a visiting researcher and in color. And she worked with us to help us build this process and engage in our reviews in a way that was incredibly useful for us. So what we've learned, we've learned that constitutions like Rai principles require interpretation predicting, misuse helps develop guardrails. This requires that
explore all the possible scenarios in areas of misuse in order to develop a comprehensive set of guard. Rails are principles are magic, they don't answer their a framework for those conversations deliberation and healthy. Disagreement is key. We do not hold our team as an ivory Tower full of moral Heroes, but working collaboratively with potential issues with them together. Is not only about, and it's not always about AI for Social Scene, should be put through principal evaluations
Sure, some of the trainings we offer gubler's to build responsibly I into our culture and equip team with the tools and knowledge to insert responsibly I into their day-to-day work. Some of these are available externally as well. Our aim is to collect connect and collaborate with Partners across the industry are learning. Learn your fresh and varied perspectives, and Advance. Our Collective thinking, when it comes to responsible AI, The final way in which we put our principles into practice
is through a growing set of tools responsibly and tools. And brain works are becoming an increasingly important and effective way to explain and understand Rai models questions. How do I understand the predictions made by a? I how do I know if my data or model or outcomes have been there by it? And how do I know what is influencing my models predictions? Building schools and infrastructure to evaluate and improve concerns as scale today, we'll touch on for tools. We offer that get at some of the main
concerns facing what a school model cars and explainable AI. We often get asked about D biasing model and how we are Google. Think about Emma, author, net, speaking. There is no standard definition of fairness whether decisions are being made, by humans or machines. Identifying appropriate, fairness criteria for a system requires accounting for user experience, cultural social historical, considerations trade-offs. I also wanted to stress that there is no future in, which we will be able to stay in my opinion that we have achieved parents that
we can say. We're done with this work and we can move on. This is something that we engage in for the long-term and we are always learning more and evolving and will show you some examples of that in a bit. So why does Google care about this problem? Well, our users are diverse and it's important that we provide an experience. That works equally well across all our users. We have the power to approach these problems different and to create technology that is fairer and more inclusive for more. People will start with an overview of what that means. And I should also say that
this is an area, we are still learning about an active area of research at Google and Beyond from fostering, a diverse Workforce. That embodies critical Knowledge from training model to remove incorrect problematic by seize. The opportunity to challenge important We may not in our group have the appropriate experience. I can help inform how we build a product, but it's important that we recognize and that I recognized and stressed for our team that we are coming from a place of great privilege and to honor that as much as we can. This requires creating a space that is
Rich with psychological safety and I cannot stress enough the importance of doing that work. An important step on this path is acknowledging. That humans are at the center of Technology design in addition to be impacted by it. And humans have not. We made products that are in line with the needs of everyone until 2011, automobile, manufacturers were not represented, female bodies, as a result, women were 47%, more likely to be seriously injured in a car accident. Have long been manufactured in a
single color. A soft pink. My skin tone in this tweet. You see the personal experience of an individual using a Band-Aid that matches his skin tone for the first time? I will never forget, I had a third-rate assembly where somebody came a father of one of my classmates and told us pointed out that Band-Aids were made for white skin, tones sober crayon. So we're Hosiery that was with the nude color was and once you know you can't see it, it changes you forever
and this is why servicing these issues is so important. A product that's designed and intended for widespread, use shouldn't stay up for an individual because of something, they can't seems about themselves products and Technology to work for everyone. The choice is made in those two examples of being thoughtful about technology design and the impact, it might have on humans. As in the case with those two examples humans are at the center of machine, learning development In fairness concerns,
so how to make them more inclusive. Unfair unfairness can enter into the system at any point in the MLS and handling model training to enemies in this. If you can, you label the data, you train them and using certain objectives. The data is packaged users affect the way users engage in forms, more data collected Within How the model is trained and whether or not the objective, without a particular set of users, how it was packaged and what kinds of unconscious bias. He's our users bring at engagement
and how that affects how they behave. And that's how we collect more data, can all work together to create their systems. Rarely, can you identify a single cause of, or a single solution to these problems, far more often, various processes interact in Emma system to produce is problematic. How come in a range of solutions is needed, the work of ml furnace is to disentangle. These root causes, and interactions and spying wait for word, what's important is getting
near on the questions, we need to answer. There are a smattering of questions that can be used in dyed, your fairness work across the pipeline, everything from up front, which is often under specified in a lot of pieces really defining. The problem is all the way to identify if, and when a model is in the heating correctly, we want to know why did the model cell is the model trustworthy? What are the limitations? We found these questions to be helpful in dieting. Underlying all of this is making sure the workforce is
inclusive. The best way to make sure as we that we esteem are equipped is to have a diverse Workforce, having these conversations. Backward pipeline at each stage of the life cycle. We have identified tools best practices and framework to help and I'll hack sisters in their efforts to build responsibly. I have to start. We are looking at problem definitions and the resources we have when we initially qualitatively speak to what we want to accomplish. But when we talk about fairness, some of the tools and resources we have found valuable here for fairness are the
stages at the end of the MLA cycle where we've really focused in terms of tool development and transparency Frameworks is the evaluation and in the evaluation and monitoring stages. Now into some case studies to show you some examples of how this work has gone for us and what we've done, both in the process of launching products and evaluating problems when they arose. The first case study outlines are approaching Cloud to facial recognition and how we got to this approach,
through our air principles and the review process as we put into operation with them. I'm going to start with the outcome of our in-depth review of Enterprise facial recognition technology. The headline is the loss of a so-called celebrity available to professional media and entertainment Hall. You may realize that video is all but unsearchable without expensive. This makes it difficult for creators and platform to organize their content, and cheaters creasing. Demand for personalized experiences, or even really understand what content they have soap.
Celebrity recognition is a pre-training, say I bottle the popular actors and athletes from around the world based on license images, with facial recognition. So I want to talk a bit about history in early, 2016. Cloud leadership decided not to make facial recognition available as part of the cloud and despite it being one of the top requests from customers coming out of beta, which was started the year before in 2017, facial recognition was identified in terms of potential unfair bias in our definition of
algorithmic from Sarah, which was written by our privacy and security and now responsible Innovation and privacy colleagues for definitional document for Google's machine learning. Perilous fairness ever, we decided to take face recognition through the process we described earlier. Anime 2020. Just recently, we welcomed the news that other technology companies were limiting or exiting their official recognition business. And this was Saint significantly to the gender sheets were switch really informed how this technology can have him pass
across a range of hunger of Icees from skin tone to gender. As you know, we decided not to make a generally available back in 2018. So what we did and this lie within cloud is an in-depth review of face recognition technology, by both our deal review, and our product review by, as well. As the central Google executive Council. Our own internal review. Process has necessitated a broad. Look at their research, societal contacts, and complexities around fish. Recognition going through these prophecies and this was the second review we ever did. Gave us the open
forum and this time to really think Chrissy about the challenges that alone we couldn't do that. So we sought help from third parties to work with us on both external, benchmarking and performance. And we began a human rights due diligence and civil rights leaders and human rights leaders. That would allow us to incorporate the perspectives of impacting people. Just a reminder. I'm going on about how our lives experience is not going to have helped us inform the lives of perspectives of
impacted people. And so it was really important that we did the work. I understand that. Because the reality is everything. We do every product. We launched every Market, we entered does not exist in a vacuum. And in order for us to make decisions that are aligned Rai principles. We have to take this into account at every step technical analysis, alone is not enough because the reality is that there is a systemic societal under representation in media of black and minorities actors. And this needed to be
the basis for our evaluation As a side note, just with the analysis that the Geena Davis Institute on gender, which show that women, let them made 16% more in the box office despite Mandy ingber seen and heard twice as much as women on film. You can see that film with racially diverse Cody had created 60% more in gross income for those films. But we're grossly underrepresented it in Fillmore broadly. Human rights. Due diligence became of poor, part of what we needed to do.
Human rights are a basic code of conduct for all human beings and we honor the universal Declaration of Human Rights. And so again we really wanted to see external voices to help us make this assessment. And so we can gauge in a process called a human rights organization called BS R & B S R S stands for business for social responsibility and they helped us to really significantly improve our product offering engaging with the DSR validated, many of our already planned for
alignment opportunities and almost all of their additional suggestions across a range of aspect of the five. There were is publicly available and I heard all of you to read it. This guidance and the reviews informed where we needed to do our testing and fairness now, and it reveals us both where the solution needed additional oversight and validated the decision, a general-purpose, facial recognition API. So in the coming size, I'm going to show you a deep dive into the investigation. That then further inform the
product has to be made before laughing. So, over three, separate fairness test, we found a discrepancy, in our training, dataset falling on skin, tone lines in one of those tests. It's the one on the right that you be too, was obviously very upsetting and just like we saw earlier in the presentation for many organizations. This is where they would have stopped and said we are just not going to do that but because we knew that we wanted to evaluate this as thoroughly as we could. We decided to take a deeper look.
The first thing had to determine if the skin tone labels for accurate. They were not terribly accurate and categorizing medium dark skin people. So based on the seminal work in the findings of the gender shade study, we categorize skin tone into three buckets. Using the Dermatological Fitzpatrick skin type which is an estimate of skin sensitive to radiation. Going through labeling, the skin tones, resulted in a smaller Gap in the errors, especially for the medium skin tone but there was
still a problem in errors and darker skin tones. So here are the results of that relabeling 25% of the skin tone labels change categories and it completely closed the performance Gap with the medium skin tone. So let us to the discrepancies, a small subset of actors, and actresses represented, a significant proportion of the total misidentifications in that evaluation dataset and this was, especially for the dark skin tone groups and especially for men, So with that knowledge, that a select few actors were disproportionately
affecting the air race in the state of that. We looked at those a number of errors and found that they had nearly a 100% false rejection rate. Because we had made the decision earlier, not to do a general face recognition API, but to take this carefully spilt approach, we were actually able to go one-by-one through the gallery and the test that to determine what the problem was. So here's what we found, it was really three actors where are celebrity Gallery, head images of
those actors as adults. While the Eastwood training set had videos of them, all from the TV show, Family Matters, our model could not recognize the adult actors as the younger characters. They had played many years prior in this instance, for our celebrity recognition model. We were able to correct that problem by hand lately, which closed completely and that is what allowed us to get comfort with launching the project. But the answer here is not to say that this was an instance of age of any images because
look at the larger landscape of representation in media. The fact that there's a discrepancy about what might actually be happy. This is an example of why this process is so important. And these were some of headlines, and in fact, this is to be as far apart as well. It's not about walking. It's about responsible development of which is what leads to successful deployment of the I exist outside of the larger landscape. And it's because we started with that understanding we were prepared to address it, and we have to consider that every
step Afrika Lee. I is not only about the extreme and it's not always about AI for social did. It's about being part of every youth case, for our job, is to find a path through the extremely complex and intersecting realities. he said, he will show you how we responded to two issues with a generally available product, which was our Google Cloud Vision API, Labeling is a common practice used to classify images and train machine. Learning models across many applications for division in our
products and in 2019, we identified an instance that violated Rai principal to avoid creating unfair bias employees are our best source of feedback and that is how we were able to identify this issue of themselves in the API returning, a result that misgender them who are flags to the team which set forth an investigation into this issue. View, an assessment of this escalation. It was important to put this issue in the context of the user by identifying the cause and effect this would have gender misidentification. Like what happened to this cooler occurs because a person's
gender cannot be insured on their physical appearance by a human or model, their outward appearance at the model. Uses is not a determining Factor when assessing gender and the impact of doing. So, exacerbates for create unfair assumptions that restrict or harm those who do not look stereotypically male or female or a gender non-conforming for sense. The steps to evaluate, an escalation are not always the same, some require in-depth product, testing industry, expertise and so on
having identified the issue potentially harmful impact users, we formulated a go-forward plan. We made the decision that a person's gender cannot be inferred by removing man and women label from the cloud is an API and replacing them. With a non-gender label. Such a person is the Through this process disorder, couple key things that altered our approach to gendered label data. There are often more targeted signals to a tribute to a person that provide more valuable information to
a customer, taking retail stores. And as an example to my store, if we analyze foot traffic with demographic information, a more targeted labeling, could be a person Peter, their inventories. We also came to the conclusion that when it comes to General tools where we do not have control over customer use cases to take a hospice approach with capabilities and functionality and those targeted Solutions where we have a can in the development and use case The second example is the
result of an escalation of our vision Epi earlier this year 2020, and it shows how important and understanding of the societal context is in which our products exists here is escalation the image with a black person and a hand-held thermometer returned a label for firearm and an Asian person. Holding a hand-held thermometer for trying to label for electronic devices. This was brought to our attention through the Tweet. You see here on the left What happened next was someone taking the photo of the black man
holding a thermometer gun and manipulating the image to make the skin appear to be lighter. As you can see, this change the label from Don to Tool which is a deeply distressing result. It is critical to recognize why a result like this can create increased harm, the statistics, your highlight, some of the societal context in which this lives any robust. Valuation of this escalation cannot separate this reality of racial Justice, racial violence, and systemic racism from the technical analysis. So,
what did we do? Our investigations, let us across multiple teams. And you can see her, the results of some of those changing. The skin tone in the first example, through some of our does James H ice collection. What we found was that done was still a top-five label response for all three images in skin tones. All those, we used our own explanations. AI to look at the agency mapping sailing in C Notch for the third hand images. So that the attributes cluster predominantly on the thermometer for all three images
with some bleed into the upper fingers of the hand, for the darker and medium skin tone hands. When we went into our contractual testing further, change of color of the thermometer gun in the original image, also, caused the people to disappear. We mended ability testing, and in this case, the image appears very sensitive image coughing. Minor variations. We also looked at our training data, we don't have many images of handheld, infrared, thermometers. And
so, this was the Gap as it's a newer product in our own image database. And it's not necessarily incorrect that this be labeled as they are called thermometer. But if that's the case, then they should be labeled that way unilaterally, we found that many images were mislabeled got including things like camera, for example. Examples represents an all-hands-on-deck process that required a humble approach action to remedy the driver and buy a dusting. The Confident special to more precisely return labels when a firearm is present
example. Showed us again, that we have to do the work to put the harm in contact in order to truly assess the extent of it. It isn't enough to say that this was only about model accuracy and stability because this result was harmful and it exist in a world of larger heart that we have to think about when we address the problem. What does Chata through experience was that none of the evaluations we do exist outside of the large landscape and that context in which they is it. And that's the case for every analysis. We undertake
is Pigeon meme about how to best respect our users the opportunities and each other Thank you so much for joining us today. I hope you enjoyed this conversation and I look forward to many more in the future.
Buy this talk
Buy this video
With ConferenceCast.tv, you get access to our library of the world's best conference talks.