Events Add an event Speakers Talks Collections
 
MLconf Online 2020
November 6, 2020, Online
MLconf Online 2020
Request Q&A
MLconf Online 2020
From the conference
MLconf Online 2020
Request Q&A
Video
Designing More Inclusive AI with More Inclusive Workflows
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Add to favorites
62
I like 0
I dislike 0
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
  • Description
  • Transcript
  • Discussion

About the talk

While biases in AI have long been a topic of conversation for data science experts, this year in particular has brought conversations around racial bias to the forefront and sparked a call to action for industry experts to find tech-based solutions to existing challenges around D&I within data. In this presentation at the ML Conference, Getty Images’ senior data scientist Dan Gifford will share three key avenues to consider to confront bias in AI and use tech to create change.

Dan will cover some of the work he’s doing to lead change within Getty Images, as well as important updates to best practices within data science that can lead to stronger machine learning and computer vision. He will also address the importance of building ethical training and validation sets that take the contextual nuances of 2020, like wearing masks, into consideration. Additionally, Dan will talk through the importance of creating a strong, consistent pipeline for output accuracy, changing our perspectives on using open source datasets for training purposes, and identifying ways to retag existing datasets. As an industry, inconsistencies and inaccuracies in datasets are holding us back and creating a cyclical pattern of biases that are amplified in new products and nascent technologies. It’s crucial that we begin to adopt best practices now that not only prevent harmful biases from being coded into datasets and foundational technology, but that address systemic challenges as well.

About speaker

Dan Gifford
Senior Data Scientist at Getty Images

Dan Gifford is a Senior Data Scientist responsible for creating data products at Getty Images in Seattle, Washington. Dan works at the intersection between science and creativity and builds products that improve the workflows of both Getty Images photographers and customers. Currently, he is the lead researcher on visual intelligence at Getty Images and is developing innovative new ways for customers to discover content. Prior to this, he worked as a Data Scientist on the Ecommerce Analytics team at Getty Images where he modernized testing frameworks and analysis tools used by Getty Images Analysts in addition to modeling content relationships for the Creative Research team. Dan earned a Ph.D. in Astronomy and Astrophysics from the University of Michigan in 2015 where he developed new algorithms for estimating the size of galaxy clusters. He also engineered a new image analysis pipeline for an instrument on a telescope used by the department at the Kitt Peak National Observatory.

View the profile
Share

Welcome back everyone. We now have Dan Gifford, senior data scientist at Getty Images, ready to share his information with you damn. Take it away. Thanks so much. And yeah, welcome everyone. Thanks for joining today. So I was introduced. My name is Dan and I'm coming from Getty Images. A little bit about inclusive, Ai and inclusive, workflows around building that kind of a. I so, you know, this is an interesting topic is something that I think is important to a lot of us and certainly has grown in the intensity that more

people are thinking about it recently. So so what's good in doing this myself? And the reason I'm going to show you this picture is because we're going to start off with a protest based on what little you know about me in the few seconds that I've been talkin. I'm in what city do you think I was born? How about my level of Education? About my gender. And what do you think is my favorite genre of music now? Suppose a spoiler. I'm not going to give you the answers to any of these questions and not expecting any of you to guess them. All

right, but I'm using this exercise to sort of showcase something. That is interesting about the human brain. And is that all of you, probably started stinking and developing stories about me as soon as the word began coming out of my mouth. And you saw me on the screen and this is fascinating, our brains do this automatically without his even thinking about it. And this happens to actually a really good reason, just because humans are social creatures. We depend our survival depends on us. Building relationships with those

around us in order for us to grow. And, and ultimately succeed, and this is the reason that our brains do this. Please because for a lot of human history, we've had to make these judgments and Bilby stories about the environments and people around us very quickly, our survival actually dependent on it. I know I needed to be able to tell her. We need to be able to tell really quickly, if the person that your interfacing with someone, you can trust, or if it's someone that you should be skeptical or wary of and your brain is doing this all the time without you even

realizing it and just as positive as in the sense that it helps us sort of form these stories. But of course, we also know that that can be negative. I'm specifically what that can lead to write. If you're using information that you have built up over the course of your life, to build these stories. I'm, you'll start to attach meaning to things that might not actually be true. And this leads to things like stereotyping and it can lead to things like sort of unconscious biases that work their way into how we look at each other. Inserting face with the world. So

it's helpful as if we're going to talk about something like by us and fairness that we started to find what we mean by that and there's lots of different ways that we can Define bias. But this is just one particular quote that I really wife. Which is it's a system that unintentionally are is a different race store in different ways between I feel like a really good quote as a relates, the machine learning and in particular, the key operating were here is unintentionally at because it's totally okay and possible to have a system that are is a different rate for a different ways

intentionally. And that is actually, you know, there's there's great examples of that. Let's say you're building a translation app for language and say you're building something to translate to in English and French and the attention, you've built it to do that and you made that explicit will someone who is primarily Spanish-speaking. It's not going to work for them. It's going to air, but that's intentional. However, if you were to build his application and sort of stay, this is the universal translation app. And someone who is primarily Spanish-speaking comes in to do that and

notices, that it doesn't work for them. This is an unfair experience and it's going to result in what appears to be a bias in the app and likely is based on the way the designer actually built it all kinds of different exist, and I'm going to list a few here but it's useful to sort of go through a few just to remind us but there is a variety. Want me to bring up the concepts of buying something like availability bias has an example of this would be what's a uncle which is another 95 years old and smoker. Well, okay smoking must not

be bad because you know, I have this very old uncle that has been doing his whole life information that's available to you soon might be spewing your interpretation, right? As if I owe their confirmation bias related to, especially we talked about this round like the media and social media landscape nowadays wear for the wind up in these bubbles that information. Or you're more likely to sort of trust information that already conforms to your priors and by beliefs stereotyping which I tried to elicit from you

in a protest, right? Where are you supposed to rest to make the importance of correlating? Teachers that you store, built up over your life? For a machine-learning has, built up over the amount of data seen and overestimate, the importance of those when making predictions. And then finally think like survivorship bias, and then I'm sure many of you in this talk has have seen the sort of classic World War II bomber example with a bullet holes in first, only getting information. Back from the plants actually came back to the airstrip. These are all different kinds of biases. In these are

some of the most Common ones that work their way into machine learning and AI in general, but of course, there's many more besides topics to discuss now and why are more and more Industries are tackling this head-on. And it's because we started to realize that biases are holding us back there. Holding us back from creating products that are fair to be widely adopted. And it's holding us back from actually building things that are meaningful and valuable to everyone. And we

machine learning at first, is this objective technology is learning from data. Humans aren't necessarily driving is a hundred percent. But now we know that's really not the case, right? Humans are involved at every stage of the development of a, I and our biases, there's come across and because of that. And so we're realizing the amount of tower is humans that we have insert influencing this technology isn't where the subjective. Ashley Tisdale. 2020 comes along in a 2020 has shown us anything. It's at the world can change

really really rapidly. Now what happens is when you have technology that exists in a rapidly changing world, you have to decide as the designer how quickly you want your technology to evolve with it, right? There's times when you might not want it to change, an answer to become skewed super quickly, by flashes in the pan or things that happen quickly, but certainly for many of the things that have happened this year. We might share a lots and mask-wearing and working from home or we have a blast from our photographers and we always had a lot of

imagery with people wearing masks and surgical masks and we've had lots of imagery of people working from home, but only in 2020, have those two things been related to one another and so we are models that serves help our customers find the right content for the stories. They're trying to tell needs to take that into account. Otherwise were Biases for recruit their way in and the models dressed as we see him off that other sort of arenas. So let's talk about the subjective side of bias in machine learning for a second and how work flows need to accommodate

for the subjective be. So this number 90%, What does it refer to a show you a set of image results from Getty Images and this? These are the results to come back when you make the search for nurse. And so if you relate that to the number, I just showed a 90% that 90% of the top results. Do you get back for the search term are open and Mike? Okay. Well, that's that's interesting and slightly biased it certainly from a gender perspective, but with curious and probably not incidentally is

that if you do when you look at the census date for the profession of nurses in the United States, you'll find that 90% of the workforce is actually a female and So this makes an interesting question, right? Is that the correct percentage? Is that the let's call it be at the goal percentage and is getty, if it's not. What do we do about it? Do we serve for our hands up in the air and say, well, this is the status quo in, this is what our customers are interested in and sort of looking at being. So,

therefore, we're matching sort of the representative sort of distribution in society. So let's call it a day, or is there something deeper here? Is there a case to be made for the fact that this is a profession that doesn't necessarily favor any one, gender over another? And there's an ideal, which is different from this current status quo of systems out there where the status, we know. The status quo, isn't a good status quo, right? You can look at the fraction of women in stem careers or the number of women in position, CEO position where we know that there is a

Biosource the stomach virus, that's been perpetuated for a long time and the status quo isn't the ideal. And so, we have to ask ourselves in the subjective way, but we need to ask ourselves. What is the vision for the future look like, as a company? That provides imagery that allows Brands and corporations to tell stories on the creative side. And on the editorial side, what news agencies and media are actually putting in front of a lot of those images come from Getty. How easy do we need to make it for those places to find more ideal distribution, but so sexy.

And so it's an interesting question that we had to face and really tackled head-on and is not something that she's going to come naturally out of the data. So there's another problem, right? Because you might say, well, the correct answer here is to go with customer. What does the customer want? And that's often times the way that that place will go is is for a customer is always right. What happens if your customer is biased, right? So what we did at daddy is we we sort of a fact-finding experiment where we trained a word, Avec model on our

search phrases that we get from our customers. So what they directly type into the search box when they are searching for imagery. So this not influenced by the metadata that we have around her imagery is not influenced. By the way that we structure. Our images are search algorithm cheerly dates off of what they are typing in what they're searching for. And as most of you know with that you get these word embeddings out at the end that you can sort of person on board Mass into your more complicated natural language understanding.

We went with the simple word relationship past year and just like the famous sort of King. Midas man. What's woman example, equals Queen, right? We can do the same thing, but look for a bias around terms that our customers off in third for for instance, business business is a common term that our customers will look for stock for customers. Again are searching or no imagery around business. And the words, we get back from we make this sort of relationship, check our work like CEO and salary and Leadership. How were

oriented words. And if we if we look at certain what are customer service looking for in searching, for a round up right behind the screen cuz you know, what words are probably going to be coming and it's not the words on the left, if these words, right, fashion secretary collaboration with very different set of words. Some might say, Biased, right, set of words, that come back. When you look at this and this is just the roster today that we're getting from our customers. So, if we're relying on that raw data influenced the type of imagery

that were showing at the end of the day, we're going to be doing a disservice. And instead of amplifying systemic biases, that exist in society, again, something we need to consciously, look into can tackle head-on. Here's another example, here is the results for the search for Ray's wedding. And if I were to ask many of you to imagine a wedding, many of you, especially to live in the United States, but many Western countries are going to think of a white little chapels and, and serve the western style right to keep all coming together

now celebrating. But of course, yeah, that's just one way of celebrating a wedding. There's lots of different ways to do so, but we noticed when we dug into the data on the search results that are coming back, The distribution about this he's that we see is very different from the distribution of ssds. Globally that you would expect to see. That's all we did is made a conscious decision to allow that you are algorithms to learn. As a part of the fitness function, the distribution of ethnicities, that appears more globally in our world. And we can also localized best to

individual regions if need be and the results of that is a completely different set of results. We see much more diversity in skin tones in a sort of clothing and styles and these are just the top results, but you can see this as you go deeper in the results as well. And the interesting thing here is that this actually is something that our customers, what this isn't something that turned them away, or sort of decreased our business. In fact, customers love estate, domanski more diversity, which is interesting because it's the opposite of what we Be in the

actions and what we realize is that there's a bit of a feedback. What's going on here? Customers were searching for more Western depictions of wedding. So then more of that ultimately got served, which sort of developed now. So what color your confirmation bias, kind of scenario, where we had those images were locked in overtime as receiving a lot of Engagement, but this wasn't a question about the data right there. As Getty have had these images on our website for a long, long time. It was our optimization process which needed to be addressed. So there's

a lot of interesting things that came out of this particular exercise that we went through and updating this. But the the biggest one was really creating a vision around sort of diversity from the top down in the company that let us to tackle some of these issues right now. It's not something that's just going to naturally come out of the data from other talks about how to be by us. Data and be biased, the algorithms that depend on, it really have to be explicit about the ways in which you go about doing that. So I want to spend a bit of time on some of these best practices that

we've found and working on these types of problems over the last few years that I've been a caddy. The first is what I just mentioned is that you need to create a vision of fairness and really stick with it. Your fairness is an interesting Concepts and its objective and it's something that doesn't come about naturally. As I started introducing early in the chakras humans. We tend to get into positions of comfort and son to stay there. And so, when sort of, we find ourselves in a position where we need to change the way, identify ways in

which we can serve, you better make helps to have a plan in the Northstar. Do you know which direction to keep going when the road gets a little bumpy along the way? I don't think this is a controversial Point anymore, if it may have been for a while, but hiring diverse teams and this doesn't just include diversity along at a city in gender and age is also includes diversity in points of view, right? So different and data science. That means not just hiring people with cs degree, or they come from the physical sciences. Somebody happened in there

too. And so that's something that you really need to consider hiring your teams and building amount. Also using diverse data and quantifying at the First Data. So the talk about making sure that the data we have is representative in that influence machine learning algorithm. Absolutely true. But that's often times, were a lot of people. I'm actually need to quantify the diversity that. You see this is sort of shown by the the pie charts. I showed earlier

about some of the representation of a. That's really the saying that you can dictate at the end of the day, whether or not you're achieving your goals. If you can't measure it, you can't manage it. Sort of the common phrase over and over again. It is true here as well. So quantifying is really important. Another thing is that biased doesn't just come from your data. Now, it really comes from your optimization processes as well. So if you have Fitness functions, are you have lost functions having diversity as a part of that? And again, if you're able to quantify your data, this is

sort of the obvious next step to that and being able to encode that into your loss functions to make sure that your algorithms aren't just optimizing for the wrong thing and everything that you care about, which includes fairness and representation, depending on what you're trying to do. Also, and this is something that extends Beyond sort of building Fair locations in algorithms, but not being afraid to iterate. These are hard problems, and they're challenging problems, and they're comfortable problems. These are things that as humans. We often times have to get out of our

comfort zone in order to improve and tackle and have conversations about the duration is the key to succeeding here. Start simple, you identify the areas where we can improve and, and just start making progress realized it when I come. You're perfect. So there's going to be some mistakes. But realizing that those mistakes are coming on the way to building a better more. Just more fair system that everyone can benefit from is a noble process that we should take pride in. So don't be afraid to to get going and I need a rate. Also, you're changing your perspectives on

using open-source datasets. One of the things that we've noticed, but it's not just us. This is now, becoming very well-documented is that many of the open source data sets that machine learning and AI practitioners have used over the years and has really went to among other things, the explosion in this field. We know that they have biases and put it in them whether it's the images that are male and face recognition or of different ethnicities or whether it's the way that labels pertaining to certain individuals in the units that datastat, yes, image. That has lots of biases and we still use

it as a benchmark for a lot of the computer vision works at his own research that's constantly on going. So we need to take a hard look at whether or not these are what is open source stated that should be used for and what we should start as actively move away from or just become more committed to making them more fair and has lots of groups that are tackling this. So this is something that we all should be aware of. And it's something that we ran into, as we build out our own algorithm with him, daddy. And finally, I think don't feel like on

your road to building more fair applications, that you need to sacrifice accuracy your performance. In fact, quite the opposite is true by making your algorithms more fair by, including diverse data, and diverse points of view and hiring diverse teams. And doing all of these things. The ultimate thing that will happen that you will see, is that your accuracy will improve, and it will actually get better guy. Your customers will love your products even more than they do. And so don't feel like adding in this additional constraint is going to constrain you. In fact, it's going

to make things even better. So we had we know sort of the power of imagery and I know many of you on the call today or are you work with images? You were use them as training stats? You have them as part of your products, images are powerful day or one of the purest forms of communication that we have and people can digest a lot from a single image of the phrase up in an image is worth a thousand words. It's worth more than that, right? Because people see themselves as individuals that they encounter on a daily basis. And we're providing

those images that ultimately wind their way into advertisement into social feeds a man into the news that you watch on a daily basis. And so, we know that representation matters, we know that people want to see themselves in a mature and we need to make them easier to find. And so, are we created this vision of fairness for for Arbonne products and services and search? And one of the things that we've just Is this is only made our products stronger and our customers even more committed to using these types of imagery and in the process of influencing others around the world through the

use of that. So with that, I think I'll I'll take some questions, but I really want to thank all of you today for joining in. And yeah, I know this is quite sensitive topic, but I'd love to hear any thoughts that but you might have are questions, you might have on on it. So, thank you. Thank you, Dan fantastic presentation. If any of you have questions, please submit into the stage chat. Address them to Dan so we know that those are his questions. Did you have one? We got a question here from Penn Yan.

Lee Dan, interesting talk. What open source data set. Do you use to test your model to reduce biases? Yeah, that's a great question. We use a variety of Open Source data sets, But ultimately, we, we found a lot of viruses in them. And so it winds up being difficult to trust many of them directly. So, we spend a lot of time when we're working toward devising, algorithms, or buildings are two more fair evaluation. It's a tricky. Challenged is often times the open-source after the easiest to get

access to. But of course, I can also say that I'm fortunate to work in a place like anywhere. We have a lot of images that we can use them out and thought we were lying, of course a lot on that data as well, which we build inserted into these evaluations vets that do have a more representative distribution. So yeah, we both where we can Excellent, and we have a question here. This is the last one, in our time slot. We have a question from vinodh. Very interesting talk. Thank you win, including reference distributions, for example, world race distribution in

wedding. Search in model optimizations. What kind of government measures do you recommend? To make sure that the references themselves are accurate. Yeah, that's a good question. So maybe I'll answer it like this and say that you're the ultimate, leave the goal of using a reference like that. He is not to sort of the reference directly read this, more to have the reference influence the direction, in which we had, and of course, the reference that sell steel. For instance, the distribution of ethnicities around the world, be the right distribution. We might find that in fact, that

doesn't work for certain regions or even globally that isn't the best scenario for our customers in terms of what they want to see. Even though it's sort of broadly more representative. And so this is where we started to do a lot of testing and integrating into what those bastards distribution center near look like. So we'll use them more as sort of a guide that was really in the directionality. But knowing that any one sort of reference point is not necessarily going to be the best for the most optimal. So, it's Sharon and we serve realize that there's going to be noise and error

and potentially have some some non Optimal Solutions along the way but hopefully we can test them on quickly and Caesars which one they're giving us the best password. So it's a great question now. Thank you, Dan, Irina has it submitted a question, but we're going to have to take it off line to stay on track here. We do have our next speaker standing on Deck. So Dan will continue to answer your questions in the chat, and I thank you very much. A fantastic presentation. Dance. Everyone stand by, I'm going to reconnect

so we can bring our next speaker on board. Thank you, Dan.

Cackle comments for the website

Buy this talk

Access to the talk “Designing More Inclusive AI with More Inclusive Workflows”
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free

Ticket

Get access to all videos “MLconf Online 2020”
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Ticket

Interested in topic “Artificial Intelligence and Machine Learning”?

You might be interested in videos from this event

February 4 - 5, 2021
Online
26
104
ai, application, bot, chatbot, conversation, data, design, healthcare, ml

Similar talks

Patrick Hagerty
Chief Data Scientist at Arena
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Nitin Sharma
Senior Research Scientist at PayPal Risk Sciences
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Vinay Prabhu
Chief Scientist at UnifyID
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free

Buy this video

Video
Access to the talk “Designing More Inclusive AI with More Inclusive Workflows”
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free
Free

Conference Cast

With ConferenceCast.tv, you get access to our library of the world's best conference talks.

Conference Cast
949 conferences
37757 speakers
14408 hours of content