Events Add an event Speakers Talks Collections
 
Duration 40:01
16+
Play
Video

Ethical AI: Addressing Bias & Algorithmic Fairness

Sherin Mathews
Research Scientist at McAfee
+ 1 speaker
  • Video
  • Table of contents
  • Video
RSAC 2021
May 20, 2021, Online, USA
RSAC 2021
Request Q&A
RSAC 2021
From the conference
RSAC 2021
Request Q&A
Video
Ethical AI: Addressing Bias & Algorithmic Fairness
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Add to favorites
72
I like 0
I dislike 0
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
  • Description
  • Transcript
  • Discussion

About speakers

Sherin Mathews
Research Scientist at McAfee
Amanda House
Data Scientist at Apple

Sherin Mathews is a Senior Data Scientist within the Office of the CTO for McAfee. In this role, she creates and develops new machine learning models to improve and increase the effectiveness of cybersecurity products. Mathews is a repeated and requested industry speaker on her research papers in areas of machine learning, computer vision, artificial intelligence, and cybersecurity. Prior to her role at McAfee, Mathews held research positions at Canon Inc. and Intel Corporation. Mathews has a BSEE degree, with honors, from the University of Mumbai and an MS in electrical and computer engineering focused in signal processing from the State University of New York, College at Buffalo. Additionally, Mathews has an MSEE and a Ph.D. in machine learning from the University of Delaware.

View the profile

Amanda House is a Data Scientist working on research and innovation at McAfee. She received her bachelor's degree in mathematics from the University of Texas at Austin. In addition, she is currently pursuing a master's degree in computer science at Georgia Tech. House focuses on machine learning and artificial intelligence applications in cybersecurity. House leads McAfee's Analytics Education and Training committee. To promote machine learning education among employees, she has created and hosted two internal data competitions.

View the profile

About the talk

Sherin Mathews, Senior Data Scientist, McAfee Amanda House, Data Scientist, McAfee AI models can exhibit bias that adversaries can utilize to bypass security measures. It is therefore imperative to explore algorithmic fairness to combat vulnerabilities. This session will address Ethical AI and present techniques for bias detection, fairness metrics, bias-explainability tools, and bias-mitigation algorithms for datasets and models.

Share

Hi everybody. My name is Jaden Matthews and his michaluk Amanda house. We both are data scientists from McAfee and our session on ethically. I said that being said, we are here to have two sessions. So feel free to put all your questions in the chat. Apply to humans. Should I be wary of concern is whether a model could, potentially have started going to Mission data and build a model to automate decision-making processes for incoming freshmen. How do you think we could improve decision-making and check if there are any human errors or bias into the basement? Look closer at the latest at over

the last 10 years. The university has admitted what happens when between a model and advise, the model with essentially learned and Pious and this will be present in each classification. So they will learn to accept an application. Now, this problem is white with a nest report recently did a deep dive on. Facial recognition, and found that many exhibits bias towards today's hits of country. That double train station at the mission. Systems are official, recognition technologies have been

used by police forces for more than two decades. Recent Studies. Have found that while the technology Works literally went on vacation, man. That is accurate in other demographics. Another World by courted by a fidget spinner on Sundance Film, Festival and also available now on Netflix has work. So, just work in her expiration of bias, in facial recognition, technology. Now, she started investigating this bias in the eye, after noticing that official recognition program cannot identify her face as a woman of color for

technology friendship with a white mask. And after further investigation, official recognition, technology and genomics 1/2 in drain algorithmic buys. What does it mean? Have created a picture of a face and then explore a range of possibilities, which means we just computer-generated images that can reduce the output is a stallion, debated, high-resolution, image, and down, sit down skill image. Now, what group of Hollywood, spy movies has taken a turn towards this. I feel you can create a very high quality

microphone that you can notice are generated image from low-resolution image. Actually looks very similar to the person that has been given the image on the left of the Mona Lisa. There is actually no resemblance from the image on the right that you see your face and no computer, but there is no. Go on to the next flight. So this is an example of how a I turned Barack Obama into a white man is actually a low-resolution image of Barack Obama, which was incorporated

into Nail Garden design, to generate pixelated faces being a white man and the new and the newly published and reproduce image in Addison,. Are you know, any actors or anything like that? The point is what causes his output and how does air by its cover? But that's what we need to know a bit more about the technology that has been used the software that generates. These images makes technology that I mentioned previously, which means that has makeup grading. The

level of visual data processing, these programs employ and models to fill in those lines and convert. Low-resolution image. I was just calling you just made me responsible for creating a website. Now, this is not a big reactor said that's been using the cell card to exchange the range of pictures of them actually generated by the people of color is actually inherited from the sky and buy a set of data, which means there might be other factors than that. They don't realize. But given the nature of

the data that has been renovated tries to create a face that looks like a orange orange into wipes, with means that it has to be Now, moving on to the park as to why they must be at and why I think probably offering to the library to imagine it rain on historical data. It will learn the decisions based on those ideas. What it means is that I can make mistakes and that two critical mistakes. So one must note that a well-trained Ecology, which might provide fantastic benefits, but it must have been

a model. You cannot trust. It must be transparent. And humans must be able to know as to why. Now, you must got completely all rely on me. I am future will undoubtedly consist of human machine collaboration, but that doesn't mean that people should over life on machine, especially when this is coming from a more generic by. Now. Now, you might be saying why, and I haven't forget about it. I don't come and get the loan or something. Go to college. Do I have security professional last card? Be approached the model building with our own biases? It is

also possible that his friend is going to understand, only a partial view of the world and understanding and it was trained on somebody present a set of 9 + 5, 20 minutes will provide false has been actually see that in the real test. SpongeBob square to block something. I said, this is something that you have privileged class. You have Uncle discuss that supports our favor each other. We are privileged class and fairly the father heard that I'm through with my not,

nobody's Grandma. Let me give up, you might be able to create a more stable for now. Does that mean that you're detected malicious, samples of data might not directly have human-related features, our models do impact humans. Malcolm X born 2. How do we connect explainability on Tobias and how explain ability as a concept can be used to improve or explain explain better. Now, what does night is taken from a paper from a paper to do is it ties into our connections between model and size and explanations are trying to understand

and buyers, very, connections between model and size explanations that they are done with the Scarlet. User-centric XP ice cream. Book by factories around which a human cognitive patterns, can be can drive for building x-ray models and how excited I can be used to mitigate any advice. Phone kiosk, support key aspects that you see on the leftmost. What is, how do people usually reason for how to make a house with people? And how do people do it right now? And then he'll talk about take an example from each as to how!

Currently being generated. And then how I can support. So much Point, how do people how do people dye their reason or expect more? Like a top-down Disney is a process of reasoning from the best explanation. We are trying to see whether your obligation is more, you know, closer to either of the patients. And look, how do people actually leaving work? What causes the edit. So usually people end up doing a very fast into to go to Lewisport approach and we might employ you to make decisions too quickly that it may

be some kind of Representatives. I have, I seen that element before, have I seen something similar before to a previously experienced person who has seen many examples and learn from July sing. It can make a decision pretty quick. This can lead to something called, call me to buy one example as to how each of these can help you with. So now we have been moving on to help. You know, how I feel. So if you guys can see some aspect of the interstate or the

functioning of an AI system, example, cssm behave unexpectedly, are getting some errors. You wipe. You might want to understand! To be able to identify the offending for what they can do to make Corrections. That's how I usually excited that you're working with. That being said, how can you how can you make use of xdi to support? Now we presented by Zach Neese on before. It made me happens when a decision-maker or seems our current situation similar to see around us around

classification is might be due to lack of experience of seeing those examples. Before we picked up the wrong selling features. Became you something call Portrait. What it does, is it represents different outcomes. You can play by the similarity of that test Case by explicitly showing, some kind of person is a metric by inspecting pictures. The difference between value, you can see contrast as to the difference between that particular task is to other places, then that might help you to

mitigate bison somewhere. Reminder to not talk about where this by excess in detail and how we would measure and Spice. Hey Sharon. So now that she has given us a great overview of bias and why we should be concerned about bias. I'd like to dive into talking about where bias exists and then in the following slides, also how to measure that bias and how to mitigate bias as well. So the place by is primarily exist would be in the data. So there's three places that buys can exist in the first place that we were going to look at is the data

example of this relates back to the documentary that Sharon had talked about, on the first slide called coded by us. And in this documentary, an algorithm bias researcher discovers that when she tried out with smart mirror that use compute computer vision software as a black woman. Her face was not detected. However, when she held up a white mask, the mirror, finally detected her and the most likely culprit for the Sarah was at the data that was used to train. The mirror didn't have a diverse set of images. So miss actually has a great report that discusses the facial recognition

algorithms can be biased towards the region. They are trained in. This is because most algorithms are trained on pictures that represent the people in that region and it is highly likely. The mirror was trained on pictures only from a small set of braces and did not contain enough pictures of African American people. The shows that the data used to train. The model was potentially not diverse enough. The second place so we can have bias exists as in people. So people are the ones who train models and curate. The data used to train models. As people, we all bring our own biases to model

building. An example of this is an algorithm does, the University of Texas at Austin used to grade applicants to the computer science, Ph.D program. So, in 2013, UT started using a machine Learning System called grade, and grade stands for graduate admissions evaluator, and it was created by UT faculty member and graduate student in computer science, and it was originally created to help the graduate admissions committee and Department save time. So great predicts, How likely the admissions committee has to approve an applicant and expresses that prediction as an America score out of

five. The system also explains what factors most impacted decision, Grace creators have said that the system is only program to replicate with the admissions committee was doing prior to 2013. Not to make better decisions than humans could. And the system isn't programmed to use razor gender to make its predictions. And in fact, they said, when it's given those features as options, it actually, weights them a zero. Great creators have said that this is evidence that the committee's decisions are gender and race neutral. However, something that should be considered. Is it possible that

the admissions committees prior to 2013 that the data source from brought their own biases to the selection process and those biases Worthing coded into the model by using that pass data. This benhar minority classes if I was his prior to 2013 exist in the admissions committee. And finally, the last place that to look for bias is in the model itself. And so a great example of this is actually the Twitter bot, take created by Microsoft. So, prior to releasing Tay Microsoft, make sure that tripped a was trained on a diverse set of data and initially, when she

started interacting with users on Twitter or tweets were mostly harmless. However, after some users shared racist, language would say she ended up picking up that information and started tweeting racist concept for self. She also had a feature where you could essentially tell her to repeat the exact weed that you tweeted at her and she would repeat anything. You said, and some users, of course, manipulated this to have her tweet racist stuff. So these are some examples of where bias exists now, let's talk about how we can measure buy it. So there are numerous

statistics and metrics that can be used to measure bias. However, I have highlighted, a few from an open source tool called a i360 fairness. He's measured for you. You can also hand calculate all of these metrics at their core. The metrics are based on the confusion Matrix output and knowing, which class is the favorable and unfavorable class. For example, if we were measuring bias and an algorithm to determine College admittance, in factored in race are favorable class might be white and are unfavorable class, might be African-American, since in the past

African Americans may have received unfavorable admission decision. Insecurities that we have to be a little bit more creative with this and think of our favorable saw something that maybe we typically have more data or more experience with like PE files and are unfavorable class might be something that we have less data for or that our model has performed worse on the past, in the past such as. Netflix. All of these metrics have arranged in which if the output Folsom model is determined to be fair. For example, will take the first metric statistical parody difference. It has a fair range

of -0.1 and 0.1 or anything in that range is considered Fair. So, if we get a statistical parody difference of 0.2, then we can conclude that bias exists in our model because it resides outside of the fair range. Each of these metrics hasn't, I do use case where they would perform best. There are two opposing World Views that we can use to cut a group. The applications of these metrics. The first one is we're all equal and the second one is what you see is what you get. So we're all equally robe. You hold that all groups have similar abilities with

respect to the task. Even if we cannot observe this property and the what-you-see-is-what-you-get worldview holds that the offer observations for flexibility with respect to the task. For example, if the application follow so we are all equal worldview than the demographic. Parody metric should be used like to spare impact and statistical parody difference at the application. Follow what you see is what you get worldview than the equality of Oz metric, should be used as averages, such as out for jobs. Difference other group, fairness metrics lie, between the two of yous and addition, there

is also the concept of fairness, vs. Individual fairness group fairness, and its, broadest sense, partitions the population into groups to find buy protected, attributes, and seeks for some statistical measure to be equal across all the groups, individual fairness. On the other hand, in its broadest sense, seats for similar individuals to be treated similarity. Similarly, if the application is concerned with both individual and group fairness, then something like the steel index should be used. So now that we know how to measure a bias. What do we do? Once we know that

bias exists within our data, our model, our predictions. There's actually a few techniques that we can use to mitigate by us. And I have some listed on the slide, but it's not an exhaustive list of all the techniques that can be used to mitigate by us. But these are some of the important ones that I wanted to highlight. The most important thing to remember is a mitigating buy a starts with your data. This is always the first place you should look for bias and it's Ground Zero for trying to mitigate by us. There are two techniques that can be used to mitigate by a senior data. And these are

we waiting and optimize pre-processing re-weighting, generates waits for the training examples in each group differently to ensure fairness before classification, optimize pre-processing. Where's a probabilistic Transformations? That edits, the features and labels in the data, with group fairness and the visual Distortion and data Fidelity, constraints that object moves. In addition, you can look at some more simplistic techniques to mitigate bias in your data, such as under sampling and oversampling, and also sourcing more data and all of these can help you to ensure

that you have more balanced dataset or particular features that you were concerned about being biased. The next place that you should look to mitigate bias of the date approaches, do not work or can't be implemented is in your classifier. One, such example is a visceral device thing and I was Harold You by saying words, classifier to maximize prediction accuracy. And simultaneously reduce an adversary's ability to determine the protected, attribute from the predictions to Affair. Classify her as a prediction cannot carry any group discrimination information that the adversary

can exploit in the future. Finally, the last place you can look to mitigate bias is in the predictions themselves. An example of this is reject option, base classification. And reject option-based classification, gives favorable outcomes to unprivileged groups and unfavorable outcome, super good troops in a confidence band around the decision boundaries with the highest uncertainty. So again, these are just a few techniques that you can use to mitigate by us. It's not an exhaustive list and there are many more out there as well. So now that we know how to measure and mitigate bias.

Let's apply this to a real world example to kind of see how you would use them in action. So this example that I'm going to be discussing is actually from a paper that we've offered here at McAfee and is currently under review and the paper details algorithms that we created to detect, he take images and videos and it also introduces a nudie pic data set with high-quality images. We wanted to make sure to do an analysis on this new data sets of this time and how to reverse our images were to make sure that our model wasn't bias toward certain ages races are genders. So what we said was we

took images created by style can which were a deep fake images because they're not images of real people. And then we took images of real people that we straight from the internet and we had our data set. So we had the Deep fake images and the real images. We done took those images and pass them to an open source tool that you can find on GitHub called deep space. And deep base is a lightweight, facial recognition and facial attribute analysis framework for python construct. Things like aged under a motion in race and uses state-of-the-art models, such as you be Gigi

face in order to detect the faces and the library is, mainly based off Kara's fan sensor flow, open source tool. It gave us a output of what it determined to be the race age and gender of the person in the photo. This was easier than manually going and labeling, all of our data, all of our pictures because we had so many pictures. So this whole kind of automated the process and made it easier for us to get a feel of the diversity of our image sucks. You can see an example of the results that we had for Race. So as you can see on the slide, a lot of our images

from the sub sample of images. When our data set, then towards Caucasian and we didn't have as many images for African. Americans are Indians. And so we really focus on how can we mitigate red race and are algorithm and what can we do to kind of mitigate this bias? So you can see on this side, the two tables detailing, the results of the statistical measures of bias that we calculated that I discussed on the previous side, as you can see for age and gender, all of these fall within the fair range for each of these different metrics. And the only

one that's a real concern is race. So we have two instances where race does not fall within that fair range and instead Falls and outside into by us and that would be in the statistical parody difference and the spirit impact. So we really want to focus on how could we mitigate these and so what we chose to use was a technique called have a signal device thing and the technique of adversarial, do you buy thing is currently one of the most popular techniques used to combat bias. It relies on a visceral training to remove Baez from the late and representations learned by the model. So let Ziggy

in this diagram, you see on the screen. Be so sensitive after Butte that, we want to prevent our algorithm from discriminating on exam, palasia, race and our case if it's race and it's typically insufficient to Simply remove Z from our training data because it often highly correlated with other features and our case. It's difficult to remove D because we're dealing with images and we can't simply Source more images because getting deep, dick images is hard to do as well as scraping real images from the internet. And so, it wasn't simply easy to either balance the races or anything like

that. And so what we really want to do is to prevent our model from learning a representation of the input that relies on Z and any substantial way. And to this end, we trained our model to simultaneously predict the label why. And we're going to join the train out of Sarah from predicting Z. So this technique allows us to mitigate by us at the classifier level since we didn't have the option to balance out the images and sourcing more images was difficult for us. So, you may be wondering, I work in security and I Sharon mentioned, the pies is also important to security,

but you might be wondering, how can I use a SIM hour to text him since I'm not dealing with images of humans, or data related to humans. So I don't walk through the process of how you would go about applying this and the example of malware detection. So first, let's consider which features you might have it if the device in your model or dataset, so some examples of this might include file types of as we've mentioned previously, a lot of our data set consists of PE files and not so much. Net files. So maybe we have a bias towards PC files because we have more data for them or malicious

person. And sometimes it can be easier to get benign samples and malicious samples. So maybe we have a lot, more benign samples, and Melissa samples and we can be biased towards the nightstand and then there's also our families. So we try to have a diverse set of not where families in the data sets that we look at. But sometimes maybe you will have a lot more examples of ransomware. Then we do have a motet, which means we might miss Imhotep samples. So, these are just some instances of War by has made this in your data and it's not exclusive list, but it's something to think about Now

that you've identified the features that might exhibit bias, you need to measure to buy ass in those features using the bias metrics that I discuss such as certificate parody difference. So you want to go through and actually measure all of these features. You identified using the output from your model in the confusion Matrix to determine if any of those metrics lie outside of the fair range, and you do have bias in one of those feature categories, Then once you identify that bias, you can go ahead with mitigating that by us. And remember, the first place to mitigate bias in the most

important place to look is always in your data set. And again, you can do this simply by adding more. So say you have more ransomware samples. So you do emotes that samples. You can try a sourcing more most that samples of adding them. If it's hard to Source, more motet samples. Maybe you can balance out the family so that you downtown around somewhere to be closer to emotes that, so maybe the mall was not as biased as to ransomware. And then of course you can use the other techniques that I discussed on the previous sites such as re-weighting and pre optimized for optimized

pre-processing. And finally, once you medicated that buys, you want to read measure the bias in the features. Using those buys metrics again to ensure that your mitigation stuff was successful. So this is you saying, did the statistical pair TV, metrics, the disparate impact again? Remastering those same naturox on the new model and the output of the confusion Matrix from that model to determine if those. By this Metro does not lie within the fair range. Are you want to remember to look at your false positive rate and your true positive? Wait to True positive rate to ensure that they

are still an acceptable range because it can be a trade-off between bias, mitigating bias and false positive rate in true positive rate. So, some key takeaways from all of the stuff that we've discussed in this presentation R1, understand devices that you bring some, all the building and the biases that exist in your data. A second. You can use an open-source tools, such as a i360 fairness tool to measure attribute, remodel exhibits bias towards. And again, you can also hand calculate all the metrics that we mentioned and then finally you should go and re-evaluate all the

AR models within your organization and mitigate bias, and any data or models that you might find. I was that we'd like to thank you for listening to our presentation and Sharon, and I can take any questions you have.

Cackle comments for the website

Buy this talk

Access to the talk “Ethical AI: Addressing Bias & Algorithmic Fairness”
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free

Ticket

Get access to all videos “RSAC 2021”
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Ticket

Similar talks

Wade Baker
Collegiate Professor of Integrated Security at Virginia Tech
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Davi Ottenheimer
Founder and President at Flyingpenguin LLC
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Ben Nassi
Ph.D. Student at Ben-Gurion University of the Negev
+ 1 speaker
Yisroel Mirsky
Postdoctoral Fellow at Georgia Institute of Technology
+ 1 speaker
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free

Buy this video

Video

Access to the talk “Ethical AI: Addressing Bias & Algorithmic Fairness”
Available
In cart
Free
Free
Free
Free
Free
Free
Free
Free

Conference Cast

With ConferenceCast.tv, you get access to our library of the world's best conference talks.

Conference Cast
816 conferences
32658 speakers
12329 hours of content
Sherin Mathews
Amanda House