About the talk
Contextualized word embeddings are state of the art in NLP, learning from large corpora of everyday language usage in order to more quickly understand how to perform new tasks. Unsurprisingly, along with general understanding of language, these embeddings pick up the biases common in society. Using the embeddings in a number of important tasks (semantic search, sentiment recognition, predictive modelling, etc.) is thus problematic, as the risk of perpetuating bias is high.
Lexalytics has been working with a team of students at UMass Amherst to produce debiased corpora that use a variety of techniques to reduce gender specific associations within the corpus. We will discuss our efforts to reduce gender bias with contextualized word embeddings, effects on accuracy in downstream NLP tasks, challenges in working with names, and future work and applicability to other forms of latent bias.
Paul Barba is the Chief Scientist of Lexalytics, where he is focused on applying force multiplying technologies to solve artificial intelligence-related challenges and drive innovation in AI even further. Paul has years of experience developing, architecting, researching and generally thinking about AI/machine learning, text analytics and natural language processing (NLP) software. He has been working on growing system understanding while reducing human intervention and bringing text analytics to web scale. Paul has expertise in diverse areas of NLP and machine learning, from sentiment analysis and machine summarization to genetic programming and bootstrapping algorithms. Paul is continuing to bring cutting edge research to solve everyday business problems, while working on new “big ideas” to push the whole field forward. Paul earned a degree in Computer Science and Mathematics from UMass Amherst.View the profile
Hello, everyone. We have Paul Barbra Chief scientist standing by to begin his presentation. Please take the stage call at 1. In the morning. Whatever it is, wherever you are. Things are going with my talk. I'll be discussing. Be biased in context. Realize word embeddings, Randall P. You buy think we change? Language model is a place. You can reach out to me. Further. I'll stay in the chat for a few questions, but I'm here. Before I begin a, thank you to the UMass Amherst Center for data science. Thank you to the Northwestern University Business School. How do you say we
work with industry and are currently engaging with the University's. I recommend a great chance, you know, I'm going to protect your day, came to the industry mentorship program will provide provide a problem of interest to us and Industry, research papers open source code. I'm going to guess you're with this technology is pretty,. But just in case a very popular cutting-edge approach. The core idea that if you predict, which word is missing in a sentence, the boy, hit the plank over the fence at the Network's predict up, you know, you say no, no, no
baseball language grammar. So by taking gigantic for Purim, you know, lots of time, training networks. And then if you want, and I'll be problems. If you want to call you hear about in the news to us to be as cheap. As if you're not familiar with black leather wear, a natural language processing center. We've been doing this for ever. Want a secret languages, all different Industries, based business has problems involving text from voice of the customer boys, to be employed, pharmaceutical Finance.
We need to be able to rapidly iterate, rapidly solve problem. Are generating training data and through the pain. In my system, be better with less system. I would like to use them everywhere, but unfortunately I can't. And that because of the slide for slide like this, how to describe what you're seeing. Quickly. The circles are quote, on quote blackening. The buses are white name, left to right. Your thing is check-in to positive or negative word with the
sentiment, quite dramatic. I can't possibly ship a model that uses all the customers of ours. Anything that involves people away is currently aren't appropriate. They showed us by gender or sexual orientation with religious views. These are all terms that they're bringing existence of the pious and this is a real limitation on App Store. Better move the behavior. We have to start problems. Are there some overtly awful,? They're trying to model. Is this true? Is this good? It's just, what would somebody Second. There is what I would call to sacrifice their
biases that exist in the world that maybe we don't have a future. And the best example of this is also, the chairman of the board is less likely to be female, nurses more likely to be female. This is a truce to fill distribution in the world until the battle's learned that. And therefore, they look like that idea. And if I see the word nurse more likely to get the email again, this is something that's the way the world works today, but you wouldn't want to see the resume. Do you have a throat?
Finally, they're just these interesting battle subconscious by and see you and I'm not a sociologist. How much of it is believed were such a pattern of these bracelets are from a project me to do the customer analyzing their employees review. The medical executive excellent, but we were able to quickly, find a generic positive word and females are being complimented for stereotypical female traits. I don't think they were sex organization, but these total language show up and then the model
together. So, what can we do about it by level overview of, where this is an active research paper review to see what's going on. Weather here. I want to talk about. Over here. We have your data, free transport business for giant Corpus that may be problematic model that will not be biased. This can be identifying problematic document each other about how the shows up better language in. The future is always a bystander or analyze model. It was a fair amount of work on but
can we just identify the dimension where this is happening in the rotating or just mathematically removed by? That doesn't seem to actually practice, it is an area and then we were focused on improving but you can do all the same things and try to run over the right. You should always be looking. Are we actually reducing by? You didn't buy anything you want with the Bible, affect your monitoring pulse, the chassis, want to solve the other day and just generally diagnostic approaches to your models are still performing Bell.
Oh, yeah, that was part of the overall. Look at the various that people are trying to do a stickler focusing on. Area of research. Why is it so hard when I saw this? First of all, that was a really good last year, called looks simple. Mathematical approach has a high price is not immediately visible. No problem. There's a whole bunch of different language that we just saw earlier around. Femininity vs. Zombies, 2, the races, and so simple approaches that don't actually addresses for the second
order. O from the word choice can help this person. Dana has a very large. We can't manually. Add more languages. If you did fix it, right? We don't want to lock in a break yours before all fixed up 2020, because 2030 will be a different language. So I don't think it. And then languages subtle subtle. There are certain types of it so you can remove but it's not the battery light up. I was just inherently for the hard problem. Doing nothing for now. If you're still there a relationship, but 97%
free training. Do tell you whether it was a man or female. What we have for the bathroom that I can see what they were doing with your trying to remove from the train in Corpus and retraining Elmo, birth been removed with a standard knowledge. Mr. With a generic products, and rewriting the documents and they're still 40% of the time job. That was gone. I think I'm a biological and check my mail. Google, sing, a lot of the standard and the fact that, you know, Mary and Bob bought a condo, he loved it. You know, that he knew you were first Bob because the genders of the people want you, none
of my car now. Boeing 727. Also still shows that there is value in the approaches a device can be removed without removing general knowledge. The student work this is after a single iteration through the whole training set model office is looking at Job titles in male or female contact and looking at how they differ identify gender. and after all their changes class together and you're really excited, very promising, look like Interesting Lee by 8:09, but
the fact that is probably the best thing for the thing to happen along a line. A couple kind of miscellaneous things about the project. One was a firefighter is more females in fireman, interesting. That as a society we can still fall back mail salesman chairman of the board and talk about a female. That's me. Remember to chairperson. Also, Mike Will made. It was a lot of trouble finding by finding a mission, but it turned out the networks. Are you? Rewritten, any name, they got its Social Security data and it turns out locked,, just bailing overall, but it
finally, for security baby name list and some really amazing resource that goes back. Hundreds of years and has a frequency of name, gender of named. Rachel are a lot harder to buy. Recommendations include vision. And the first one is the most important thing to take away from learning and AI. I think, especially when we're starting I'm trying to work on that. That's how we going to do on how to sample example that you just need to be. So I always get back that idea of, you know, your thoughts will be professional. That if you're
interested in the topic, I can point you to the research papers, but I think they will,, but I think that's another thing for need work. You should always look or bias in a lot of your ways or trying to track it. You go all the way up to the final outcome, measure correlating, Alexander, anything, that you try to pick something that you tried. The Network's against are going to but sometimes these things just pop up. We're looking for like a good model for the buy. He's already there. You can. Also when you're training a final models of 5 is equal to
hide. How do you remove a bystander days off of work backpack receipt? I don't think it's ready to make progress toward moving. Thank you for listening and be happy to answer them and that I can check out the cat box.
Buy this talk
Buy this video
Our other topics
With ConferenceCast.tv, you get access to our library of the world's best conference talks.