Detecting and Mitigating Societal Bias in AI Artwork

Your AI Injection

Is AI an ally or adversary? Get Your AI Injection and learn how to transform your business by responsibly injecting artificial intelligence into your projects. Our host Deep Dhillon, long term AI practitioner and founder of Xyonix.com, interviews successful AI practitioners and domain experts to better understand how AI is affecting the world. AI has been described as a morally agnostic tool that can be used to make the world better, or harm it irrevocably. Join us as we discuss the ethics of AI, including both its astounding promise and sizable societal challenges. We dig in deep and discuss state of the art techniques with a particular focus on how these capabilities are used to transform organizations, making them more efficient, impactful, and successful. Need help injecting AI into your business? Reach out to us @ www.xyonix.com.

All Episodes

Your AI Injection

Detecting and Mitigating Societal Bias in AI

June 14, 2021 • Deep • Season 1 • Episode 3

AI models are now being used to make life altering decisions in areas such as healthcare, hiring, lending, university admissions and criminal justice. The stakes of AI models to make decisions fairly could never be higher and yet there are many high profile examples where AI
has been shown to produce unfair, inequitable and exclusive decisions.

In this episode of our podcast, data scientists at Xyonix talk about how to prevent societal biases from creeping into machine learning models, and why it's so important to be aware of these biases.

Read the article on our website:

https://www.xyonix.com/blog/how-to-detect-and-mitigate-harmful-societal-bias-in-your-organizations-ai

Automated transcription

DEEP: Welcome back to the newest episode of your AI injection podcast. My name is Deep Dhillon. I'm your host and a data scientist at xyonix. This week's topic is an important one. We have our regular data scientist, Carson and Bill back to discuss mitigating societal bias

BILL: in a so

DEEP: Bill. You want to quickly introduce yourself and then carsten, and then we'll dig in.

BILL: Sure I'm Bill Constantine. I'm a data scientist. I guess I've been doing this letter work for about 20 years now or so and yeah, this is really cool topic. So excited to talk about it.

CARSTEN: Awesome. First item, I'm Carsten Tucks, I'm co-founder at xyonix. I'm a computer scientist that's been working in machine learning and AI for the last 20 years. And we talked a little bit about Societal bias in AI

DEEP: today. Cool. So let's kick it off, you know? And I'm sure is, as many of us know, the stakes of AI models to make decisions fairly could never be higher. And yet there are many high-profile examples where AI has been shown to give us unfair inequitable and sometimes exclusive decisions. So Bill, can you start us off by telling us what we mean? When we say that a, I can be. It by societal bias and maybe start out by giving us a story of one of these AI gone, bad

BILL: example? Yeah, there's quite a few out there. So I'm thinking about the ones that are truly impactful, and quite some time ago, there was a societal bias is things like racism and sexism, and gender bias in ageism

DEEP: wearing a lot like boss wery. Well.

BILL: Creeping into your models. In a way that basically harms harms people whose decisions could be affecting their lives. Not just making them feel bad or or divided. And so, you know, there's lots of areas where this has reared its ugly heads, its ugly head. Rather one of the things that comes to mind is there was a there was a program a while. Goal called Compass, it stands for Correctional offender management profiling for alternative sanctions. But basically, it was a night, the idea was, you know, could we develop some sort of AI models to predict whether somebody would be recidivist, which means that they having had crit having had committed some sort of crime? What was the likelihood that they're going to commit a crime in the future, you know, or be a violent offender so

DEEP: forth kind of a Minority Report sort of

BILL: - yeah exactly little Tom Cruise action. Yeah. Who were in the 80s so I think you know, so they when I first read this article was really striking. There was a there's an African-American gal who had stolen a bike and she ended up, you know, getting caught and the police booked her in and sort of in that same county along the same lines. There is a white guy that made caught for for breaking into I think businesses and stealing stuff. The the African-American gal was like kind of like a one-time thing and the the the white guy was like a career criminal. Yeah. And and what they faced was being basically ranked for recidivist activity, the future are where they likely committed crime the future and the African-American gal got ranked really highly as being somebody. That would commit a crime and I think Think it was even a violent crime. Whereas the white where's the white guy was like? Yeah, he actually write quite low so fast forward, a couple years. How did it turn out? Well, it turns out that the the white guy ended up in prison for continuing to be, you know, a thief and and African-American gal essentially never did it again. Had this been, you know, one off sort of thing that would have been maybe just a little sad exception, but it turns out there were quite a few of these there were going on. And they found out that, you know, the, the biased towards and against black people was very high and profoundly wrong. That's the main issue profoundly wrong in terms of its prediction. Now, keep in mind that this isn't just the police force using this, you know, they don't really have knowledge of what's behind the onion. Algorithms is also the judges, so people will come up in front of a judge, they would say, here is, you know, John Smith. With and blah, blah blah. And they would see basically what's the risk of this guy being a violent criminal and if it had you as like yeah there's like you know, an 85% chance, it had had actually affected the judge's decision on how to punish them, sometimes people served longer sentences sometimes maybe in cases where they could have gotten help or something, they didn't get it. So imagine the impact on these folks, these poor folks whose lives, you know. Yeah, this is definitely a situation. Where is gone bad and bias? That was somehow caught up in that data ended up cool. I'm gonna personally affected him. I'm going to

DEEP: Rattle off a few other ones. So there's a, you know, there's some rampant racism and some software used by US hospitals to allocate health care for, you know, 200 million plus patients every year. There was significant bias against women by some Nai gone bad and Amazon's kind of early. Attempt to build a resume filtering tool. There has the very Infamous Twitter bot developed by Microsoft that went on misogynistic racist and Nazi supporting rants. That was, you know, of, you know, a problem there is there was a lot of gender bias and kind of some standard natural language processing tools. It's been detected everyone's heard about face detection algorithms, and, you know, a significantly higher kind of misidentification Vacation rates, depending on parts of the data that had kind of less less training, representation, for particular, races, and gender. So, I guess the big question here is like, not everyone's, you know, maybe machine learning AI, folks. You can kind of, like kind of guess like what happened in all these cases, but I don't know. Carson, you want to maybe like talk us through maybe like the Twitter bot or you know, or any of these cases are Even just kind of in general but like what's going on? That causes us to like look at the output of these things and sort of see this sort of very you know obviously kind of societal biased sort of scenario making its way into the box.

CARSTEN: Well, on the one hand I think it's just people making bad decisions on what data to pick. For example, if you look at the hospital racial bias, it was, it was about people, who self-identified as black that were generally assigned, a lower risk scores than equally sick, white people. It was about handing out memberships to to special programs for people that need it, right? And so because the equal to Sigma and people were esteemed a greater risk, they got the memberships and cannot discriminate against the black folks. And the problem was that they picked the, the total health care. Cost accrued in one year as a basis to figure out how sick the patients were. So other words, if you spend a lot on Health Care during that year, you must have been sicker than the person that did. This is actually not her. Not a racial discrimination really. It's it's a discrimination for poor people, and of course, you know this, that that brings race into play, but it was a bad choice. It was just a bad choice of a metric to make that determination and that's that's what it was. Reflected in the model later on and they said sorry Access to Health

BILL: Care was not part of that, right? And

CARSTEN: no just doesn't money spent in Total Healthcare cost of crude in a year

BILL: bad bad variable.

CARSTEN: Yeah, exactly. Somebody didn't think about why would you include that in this Choice, it has nothing. It really has nothing to. It was a bad metric or better representation for your state of health. So but it was one of the data points you have and sometimes people just look at They don't really think about the data. They have to just take all the data they have and throw it into a model and a great. The model does something meaningful and predict something, but she's learning models have a very strong garbage in garbage out principal, and you got to be really careful there. So,

DEEP: yeah, I mean, a lot of, I mean, all of us have heard the stories of, you know, most of us have even been this person where, you know, you're a grad student, you're completely, you know, focused on your particular problem and you realize like I don't know if it's let's say it's you know. People detection or face detection or something. You try to get your Corpus or your collection of data to train and evaluate on and maybe it doesn't exist and then you have to create it. Most data scientists tend to be pretty like this isn't the most important thing to them. They just want their data and they want to move on. So they don't always think, you know, very kind of heavily and deeply about about the data sets. And so then just the sort of

BILL: you know

DEEP: happenstance by Us of how they put together their data set can of course, Ripple in the algorithms. I mean, one thing that I've noticed a lot is a lot of times machine learning models. Get developed on kind of one data set, but then they get to replied maybe in different areas or like in areas that were sort of it's nobody ever really took the time and effort to ensure kind of fair representation in the training data, used to train and evaluate the models.

BILL: Build you. Is that what you're seeing? You know, when you,

DEEP: when you study something systems?

BILL: I think so. And that actually speaks to the problem with the Amazon hiring system. They wanted to find. They basically were like, you know, everybody wanted to work for Amazon. So they had, you know, a hundred resumes. They want to be able to take your thousand resumes. They want to be able to rank the top five of those resumes automatically and then hire those people. But they turned out, it was very, very biased against women. In fact, if there was an entry of worse You had something like, well, I belong to a women's rowing team or something like that there. They were actually downgraded. Their, you have a situation where the population is very, very disproportionate. Most of the people that were applied to Amazon as a engineers and so forth, were mainly male and they had didn't have much representation for females. And so, would they try to offset that by taking out gender completely? But that didn't seem to fix it. And so, finally, they think they just gave up on him after he got some Negative negative coverage and Amazon, always claimed, actually, that human resources only use it as a guide and not not a soul determiner of whether somebody get hired or not. But I don't think in that case, that population was necessarily the Leicester population of women who applied was necessarily represented fairly my case.

DEEP: So, what's the, I mean, what's the right thing to do, or the right way to think about it? I mean, I mean, we know, we all know that. That, well, maybe not all of us. Actually, some of our listeners might not realize is, but like, you know, if you've got a machine learning algorithm and you've got trying to class of categorize, something into let's say one of, you know, two buckets and you've got, you know, like 90% of your training data from bucket one and ten percent from bucket to you know you kind of create a scenario where the model might be you know inherently biased towards bucket. One that might be perfectly fine and okay and it might not be so I don't know. Carson, do you have any thoughts on that? Because there's there's bias is sort of normal and machine learning. I'm not talking about societal bias necessarily but just bias period. So, how do you think about organizing like, you know, like balancing your classes and categories and in a scenario like like like those bring it out.

CARSTEN: Well, the problem is that you're not actually balancing your target classes here, right? It's not that you have men and women at you're trying to classify By the resume its into male and female. It's kind of like you have men and women represented in your dataset. They have differing attributes. But what you're trying to predict is whether or not, they will be a good hire and now you're minority class is actually not in, it's not one of their targeting classes, right? But the distribution in your training set is such that a certain class is underrepresented there and if you get even unluckier that class it is underrepresented. There has bad attributes. So let's say and Data set, it's Amazon from the last 10 years, highly male-dominated, tech industry. Maybe the 20 women they had were actually pretty bad. I'm not, you know, I'm not saying they were I'm just making that up. So what does the model learn? The model simply learns woman makes a bad employee. Great, there's your bias right? And there's literally nothing you can do about it. If you have a bad day, does it start with or if you're trying to do something, where where you have a hits, the reality right in our training set. Minority classes are Tease and it's really hard to fix that. Sometimes you can consciously Dr. The attributes that everything that kind of like identifies the minority class is excluded from from the data set, but that's not always possible as we've seen in the The Embers of the dataset, right? They remove gender but still there's attributes in your in your resume that differ between men and women but you can still say, hey that was a woman

DEEP: could for example, balanced by the Target in this case. So you could say okay well look we our goal is that we want to have, you know, fifty percent. You know, whatever. 50/50 on the hiring front. So the data we're going to include is going to include 50%, good, female hires 50%, good male, hires 50%, bad, female hires 50% bad

CARSTEN: you could you could try that if you had the data but if you had that data, then you wouldn't have the problem in the first place because it's most of the buyers, if not all of the boys is based on unbalanced training input, datasets you see that in the

DEEP: some data, right? Or you could over sample or understanding not if you don't

CARSTEN: have it. And and if it's really highly unbalanced, if you have a hundred female higher, as compared to 10,000 all your over sampling at the world wouldn't help you, you could see that. It's just not enough, you just, you can't fix extreme imbalances. So it's a problem.

DEEP: Bill, what do you think of your thoughts? Yeah, it's it's

BILL: interesting. It's a I think I liked what Carson said. In the fact that, you know, even if say you have a minority class, that also zil represented, then you're so host. Right, it's not just that you have

DEEP: just just for this for the, for the listeners sake. When we say minority class, we're not talking about racial or ethnic things, it's just, yes, she learning lingo. We're like, let's say, you know, we're talking about percentage, male, percentage female, if one is 51% ones for Ian. The minority class is the one with 49%. Yeah, that's right. We

BILL: really take a head count, really talk about counts of classes and your data. Yeah. But you know so IBM has an initiative out where they're trying to offset societal bias and I'm hoping and I'm jumping ahead in the conversation but

DEEP: I'll take us there. Take this their, let's date. What's the topic of the mitigation strategies that are out there?

BILL: So they have they have a suite of tools, they offer, it's called the IBM's AI fairness 360. It's an open-source toolkit it's available in common languages that we use for, for dealing with machine learning. And so it's their first sort of app, I think their approach and they taken a quite seriously which kudos to them for doing. So they have sort of three attacks at trying to adjust for the In fairness in And in AI bias, so wide of involved in the pre-processing stage and they, that's where they take training data and labels that may be transformed or modified or waited to somehow try to promote fairness during subsequent training. So deep, you you mentioned, maybe we could try to form a more balanced training dataset if you have that, right? If you have that, like Carson said, if you have that available, then maybe you could try to form your models, making more balanced It is that they apparently have tools for in process stage that we're actually training the models they have tools for adapting for fairness. Like for example adding a discriminative discrimination aware regularization term in the loss function. So what that means to the regular folks is when we train these models, we have this thing called the loss function, which accumulates error, and then iterate over and over again, tries to reduce that error. Well, if you know that you're discriminating, they get some particular class. S of folks, you might make that are super pronounced. Like in the case of our original idea about black people, being unfairly projected to create crimes in the future or to perform crime in the future. You will have a test holdout set that basically says, you know, that's actually not true. So every time you mess up there, make it a huge

CARSTEN: are I have a big of a bit of a problem with that, but we can come back to that later because if the model is actually right, You know, we've all against be all against bias and against racial biases cetera cetera. But what if that discrimination, or that is actually the correct choice because it's true, but we can talk about that in a bit. That's a little bit of a controversial topic, it is a

BILL: controversial topic and I totally agree from a machine learning perspective. You don't want to, you don't want to elevate are unless it's incorrect, right? You I give this is a really bad are. It's actually not a reflection of what we see in the

DEEP: Little. Bill going back to the GM tool for a second because I didn't quite understand that. Can you explain to me like, like, what exactly is the explain that to me again? I didn't quite fall. Like, what exactly is is the

BILL: tool trying to do here and enable So, the toolkit is a bunch of things. Their first of all, they develop a bunch of fairness metrics to say the, you can apply these metrics to be able to be able to detect whether your algorithm or your data. You're using may have a biases in itself. So they have sort of basic things like like, for example, when you have a group that is so called privilege or not privileged may be looking to get a loan from A bank or so forth is your Is the ratio of positive outcomes in a certain class, the same between say white folks that black. Folks if

DEEP: there's you have to define those criteria and somehow you do great, hang in there. You have to go out of your way and say, like, hey, we want 50. This is akin to what I was sort of suggesting earlier with the like, yeah. The target variables basically saying, okay, we want 50% of the, The Hires, to be women and 50% be men or you know, whatever the the gender distribution. In is across non-binary, Etc. So okay. So that that makes sense to

BILL: me. Wait, there's one more thing, you know? It's I did. They also basically asked that there's another, it's a post processing stage where the predictions are themselves, altered in order to make them more fair. So, basically throwing away with the AI model is predicting or suggesting and doing something that's more fair and equitable books that are being discriminated against So that's a general sort of General approach, but they there's tools out there that that data Sciences can use to try to mitigate this.

DEEP: I mean, one of the things that comes to mind here is like if you kind of jump up a level like what this tool in that, what we've been talking about, there's kind of a common theme which is sort of like the theme of visibility like

BILL: is the

DEEP: like as soon as the is the potential societal bias visible to those working? Problem, if they're not even looking for it or paying attention to it and they're most likely not going to find it or notice it. And so, it feels like a key part of, this is just simply asking the question. Like, you know, what could happen here? You know. It's not like the example that I'm sure everybody's done this at some point or other. If you just go to Google image search, you know, and certainly in the late 90s and mid 90s to now, it's improved. But if you just search for something like, you know, beautiful person and Awful woman, beautiful man. You know you're going to

BILL: get

DEEP: like racial bias like, you know, coming back for sure. And so then the question is like, if you're making that thing, you know, you should probably sit down and like, think about that and like, what is it actually? Like what is the right thing to do? There is a separate question from. Like, what is the thing that could actually happen here similar? You know, so my question is like, what can a And do that's working on a problem where somebody has an intuition that there might be some bias here so it could be you know, it could be as sort of benign as like you know like a project that I worked on a few years ago was we were just trying to take video footage from like this kind of social streaming app and just automatically pick out the quote, best picture like the moment in the video. That was kind of really Incheol. And so, as a part of that, you know, we ended up running a lot of Mechanical, Turk experiments, to sort of, you know, get data and figure out

BILL: like, you know,

DEEP: given a like what was, what was kind of the most important and part of the ingredients came down to, like, what people wanted to look at and and so, you know, the original example was, like kids hitting a, you know, a ball on a tee and you want to capture the moment, right before the bat hits, or right after the bat hits.

BILL: Balls

DEEP: moving, but in other cases, it was like, you know, you've got, you know, a group of you've got maybe a video scrolling through like, you know, a few different people and it's going to pick out what's the best shot. Is it the one where you've got like the most you know the most distinct kind of ink kind of facial expressions? Is it the one that's got the really good-looking you know, person or there are the really goofy pose this all comes down to kind of like human nature, human choices like in that kind of a Context, I don't know. Like what, what do you, what would you suggest the team actually do and pay attention to

BILL: give up? Yeah, but

CARSTEN: that's just plain give up because the scenario that you're describing is one by bias is ultimately inherently defined in your problem. For example, in your video seen, let's say, well, I don't know what is the best. Thought shot, right there is no absolute truth here. This is not the laws of physics so we cannot really look up. What is the best shot? It's a matter of opinion. And so, the best thing you could do is you can take a million people and ask them this video. What is the best shot and now 70%? Scent of them will tell you this is the best image. Now if you say okay these 70% I right I'm going to cater to them now. We have basically discriminated against the remaining 30% at hated that shot and so now you have created your own two groups and you're discriminating against the minority group that is not following the majority opinion. So in that scenario, give up, you can't do it. You can only go by majority and that's hopefully a good thing. Hopefully you have adequately measured, you know, the population. Taste.

DEEP: Well I mean that seems really simplistic to me like you could come you could create cohorts of your population and you know within the cohort that has the majority opinion that could be one one approach but there might be like a handful of others. Maybe there's a cohort that's really into interested in you know the action shot the other one. That's like the people shot. I mean it feels like you could pursue a population Centric, optimization strategy.

CARSTEN: And then, how do you rig? How do you regulate that you do 60 percent of the time you go with the 60 percent population, to satisfy them 20% with the action shot, five percent with the black image, 1% with the green floor. I mean, where do you get your by and said you want to sample it according population preference because that's actually what the model would be doing automatically really kind of you know, if it distributed according to that Source population, I feel like I feel like if you want to address the whole bias problem, you need to identify what possible bias exists beforehand. And then train your models, accordingly, for example, and the gender bias, maybe you have two different. If your goal is really to have 50 percent male and 50% female hiring rate then split the candidates into by gender in the first place. Have two models one with picks the best. Male candidate one bottle, it picks the best, female candidate and just go down a ranked list on each side until you. Have 50% hiring great. Split it up beforehand and don't let the gender bias creep into your model. Because all you're going to do is ruin what the model is trying to do best with this predict, the actual outcome because the, the adjustment for for societal bias is something that is sometimes contradictory with your with your target variable and in my opinion, you shouldn't do it. If you want to be fair, then be fair but rank each group separate. Done drinking, don't mix them together.

BILL: That's an interesting approach bill. Yeah, I think it's a good approach. I think it's probably worth saying and Carson you it eluded to this earlier in your thoughts, we're talking about the Amazon Amazon are make gone awry. I think it's also, I mean, you can do some due diligence and your data collection, deep. When you, especially when you have something that's objective, like say you're building a model to say who's pretty, who's handsome, and who's not, you know, maybe you do the people that are actually labeling that That you do your due diligence to make sure that they're from lots of different locations. Ethnicity is language, ages genders. So that the population that you serve, which is all of those folks are, you know, aren't being singled out as being ugly or beautiful, you know. Yeah. And I, you know, you talk about, you know, there's the human element of this that that we really have to take into consideration. If you looking here, what's the best video shot by blah blah? Well I mean you know, Know who really gives, you know, I mean it, if you're talking about somebody who's good looking, you know, like a teenage teenager, you know, you could very much affect them and their self-esteem and yeah, you know. So it's very, very important to at least understand like do the best you can understand. Like, where is this data coming from isn't labeled? Unfairly actually, so maybe that's my source. I can help mitigate that by making it more fair. And then, you know what? If it's coming up with something that's, that is regardless of if it's Say accurate or not. If it's discriminant and then get the population. Do something about it, right? I mean you don't have to necessarily rely solely on the model predictions.

DEEP: Yeah. I mean it feels to me like a lot of what I see is that, you know, you've got a company that is making a business decision to address a particular market and then they attract a particular population of user that has particular kind of attributes on. Let's Say day, 1 to 90 or something, they release a model and then all of a sudden, it kind of shifts. So, like to take the beauty example, I don't know, maybe it's like a tender, like app or something, but if you're like from a business context, let's say you're, you know, you're you're trying to assess something as like obviously, you know, problematic as this but you could, you could like you could start off like Nation by Nation. Like you could start off in Botswana at first and and and And now you've got like literally like an incredibly tiny minority of, you know, non-native but Swan and folks in the imagery. So you don't have, you know, white folks represented her Asian folks represent or whatever. And, and you could, you know, you could satisfy that market, and then you can kind of walk out to a completely different Nation, you know, and then you have like Multicultural societies, like the US where you might sort of say, I'm going to take the The natural, let's say ethnic population distribution. And then for each of those things try to ensure representation in the ground truth in the training data. I mean, it seems like it seems like a we've honed in on the fact that this is hard at and be that you have to pay attention to even know that something's a potential problem. So that alone feels like a win to me. Because you're forced to ask the hard questions. You're forced to like rearrange and maybe do something like Carson suggesting where you, you build a, you know, a female only like quality like qualification assessor and then a male only qualification sensor. So it feels like just getting the conversations into teams early on and and having the kind of awareness is kind of part of the solution. You mentioned provenance like just kind of knowing where data. Coming from that seems important because a lot of times someone just hits import you know or maybe they wrote a script they import and then boom, there's 15 million, whatever's images, you know, documents Etc, like suddenly in their Corpus or collection. If they're not thinking about where does 15 million came from, maybe they came from like some constraint like Carson was pointing out, you know, earlier in our conversation maybe they're coming from a, you know, a corpus that's got an inherent bias. It seems like maintaining that. Audience knowing what came in and knowing what the biases of the source are simply tracking that stuff feels valuable to me.

BILL: I agree. And I just want to. I think the real danger here is that these tools are not just being used by data, scientists are being used by people who have a blind trust in them that they're telling them the truth. I saw a news piece a couple weeks ago were a black man. Walking down the street was identified and and they you know somehow had some sort of visual facial recognition going back to the face. Very interesting thing that, you know, we have this, we have this word from our computers back in the base. It says you you are. This person is a why I'm not that person.

DEEP: Check my ID,

BILL: where's your license? Why don't have it? I'm just walking to the store for yeah. Okay well I'm sorry you said we're going to detainees like you got the wrong guy. You got the wrong guy. Bye. Well the answer was well this is telling me that's not the case, you know? And they have no

DEEP: yeah

BILL: Lou of how accurate that model is, they have no clue of the provenance of that data you know accuracy that day there whether it's wise or not and so forth. That's the real danger. In fact the Seattle is the part of King County band, facial recognition, I think San San Francisco had done that a couple of years ago for this area for this very problem. So, yeah, it's a very tough problem and I have one more comment and it has to do with. I

DEEP: just, I want, I want, I want to like, before you move on, I want to, I want to address that point. I think, you know, like they did dig into that case I believe and they found that, you know, African-American male imagery was significantly. Are represented in the, you know, in the data set for generating, you know, facial identification relative to its population to the population. And so the, you know, I feel like a key part of this is when we're talking about stuff that has that significant of societal impact somewhere in this conversation should be transparency around the collections and the documents and data that's used to generate these models. So that people can likes sort of point out some of the These obvious biases that are going to result

CARSTEN: in and I think that's that's one of the big problems. You know, after all we call it AI but but really most of modern AI is pattern recognition, right? It's statistical learning pattern recognition. There's very little reasoning or anything like that in it and so it really only learns what it sees in our society is inherently biased. Right? And so if I have a model and I show it basically a population and trying to figure out criminals and I have a population of let's say, 50%, Read 50 percent green but in the red party, only 20% of Commerce and the green party 50% of criminals. Well, then automatically green is associated with being a criminal just from a statistical perspective, right? Much higher likelihood. Yeah, and as long as we train our models on real but bias data that is not going to go away. And so like I said earlier, I think the best thing we can do is be aware of. Buyers beware that bias comes from data identified in the data try to balance it if we can and in a post-mortem stab, look at the output of the model and see if it shows any signs of bias and if it does well, go back to the input and see if you can fix it. What I do not believe is kind of like this in the middle approach, try to like train models that are not biased. Because by doing that, you introduce something at the model that is inherently not true. And possibly not correct. And so your falsifying output and I would not do not recommend that, it's very tricky.

DEEP: Alright, hey yeah.

BILL: Alright are we wrapping up?

DEEP: I was going to wrap up but if you've got finding it final words of wisdom to impart on us and

BILL: measures. Well, I guess it's final thought. I think the fact that we're actually just having this conversation is progress. I mean, think about, you know, think we've been around this business for a long time. I don't think this was something that we would have had this conversation. We would have had say 10, 15 years

DEEP: ago. Yeah, I mean, it was all about efficacy. Only like can we achieve you know it's certain efficacy rate and everyone was just so excited when anything made it into production but now the stuffs everywhere and so I feel like as a i practitioners we have to really like kind of push the conversation and make sure that you know we're being ethical you know and doing the right thing

BILL: exactly.

DEEP: Okay, with that. Right? Before we go looks like, I want to kind of just briefly, shout out a couple of organizations that have done some meaning for work, and making a, I more Equitable, the Stanford computational policy lab, which is done a number of projects targeted at reducing racial bias and machine learning and Sherpa and organization that investigates, how a I can impact human rights and ethics, Bill, and Carson, as always, thanks for a fantastic discussion. For any of you who are interested in finding out some more. More about this topic of bias. We do have an article of on xyonix.com articles that's x-y-o-n-i-x.com. Thanks and we will see you next time.

BILL: Bye.

Deep Dhillon

Host