This week's episode discusses how we can understand human conversations through human-machine synergistic interactions. We are joined by Carsten Tusk and Bill Constantine to dive deep into this topic. Training machines to understand languages can be a challenge for most data scientists. This episode starts with the basics of how to start training these machines to understand languages. We take a look at different scenarios where this can be used in everyday business practices.
Check out some of our articles that dig into this and related topics in more depth:
Deep: Hi there. I'm. Deep Dhillon. Welcome to your AI injection. The podcast where we discuss state of the art techniques and artificial intelligence with a focus on how these capabilities are used to transform organizations, making them more efficient, impactful, and successful. Welcome back to your AI injection. I'm Deep Dhillon, the host and a data scientist here at Xyonix. So this week we're gonna talk about understanding conversations through human machine synergistic interactions. I'm joined by our regular is here, Bill Constantine and Carsten Tucks, both data scientists here. How are you guys doing today?
Bill: Great. It's a beautiful, beautiful day outside. Actually. How you doing Carsten?
Carsten: I'm doing well. I'm doing well. It's a little cloudy up here. I'm uh, I'm currently in hay into Alaska, enjoying the scenery, um, and thankfully escaping the heat waves.
Deep: So I'm gonna go ahead and start us off here with a few examples. So you've got a communication system that might help some kind of famous politician or a star or somebody communicate with millions or thousands of constituents or fans that they get a large inbox and they have to try to make sense of it. So another case you might have a product or a service that gets free form reviews. Um, you might have an email system, so conducts conversations with people maybe after they fill out a form, stop by a booth or have some kind of other interest expression in a business. Another one let's say you've got a, an insurance company recording, sort of millions of audio, transcripts of conversations regarding a claim. One last one here is maybe we've got a trend prediction system with that. I'm gonna just kick it off here. What makes training a machine to understand communication sort of different from other machine learning problems?
Bill: One of the things that I've experienced is the level of noise in this type of data. The way that people communicate via text is completely different than the way they communicate when they're writing an essay for their teacher at school, which is maybe completely different than how they might communicate with Facebook and so forth. So the flavors in which people communicate, presents a challenge in terms of trying to understand and what those folks are saying when you have the, all these different way of presenting that information. So I think that that can be quite a challenge.
Deep: Sometimes emojis, for example, you can have a whole conversation just through emojis and it, it clearly like different than, you know, professional publication or something.
Bill: Yeah. One other big form is the difference between oral and written type. Like our conversation is being recorded right now. And we have such great brains in our heads that we can fill in the context of things written for us, another area where we have to be a lot stricter, typically.
Deep: Carsten, maybe do you wanna give us a description on how we even take text and represent it that we can even begin to like formulate, you know, machine learning problems with it.
Carsten: We have to take our text and we have to transform into some sort of numbers. And typically we start with like a term document matrix, let's take tweets. So you have a hundred tweets kinda wanna turn that in into a thing that your computer can understand. So the first thing you you wanna do is you wanna actually see what kinda of words are in my, in my C and then you encode each document, basically, according to this vocabulary that you just built. And at that point, then you have like something that you can start feeding to machine learnings. It all boils down to having certain methods and to tricks how to take your language and transform it into basically vectors. Very often when we talk about understanding and AI much credit, because at the end of the day, it's just pattern recognition. Even the sophisticated models, like, like GT three, cetera, that we see, they can't reason they don't reason at all. They just have seen, uh, bazillion patterns. And so after you consume all the, uh, correspondence on the web, of course, yeah. You see repeating patterns and you see things that might make sense, but they only make sense because they have been seen so often.
Deep: All that's true, but I feel like part of what's happening with AI advancing is we're redefining what intelligence is to 20 years ago. If you asked me, Hey, is something really intelligent? If it can translate all languages into all other languages pretty well, I would say, oh yeah, because I only know a couple of people that can do that with a large number of languages. And they're definitely intelligent. We didn't have an idea that pattern recognition could just create the illusion of intelligence at such a, a massive scale. So we've since redefined it and said, well, Hey, maybe language translation just doesn't really require intelligence bill. Maybe like, what does it mean to extract information at the document level? And then maybe morphous towards what it means to extract information at the Corpus level are kind of at a higher level.
Bill: The sort of the starting points that I've done in ask is to take a document and you can do, like you said, you can take individual sentences of those documents and split them up into words. And you can even split them up into entities that have different meanings. Like you can do this thing called let's find all the noun phrases, but then a sentence. So a noun phrase is you can think of it as like a, a noun in a sentence, but on steroids a bit it's officially like a small group of words, which contain that noun, but they have like modifying words around it. So if you have a, a sentence like the, the new pink bike is mine, then the no phrase there would be like new pink bike. You can gather the collection of these noun phrases. And you can see, for example, you can do simple things like how many times that no phrase appears throughout that document. So if, if you know, a lot of that conversation, you know, revolves around the fact that they're talking about this new pink bike, you sort of already start to get, uh, a sense of the importance of that particular topic to that document that, you know, we're talking about something related to bicycles, this one particular bike that's important and so forth. So you can kind of build almost like these histograms of these counts of these no phrases and bubble those up to the top to sort of give some sort of basic idea of what seems to be predominant conversation essentially. Yeah. Carsten, do you wanna go from there? Yeah, yeah.
Carsten: Kind of. So, and you have your pink bike and, and you kind know an abstract level that a pink bike is a vehicle, right. Some sort of a vehicle, and it could even be a sub sort of a vehicle. It's a manual vehicle. It's not an, not a power vehicle. And so you can, you can take your NORAs and you can assign them to entities and kind of like level it semantically onto an abstract onto a more abstract level than these concrete manifestations of those entities. And that can, that can help you kind of like make sense of what the documents are about in a greater sense, or even group common documents together. If you have groups that talk about vehicles, they're not necessarily all pink bikes, but there could be other vehicles, but they still have that in common. And so it's all about taking the highly detailed information and kind of like leveling it up a little bit on a semantic level to get more abstract categories that you can then better group together and get a general sense of what your Corpus is about one level up.
Deep: Yeah. So I think like there, there's sort of different kinds of things we can do with the document level. Like, you know, there's this kind of energy extraction or this people, places, things extraction, uh, a lot of times that involves having to resolve a Nara. So that's things like he, she, it, the lawyer, the company it's cetera that might be explained kind of in the document prior, um, in machine learning language. We also, sometimes we might be just pulling out sentence level insights where a given sentence kind of categorizes. I know that, you know, we've done that in a, in a lot of different cases, you know, if we're trying to get a machine, like in one of our, you know, we had a client that was, that builds a system, um, for folks like Beyonce, Metallica, um, to talk to, you know, there are millions of fans and there's, you know, a whole hierarchy of things that people are talking about, but being able to, you know, those tend to be church statements coming in from fans that are in this case, just texting their, you know, there's stars directly. You know, I know, um, you know, Carsten you and I were involved for years building kind of a deep relationship extraction engine where we're looking at grammatical things. If, you know, if we think back to seventh grade grammar class, you know, like, you know, somebody doing something to another thing or person, so, you know, Dan hits the tennis ball, um, you know, on Tuesday during her match, you know, we do this whole kind of grammatical decomposition. So there's some kind of relationship extraction there there's event stuff. There's a whole bunch of different things. And then you can kind of bubble it up and start asking stuff at a higher level. I know, you know, we, we were analyzing a lot of reviews coming into physicians, performing surgeries. And it's one thing to tell, you know, a physician, Hey, the sentence level, you know, we have these 38 mentions of somebody saying something positive about the way you perform this surgery and, you know, uh, you know, a hundred something describing something negative of those positive things. You know, they, they gave these really specific, granular examples of, you know, you have good BI manual dexterity, for example, that was something of concern. Those are all like, really powerful to tell a physician, but it's even more powerful to kind of contextualize it across a Corpus or in this case, a other, you know, other physicians. So telling them, Hey, it turns out that 53% of your feedback coming in was, you know, positive, which sounds good. But, you know, it turns out that that was like 30% below the mean for, you know, other physicians, which, you know, causes them to kind of jump up and, and pay attention. So these sort of aggregated insights, um, tend to be pretty important as well. I don't know. Can you guys think of some work you've done maybe in the past or that you can think about where, where it's not just important to pull out an insight at a low level, like in a document, but where you learn something important at the higher level?
Carsten: That's a tough one. The one thing I was gonna mention in, in the context of coding higher level concepts is search engines. Um, so you have a whole Corpus about people doing things that say, well, one thing that we work with was, uh, I think at the public domain back then the Enron email Corpus and that whole scandal happened. Yeah. You could basically encode these things and start searching for, you know, certain relationships like who drinks, beer, who travels to Seattle. And you can imagine that if you're searching for certain patterns, like people that travel to Seattle and then rented a car, and then I don't know, did something else. It's a great way to like, build this like network of actions and analyze it afterwards. So, so that's something you can do on a, on a Corpus level that it's not necessarily in each individual document.
Deep: So we're working on a project right now, try to understand what are the various anti-vaxer positions. We're actually in the process of building out a Corpus right now, Carson, maybe just describe the Corpus that, that we're building here and what are the kinds of things that we can do with it and why this might actually help. So we basically went to social media, we're looking at YouTube edit well as Twitter. We're basically just extracting social media post that has something to do with vaccinations. And we are just building a text based on that. And the next steps will be to analyze that further and figure out if this thing pro or of vaccin. And then maybe we can even identify like certain subgroups in there and analyze their motivations behind their actions. We try to like divide this Corpus into, into two different groups or multiple groups and try to identify individual subgroups in there based on their language. Now I don't really know what that's gonna look like. Exactly Han that far yet, but, uh, that's the goal.
Deep: Yeah. I know we've got this. It's kinda interesting. So it turns out that, you know, people have different reasons for taking an anti-vax position, right. You know, the idea is, you know, if you're a public health agency and you're trying to convince somebody, you kind of wanna speak to the lexicon of their concerns.
Bill: Maybe you could speak a little bit about the process of what you do with tech data like that, to build something that was used to detect sentiment. So sentiment is, you know, that person said something positive. That person said something negative. That person was mostly neutral. How would you teach computer to, for things that we sort of naturally figure out on our own.
Carsten: Lots of samples, you basically start with examples of what a positive expressions you start with saying it was put negative expressions and whatever type of expression you want to identify. And then you just basically annotate a whole bunch of them. And then you simply train to classify em. So that that's the boring and simple version that you can go.
Deep: There's sort of limited utility around sentiment sometimes. And, and a lot of times people wanna know, well, what are they being positive about or negative about? So if you say, you know, I hate, you know, Joe Biden, or I hate Donald Trump, or I love Joe Biden, or I love Donald Trump, you know, in those cases there's clear, positive and negative sentiments about a thing in this case that you know, that the person the, after the politician. So in those cases like, you know, I know we did this years ago in an earlier startup up of ours where we, you know, we did a deep grammatical analysis of, of the text. And based on the deep grammatical analysis, if you think back to seventh grade grammar, you know, I hate, uh, Joe Biden. They, you know, the subject is I, um, the, you know, the, the, the verb or is, you know, hate or the predicate and the, and the targeter object of the hatred is, you know, is, is Joe Biden in this case. And then we would do things like associate a whole tax phenomenal hierarchy around the entity. So we would know, for example, if Joe Biden's a politician, that specifically is the president specifically of the United States, then you can kind of like with all that stuff, you can now start to contextualize your sentiment and you can start to look at like, well, you know, what is the sentiment score positive and negative, um, for presidents, you know, over time or for politicians over time, and you can start to kind of bubble. So this kind of ability to leverage the hierarchy in conjunction with some grammar in conjunction with some sentiment and some machine learning kind of starts to get you much more exacting and potentially helpful things than, than just kind of a generic sentiment.
Bill: And I wanna put a shout out into the systems that we use here at Xyonix, you know, uh, we have this hierarchical annotation that we, we use when we develop models around text. You're not just labeling something as simple, as good or bad we have. We have this whole hierarchy that allows you then to delve much deeper into the dirty details, uh, not just good or bad, but maybe certain reasons why good or bad and so forth.
Deep: You're listening to your AI injection, brought to you by zion.com. That's x-y-o-n-i-x.com. Check out our website for more content, or if you need help injecting AI into your organization.
Why don't we pivot to this question of like, of labeling getting training data? So the question for you guys is like, well, what are some of the challenges in building effective training data for these systems?
Bill: What's kind of tough is, is that if, if you're looking to build out a model for these things, and sometimes whatever you're trying to do to, to gather data, it may be publicly available, but you might be, uh, limited in, in how in your access to it. It might be private data that you don't have access to it at all, or it might be just rare data. So in those case, is there are these great alternatives. There's a lot of detail behind them, but there's actually this great AI based techniques for simulating data, but you might be lucky to, you.
Carsten: Have to be very careful with the simulations that it gonna throw that in. You have to be very careful. You don't introduce it bias into your, to your system because sometimes the simulation is too limited and it has a certain point of view. And then if you use that to train your data, to, to simulate your data, all your model learns is basically a copy of your limited simulation. So to use these techniques, be aware that you have to use something that's really broad and encompasses the whole scope of your domain, basically.
Deep: So going back to the question of the challenges around getting sufficient examples and getting good examples for training these models. So bill, you mentioned this idea of leaning on these trained systems that can actually generate language, but another direct and straightforward means is simply to have humans label this stuff. When humans label stuff, you gotta figure out are all labels created equally. So if I'm just randomly sampling from millions of doc for a particular category, it can be like sort of a waste of time. At some point, what happens is we have these models where they have areas that they fail in that are way more in need of new examples than the areas that they're doing just fine in. So like, let's say that you've got some strong cues, like hate and love and, and the models doing just fine in those intense cues, but in much more subtle cases, you find out the thing is inevitably doing worse on. So just one of you guys wanna speak a little bit to the idea of active learning in this context.
Carsten: Yeah, sure. I mean, it's just, like you said, you don't just need samples. You need the bite samples, right. You need to, to pay attention to the classes that are underrepresented. Yeah. Try to create a very balanced training set. And then I also concentrate on those sections where your model gets it wrong, like you said. And so that is the problem finding, finding false negatives. And it really depends on your problem scenario, how to do that. You gotta find a way to find the things that you didn't find. In other words.
Deep: What's the evolution like from the very beginning of a project, do you find categories, you know, have more or less problems as you kind of, uh, progress? What have you sort of experienced with their.
Bill: Yeah. You can start with a problem where you have a bunch of classes that you're looking to identify within text. Let's say that you're looking to identify, for example, folks that aren't really doing well, and you're looking to assess their mental state. So the types of things that they're saying might be indicative of them being, for example, depressed, or, or have a lot of anxiety and so forth can divide that up into this person is super happy. This person is depressed. Those broad categories don't really do a lot of that rich texture of, you know, mental health justice. And you might start off with something quite simple like that with just a couple of main categories and then tell your annotators, 'Hey, yeah, I'm looking for you guys to go through this text and label these, according to these couple of main categories that I've chosen, these broad stroke categories', but they're ultimately probably gonna come back to you and see all the subtleties in how people communicate and see, you know, this person sounds like they might be depressed, but I actually think they're being sarcastic or this person is maybe quite happy, but they're just being very cynical just for a brief portion of other text and so forth, you know, and as you get more of these labels into your system, the models will start to learn more and more. At some point you can take all of those labeled data, you can divide it up into something you train with and something you test with, and you can see, you know, how your models are doing with each one of these different labels in each one of these different categories that you've had.
Carsten: In the same context. You talk a little bit about annotation bias because unless you have a very crisp like scenario where you have annotators distinguish between cats and dogs and elephants. Yeah. What belongs to a label is not very clear. You will find that each annotator has a different opinion of what you mean. And this annotator bias goes into your training data. We have run experiments where we compared this. And what we found in this particular project was that very, very rarely, uh, more than three annotators would agree on a label across the board. And that was because the categories, they had a little bit of like human interpretation of them. You had to make a decision and the annotators, weren't all on the same page. And you'll see that reflected in your cation results later. If you look at your confusion matrix, you see a certain distribution there, text not black and white, what the meaning of a sentence is, but, uh, images are not either if, like I said, if it's not cats and dogs, if it's like a real world example of some that is not black and white and not easily distinguishable by a human, then you have a lot of anotator bias and everybody interprets it in a slightly different way.
Deep: One of the things that I've seen, be it a way to address some of this category boundary definition. So let's say you're a famous politician. You're getting thousands of emails a day, you know, coming in. And let's say that they're kind of like bombing you with stuff. If we ask our annotators like, Hey, tell me every time somebody's viding some feedback that's could of a awkward thing to define sometimes. But one of the things that we found is if you go way you down and you give them really specific examples, like, Hey, you are doing great, or you're doing terrible. Like that's, let's say a very low level label communicating that straightforward. You can give it to the labelers. Labelers can go out, formulate some keyword searches to help get the category kind of juice and off the ground. But one of the powerful advantages is if you get them to do that at that granular level, then that can serve as an example all the way up this hierarchy. One of the things that we do when there's ambiguity, we'll say something like, look, if you don't know for sure whether this is about including a public option, but you know, it's about healthcare or, you know, it's a suggestion about RO legislation, just label it a core into the highest level thing that you can. And then that kind of like lifts the burden from the oh, okay, well, that's, you know, I don't have to maybe argue about the category boundary and you can sort of organically like punt that to another day.
Bill: So the thing that they're sure about might then inherit this upward hierarchy, so they don't have to worry about right, exactly. That that's pretty cool.
Deep: The other kind of concept that's related that we found to be powerful is sort of like holistically labeled within a space, but then you might prioritize within there. So for like let's say that, cause we know that there's some healthcare legislation coming down the pipe and we wanna give really granular feedback. We might use the models to give very granular feedback in healthcare, but retain their generalized feedback in other areas. And then in a couple of months we might switch it. So this lets us sort of evolve the depth of understanding of the model over time and kind of tailor it according to the business needs. And that's another really powerful thing that you don't really see coming out for. 'em like generalized models out of the big tech companies or what you'll see is a lot of individual companies will just they'll miss potential product and business opportunities because they're overly constrained or they'll make overly broad category definitions. Sometimes for years, we've seen this, which will end up kind of blocking out.
Bill: So it sounds like you're saying this hierarchical just facilitates lots of adventures in the future for different types of modeling exercises. Is that a one liner takeaway that?
Deep: Yeah. I. Think so. Yeah. Like not everyone has Facebook or Google's budget, so they're getting like hundreds of thousands of labels. And they're just like in a totally different universe than a, than a startup or a project that's focused in a particular area and has like a judicious budget, you know, that they have to like allocate when we were talking about synergistic human machine interaction. I think we're talking about human interaction start with maybe a random sample machine goes off, runs through the back catalog of millions of documents, emails, text humans are now presented with what the model thinks is a category like include the public option, healthcare legislation or let's relabel. These ones that the model screwed up on. And the humans kind of have to of balance that as Carsten is suggesting, cuz you don't wanna only go towards model correction or you could potentially just have giant blinders behind you. So you need to include, you know, random data along the way. But in this way you've got humans and machines kind of working back and forth. One trains, the machines, the machines get smarter, they give the humans more stuff to look at and you just kind of keep going. Eventually you understand the conversation and you know, you can bring all kinds of capabilities to your product.
Carsten: What do you think the public real understanding we doing?
Deep: I would agree with that. I mean. You know, like, uh, I don't think patterns are understanding, you know, like that's a powerful word to use it's provocative, but you know, like we were working on a depression bot for a company and I started doing some research on some state of the art techniques for this depression bot. This thing was trained on a lot of these very state of the art system, Tim, and you would ask it stuff and it was supposed to be your therapist and you just had no moral value judgements whatsoever. And the way a parrot repeats things that's heard, but doesn't understand them in any sense. That's kind of the path that we're on with these machine learning systems systems, at least that's my take. Like they could be smart parrots and that they can kind of like take things and parrot them in just the right way. But I remember asking this thing. Okay. So do you think it's a good idea to blow up the earth? And the thing had like learned that cuz this is like a silly conversation that happens out on the web. So it's like, well of course not. Why are all you humans so obsessed with blowing up the earth? But then I asked the thing like, Hey, you know, is it a good idea to eat babies on Tuesdays after going for a run or something like that? And the answer is like, well yes, of course. Because. It had sort of learned that if somebody asked something kooky and weird, the thing to do is to give like an extreme answer.
Carsten: And so. Eating after running is good. Right? So babies.
Deep: That would've bless standing. So I don't. Right.
Bill: You know, Carsten, you mentioned before about these generative text models careful about bias, but they don't have an understanding of the physics of the real world. So for example, in one of these text generation moments, I asked the model, which is heavier a cookie or the moon and it, it responded, the cookie is heavier in. If some cases they got it, right? They said the moon is heavier, but you might enjoy it more cuz it's made outta cheese. If you think about that in the context of you're talking to a machine, that's cool that a machine could actually.
Deep: Formulate such a clever statement.
Bill: Such a clever statement. That's the thing about this. And this particular model is really fascinating cuz it's literally trained on the world's worth of data. I mean the entire text and the internet. So if there's something crazy that you wanna ask it, you're gonna get something back. That sounds like a human might have said that whether that's BA in logic is totally different story.
Deep: One of the things you're bringing out is part of the art or the jujitsu of wrangling these systems into being productive and helpful. And that is everyone has this dream of putting this generalized machine learning AI system out there to handle a generalized problem. But the reality is is sometimes we as practitioners and product builders have to put guardrails up. You can't have a generalized system. That's a parrot without any real understanding. You can't have it talking to potentially suicidal patients with no guardrails. And so it's the cost of a mistake is so high and the chance of this thing, making a mistake is incredibly high that's part of what's going on is that we've really changed the game from you 15, 20 years ago, 10 years ago, even, you know, we were just trying to get these machines to do something reasonable. Now they do reasonable things most of the time. And so we're changing the way we think about them and we're saying, well, what are the cases where reasonable most of the time is? Okay, those things are getting deployed at scale. That's Google T that's, you know, Alexa understanding what you're saying and saying stuff because it's reasonable to screw up in those contexts, but it's not reasonable to screw up and have your pacemaker just give the wrong signal at the wrong time. So I think that's part of the eye in the AI. It's a growing eye. Like it was really, really tiny 15 years ago and it's getting bigger, but it's definit, not, you know, intelligence in the way that you know, those of us who know what's going on behind the scenes, can't easily spoof and get it to just fly off the guardrails and smash into a million pieces.
That's all for today. I topic on standing conversations, bill and Kirsten, thanks as always for a fun conversation. I know we started off kinda trying to get our feet in the door and eventually wind up at, at AI, taking over the universe, which is always a good end for the Everybody. Uh, for anyone that's interested. You know, we've got some articles on this topic on our website at xyonix.com. That's xyonix.com. Um, with that, see everybody next time.
That's all for this episode, I'm deep Dylan, your host saying check back soon for your next AI injection. In the meantime, if you need help injecting AI into your business, reach out to us at xyonix.com. That's x-y-o-n-i-x.com. Whether it's text, audio, video, or other business data, we help all kinds of organizations like yours automatically find and operationalize transformative insights.