In the newest episode of Your AI Injection, Deep, Bill, and Carsten explore the potential benefits and drawbacks of using ChatGPT for code generation, both in generalized software development as well as the machine learning space. They discuss how this generated code may provide a false sense of confidence with its detailed explanations, but how its accuracy still leaves something to be desired. They also speculate on utilizing AI-generated code to boost efficiency of menial coding tasks, and speculate on what the future of AI and machine learning code generation may look like in the next 10 years. Tune into this podcast to learn more about the future of code generation!
In the newest episode of Your AI Injection, Deep, Bill, and Carsten explore the potential benefits and drawbacks of using ChatGPT for code generation, both in generalized software development as well as the machine learning space. They discuss how this generated code may provide a false sense of confidence with its detailed explanations, but how its accuracy still leaves something to be desired. They also speculate on utilizing AI-generated code to boost efficiency of menial coding tasks, and speculate on what the future of AI and machine learning code generation may look like in the next 10 years. Tune into this podcast to learn more about the future of code generation!
[Automated Transcript]
Deep Dhillon: Hi there, I'm Deep Dhillon. Welcome to your AI injection, the podcast where we discuss state-of-the-art techniques and artificial intelligence with a focus on how these capabilities are used to transform organizations, making them more efficient, impactful, and successful.
Uh, welcome. To our regular guests, Bill and Carsten, we are gonna talk about the viability or lack thereof of using ChatGPT or similar large language models for code generation, but more particularly for like machine learning and AI code assistance, let's call it. So with that, maybe. Get started.
Everybody had their homework assignments to go off and try to do something with chat G P T. What kind of stuff did you guys try and what did we find?
Bill Constantine: Um, well maybe I'll start off. I wouldn't say I delved into the machine learning space as much as I looked at a problem that was slightly more complicated.
Then say a one-liner, something very simple that you could look up on, say cora.com or Stack overflow. And it was actually a real world problem. My son and I are gonna be running a a half marathon in a couple months, and the last time we did, so he was a, he was a little tyke and he's grown up much bigger now, and he wanted to know Dad how fast he would have to run.
I wanted to beat like a two hour mark and that, that's an incredibly easy calculation, right? If you know the distance and, and the time that you want to beat, that can tell you your average speed. But we, we run a little bit differently. We, uh, run in intervals of, uh, running a a little bit and then walking a little bit.
So like for a typical marathon, I'll run like eight and a half minutes, say, and then walk 30 seconds. And so that complicates the calculations enough where it's not necessarily a super, super trivial thing. And actually I responded to him that when I had run the, um, London Marathon a long time ago, I, I had actually written some Python code to be able to explore different combinations of run walk ratios and, and, and so forth.
And I actually, Honed in on one during training that I felt comfortable with, so I was gonna send him that. But I thought, oh, wouldn't this be a fun experiment to, uh, a coding challenge to send to chat G p T?
Deep Dhillon: Yeah. So what kind of prompt did you use?
Bill Constantine: Yeah, so I basically described the situation to it, Tely, as I just described it to you guys.
I did give it a heads up on the inputs that I would feed such a function. And also the units of that function. And I said, now go ahead and create that function in Python for me. And I pressed the button. And like most people who are interfacing with chat g p t in the world, a little smile came across my face like,
Deep Dhillon: good luck with this.
Bill Constantine: Well, in a way I was, I mean, I was, you know, it's kind of a, it's kind of an esoteric computation given how we are running these intervals. And I didn't really expect it necessarily to logic that out. Well, But in fact it did. So actually, let me tell you some things that I, it
Carsten Tusk: didn't though, it didn't though.
Bill Constantine: Wait, wait. Before we, before we go negative. Alright, let's get to, let's talk about the positive and then talk about the negatives in that order. Sure.
Deep Dhillon: Yeah. So tell, tell, tell. Like, say you, you, you kicked it off, it crunched away. What? Did it spit
Bill Constantine: back? Yeah. So the first thing that I liked, About what it's, it spat back at me was a paragraph describing what the upcoming function was about.
So I got a good qualitative description. It gives you a false sense of confidence in a way that it sure somehow has this forethought and it, and you know what it's doing. I mean, that's also nice in terms of documentation, right? If you were to grab this thing, you can, you can shove that into some, some comments or something for documentation on that
Deep Dhillon: function, and then it goes ahead and actually gives you the.
Then it,
Bill Constantine: then it spits out the code. There's actually a couple things that I really liked about the code itself. One was the interim variable names that it came up with on its own were actually very reasonable, like understandable. They weren't super huge in length. They weren't super small in length like X, and then for some of the code that was.
You know, a bit more complex. There was a comment at the end of those lines that described what was actually going on there. Um, and then after that function was sped out, then it gave, um, an example, so it had some input parameters defined like distances, 26.2. Um, and have things like walking pace, for example, uh, as yeah,
Deep Dhillon: I was kind of impressed with this.
Just even as his ability to like name variables properly, like reasonably Yeah. And to like formulate the problem in a way that at least looked like a sensible formulation, even if it was not Yeah. Correct. Completely, but.
Bill Constantine: Well, that, that's what's kind of interesting there is, uh, like when I conservatively, when I actually ran a marathon, I would put these walk, uh, pace, um, values about like 18 minutes per mile.
It's actually quite slow, but, you know, um, that's, that's very reasonable. And I think it came up with like 20, it seemed like all the values that it came up. Were actually also quite reasonable. Which was, which was interesting. Oh, I will say too, I didn't, I forgot to mention that. I also said that I want, what I wanted was the average pace of the running segment in minute call and second format.
And so I saw also in the code that it was able to do so. It took that suggestion to heart and actually did that correctly. What happened though is I was excited, you know, I. We chatted, I threw this in our chat group. I said, geez, look at this thing. It actually did quite a decent job, but then Carson pointed out, actually, it's wrong.
And, and he in fact was correct in finding out that there was a line in there that was, that was
Deep Dhillon: wrong. So Carson, what, what did, uh, what did you, what did you find in there? Carson was much less impressed, uh, with the
Carsten Tusk: results. No, I mean, I agree with everything you said so far. It was deceptively good looking.
Um, yes, but you totally got catfished because , the actual computations, the formulas, they even made sense at first glance, and if you didn't really pay attention to it, and because the output was just numeric and not easily validatable. If you just looked at it, it took me a while to figure out something is wrong with this.
Right. And ultimately, what I, I just did like some, some double checks on the, on the actual computation lines to see, uh, simply whether or not, uh, the units would match up. Right. So if you want, you know, miles per hour, it shouldn't give you miles squared per second. And that is exactly what it did. And then everyone, I was like, well, this can't be right.
Right. The units don't measure up. That meets the, the calculations must be wrong. Yeah. It would produce code, it would read. Normally, at first glance, it looks okay to a detailed control. You realize that what is computing, the units don't match up. The numer results are wrong. So I think for me, the lesson learned was great.
This is a nice little scaffolding and template. This is kinda like how you should compute this probably, right? But you better damn well check that the extra computations it does are correct and you
Deep Dhillon: validate that. But I mean, that sounds like to me, progress in the sense that I think what you guys are both saying is that.
This is a reasonable way to not be staring at a blank i d e and with a little bit of work, you got something else to look at that you can sort of modif begin modifying and correcting
Carsten Tusk: if that's what you need. Yeah, you could look at something. I like it. Like I had a completely different thing. I was, uh, it's not Wait, wait,
Bill Constantine: wait.
Car. Sorry to interrupt you, but before we leave, This particular exercise. There is one thing that was, I thought was quite impressive and interesting, and that is, and I didn't mention it before, but the final thing that it returned beyond the function was a comment. It said, we assume that the walk pace variable has units.
We assume the distance, which is in miles in this case, and if that is not correct, basically fix it, change the code or whatever. And that actually was, that actually was the problem. And what was kind of interesting is the walk pace given, uh, in the example said 2020, it had a comment, it said minutes per mile.
But then in the code it used it as miles. So it's almost like it had a lack of confidence , which is making it into a sentient being, but it's almost as if for somehow, somehow it, it had a sense of that that particular input variable might be problematic in the code, which was kind of interesting. So it's almost like you might wanna pay attention to this area.
Um, so that was, that was cool for me. But I also, I would agree with Carson, is that you definitely get this false sense of security. You can, you know, sort of somewhat simple coat, um, snippets like this. So be careful.
Carsten Tusk: Yeah. And I think in, in some cases that doesn't matter, right? So if you have something where you are, um, I think deep you played with something like that, right?
You wanted like CSS layout, right? Yeah, yeah. I, I was gonna say, there's a certain class of problems where correctness is not that important because it's easily validatable and you just want to get an idea how to do that. And previously what you would do is we just Google for it. You know, how do I stretch a background image in css?
Then you go through your Google pages and you have to figure out examples and you look at a couple, et cetera. So you find one that works for you, right? And then you try it and it might work for you or it might not. And you go to the next one. And with at G P T, you can do the same thing, right? Except you don't have to browse through a hundred pages and see where on the page that the little snippet is that you might use.
It just gives you the snippet. And if it's something like that is harmless and you can just validate to see if it works for you, you just use it.
Deep Dhillon: Could you a little more than just the snippet. . I think the one difference between what you can get with ache, G B T versus combing, stack overflow or whatever is, you know, you can actually get it to address you the particulars of your problem
Carsten Tusk: in Yeah.
It's, it's like, uh, it's like, like ultra customizable
Deep Dhillon: templates. Yeah. And I find that to be quite powerful. Like, so the example, you know, for the sake of our listeners, um, I had uh, uh, some. Basic H T M L, um, layout that I needed. Like I needed a, you know, a, a border, um, that was in black, some white, uh, for the main, and then then a, a subsequent border.
And I wanted a, um, you know, some, some text, uh, coloring and link coloring stuff that's just kind of annoying to do unless you're an H T M L developer, which I am most definitely not. And I just asked it to it, and it, and to spit out a C s s and the H M L and it had did it and. Basically popped it onto my file system, loaded up the browser and it didn't do exactly what I wanted, but it did a lot of it.
And the alternative in Stack Overflow land was just a bunch of goggle gobbly go that I had to wade through. And then I had to think, and I, I wanted to see like, can I eliminate having to think in this case? And it seemed to mostly do
Carsten Tusk: that. Yeah. And it, it's kind of nice how you can be that. , which brings me to my problem.
I had something, I was working on some color matching. I had to match RGB pixels. Mm-hmm. or pixels in an image. So pixels in a patch of an image that doesn't work so well in RGB color space. It works better in lab or c i e color space. And so I was like, but I had to do it in JavaScript. I asked it to like, Compare histograms in C I E labs color space in JavaScript.
It spit out like some little code segment, which again, just like the one that it did for Bill, it seemed plausible in this case. I did really actually try it. Um, I did my own thing in the end, but I learned something about it. I learned that there's actually an open cv, JavaScript library and you can use open CV and JavaScript, which was really cool.
Mm-hmm. And then it used that for color conversion, et cetera, et cetera, because all those functions exist in open cv. That's the other thing that's good for, even if you don't really, if you have kind of. You kind of know what you want to do, but you wanna do it in a certain framework or environment. I think it's almost easier than Googling to ask it to solve your problem and then use at least what it spits out as an idea on how to approach the problem.
Yeah, so my, Kristen,
Bill Constantine: I was thinking about that same thing going back to the running problem that I threw at it in Python. We three are all. I would consider as experts in Python, but I've always heard Ram ramblings about Julia or Russ or other languages. And with chat G B T, you could say, okay, that's great.
Now, now write, do that same problem, but uh, write it in Julia. And that would be very instructive for, for me, just to see sort of the layout. Knowing that I know the problem well, just to be able to see the, sort of, the format of that code in comparison to Python, in comparison to rust or whatever. So learning new languages in a way with, with a problem that you really know that's beyond something that's hello world.
I think that would be a little bit informative even if you know, hey, I'm not gonna assume that this code is right. I can see how things are formulated in that language that might be different in, in a language that I already
Carsten Tusk: know. I think that's it. It's not necessary, right? But it's also not grossly. And so you can kind of get an idea out of
Deep Dhillon: it.
I've almost been thinking of like, I don't know if you guys know this, recall this, but when Chat g B T first got released, I think it was Stack Overflow and a few of the other coding, uh, forums that had to immediately. Ban anyone's contributions that, uh, came from chat, G B T I think they just actually shut down for a while because there was a bunch of script kitties that were running off trying to answer as many tech overflow questions as possible.
They were just blindly shoving at chat G B t, putting it in as an answer, and all of a sudden, you know, these databases of responses to very particular questions that are highly curated. Right. You know, like Stack Overflow has this voting mechanism, this reputation mechanism like things. Mostly right. You know, like when they're in there, obviously there's sometimes there's exceptions, but all of a sudden know they're getting infiltrated by folks.
But it made me think that there might be a, a hybrid world where Chad G B t is used behind the scenes by knowledgeable humans to like grow the rate at which Stack Overflow and other sites like that cover questions, but they're still like, you know, they're still like the humans making sure that things are correct in that ideal.
You know, we get more answers and, and you still might be using chat G p T, but you don't
Bill Constantine: even know it. So you're talking about exploring topic space more broadly behind the scenes with a, with a robot. Basically
Deep Dhillon: I'm talking about empowering those who crowdsource answers to sta to stack overflow with chat.
G P T. Likes capabilities, right? So that they are more efficient, they can answer more questions. They still play the vital role cuz they're the ones whose reputation rank is on the, on the,
Carsten Tusk: it defeats the purpose of Stack Overflow though, because the thing is not really meant to be like some weird question answering thing for programming problems.
It's kind of like a thing where you hope to get answers from other human individuals. Either know the solution or have run into the same problem. Right. The last thing I wanna see on Stack Overflow is some made up answer that may or may not be
Deep Dhillon: correct. So presumably they are still, because they still care about their, you know, their rank or their credibility, they're making sure it's correct.
Carsten Tusk: Yeah. But it's kinda like fake reviews on Amazon. Nobody wants them. I don't give a, I already could care less about somebody's rank on Stack Overflow that digital EP and swinging can go away. People should post an answer there because they know the, they know the answer and they wanna help somebody out not to improve their rank.
Deep Dhillon: Well, but that is a philosophical difference. There is worth noting if I ask you a question. And you answer it painstakingly and you give me an answer or you immediately get an answer from chat g b t. You look at it and you know for a fact it's correct and you save yourself 80% of the time. What's the difference from my vantage?
I don't care. I got the answer from you. I still got Carson's answer. Even if it came through chat G B t, you
Carsten Tusk: didn't because why would I sit there and answer questions using TR PT on stack over. Uh,
Deep Dhillon: the same reason that you use chat, g p t, period, like to the, the stuff we're talking. Here's the thing about
Carsten Tusk: to get some water like code.
If I am knowledgeable enough to make the judgment, call that the answer from chat, g p t is actually correct, then I don't need chat. G P t. I could have just written my answer. Well, you could have, if I'm not net or not digital enough, it's outright and dangerous to allow that. So I don't ever wanna see answers from chat G P T on any kind of form like that.
Deep Dhillon: Have data, have a hypothesis on some high value insights that if extracted automatically could transform the business. Not sure how to proceed. Bounce your ideas off one of our data scientists with a free consult. Reach out@zion.com. You'll talk to an expert, not a salesperson.
It's a question of efficiency, right? Like that's where we
Carsten Tusk: disagree. There's nobody sitting somewhere in the world whose job it is to answer stack overflow questions. Right? It's not, you don't have to be. Well, there's people who think it's their job. I mean, they do it. You don't get paid because you answer a hundred questions.
And I hope that the people that that do that with chat g B t fall flat on their face and their reputation goes down because their answers are garbage . Anyhow, what if they're not? That's like Bill posting his marathon running thing. He thought it was correct. It wasn't.
Deep Dhillon: There is, but he wouldn't have posted that unless he actually verified that it was.
Carsten Tusk: Oh, I don't think if you say he would have validated that before he posted it, then your time savings that you're just talking about with Jet D p d are through the drain again because you can't afford to like, have compose a thing and you spend an hour testing it and then you post it.
Deep Dhillon: Maybe it's more efficient.
It's like this
Bill Constantine: completely from a, an efficiency perspective as well. Like, would I ever use this? Would you, and, and it comes, comes down to that question, is it easier to build a house from scratch or, or remodel it? Because I think if the complexity of the code is such that, you know, there's probably something broken, it's gonna take you a lot of time to go through that and validate that code.
And in this case, there's no way I would've used chat g p t to write this. Uh, I would've done it from scratch for sure. It is a bit of this toss of like what is the
Deep Dhillon: most, well, so I think that's the root question, right? Are there any scenarios. At all for improving efficiency amongst data scientists or developers with these tools.
And I'm gonna just, I I, there I found an interesting thread by Sergio Perrera on, uh, on Twitter, and he listed 10 of them that I thought were kind of interesting. So one was generate boiler plate code, um, which we, we talked about, I think, um, another one that he lists is research and compare. So like, um, algorithm A versus B or uh, language A versus B, or framework A versus B.
He talks about explaining code, like, here's a blob of code, what does it do? And having it, uh, explain it in
Carsten Tusk: words, that one doesn't work. That's bullshit.
Deep Dhillon: I don't know, like I tried a few of his cases and they did seem to do something generally
Carsten Tusk: in the ballpark. Yeah. The moment that you have code comments that just explain what you can read out of the code just in English language.
Agreed. Those comments should be thrown away.
Deep Dhillon: Yeah, I was, I wasn't in, is in love with this one either, but um, but there is, there's a few other ones that one that I thought was interesting was like writing test cases cuz that's something. It doesn't work. , I I think they'd have to actually run an experiment
Carsten Tusk: against, well, we can do that.
We can take, we can take bill's code and run a test case because if you get the code wrong, you get the test case wrong too.
Deep Dhillon: How would guess the code is like separate problem if you, on a philosophical level, how how's trying to write a test case, like, because there's boiler plate code
Carsten Tusk: involved, you can write boiler plate code for test cases.
Sure. But you cannot write the actual test.
Deep Dhillon: Well, so some of these examples were pretty specific, like check for, no. In this case, I felt like I would have to actually sit down and time myself on my own versus like leveraging this tool and see if I was actually faster. But my general sense is that maybe not today.
But in a custom training of chat G B T on code specifically around common problems and embedded in an ide, I wouldn't be the one betting against the stuff making its way into useful IDE capabilities.
Bill Constantine: Well, let's, let's talk about just this issue because I was able to chat G B T to say thumbs up or thumbs down on that.
And I gave it a thumbs down and I was able to explain why we found out there was a problem with units, with linear, your infant variables. Then there was the little box that says, uh, is this like something like unhealthy or, or dangerous? And I actually checked. Yes. Yes. And the reason why is because, guys, I have not yet run a marathon where honestly, somebody hasn't died.
Somebody hasn't passed out. I've, I've seen it all. And you know, advice like this, taken say with my son for example, who doesn't know Python at all, he wouldn't necessarily know that that was right or wrong. and he could end up having, you know, a problem in, in, in the real, in the real world. This could have a, a negative impact.
But what I thought about that, you know, that might be a bit dramatic here, but what I thought about that is that I would say that chat g p t is now running this grand experiment where everybody and their mother is using it and they're getting this feedback. I would love to come back in a couple of months and pose this very same problem and see if they fixed it.
Carsten Tusk: Basically. I mean, I, I doubt it because how. It is not that there's some intelligence object on the other side that actually solves problems. Yeah, that's right. But they, the amount of safeguards they can put in, how they can influence this particular thing is actually very limited. Right. It's
Deep Dhillon: kind of like, yeah, it's, it's not like you can correct.
Scenarios 17 and 18 and
Bill Constantine: 29. Well, actually, actually, on top of these large language models, you have human annotation, right? That gives basical, basically corrects the output, and that's where you might have a chance. But then we're are back
Carsten Tusk: to our, our original topic of AI and machine learning, right? How do you, how do you correct us?
Well, great, you got new data and try to train it on that data, and then you pray, you know, there's, there are no guarantee. So that's where it gets difficult to correct any of these issues. Let's totally. Or to guarantee anything that comes out of it.
Deep Dhillon: I, I wanna go back to a moment and say, and ask the question like, how is it that a model that's trained to predict future sequences of characters and words can get to the point where we can even have this conversation?
Like, how can it even get to the point of producing, albeit wrong, but totally like passable code. That a lot of people disagree with and genuinely are using this stuff daily in their coding practices. You know, like a simple use case is like, apparently it's, it's quite accurate at generating red Xs. I hate generating red xxs and everyone's red XX builder that's ever been out on the web's annoying to
Bill Constantine: use.
Boy, you just landed on a problem that I would use check. I would just try it. So I
Deep Dhillon: would, cuz I, it takes, but my, my, my question though is like, how is it that this stuff can, given that it doesn't know what it's talking about, how is it that it can appear so much, like it knows what it's talking about?
Carsten Tusk: Because it's like everything, it's pattern recognition and pattern assembly. It's literally based on statistical distributions of patterns in the corpus that it has seen and learned, and it learns what goes together and what is more likely to go together and what does not go together. And then you throw attention in the mix and it knows exactly each output element, what input elements to focus on which are most relevant, and it produces good results.
I, I'm not sure if anybody in the world can explain to you why it exactly is capital of doing what it's doing, but ultimately it's, it's just based on. Patterns and statistical pattern distributions in its training purpose. When
Bill Constantine: you take, when you take code output, that looks very reasonable given that it's produ, it's predicting sort of one token or a snippet of a word in this case at a time, and it comes up with something that's actually quite reasonable.
It's wrong, but it's reasonable. It is a little bit mind blowing actually. You hit
Carsten Tusk: the nail on the head. It's wrong, but it's reasonable and it gets reasonable because of its pattern. And it's because there is no exact match to what you wanted out there. Yes. And just came up with something that looks reasonable and it does that in many, many cases.
That's why it makes up facts and language. It makes facts up when you ask it about a person or something and it makes shit up when it
Deep Dhillon: writes code . I find it mind boggling that the same piece of software that, you know, predicts a handful of words, you know, or whatever in front of where it's at, that that's basically how this thing's trained, at least the large language model part, not the reinforcement learning part.
I, I, I find it mindboggling that you can somehow get to that. And then, and the same thing can translate from French to German to like Klingon the same thing that can translate from Python to JavaScript to you. I mean the, the same thing can reason about like such a massive array of stuff. Like I just can't, it's
Carsten Tusk: a funny thought.
I agree with you. I find it mind Bo as well. I don't understand exactly how it works. I mean, and, and we are actually at the point where it's a little bit like the human. We kind of know mechanically how it works, right? There's synapsis and neurons and we kind of know the molecular composition of the human brain.
What we do not understand is how it is actually capable of doing what it's doing. I think we have reached a certain point in our, like modeling of the human brain where we're at the same point. We can explain, I can explain to the, in the mechanical details of how it's implemented, uh, what it does on a, like a pure physical level, how that in particular leads to the output we are seeing.
I have no freaking. No idea. But isn't
Deep Dhillon: that sort of the miracle of this thing in a way? I mean like if we talk about miracles, we talk about like mysteries of the mind. I feel like there's a huge mystery around the synthetic minds that are being created. Like we don't really know. These are very large, complicated neural nets.
Exactly. Their own creators don't know like these, we just
Carsten Tusk: Exactly. Yeah. No, and the only other thing I can think of that like that is the human.
Bill Constantine: Yeah, and a lot of these models that we're looking at here, they're biologically inspired or behaviorally inspired, right? Like the reinforcement learning models are all based on this reward system basically.
And you can think about when you're a kiddo and you do a bunch of stupid stuff, you're rewarded or punished for a lot of, I mean, that's conditions, and you learn quite quickly.
Carsten Tusk: That was the goal when we set out a hundred years ago. Yeah. By, by this whole research went into that direction. We wanted to like understand human thinking and learning and and intelligence.
Yeah. I think, I think with this and with this magnitude and size we have come one step closer. There is a danger in getting it to work, but not understanding why it
Bill Constantine: works. Well, if the black box was ever black, it's certainly when you have a model that has 178 billion parameters,
Carsten Tusk: but it's not black. I can tell you exactly what's in there, but I don't know why.
It's, why ? Yeah. Why it's working the way it's working.
Bill Constantine: In the, in the overall sense.
Deep Dhillon: Right? Yeah. So go. Going back to, you know, the theme of this discussion, which is, hey, can we use chat G b t like functionality to, you know, help build machine learning and AI systems in addition to coding in general. The question I would ask is like, okay, so maybe, maybe chat G B t, you know, falls down because it's trained so generally, but if you were starting up a startup that's, let's say laser focused on just machine learning assistance, uh, tooling, would you, how would you train differently?
How would you reformulate the problem? Like how can you start to address some of these limitations that we're seeing around blatantly very confidently saying things that are, are wrong, but appear quite right and get you started?
Carsten Tusk: I don't think you can. I don't think you can fix that problem. That's what where we're at, what we said earlier, right?
We have this, we have this very mechanical, implemented, large language model, trained on a massive amount of data. We don't really know why. It puts out what it puts out. So we can't actually fix that. And I mean, gen G P T is the next evolution of trying to fix it with reinforcement learning, right? It's kinda like GBT three plus reinforcement learning or some other magic.
So we're trying to fix it by fine tuning it towards the right direction based on how it interacts with the world. Can we actually fix it with a guaranteed fix? No, I
Deep Dhillon: don't fix, not with a guarantee, but can we just improve efficacy?
Carsten Tusk: Well, yeah, and we we're gonna do that, but I think what, what really needs to change is the way that people approach it and that people approach the output.
There is a way to approach it that is dangerous, which is like taking the output and believing it's true. And there's a way, uh, there's many, many good ways to use it. And we mentioned that some of them today, right? Inspiration examples. Um, let's say you know what you want to do, but you don't know how to implement it.
What is an efficient implementation of X? Just ask, chat, g p t. It gives you an implementation. If new, you know the problem and how to validate its correctness, then use that code and see Makes sense to you, right? Get some inspiration from it. Just don't use it blindly. That's, that's
Deep Dhillon: the, I I'm, I'm thinking more of.
On the training side, can we modify our training somehow to like intentionally teach things that are correct and that are incorrect so that it understands the difference? Like that's what we're doing.
Carsten Tusk: Right? That's, that's, that's Socha. G P T is training with the reinforcement learning. Yeah. It's basically being delivered a question, a set of answers, and it has to basically, uh, and a set of answers that are ranked in correctness.
Deep Dhillon: Well, I guess I wonder about the human feedback that's being given. Like you really need people to actually, like when it, when it comes to code, it's not enough to say like, yeah, I speak French, I speak English. That translation was correct. I have to actually take this piece of code and run it and see if this was right or wrong, or some gradation in between.
How do you formulate a problem so that we actually. Take steps forward to. I don't to correct things when they're wrong.
Carsten Tusk: I don't think you do because I don't think that J G B T or any technology based on a large language model will ever be able to actually solve problems. The only thing it does is it, it, it literally just can reassemble patterns.
It's seen in the past. So you will never be able to ask it for something new. It doesn't think, it doesn't reason
Deep Dhillon: well, so, but I already ask it for things new and it, and it seems to do a reasonable
Carsten Tusk: job. No, you can only ask it for things that are recombination, recombination of things already in existence.
I mean, given, that's
Bill Constantine: a lot of stuff. It's taken the, the world's, given that it's taken the world's worth of text information and ingested it in training the model, you're very likely to see things that you might construe to be new, but it's not new to it. Right?
Deep Dhillon: Agreed. Agreed. Yeah. You're listening to your AI injection brought to you by xyonix.com.
That's X-Y-O-N-I-X.com. Check out our website for more content or if you need help injecting AI into your organization.
That's kind of what I'm asking is like, if I'm a startup and I'm trying to make the world's best i d e that leverages as much AI as possible, what am I gonna do differently than, uh, than what the OpenAI folks have done
Bill Constantine: with chat G B T. Yeah, I'm, I'm a little bit in Carson's camp right now in that I don't see its value in this form quite yet, given that I don't necessarily trust it and given how it does business in, in producing its completions, I don't
Deep Dhillon: have a solid answer.
Yeah. But do you trust Oug suggests to like method names? I mean, you, you.
Carsten Tusk: Yeah, because it's, it auto suggests the method name, and then I have the documentation next to the method name. I know exactly if it's the method I want or not.
Deep Dhillon: Yeah. But the But the reason you trust it is because you can look at it on the fly and agree with whatever it's done.
That's right. But in the case of code that's more elaborate, you know, it takes energy and effort to go in and look. The question could be in terms of like how do we apply this stuff is like maybe there's a compartmentalization. Like we went from something as straightforward as like method name auto suggests.
All the way up to boom, like a whole huge block of code. But maybe there's like a spectrum in between. I
Carsten Tusk: don't think, personally, I don't think we do, and I don't think we should. I do not think that like writing code is something that should be automated. But we at AI things, because I think writing code is ultimately like a very creative process.
It's a very individual process for the, for the programmer. It's not like somebody's shoveling sand. I mean, sometimes you're shoveling sand, but most of the time it's creative and ultimately you're trying to solve new problems in a new. And even if it's an old problem, you might actually improve on other solutions that are out there.
If all we do is like start regurgitating things that in I suggests to us, we will essentially hold produce,
Deep Dhillon: well, I mean, I would agree that the kind of code you write satisfies your description, but I would not agree that somebody who works at a firm that's got 193 drivers for a particular whatever printer and.
Making the hundred 94th driver that it's a particularly creative pursuit or that it can't be, you know, um, accelerated and automated somehow leveraging some of this capabilities.
Carsten Tusk: If that driver is the same as a driver already in existence, by all means ask chatt PT how to talk to it or how to write it.
If that driver is something your company just innovated on a new product that gets out on the market and you wanna, uh, ride a good driver for it, then by all means do not. Well, I think that,
but
Deep Dhillon: I think that's, I think the point that I'm trying to make is there's a ton, a ton of problems that are not creative, that are lame and boring.
I, they've been done a bajillion times. It's just, I agree. They've been done a bajillion times. But
Carsten Tusk: somebody else, and you're, and I think ultimately there's a, I mean, there's also a whole movement that goes toward these, uh, you know, what is it like code free, uh, application develop. Right where you enable people that can't code at all to do like some visual programming.
Yeah. Or just slight little building blocks around. And I feel like using AI or chat G P D for co-generation is kind of like the equivalent to that for people that can partially code all lazy. Yes, use it if you have to do something really, really boring. Sure. I mean, I might even do that, right? If I have to write HTML and css, by all means.
Absolutely. Wait, wait.
Bill Constantine: Take it one step forward towards something that's a bit more data science, seeing machine learning stuff. Yeah. Have you guys ever used chat, G P T? I have actually tested it, not just for this. Exercise. But, uh, how do I write efficiently in PDA's? Multiple aggregations in a group by clause.
Okay. And have it put out something. And that actually, that actually changes depending on the version of pandas you're using, which we know, but I personally found it's. Answers to a lot of those questions to maybe be correct actually, but very inefficient and, and so again, I was left with this kind of thought, well, even if it's correct, I have to check that it's correct.
And ultimately that's not as sufficient as if I were to do it on my own because I know a better way. Well, I just wonder for like people who are exploring territory that don't know how to put together a deep learning network or something, you know, so would they use something like this in a way I. That brings up a good point.
Like write me some Karas code to
Deep Dhillon: do like convolution. I mean, the way I see it is like when you're really learning something from scratch, and let's say you're on the more novice side and you're trying to do something new, you spend a lot of time Googling. You're trying to find just the right, like oftentimes you're trying to find just the right boiler plate or the the right seed code to get started and it doesn't quite fit.
And then you go and you compile or you, you run your test case or whatever, and maybe it works, maybe it doesn't, but usually it doesn. That process can be time consuming and tedious, and you could get into the ballpark a lot closer. And I've, you know, I, like, I, I did some experimenting where I just said, Hey, generate me with the psychic Learn classifier for the flower petal problem.
Like, just generate that boiler plate code. And it did it pretty quickly. So then I, I thought like, okay, well that, otherwise I gotta go around until I find that one particular example that's sitting out on their. I feel like there exists a world where somebody hasn't written exactly what you're trying to do, but they've got posts that are sort of spread out across four or five, 10 different places, and this stuff plays a role, like it gets me attached.
It does
Carsten Tusk: right off the bat. It. It's what we talked about earlier and actually that's, that's what I found when I had to write this color matching HISTORIUM code in JavaScript. Does that exist? I Googled actually for it, and it doesn't, I wouldn't find anything Google, then I asked open G P T to write it for me and I was like, that looks, yeah, that looks how a way, like I could do that in JavaScript.
And JavaScript is something I don't develop orphan in, so you know, but I kind of know the language. The syntactic details elude me sometimes. It was really good to have that sample. To then implement literally line by line myself, validate that it's correct as I'm doing so and say, okay, that makes sense.
But now I had like an example that I could go for like a template, right? So I do think it's good for that. I do not think, and, and that accelerates things, right? If I had to dig that out myself, I might have like granted with like 10 more errors. So that accelerates development. But can I really use it to like dramatically accelerate my development and have it actually helped me with problems?
I don't quite understand. No, the answer is definitely no for. , it is good for things that I already know that I can validate myself. And I'm just looking for templates because you know, last time I did this was three years ago and I've forgotten how the hell it works. So would
Deep Dhillon: both of you guys agree that would you encourage folks to experiment and try this stuff and play around and come up with their own ideas on to use it?
Me, the example, they can wait six months. It doesn't
Bill Constantine: matter. Let me take the example of my son who doesn't know how to program in Python. He's just starting off in school and, and he's going to, at some point, I think it would, I would totally encourage him to say, look, here's a problem. That you asked me, I told you I had written code for it to solve it some time ago.
Why don't you just go off and do that yourself and see and have a generate the code. You can learn something from that. I think for beginners and folks looking for just a structure, maybe a little bit
Carsten Tusk: lazy . I agree. I think it's a great learning tool, uh, because when you learn something, uh, it always helps to get many different perspectives on it and I think that chat g PT can really help you get, take a.
And get many different perspectives on it, how to implement it, change some parameters, how would you implement it now, et cetera, et cetera. So you can see the different variations and it gets it correctly. Close enough that even if there's a, you know, a calculation error in there, it doesn't really matter, right?
We're not asking it to, uh, bright the temperature control for the nuclear reactor. We just want
Deep Dhillon: some like head deploy automatically
Carsten Tusk: funny colors on a, on a Java pin. So I think, no, I think it's actually great for that and I would even use it for that. If I'm, if I'm trying to get into a new api, a new system, say I wanna do something and react, I have no freaking clue about React.
Right? Hey, how do I create a button and react? Great. Gimme some template code. Yep. I think I would totally use it for that. Templates, examples, some learning experience, and then transfer that into my own application.
Deep Dhillon: Let's do this. Jump out. I know this is gonna be a very hard question to answer, but sometimes these ones are fun.
Jump out 10 years into the future, right? Like we, all three of us here can look back 10 years and 10 years before that, and unfortunately 10 years before that and, and we can see the big jumps that happened, right? So jump out 10 years in the future, how is AI machine learning code being written, if at all, and how is software being written and how is it different by all of this, this new stuff that's like emerging at such a, a
Bill Constantine: rapid pace?
Well, first of all, I think that we're gonna be entering a new paradigm centering around the questions that we're we're talking about in terms of correctness. You know, Carson stated early that we could take these large language model approach. You, you're evolving completions that come from, you know, statistics and patterns that are matched against the world's worth of data in, in text somewhere.
There has to be a change in that modeling paradigm that they're trying to intro introduce now, which is, well, you, you can go ahead and spit something out that's really interesting in terms of its patterns, but it's not correct. How do you, how do you change the system so that what you're spitting out is actually.
Usable and correct from the from the get go. Like I say, I think six months from now, I would love to see, for example, a chat G P T, which is comprised of a set of models, but now not based on G P T three, but G P T four, which is, if you listen to the rumor mill is headed towards the direction of trying to create these corrections, logically, philosophically, et cetera.
So, In, in the, in the future, probably in a couple of years even, I am hoping that we can get to, to address these kind of programming challenges where we start to trust these a lot more. That kind of changes to me. Once you developed a certain level of trust in something, you can start implementing these.
You know, machine learning models in your toaster and in in your aids. You know, you're not, I I wouldn't want to ever see it in a nuclear reactor, honestly, but there's gonna be levels of trust that I think that we can develop as a society in using these tools, depending on how accurate they're, they're seem to be.
Deep Dhillon: Carson.
Carsten Tusk: It's, it's a good question. I don't really know.
Bill Constantine: Would you ever trust a robot to do surgery on you? In Carsten? Me
Deep Dhillon: for sure.
Carsten Tusk: Without human supervision? No, I don't think so.
Bill Constantine: Based on, based on a lot of supervision. Uh, depends on the surgery, the models,
Carsten Tusk: I don't, I don't trust the Tesla to drive me either.
No, I, I would not, because even if you get a 99.9% accuracy rate, there's still zero to 1% that you end up debt or, you know, against a bridge pillar. So no, you, you don't get guarantees with that. These are not like validatable systems. Um, 10 years from now, I think we see further growth because we have just experienced that as it has been.
The more data we throw at these things, the more closer we come to better performance, because I feel. We are really getting closer to how the human brain works, which has still a couple of billion more parameters than this. Um, so I think we'll see these models get bigger and bigger and bigger and possibly more powerful.
What's lacking is that step that you mentioned, bill, the validation and how can we actually validate accuracy. I don't really know what the answer to that is because we can't control the model. If we can't control the model, maybe can, we can throw the garbage results away. If we do that, we would need some sort of validation, right?
So can I somehow get the model to produce it and then have an automated step that validates whether or not it's good? And if it's not good, then just produce something. That is also the paradigm of, um, reinforcement learning, right? It's kinda like you do something, you take an action, and then you get a response, good or bad, and then we refine your things.
So if we could come up with a, with an automatic validation for what we're building, we could actually use reinforcement learning. To fine tune
Deep Dhillon: these things. That's what I was thinking is like, can we, if we're, if we tell the system, generate some code that does X, now let's run that code somewhere, and then let's define what X is in a crisp way so that we can actually test it.
And if that's right. What gets generated gets reinforced positively if it's wrong, gets reinforced. So maybe
accomplish
Bill Constantine: that. If we accomplish that, then we're headed in a, in a really interesting direction.
Carsten Tusk: So, so maybe we see kinda like a, a mix of these models with like a new edition of the, the expert systems from the seventies that had big rule bases and knowledge base that would validate things, right?
If somebody claims that, you know, Bill Clinton, uh, and it goes a little bit to like reference and attribution and text that G P t writes, right? Right now there is no such thing, but if we could have a fact checker, right, that says, yeah. Um, you know, yeah, bill Clinton was president, and G P t writes something about President Bill Clinton and this thing would say, take that fact and check it against the database.
So have some validation step. Uh, I think that would give the whole thing a little bit more trust, a little bit more believability. Um, I, I think,
Deep Dhillon: um, one analogy I can go back into history and think about is, I had an adaptive signal processing class back in, I think in grad school and somebody for fun. I can't remember what we were building.
It was some particular filter or something. Somebody for fun implemented it in assembly. For fun. This is the kind of thing that happened, right? , so they implement, and for those, for those listeners of ours who don't know what assembly code is, it's like really tedious level code. And to do something simple can take, you know, hundreds, thousands, even tens of thousands of lines of code to do stuff.
And then somebody else. You know, in, in another class had implemented stuff in like native C, somebody else in c plus plus somebody else in like Java, like languages because you know, sometimes, you know, students get bored and they wanna do something more challenging. But ultimately, you know, we got to the point, you know, back, this is back in the early nineties where, you know, in mat lab.
You know, whatever you were studying in a, in a textbook or reading in a paper, you could kind of express almost line for line the mathematical expressions. You could express them in a very compact way in a, in a language like MATLAB or, you know, or r or something. And I feel like that whatever level you think at, if we get to the, the compact expression of that, then the conversation is no longer about programming, right?
Like, if. You know, taking a, whatever, an optimal control theory class in, in graduate school, like nobody really talks about the code part. Like it's, you're, you're just talking in math and, you know, communicating back and forth with the mathematical symbols, machine learning AI systems. At some point, is there, I'm asking the question, is there like an analogy there?
Normal programming is just like a lot of verbosity, you know, it's like a whole bunch of stuff, but at some level, if you think, you know, in Star Trek you're interacting with a computer on Star Trek, it's really like, what can you express? What, can you get an answer? Is that the end place that we wind up in 10 years or no?
Of course, hundred.
Carsten Tusk: I think that's something we see quite naturally in all software development, and you can see it in machine learning today, right? Look at something like Kara. Which is a library on top of TensorFlow, and you could just say, Hey, the TensorFlow implementation of this neural network, that's your programming.
That thing that you do in Carass where you have like 10 lines that just describe a couple layers in a network. These are your building blocks. That's almost visual programming that's thinking about that on that high level. Yeah, and I think we'll just like continue with that tomorrow. We'll, basically the, the whole thing that today is in Caras will become a new by building.
I mean, modern neural networks are like built like that. If you, if you read a paper, you see like there's a block here, right? It's just like this is the attention block. But the problem is this block has like 300 sub blocks, and the sub blocks have sub blocks. So in order to go back to the actual implementation where you're multiplying a couple of mattresses, you're already a hundred miles away from ground zero.
So yeah, things will become more. More compositional. Uh, we will have validated building blocks that we built put together to like solve problems. It's kind of like that today already. I think it just,
Deep Dhillon: I feel like that was a great summary of the 10 years out, like that you just, that you just had there. I think that's it.
It's like these systems are gonna continue to get. More and more and more and more powerful, but at some point a human is communicating with another human, then we will converge at the minimal amount of effort it takes to communicate whatever they need to communicate. , that's right. Uh, in the old days that might have been a thousand page thesis, you know, uh, but, but in the future it might be a prompt your AI system with this and uh, and answer it like this, and
Carsten Tusk: boom, there.
So in, in that line, I have an interesting thought. Um, you guys familiar with the whole cyberpunk literature genre? Yeah. Little bit. William Gibson, you know? Mm-hmm. , like Neil Stevenson on those guys. Yeah, exactly. A bunch of years in the future. Uh, especially in in, in, in William Gibson's world, you know, the big tech companies and corporations rule the world, right?
Yeah. These models, they're getting so big and so expensive that they're not gonna be democratized. I mean, look at what's going on right now. You have open ai, Microsoft put a 10 billion bit in for open ai. Oh yeah. I mean, any
Deep Dhillon: day now we're gonna be paying, we're,
Carsten Tusk: they're gonna be implementing that on Microsoft Azure.
And then everybody that wants to use one of these nice, great new models has to go to one of the big corporations with their big ai. And use their output. And you can only hope and pray that, you know, it's really the raw output of this thing that your data is not being app used, stored, used for other fair, nefarious purposes.
We are beyond the stage where you can run this and say, wait, I like this. I'll just download this model. I run it in isolation on my home system and I think there's a big danger in that, but that's probably a topic for a whole different topic. Podcast. Yeah. .
Deep Dhillon: Well, I think, uh, I think this is a fun conversation.
Thanks guys for, for coming on and, um, That was fun and we'll see what happens. I don't know. But in the meantime, to all of our listeners, if you haven't by now playing with Chad g b t, obviously you should, if you write code or write machine learning code D around, uh, drop us a note, uh, on Twitter or wherever, um, about some useful things that you find or spectacularly useless things.
So, all right. Thanks everybody. That's all for this episode. I'm Deep Dylan, your host, saying Check back soon for your next ai. I. In the meantime, if you need help injecting AI into your business, reach out to us at xyonix.com. That's X-Y-O-N-I-X.com. Whether it's text, audio, video, or other business data, we help all kinds of organizations like yours automatically find and operationalized transformative insights.