Plagiarism 2.0: ChatGPT, AI and Generative Content Concerns Artwork

Your AI Injection

Is AI an ally or adversary? Get Your AI Injection and learn how to transform your business by responsibly injecting artificial intelligence into your projects. Our host Deep Dhillon, long term AI practitioner and founder of Xyonix.com, interviews successful AI practitioners and domain experts to better understand how AI is affecting the world. AI has been described as a morally agnostic tool that can be used to make the world better, or harm it irrevocably. Join us as we discuss the ethics of AI, including both its astounding promise and sizable societal challenges. We dig in deep and discuss state of the art techniques with a particular focus on how these capabilities are used to transform organizations, making them more efficient, impactful, and successful. Need help injecting AI into your business? Reach out to us @ www.xyonix.com.

All Episodes

Your AI Injection

Plagiarism 2.0: ChatGPT, AI and Generative Content Concerns

December 16, 2022 • Deep • Season 2 • Episode 12

ChatGPT, OpenAI's most recent AI-driven text-generating chatbot, is changing our definition of plagiarism. In this podcast episode, Xyonixians Carsten Tusk and Bill Constantine join host Deep Dhillon to dive into the impact of ChatGPT on original text-based content creation. They question the notion of individualized thought ownership, whether or not it's possible to build a model to accurately detect plagiarism given just text, and pontificate on how far you can edit content before it may be considered "plagiarism". Join these AI experts for an insightful and innovative discussion about the future of plagiarism and ChatGPT.

Listen to our last podcast about ChatGPT here!

Dig deep and learn more in our articles on ChatGPT3 and Large Language Models:

ChatGPT: How long will it take for a cup of water to boil if I yell at it? https://lnkd.in/gy56vMba

[Automated Transcript]

Deep Dhillon: Hi there. I'm Deep Dhillon. Welcome to your AI injection, the podcast where we discuss state-of-the-art techniques and artificial intelligence with a focus on how these capabilities are used to transform organizations, making them more efficient, impactful, and successful.

Carsten Tusk: I think before we start, we should actually define what plagiarism is, so we are all on the same page, right? It's, it's only plagiarism if you would represent that as your own expression or your own original work. If you say it and you, I guess if you don't attribute it to somebody, one could claim you claim it as your own work, but um, as long as you say, these thoughts are not my own or something at the end of your article, technically it wouldn't be plagiarism or this post has partially been generated by ai,

Deep Dhillon: but how much of what we say or do, like, let's rewind 30 years, 40 years ago, right?

Every human thought they were pretty original back. And we genuinely thought like, okay, like you go out, you read a bunch of books, you assimilate the information. Whatever comes outta my mouth is mine. But in this world where like, you know, 8 billion of us are like tightly coupled, can you even say that anymore?

Can you even say that your own thoughts are that original? I mean, we all have a million experiences where the right people with the right background come up with the same. It's like, yeah, I was, I was gonna ask a question. They both came up with calculus. I think in, just

Bill Constantine: in terms of the creative aspects of our brains we're influenced by people you see in the music industry.

People say, Hey, they, you know, you stole this riff by LED Zep. And it's like, well, how many different riffs are there in the world that are unique that also humans actually like?

Carsten Tusk: And at what point are they unique? How different do they have to be to you? Be unique, right? Because especially in music, like styles influence each.

And somebody makes a riff that sounds almost like Jimi Hendrix, but it's not quite Jim Hen Hendrix, and then he suddenly is a great guitar player and not a plagiarist, right? So what makes things unique and, and to deep start, I think the question should be what is not

Deep Dhillon: plagiarism? So what is not, I feel like is a little bit easier to answer.

If I put together a series of sentences and the indivi and I take them sort of sentence by sentence. And I can query the vast historical reservoir of the world across ideally every language, and I cannot just query for the exact syntax or spelling or whatever, but also encompass the semantics. If I come back with nothing, then I feel like that's not pla true.

Carsten Tusk: Oh, you had a bad search.

Deep Dhillon: Well, that is kind of part of the problem, isn't

Bill Constantine: it? , when you submit

Deep Dhillon: a paper to,

Bill Constantine: you know, certain peer review journal articles, there is a due diligence by some of them to see if you've actually plagiarized material. So it comes down to how good is your search engine, right? And I, I would say I , I feel pretty clean in thinking that I came up with a lot of original thought.

In the stuff that I've written in the past, but the reality is that I'm totally influenced by all of the history of other people's writings, the style that they use. Oh, I like the way that they phrased that. So is it plagiarism if I grab sort of the idea from someone, but maybe change the words around a little bit?

To me that's, that's,

Deep Dhillon: that's what most pros to me that technic universities are gonna say, nah, now you're in clear plagiarism territory.

Bill Constantine: Right? So if I do that for a blog article, I don't feel bad, especially if I reference the thing which I. But when you do that for a technical article that's going in a peer review, you know, it's supposed to be innovative and, and original and and so forth, and, and they're much more scrutinous there.

But the reality is it's, this is, it's a fuzzy line really, I think. I think

Carsten Tusk: there you also have to differentiate between like, yeah, like news outlets and people publishing news or writing articles, which is like, I think it's probably 80% of plagiarism, right? Because there's an event happening somewhere.

Reuters released a little, a little statement of what we have,

Deep Dhillon: 99.9%

Carsten Tusk: of plagiarism. I mean, you know, a hundred newspapers pick it up and like rewrite it.

Deep Dhillon: I know and we accept that as what? I mean we, we accept it as it is what it is. I mean, they don't even bother to reference. There's best case scenario, there was one reporter on the ground.

Right. Often there's zero reporters on the ground. ,

Carsten Tusk: you say they don't even bother to reference, they don't even bother to write the truth. But that's a different story. . Well that is

Deep Dhillon: a different, but let's like going back to like, into like, well let's get it to, let's get it to chat. G b T cause I think this is the, the crux of the matter is chat, G B T.

and, you know, other large language models are sucking down more info than any single human can in multiple lifetimes if they have them, you know, and, and it's doing it in a way that is authentic in some sense, I would argue it's authentic in the sense that it's genuinely trying to quote, read what it's seeing.

And you could argue that like, Is how different is that entity's sort of answering a question just based on what comes to mind in the same way that we would, I feel like putting some reference checking, like after the fact maybe gets us to a place where we could live with it, but it seems like the very notion of.

Individualized thought ownership

Bill Constantine: is in question here. Well, you, I love how what you said, not long. When you started this conversation, like 30, 50, whatever years ago, everybody thought how they were, they were so original. And these models just to be absolutely clear, differ in one giant respect. They said, well, Instead of make building a model for the medical domain and for poets and you know, all these different domains, he said, we're gonna take all the data in the world in modern history and we're gonna use that text mode as a means of really capturing what it is to communicate as a human being to one another for all of the world, all of the.

So you no longer have these like pockets of division where somebody's an expert in poetry or an expert. You're now really literally absorbing the world's worth of information going back however many decades. You know, you've see individual discipline. So this idea of plagiarism. Or this idea of even these completions that are coming out of AI 21 or out of G P T three or out of all these large language models, they're just this giant consensus of text information in a way that's almost like plagiarism,

Well, it

Carsten Tusk: is because I don't, so if you define plagiarism as. Representing something that is not your original work done. By definition, what G p T three does is plagiarize because there is no thought there. There's no logic, there's no reasoning, there's no thinking about what it says. It basically just recombines facts, uh, in patterns that it has learned from what it has seen.

Deep Dhillon: Yes. And how is that different from any one? Reading 50 different papers and books and then being asked to author something based solely on those 50 papers and books and saying it's not plagiarism. I mean, the main thing that comes to mind is like citation, like we will actually cite, it's when we get really close to a particular idea.

Well, we

Carsten Tusk: also logic and reason about what we, what we have read. , so just like 50 books and like 30 state one thing, 10 state another and another 10 state something else. Then as a human you kind of like look at that and you try to combine it and you might come to the conclusion they're all wrong. You might say they're all wrong because they can't agree, or you might say, well, you know, the majority thinks that, but I do think, I have a feeling that that this particular statement here makes more sense and you might interpret that G PT three will not do that.

It will not reason about it, can't reason about.

Deep Dhillon: I don't know. I mean, it certainly looks like something reasoning about things. Everything you say about like stack ranking, the frequency of occurrence of particular positions and stuff, that's all statistics that's accounted for. Well, sure. No. Yes, but humans are doing statistics when we do the same thing.

I've heard that nine times. , I give that a certain amount of weight. One of the three times that I heard something else happened to come from somebody. Way more important. I'm going through that process. Humans.

Carsten Tusk: Humans also do pH. So go back to some old Greek philosophers that basically talked to you about making certain assumptions and then trying to reason about them.

So you can try that with G P T or with chat G P T. It'll fail miserably that that level of reasoning does not exist.

Deep Dhillon: I don't know, like I've had some pretty involved conversations

Carsten Tusk: with it. Yeah. But it doesn't believe, I, I, I literally tried that. It

Deep Dhillon: definitely reasons, like in some it does not, cases does,

Carsten Tusk: you can tell it simple things like assume a porcupine is blue and blue, things can fly, can a porcupine fly?

It will not make that leap for you. It'll not, you know, give up on what its knowledge base knows. It will just insist that occupants aren't. We're also, that's a very simple example.

Bill Constantine: I think we're also touching upon one of the advents of chat g, PT three or G P T versus. Older large language models deep, and I just discussed this not long ago where in the G P T two G P T three world, it was just very fun to play with.

You could see how it did language translation, you know, exceptionally well, for example. But what was missing in terms of like you believing everything that it spit out, you, you thought that, wow, that sounds very human-Like that was kind of the part that freaked you out. But then you would get answers to things like, you know, what's, what's larger a pizza or the.

And it would say, sometimes it would say like, the pizza is, and we all know as human beings, you know, when you're, I don't know at what age I knew, maybe four years old, five years old. That's a physics issue. That's a perspective issue. You're simply looking at something right in front of your face versus something that's, you know, tens of thousands of miles away from you.

And so it doesn't have those, those patterns of, of like physics or there's rules and physics and so forth, but even. Even when you take things that are sort of like laws of physics and things, there's also these subjective materials like saying that like Hitler was just this wonderful guy. Things that people find objectionable.

But what they've done now is they've introduced a layer of human, uh, sort of intervention on top of this, which is reinforcement learning, which allows humans to actually read these outputs from, from the old. And stack ranked them to say, you know what, you gave me 20 answers for this. And it turns out that that's not the way most humans think and feel.

So there's, it's still

Deep Dhillon: not sentient. There's definitely, yeah. I mean it's definitely, it's

Bill Constantine: not sentient, but it's this, but it sounds human.

Deep Dhillon: It sounds better. Yeah. And I, and just to push on, um, Carson, on your point about the lack of the ability to, to see the physical world. I mean, definitely rewind six months ago.

Yeah, that was definitely the case. But with Chad, G B t, if nothing else, it's creating a good illusion. Like I. Really involved conversation with it about how long it would take to yell at a cup of tea to make it boil. This was up there with some pretty high quality grad school, late night drunken conversations, , and this thing was killing it and it, you know, it picked a PO position and it stood with it, and everything was same.

It's coming back to like a lot of different physics principles that we were touching on, and it was making a coherent. Like argument that lasted a while. I'll, I'll, I'll, you know, in the show notes, we'll, we'll, we'll post that, that conversation, but I, I definitely get that. We know that there's, there's a gap in like a physical understanding of the world and how these things are trained.

Well,

Carsten Tusk: not, not even physical, but. Metaphysical understanding, right. It, it doesn't deal with metaphysics, it doesn't deal with like, you know. Mm-hmm. thought experiments or philosophical assumptions or things like that. There's no doubt that the thing is just an amazing feat of technology. And then the conversations that are coming out are nothing short of really fascinating.

Um, but there's certain things that are missing and. And, and the other big thing is that it often just makes shit

Deep Dhillon: up. Well, let, let's take it back to plagiarism. So, so there's a, there's a few folks that are now making models to specifically detect chat, G P t, uh, g P t two, three, whatever content. And so, so somebody, you know, took this and started, you know, taking some chat G P t output and putting it into this model.

So I sat down and I thought, Let's assume that if you're a professor and you got a bunch of essays coming in as of last Wednesday, I don't think you can assume the bulk of your people are not using this anymore. like I, I mean, I showed this to my daughter or friends, they all like, were taking notes immediately.

they were like, we're on this man. And, uh, and so you, you're gonna go. Standard, you know, anti plagiarism tools that are looking for little references. So I did a, a couple of directions, like, so one direction is I just, I went to chat G P T and I said, you know, write me an essay about blah. I take this essay, stick it into this model, and it comes back 99.8%, like G P T stuff.

So I was like, okay, well let's break it down a little bit. Let's start messing around with it. So then I. Occasionally cuz I, I had this theory like, oh, I think there's a couple of a few things going on. One thing that I think is going on is people make mistakes when they actually write something. Like they don't use exactly one space between a period and the start of the next sentence.

So I threw in a couple extra spaces, you know, inspiration for whatever Elon Musk's scam was to figure out who was leaking information. I dunno if you guys followed that, but that was kind of No, I didn't. Oh yeah. So apparently at, I think it was Tesla, somebody was leaking information. And he had authored, um, he'd authored some, some texts.

I see. And I don't know if this is true or rumor, everybody got a different text. Everyone got a different one with, with like a different encoding of white space between like words and sentences. Ah. And then they like figured out exactly who was setting up kind of a clever, clever approach. So anyway, I did that.

My 99.9% starts dropping pretty fast based on just white space, uh, inclusion. So what did it go down? Like if I changed a couple of white spaces, it was still in the high nineties, like 90 nines. And then, but then when I started doing stuff, like I went the other direction too. I took some texts that I had actually authored, put it in mm-hmm.

and it said, not chat, G P T, not g P T, but it wasn't always, it wasn't down at zero, it was like at 55% or something. So then I, I took a phrase like, you know, we used terms on our, on our website, like AI consulting technology. So I took out the AI part so that it was less. It would jump up and be more, you know, potentially viable.

So I think there's like a bunch of things that the model inherently and hones in

Bill Constantine: on, making it less pristine, a little bit more human, a little bit more messy, tends to, and a little bit more than,

Deep Dhillon: yeah. Also making it more general. Need help with computer vision, natural language processing, automated content creation, conversational understanding time series forecasting, customer behavior analytics.

Reach out to us@zonic.com. That's xy ni.com. Maybe we can help.

But the larger question I would ask is like, given just text, can you actually build a model to detect whether this thing is authored automatically or not? And what kinds of, I don't think so. Things would it hone

Bill Constantine: in on? The thing that you just mentioned are things like Jack gtt is not probably gonna put out things with a ton of junk in them.

Given that there's this extra human layer that's been applied to it now. So when you do more human-like kind of stupid things with spaces

Deep Dhillon: and so forth, I put slang in there, it, the slang dropped, dropped my, uh, plagiarism score way

Bill Constantine: down. Well, you

Carsten Tusk: know these, but that's all fake detections, right? I mean, at that point you're just like detecting pat writers and not whether or not it's Che G p T or not

Deep Dhillon: Yeah. I mean, and also. if you can get chat g b t to sound like a bad writer too. And so then I was messing with that. I'm like, Hey, sound like somebody with, uh, grammatically incorrect, uh, you know, statements regularly try to misspell some stuff and just seem like an ignorant buffoon and talk about block.

And I put that in. and yeah, the detection was a lot

Carsten Tusk: lower. . Yeah. It's like you, you're trying to, like, you just trained a model to learn all the different writing styles, including abnormality, abnormalities of the whole world, and now you're trying to figure out to reverse engineer that again and, and make distinguish itself from, from that.

I don't think it's possible. I think this thing can produce text in any style that is represented on the internet in significant demands. And it learned that. And I agree.

Deep Dhillon: We, I mean, that was my takeaway too, was like we can come up with something that will catch the lazy plagiarizer, but the, but you know, the motivated plagiarizer, I don't know that you're gonna be able to catch up.

Well,

Bill Constantine: we don't have here is, we know that these models take a prompt, which is. You give it a few, uh, examples of the things that, the type of responses you're looking for, those few shot learners. And then it also has all these parameters that you set. You know, how, how wild do you want the completions to be, et cetera.

Not knowing the parameter sets behind what you're seeing at a g chat g p t make, it makes it impossible to reverse. Theoretically, if you knew the settings behind the scenes and some kid for a class wrote a paper and the professor said, well, I'm gonna take some of the things that he wrote and put 'em in as a prompt, see.

GBT three generates, and I'll compare the two writings and if they're super similar, I know that's a strong hint that you probably plagiarized the thing, but you don't know. Right? You don't know those settings in general, so you possibly reverse engineer it. I mean, you could. There's a conceivably take like

Deep Dhillon: a, go ahead.

Go ahead and finish the thought.

Bill Constantine: Well, there's this idea. I come from a non-linear dynamics world and it's like, It's the same thing like you observe this output from this very complicated system. Can you reverse engineer it and find out the parameters of the model that governed those solutions and produced that output?

This is the same thing. We, we see this output and can we reverse engineer one of the parameters that went into this model that could actually generate the output that we see before us. Maybe it's in the style of writing, you know, with Kerouac or whatever given we just, it's very, very, very difficult to do that.

That I would say almost impossible

Carsten Tusk: given Kao's theory. I would say the answer is no, , .

Bill Constantine: Um, but there's also, there's also that

Deep Dhillon: right? , how would we, let's say that we wanna build a plagiarizing detector, right? And yes. And, and let's, let's have the thought experiment, like from, uh, let's, let's think about it first without reference text.

So like, we can't just have a Google like index of everything actually said, but even that suspect, cuz even that's gonna have stuff that isn't real. Um, but like, just what would we do and how would we go about it? Even formulating the problem, I would say

Carsten Tusk: without reference text, you wouldn't, it's impossible to detect plagiarism without reference text.

I would agree. If, if you had a good search engine and you'd say, okay, we just assume that we define plagiarism as um, something that is replicated, um, and also known to this, this particular corpus of this search engine, then you could simply search for it. I think that's honestly the only way to reliably detect plagiarism, and then you just have to figure out at which point of overlap you.

Define something as plagiarism. And even that is a difficult question, right? Is a single sentence enough? Does it have to be multiple sentences? Does it have to be the same context? Right?

Bill Constantine: But if I write an article, Carson and Deep, uh, and I submit it to a physics journal, one year later I read another article.

In another physics journal, and I take a sentence from that article A and I just change some words and put it into Article B. Is that, am I plagiarizing myself if you don't quote yourself

Carsten Tusk: question? Right? If, if you don't, if you don't quote yourself properly, you do.

Bill Constantine: Yes. Let's say I do. Let's say I do reference.

Carsten Tusk: Myself. No, you're fine. And it's not plagiarism, so the same thing. No, it's true to

Bill Constantine: is just, I, I, I agree with you. It's a little, I, I just think it be, becomes super fuzzy with the, these lms. I mean, we're at a, it's almost like a moral thing,

Carsten Tusk: but the question to me is also I'm diverting from, from, uh, digressing, from deeps detector question.

But the question also does it really matter? Does what matter whether or not something is plagiarism or not? If it was generated. In this shape or form? I mean, so academia, academia, when if you go and you publish something as a new document or a new research paper in academia, they already do check if it's plagiarism or if you didn't quote something, it'll probably come out.

Right. And we stated earlier that if you are in the, uh, news, uh, article atmosphere, It's sphere. Um, nobody cares. They just, nobody cares in the

Bill Constantine: first place. It's

Carsten Tusk: entertainment and social media. Social media and blog posts. People care even less. Sure, yes. So is plagiarism a big issue? Do we really think people will use this here to, I mean, I,

Deep Dhillon: I think it's academic's a big issue.

Wherever you wanna, wherever you wanna attribute individual. Like if you want to give somebody credit as an, as an creator of this, of this truly individual thought, which school is probably the most obvious place where they care about that company settings. I think people care about it a lot less people generally.

I don't know. My take is like, people just don't want to get caught for plagiarism, right? It's not that they ethically, It's one of those things that's hard to define. You know, like we all know that if you just grab somebody's essay and put your name on it and turn it in, that's bad. That's, that's bad.

Everyone can agree on that. But then on the other end of the spectrum, if you, but

Carsten Tusk: let's wait. Hold up for that thought a second. That's only bad because, You claim to have done work that you actually didn't do and are rewarded for it. Yeah, so I think plagiarism it, it's not just the fact you replicated something, but it's kind of like the fact that you misrepresented your effort into it and get rewarded for it.

So I think it's linked to that reward system. Okay. So in other words, if you make a career. By plagiarizing people, that's bad. If you plagiarize things just to spread content, who cares? You're not if, if you, if you don't really get rewarded by it. I guess that's what most people do when they try to search, optimize their website.

Right? So,

Deep Dhillon: yeah. So, but like, let's take the same scenario, but instead of just putting my name on it, like, and let's, let's shrink it down. Let's say it's a paragraph about of text. This student grabs it, puts it in there, and then they take every single sentence. I'm just trying to like create the spectrum here.

So now instead of just putting their name on it, they take every single sentence and they just do some basic synonym swapping some basic like grammatical rearrangements, general sequences of sentences is the same. I think we're all still gonna basically say that's still plagiarism. We're gonna get it on that

Carsten Tusk: spectrum.

Well, you get a couple points for actually like changing the sentences, right? ? Sure. Right. Like,

Deep Dhillon: it's like all, all of us took, you know, CS 1 0 1 and there was always that kid that grabbed somebody's code and just changed all the variable names .

Carsten Tusk: Well, so actually, actually in, in coding it's good practice because you're taking a piece of validated Coke, right?

Bill Constantine: Sure. Well, here, I think we, we were touching on something that's super important. And you know, a negative view of this is like, kids today are gonna abuse this. They're not gonna learn anything. But if you think about it positively, it's like, well now this kid that maybe is not that great of a writer can be influenced by the world in different styles of writing.

Can actually learn. I don't how to write things correctly. The structure, the form, how you build a story. Well,

Deep Dhillon: yeah, hang on to that. I think that's super important. Cause I wanna finish my sequence. Uh, my thoughts . So now we're, we're still in agreement that it's generally plagiarism, but now if the, if, let's go up one level.

So now the student took that paragraph. They didn't just semantically like swap out words for synonyms and do some grammatical rearrangements. Now they just kind of took, um, the general idea of each sentence. They went out. Maybe they did some searching. They came up with some original ideas and they're like, eh, I'm gonna kind of delete this one.

I'm gonna like rewrite this here, I'm gonna do this one. And then maybe like one or two sentences are sort of in the se semantic, ballpark of the original. But this thing looks quite different and they've assimilated a lot of different sources. Plagiarism, is that enough to throw a kid outta college for at that point?

Well, he

Bill Constantine: certainly pushing away from what we might consider to be a, you know, A slam dunk. Right? Putting

Deep Dhillon: what are we effort, are we down to those two sentences that are still semantically quite similar? Is that what we're down to in, in terms of like throwing 'em outta school?

Carsten Tusk: I think, I think plagiarism goes back down to the amount of original thought that you put into your work.

And I can't tell you where the threshold is, at which point you have put enough original thought into your modification of somebody else's work that it become. A valid piece of your own. I think the same thing is true not just for text, but also for art, right? Yes. At what point does plagiarism become art?

If you take, if you take an image or a painting that you like, but you know, you don't really like certain aspects of it and you think, well, I can do better, but I really love that scene. Do you redo it? It's that plagiarism, right?

Deep Dhillon: Yeah. I mean, it's like when rappers started, uh, you know, lifting. Like rhythms and riffs and you know, like the Beastie Boys, like Paul's boutique comes to mind.

They had like, I mean, all kinds of stuff. Totally yanked outta context, put in a song, and then just like popped all over the place. Well, everybody said importantly, clearly important. That's an original sound. That's original thought, but you know,

Bill Constantine: like, but they presented it to an audience who would never have heard that material.

Yeah, you're, so these, so this new audience of younger people who are into this are like, oh, that's really cool, blah, blah, blah. And then you hear and you're like, dude, that was totally ripped off. That

Deep Dhillon: was like, so I think here's the point, here's the kicker, right? Like they just made their album and like, you know, a lot of hip hop and rappers just sampled.

I think this started in like late eighties or something, and then they just left it to the studio, you know, to the record label to go figure out who to pay and who to compensate and like how to like do it. And I think that's the, the key here is that maybe this analogy makes sense here. Like the, the rappers basically just boom, artistic, complete, and like liberal.

Go to town, put their stuff together and maybe they wind up with seven words from this artist and like a 10th of a second from that artist and now someone else chases it down, figures out who got lifted, how to give credit, how to write checks, how much should the checks be. I think that's how it worked out.

I don't know, like some music expert can tell us, but maybe that's what we need here. If you just lift and you just do your thing and then something else figures out

Carsten Tusk: how to a. And it brings us back a little bit to the definition of plagiarism, where they basically say that it's the, the representation of another person's language, thoughts, ideas, or expressions as your original work.

As long as you give that attribution to it and you, you correctly say, this is a song originally by so-and so, I'm redoing it. This is a cover, this is a rendition. It's no longer a plagiarism because you gave this original credit and you leave it up to the audience or the consumer to decide which version they like.

Deep Dhillon: So in the case of an automated tool, like a reference maker if you will, you, you go to chat gbt. So in, in our new arena here, what we're saying is you go to chat G B T and the new tools and you get whatever the heck you want, you do whatever the heck you need to, you go out and just write your thing and we're gonna, and you're just gonna rely on the quality of the tools to find references for.

Well,

Carsten Tusk: you don't even need to reference us. You just need to say that it's not your

Deep Dhillon: original work. I know, like, isn't that, that's a funny thing. Exactly what the professor's asking you in college . Oh no. Your original work that

Carsten Tusk: No. That, that is a problem. So in an academic scenario or where you have students using it to do their work?

No. That, that's literally plagiarism. If they don't, if, if it's not their work and they turn it in, no matter who generated it, it's plagiarism by definition. Right. If, if you didn't do it yourself, it's pla That's how it's. You, you were delivering something. You claim it's your original work, but it wasn't you pro typing the prompt into into chat G P T.

Does that make it your original work? I don't think so. Well, what if you don't? You

Deep Dhillon: put in one prompt. What if you're putting in 50 prompts and you're going back, it's still not your original word, but it's a good, but why is it not a legit legitimate seed? From which you can like dive, diverge?

Carsten Tusk: That's a good question.

I don't really have an answer to it. How different is it from you sitting in front of Google for three days and you know, typing the same queries in Google, getting documents back and taking excerpts of them, rephrasing them, and then putting them into your essay? That, in my opinion, is also, Exactly the same thing that chat G p t does in a way, right?

So you could do that labor manually. You could go to Google, execute all your queries on Google, read like 300 documents, then write an essay on Abraham Lincoln. Or you can ask chat G p T, um, the same question. From an educational point of view, I would say if you did that with Google, you learned a lot more than what you put into your essay.

So it was a more beneficial effort for the student. And in the end, work is rewarded, right? Not just the, the, the, the end output please. In, in a school scenario, I don't

Deep Dhillon: know if you learn more, like, you know,

Carsten Tusk: like, oh, if I read, if I read, like if you ask me a specific question about Abraham Lincoln and I have to read through like, you know, 30 different articles about.

I learned so much about Abraham Lincoln, that it's not about the subject or the aspect of him that I'm currently writing about, that I really exp expanded my horizon after I go through all that stuff. Well, okay,

Deep Dhillon: but let's talk about the flip. The flip is you used chat g p T and you got a lot of like high stack, um, thought that you were able to navigate cuz chat, g p t did the reading for you on the originals, but you were able to dive so much deeper because into that one aspect, into

Carsten Tusk: the one aspect.

So it's, it's a, it's a more concentrated thing. Feels, it's a more concentrated thing. You wouldn't have expanded your horizon. You just went deeper on one particular little

Deep Dhillon: subject. I mean, the, the analogy feels to me like, imagine I'm like insanely wealthy and I'm a student and I have access to people that I can assign to go do readings for me.

So I go and I have to write an essay on into, use your example, Abraham Lincoln and what. I, I say, okay, you read about, you read these 30 articles, you read those 20 articles, and then we sit in a room, we just talk. I got all that information assimilated, and I put together some original thinking that I call original except for debate whether it really is cuz now chat, g

Bill Constantine: chat g p t three in this context provides a very polar.

Filtered view of the thing you're ultimately seeking to write, and it takes away all of all of the 50 articles that you search through, through Google. You will learn a lot more than you actually need to write on a very specific topic and chat. G p t three provides a framework for you in this, in this case.

On how to go about writing. But I think it's, I think ultimately it's, it's going to provide inspiration to people that are not very good writers, but can't, can't really find structure. And, and I think it, if they were just to use whatever was, uh, spat out by chat G p t three as their own work, that is definitely plagiarism.

But if they use it as inspiration to go off on a particular paragraph and do more investigation. Write it in their own voice, change it in a way where they essentially feel that is their own thoughts on the subject. Then now we're getting away from something that's plagiarized.

Carsten Tusk: I think it's almost a different topic from plagiarism itself. The, the potential threat this poses to the educational system and, and assignments, right? Because whether or not it's plagiarism or not, you also have to ask yourself, um, what is the intent of that, of giving a student that job to write that essay?

Is it, is it really the end result or the process? And it's about the process and, and learning how to write an essay and, and, you know, formulating, uh, coherent sentences and things like that. I honestly don't think, though that chat g p t is a problem. It has changed the landscape. There will be way less like homework, essay writing, um, assignments because you cannot prevent your students from cheating.

So I think the solution to that problem is just like, don't do it anymore. Don't have them make things. They can so, uh, do things they can solve. Well, let's g pt,

Deep Dhillon: like let's say we're all, I don't know, literature or philosophy or whatever, professors, and we're coming up with homework assignments and they are at home, not in class.

What can we do? to encourage all of that original thought, that exploration, that research, fully let them use chat. G P T. Either we just accept that they're gonna do it anyway, or we embrace it. That's my attitude. Like what? Or you can take assignments, can you

Bill Constantine: give? You can take a class. You can take a classroom if you're teaching some sort of English writing, maybe an English writing assignment.

You can have people write in class on the spot and submit their papers. You know, you know that they haven't had access to chat. GPP three there. They're writing their material, and then they have maybe some sort of review, you know, some peer review. It's almost like we do that,

Deep Dhillon: but I'm talking about the homework content.

Like there's a difference between a one hour in-class thought process, I get it, and, and somebody going out and writing a, you know, a nine page, but it's page paper on something.

Bill Constantine: It's the same thing that happens in math, right? You teach them the basics, the components that they can use to do the work on by themselves, and you test them on those things.

And then later on in life you say, by the way, , you'll probably just use a calculator or a computer to do what you just learned. And cuz no one in the world sits down and does things as we've just done everybody in the world and nobody, nobody is differentiating or, you know, or so to a, a certain degree.

You teach them the, you know, the art of actually creating something and the structure in which they were, were going to create that thing and whether they've done so correctly or not. But then when they get out in the real world, no one, everybody does things as efficiently as. And chat GPT three and every, you can argue, any aspect of technology makes us inherently a bit lazy and, and a bit less creative because, uh, ultimately a lot of the technology we create just makes our lives a lot easier.

simply, I mean, I, I would be inclined, but no one use the advocates anymore.

Deep Dhillon: I would be inclined to do the opposite. Like I would be inclined to say, use chat, j p t, use it. And here, and I would be inclined to take really like things that I know are ambitious and difficult to do with just chat g b T. Like I want you to write like a, a 20 page paper.

Um, but I want it to be, I want you to take this specific idea from the 14th century blah, blah, and that specific idea from this other worldview. Specific because I know that these things are good generalized learners and they can help you. You can take it up a level away from instances to generalize concepts and you'll get some benefit, but I would just tell 'em, blatantly use it.

Carsten Tusk: That's the same thing that actually happens in in math education. Um, you know, when you first start off. You're not allowed to use a calculator, right? Mm-hmm. , you're basically, you're learning the basics. You're learning the principles. You can't use a calculator in class. Later on in my graduate studies.

Use whatever you want. Use a calculator, use a computer use. You can do whatever you want. You just need to solve the problem, right? So it's kinda like this slope of, um, giving them more and more freedom to solve the problem at hand, but you adjust the difficulty of the problem to the tools. And, and that's exactly what you just said.

No, I think

Deep Dhillon: that's exactly what you do, right? Like, I mean I used to, I hated getting open book, uh, take home exams. They were the worst, like in grads. I know that six days of my life are totally and completely gone. Like, please just gimme a one hour in class exam. Like I do not want to do this . And I know that I'm gonna be like up till like four or five in the morning cuz these things are hard.

And you know, and I don't know, back then it wasn't like, Sitting on the internet where you go Google that particular professor's exam from whatever. I, I feel like the, the, the educational establishment needs to really embrace these capabilities and be clever and figure out how to get their students to leverage it.

Cuz in the real world, Like, we want you to use all tools that are available, including ones that help you with high stack strategic thinking. I mean, I, I, I firmly believe like if you exposed to something that's this smart, it's like having 50 grad students to ask questions to, like, how is that not gonna ultimately raise

Bill Constantine: the bar with you?

I, I actually, I tend to think positively like that. I think that this is not gonna be a tool. Certainly it's gonna be used as a tool for certain people to flat out plagiarize things, but frankly, I think if you're that type of person, that's probably gonna be that way no matter what. If you're the type of person who's actually interested in learning and interested in betting yourselves, you will use this as a, as a tool for inspiration.

You'll be standing on the shoulder of giants just as those people before you stood on the shoulders of giants, just as those people, um, before you stood on the shoulders of giants and so forth. Technology always makes our lives easier, but your inspiration to actually learn a subject and better yourself will continue to improve this.

Just frankly, this just raises the bar for teachers in a way. It takes something that's already a very subjective experience and really puts the onus on them to be able to say. I can't tell whether this kid actually did this on his own or not. Think

Carsten Tusk: a little bit about, if you really wanna test it, you could still do these homework assignments.

You can let them use whatever tools they want to write them, and then when they turn them in, you make it a little bit like a thesis defense. Yes. In other words, you, I love that idea. You read this thing and you ask in particular questions about certain paragraphs that they wrote, and then you ask them why they claim that and what else they can tell you about it, and that comes back with a blank.

Then

Deep Dhillon: that's beautiful because at that point, even if Chad G P t wrote everything and they wrote nothing but they. It

Carsten Tusk: if they know the subject matter and they have done the research, but it, it gets difficult that that only tests them about the content of what they wrote, not how they wrote it. So if you're an English, uh, literature professor or something , it becomes still difficult because there is all about the way that you write something and not so much about what you write.

Oh, I disagree. I disagree. Well,

Deep Dhillon: in a way, I mean, come on. Let me ask you something else though. Are we, I'm gonna tell you like, so I have my son when he was maybe. Sixth grade, seventh grade, he's, uh, a bit of a perfectionist. He would be tasked with writing an essay or, you know, a paper or something. He would spend a lot of energy, like googling stuff, figuring out like what his general space is.

And then he would feel this burden of novelty. Like he, he really wanted to write something that was unique, but he's looking at the world and it, it would be really hard for him to find some novel, some novel approach. Then he would go write his. And then he would find out at the last minute that somebody else had already disproved his position or something and then he would just delete it and turn in nothing.

And I, I feel like we're taking humanity and we're sort of asking everyone to be unique and it feels like we're creating this massive burden of novelty. Like not everyone is like in a graduate program where the defacto position is to find so, Incredibly narrow that no one else has ever done before.

We're losing something in that process where you genuinely have to come up with something novel. Now, as a seventh grader, I

Carsten Tusk: mean, seems, no, I don't think you should have to. I, I think you're right there. I think that shouldn't be the case because it's almost close to impossible. And just to come back to what I said in the beginning, what is not plagiarism?

Right. I, I think I agree with you. I do not think that we are incredibly unique. I think there's, at any given point in time, probably two humans that have the same thought. It, it's difficult to say.

Bill Constantine: I think so. I think with the age of the internet, we kind of soon figured out, actually we're not as unique as.

You might think you are. We tend, we tend to actually be much more alike than we are different. And it's just, it's a bit sad, a bit of an ego punch to a lot of people thinking that they're really special sauce, but they're actually one of many people who had those same original thoughts. There's, there's another thing too that I think is important.

It's not difficult to be, be novel. As much as it is, there'd be novel and write something of quality value, right?

Deep Dhillon: Substance, let's call it like impact of of

Bill Constantine: substance. Math is, math is has its own u sort of unique set of rules where things tend to be a little bit more black and white than something more subjective as go and write me an essay on a particular

Deep Dhillon: topic.

I, I think this has been a, a, a fun conversation on the AI and plagiarism. I'm gonna end it with like one final question. Let's fast forward 10 years, let's go 10 years out. And with all of what all of us know about where these models, you know, like, I don't think any of us know where exactly they're gonna be in 10 years, but we've seen this trajectory that's incredible.

are we still in 10 years gonna be talking about AI and plagiarism, or are we in a totally different place and, and like, how's the world different from a, from a plagiarism vantage?

Bill Constantine: I'd like to take a stab at that because where we're going right now is we're, we're, we're talking about one modality of communication, which is text, but it's going to, these models are now going to absorb audio.

We've seen in a different, uh, product called Dolly, where they can create images based on text. But we're gonna be going the way of video too, where video is actually input so deep. I'm gonna have, uh, your wife shoot us walking down the road. You're gonna be playing the banjo. I'll be writing a pony. We'll put it in black and white and we'll make a little short video and then we'll feed that into one of these models and we'll, we're gonna say, Given what I just what you see before you write a short story that explains this video or it will be the case, I think probably before we die.

That we say, write me and create a movie for me of Carson and I going to Europe together and, you know, blah, blah, blah, and, and an actual video will be created of two characters, Carson and Bill going down the street. We are absolutely headed to these different mor modalities of communication, video, audio, and text.

Now we're really talking about this world. Plagiarism. It, it's very, it's gonna be very, very fuzzy, even more so in, in that world, but we're definitely headed.

Carsten Tusk: So I, I, I actually don't think that plagiarism is a big problem, nor that it matters that much in the grand scheme of things. Um, like I said, it's just a measure to determine whether or not you should be rewarded for your work.

Um, I think what is more important is whether or not that work that you have done or that you're presenting there, uh, resembles the truth or not. And, and I think my, the, the big problem we will see with this is, uh, we won't know what content we can. And I think that goes from block posts or block articles.

I mean, we have that problem already, right? It's, it's well established in our new fake news. Well,

Deep Dhillon: we had an insurrection at the capitol literally because of misinformation.

Carsten Tusk: I know, but what worries me more are the things that are generally accepted as like evidence of wrongdoing today, right? If somebody kept us a security, um, camera video of you beating up that school kid in front of your house, you might go to jail, right?

If I can 100% fake. What kind of evidence can we still trust in that new world? And so the job of telling fakes from real evidence is gonna become a lot more difficult, if not impossible. And that, that worries me a little

Bill Constantine: bit. And the reality is, is those tools that you speak of are being used by people who are not experts in artificial intelligence or data science.

They're law enforcement officers, for example, who, who aren't experts in that field. And they're using this as a valid. Well, the jewel is telling me that you were the guy that stole this. Well, not,

Carsten Tusk: not, not law enforcement, but just plain out criminals. Like some guy I might just say, Hey, I saw Bill yesterday.

Kick that cat in front of my door. Yeah. Yeah. And sue you for it. And then I Go on. I mean, I think, I think

Deep Dhillon: like, yeah. Yeah. Well, I'm gonna try to take a stab at it. I think these are all awesome points. My take on it is humans are pretty smart. I disagree. Well, uh, bear with me for a moment, . Cause I, I don't understand.

Bill Constantine: I don't understand,

Deep Dhillon: but like, if you re rewind like a couple hundred, like 150 years ago, Somebody invents the photograph, the ability to shoot photographs. But it was a painful process. Like you had these little boxes and you had chemicals and darkrooms and all stuff. And then we had like 70, 80 years of uh, maybe a hundred or so years of photographers and it was in art and people, you know, were, were capturing things in certain ways and moments.

And not to diminish photography, cuz I think there's still, you know, photographers doing some unique work. But now we're in a world where, you know, everybody's got a camera, everybody's got access to like amazing edit, edit, editing capabilities. Everybody's got Instagram, all this stuff. And we're just saturated in high quality imagery.

We don't really care much about photographs anymore. We don't even think about the character of the photograph. We think about the moment, you know, that it reveals something about your holiday or what, or or, or like what you were thinking or something else. But we're on a different level and I think that's what's gonna happen here is like we've got these tools.

That are gonna produce incredibly high quality things, but our bar is gonna go up and what we focus on is gonna change. Like we're not gonna be satisfied with somebody charging me 15 bucks to go to a movie theater, to see a film that some, that somebody. Yanked out of a, out of a 15 seconds of effort and a, and a and a great prompt with the killer robot like we're gonna want, cuz we love stories about humans.

I don't know that we love stories about particular robots, , even if they have personalities, maybe, maybe not. But I think we love stories about humans and if we know that somebody. You know, did this and that and like had this journey and it's like this whole thing and then somehow they produced this thing that, that the A bot maybe had a role in but didn't, you know, couldn't claim ownership of it.

I think we will naturally migrate to those other areas that are like linked with human stories that we can bond with. That's my take.

All right, guys. Thanks the time, . Uh, this was a, this was a great conversation. Love that. That's all for this episode. I'm Deep Dhillon, your host, saying Check back soon for your next AI injection. In the meantime, if you need help injecting AI into your business, reach out to us@xyonix.com. That's x-y-o-n-i-x.com.

Whether it's text, audio, video, or other business. We help all kinds of organizations like yours automatically find an operationalized, transformative insights.

People on this episode

Deep Dhillon

Host