Yes, it’s in the lawsuit and another article I read. Open AI said they used a specific dataset, and the makers of that dataset said they used some online open libraries which have full texts of books. That’s the primary basis of the lawsuit. They also argue that if you ask ChatGPT for a summary of their books, it will spit one out, which they are claiming is misuse of their copywriten work. That claim sounds dicey to me, Wikipedia and all manner of websites summarize books, so I’m not following how ChatGPT doing it is different. But I’m an idiot so who cares what I think.
Remember, the human that wrote a summary had to legally obtain a copy of the source material first too. It should be no different when training an AI model. There’s a whole new can of worms here, though, since the summary was written by another person and that person holds the copyright to that summary (unless there is a substantial amount of the original material, of course). But an AI model is not “creating” a new, copyrightable work. It has to be trained on the entire source material and algorithmically creates a summary directly from that. Because there’s nothing ‘new’ being created, I can see why it could be claimed that a summary from an AI model should be considered a derivative work. But honestly, it’s starting to border on the question of whether or not what AI models can do is considered ‘creative thinking’. Shit’s getting wild.
I was under impression that there was no real definitive way to tell what ChatGPT or similar AI use for their training. Am I wrong?
Yes, it’s in the lawsuit and another article I read. Open AI said they used a specific dataset, and the makers of that dataset said they used some online open libraries which have full texts of books. That’s the primary basis of the lawsuit. They also argue that if you ask ChatGPT for a summary of their books, it will spit one out, which they are claiming is misuse of their copywriten work. That claim sounds dicey to me, Wikipedia and all manner of websites summarize books, so I’m not following how ChatGPT doing it is different. But I’m an idiot so who cares what I think.
I care. Idiots unite!
Remember, the human that wrote a summary had to legally obtain a copy of the source material first too. It should be no different when training an AI model. There’s a whole new can of worms here, though, since the summary was written by another person and that person holds the copyright to that summary (unless there is a substantial amount of the original material, of course). But an AI model is not “creating” a new, copyrightable work. It has to be trained on the entire source material and algorithmically creates a summary directly from that. Because there’s nothing ‘new’ being created, I can see why it could be claimed that a summary from an AI model should be considered a derivative work. But honestly, it’s starting to border on the question of whether or not what AI models can do is considered ‘creative thinking’. Shit’s getting wild.