cross-posted from: https://lemmy.intai.tech/post/43759
cross-posted from: https://lemmy.world/post/949452
OpenAI’s ChatGPT and Sam Altman are in massive trouble. OpenAI is getting sued in the US for illegally using content from the internet to train their LLM or large language models
there were court cases around this very thing and google and webarchive. I suspect thier legal team is expecting similar precedent with the issue being down to the individual and how they use the index, example, using it to make my own unique character (easily done) vs making an easy and obvious rip off of a Disney property. The same tests can be applied, the question IMO isn’t about the index that is built here. I can memorize a lot (some people have actual eidetic memory) and synthesize it too which is protected and I can copyright my own mental outputs. The disposition of this type of output vs mechanical outputs i expect will be where things end up being argued.
I’m not going to say I’m 100% right here, we are in a strange timeline but there is precedent for what OAI is doing IMO.
The issue becomes the sale/profit of selling access, such as with GPT-4 right now. Indexing/archiving and selling are two very different beasts.
interesting lines to walk, depends on what they are selling, there is a definte cost to running a model and you are allowed to charge a reasonable fee to handle the process of providing the records. we used to pay per page for this kind of thing, now you pay per token
they can also sell a lot of services and tools around the model while still not using it in a non-infringing manner. this will all end up in front a of a judge, with the books laid out i suspect. I am not sure we will ever see any of the details, i hope we do.