This is a classic case of tragedy of the commons, where a common resource is harmed by the profit interests of individuals. The traditional example of this is a public field that cattle can graze upon. Without any limits, individual cattle owners have an incentive to overgraze the land, destroying its value to everybody.
We have commons on the internet, too. Despite all of its toxic corners, it is still full of vibrant portions that serve the public good — places like Wikipedia and Reddit forums, where volunteers often share knowledge in good faith and work hard to keep bad actors at bay.
But these commons are now being overgrazed by rapacious tech companies that seek to feed all of the human wisdom, expertise, humor, anecdotes and advice they find in these places into their for-profit A.I. systems.
A truly poor analogy. LLMs don’t remove anything from anywhere. They consume no shared resource.
It’s been wild watching people flail about searching for arguments for why LLMs should be stopped. I’m not even saying they shouldn’t, just that I haven’t seen a solid argument for it.
As per the article, it goes like this:
- AI is trained on publicly available data
- AI does not credit or compensate original authors
- People don’t like their work being used without
- People share less publicly
- Public spaces desert
And simultaneously, AI content of poor quality drowns what is left.
In terms of arguments, have you heard about control / alignment problem or x-risk?
Isn’t that true with people too? If I read a bunch of books and then use what I learned to write a new book, I’m not crediting the original authors. If I learn painting techniques from Van Gogh and el Greco, I’m not crediting them either.
You’re equating sentience with non-sentience. a LLM is a non-sentient program, created by humans to learn language. You are a sentient person who is influenced by the painting techniques of Van Gogh and el Greco. While you don’t need to credit them, they have influenced your work. That is entirely acceptable practice.
This is a huge difference in the realm of copyright.
EDIT
Also the works of the artists you mention are in public domain in most countries. They can be used by LLM without incident. Works of artists not in the public domain should be subject to copyright law for LLM.
But these commons are now being overgrazed by rapacious tech companies that seek to feed all of the human wisdom, expertise, humor, anecdotes and advice they find in these places into their for-profit A.I. systems.
I think that the concept of tragedy of the commons is being misused here. When feeding data into those models, there’s no common resource being used, as the data doesn’t cease to exist once you feed it to your L"L"M. Instead what’s happening is that they’re further breaking what was already broken - the legal concepts of IP and copyright.
Where the concept could apply is the usage of the output of those models, with the common resource being the overall quality, reliability, and usefulness of the internet, for the sake of petty benefits (such as advertisement/spamming/marketing). However this degradation predates the large “language” models (and the internet itself), and it isn’t a result of the technology itself.
This was my first thought as well. My second thought is all the harms we’ve caused to ourselves in the digital age, and we only start to care when it hits our pocketbooks.
Again?! Damn I guess it wasn’t bad enough already
This is the best summary I could come up with:
Thanks to artificial intelligence, however, IBM was able to sell Mr. Marston’s decades-old sample to websites that are using it to build a synthetic voice that could say anything.
A.I.-generated books — including a mushroom foraging guide that could lead to mistakes in identifying highly poisonous fungi — are so prevalent on Amazon that the company is asking authors who self-publish on its Kindle platform to also declare if they are using A.I.
But these commons are now being overgrazed by rapacious tech companies that seek to feed all of the human wisdom, expertise, humor, anecdotes and advice they find in these places into their for-profit A.I.
Consider, for instance, that the volunteers who build and maintain Wikipedia trusted that their work would be used according to the terms of their site, which requires attribution.
A Washington Post investigation revealed that OpenAI’s ChatGPT relies on data scraped without consent from hundreds of thousands of websites.
Whether we are professional actors or we just post pictures on social media, everyone should have the right to meaningful consent on whether we want our online lives fed into the giant A.I.
The original article contains 1,094 words, the summary contains 188 words. Saved 83%. I’m a bot and I’m open source!
Archive link: https://archive.is/7S8Pu