This is a classic case of tragedy of the commons, where a common resource is harmed by the profit interests of individuals. The traditional example of this is a public field that cattle can graze upon. Without any limits, individual cattle owners have an incentive to overgraze the land, destroying its value to everybody.
We have commons on the internet, too. Despite all of its toxic corners, it is still full of vibrant portions that serve the public good — places like Wikipedia and Reddit forums, where volunteers often share knowledge in good faith and work hard to keep bad actors at bay.
But these commons are now being overgrazed by rapacious tech companies that seek to feed all of the human wisdom, expertise, humor, anecdotes and advice they find in these places into their for-profit A.I. systems.
I think that the concept of tragedy of the commons is being misused here. When feeding data into those models, there’s no common resource being used, as the data doesn’t cease to exist once you feed it to your L"L"M. Instead what’s happening is that they’re further breaking what was already broken - the legal concepts of IP and copyright.
Where the concept could apply is the usage of the output of those models, with the common resource being the overall quality, reliability, and usefulness of the internet, for the sake of petty benefits (such as advertisement/spamming/marketing). However this degradation predates the large “language” models (and the internet itself), and it isn’t a result of the technology itself.
This was my first thought as well. My second thought is all the harms we’ve caused to ourselves in the digital age, and we only start to care when it hits our pocketbooks.