Where can I get the data about Lemmy subscription patterns?

mookulator@mander.xyz · edit-2 1 year ago

Where can I get the data about Lemmy subscription patterns?

gabe [he/him]@literature.cafe · edit-2 1 year ago

Likely within the API, but I would say be cautious as people on the fediverse really do not like having their data scraped, especially for projects like this.

alex [they, il]@jlai.lu · 1 year ago

I think this is fine - people I’ve seen who objected to this kind of project were more about their account being indexed. Projects like respective size of instances were always fine

mookulator@mander.xyz · 1 year ago

Interesting. Isn’t it entirely public though?

gabe [he/him]@literature.cafe · 1 year ago

Yes, still doesn’t make people fine with it. People defederate over stuff like that on mastodon. It’s often seen as violating peoples privacy.

mookulator@mander.xyz · 1 year ago

lol I won’t make you defend these people’s logic. Thanks for the warning!

PseudoSpock@lemmy.dbzer0.com · 1 year ago

Wow! Is that how little you care about other people?

mookulator@mander.xyz · edit-2 1 year ago

What do you mean? I thought it was an explicit feature of this place that literally all of it was public and nobody owns any of the data. Isn’t it just sitting there in the public domain?

If people know that’s how it works, they can’t get mad if someone does access the data. Especially for innocuous curiosities like this.

Dave@lemmy.nz · 1 year ago

The content of lemmy is in public, but is not in the public domain. They are different concepts.

mookulator@mander.xyz · 1 year ago

Thanks for the correction. I guess is see the point

unexpectedteapot@lemmy.ml · 1 year ago

Your public domain assumption doesn’t have to apply to others, legally or ideologically.

Data ownership does exist in the Fediverse, in fact it is one of its selling points that you can set up your server and own the data instead of using a surveillance capitalist SaaS that stores, manipulates and imposes legal rights over your data. Applications like Mastodon do send a federation request to other instances to delete data if submitters want to. Additionally, some users put licenses on their profile that might have restrictions (i.e: CC non-commerical, etc.) on what you are legally allowed to do with the data.

So no, accessing the data is not the same as using or processing it for many people, legally too in several parts of the world. Also, “innocuous curiosities” label is entirely subjective.

PseudoSpock@lemmy.dbzer0.com · 1 year ago

I think the more commonly held belief is that the data, while unfortunately available right now, was going to become more secure in the near future; but the exodus from reddit happened too soon. So now there is a lot of that data, and the better management and protection of it hasn’t had time enough to happen.

In comes you, seeing the opportunity, and you seek to exploit it.

CubbyTustard@reddthat.com · 1 year ago

deleted by creator

culpritus [any]@hexbear.net · 1 year ago

Probably would need to scrap this data from all the various instances to build up the dataset:

https://lemmy.ml/instances

https://lemmygrad.ml/instances

https://hexbear.net/instances

https://lemmy.world/instances

spectre [he/him]@hexbear.net · 1 year ago

I mean the fediverse isn’t exactly super mature; from what I know you should expect that you’ll have to generate that data yourself some way or another.

BlueÆther@no.lastname.nz · 1 year ago

I don’t think you could do it without DB access, but not 100% sure

Danterious@lemmy.world · 1 year ago

Personally I would love to see this kind of project done. But for it to work most people on the servers data that you are working with would have to be informed and given time to answer. I hope you reach out to the administrators of each instance and ask them if they would be ok with this and give them time to ask their users. Knowledge is power and if the visualization was public I think it could be helpful.

mookulator@mander.xyz · 1 year ago

Thanks for the support! I thought it would be a useful tool too. People could use it to find instances they didn’t know about, but which are popular in their network, or to find under appreciated instances to build bridges.

Anyway, sounds like database access is a must, and I’m not trying to take on a massive data wrangling exercise. I had it in my mind that the data were just sitting their for download, which was wishful thinking!

Danterious@lemmy.world · 1 year ago

If you are interested there is a professor named Damon Centola that might be interested in dedicating time to gathering that data and might help with the visualization as well. If you are in the sociologist space or just interested you could reach out to him.

mookulator@mander.xyz · 1 year ago

Thanks!

Danterious@lemmy.world · 1 year ago

No problem just notify me if you ever decide to follow through.

Danterious@lemmy.world · 1 year ago

Sorry for commenting again but I think there is a way for you to do this in a completely open, easy, and privacy-preserving way. You don’t need to access their database.

Get a list of instances that you want to look at the subscription patterns for. (All the instances here + Lemmy.world)
Go to that instance’s website, click the “Communities” tab at the top, and then click “All” It shows how many users from that instance are subscribed to that community (both communities from that instance and outside of that instance)

If you find a way to automatically (or manually) scrape this data from all of those websites you can create the visualization that you were talking about.

So you were right, the data is open source it is just specific to each website.