Does lemmy have any communities dedicated to archiving/hoarding data?
FWIW :
fabien@debian2080ti:/media/fabien/slowdisk$ ls -lhS offline_prep/ total 341G -rw-r--r-- 1 fabien fabien 103G Jul 6 2024 wikipedia_en_all_maxi_2024-01.zim -rw-r--r-- 1 fabien fabien 81G Apr 22 2023 gutenberg_mul_all_2023-04.zim -rw-r--r-- 1 fabien fabien 75G Jul 7 2024 stackoverflow.com_en_all_2023-11.zim -rw-r--r-- 1 fabien fabien 74G Mar 10 2024 planet-240304.osm.pbf -rw-r--r-- 1 fabien fabien 3.8G Oct 18 06:55 debian-13.1.0-amd64-DVD-1.iso -rw-r--r-- 1 fabien fabien 2.6G May 7 2023 ifixit_en_all_2023-04.zim -rw-r--r-- 1 fabien fabien 1.6G May 7 2023 developer.mozilla.org_en_all_2023-02.zim -rw-r--r-- 1 fabien fabien 931M May 7 2023 diy.stackexchange.com_en_all_2023-03.zim -rw-r--r-- 1 fabien fabien 808M Jun 5 2023 wikivoyage_en_all_maxi_2023-05.zim -rw-r--r-- 1 fabien fabien 296M Apr 30 2023 raspberrypi.stackexchange.com_en_all_2022-11.zim -rw-r--r-- 1 fabien fabien 131M May 7 2023 rapsberry_pi_docs_2023-01.zim -rw-r--r-- 1 fabien fabien 100M May 7 2023 100r-off-the-grid_en_2022-06.zim -rw-r--r-- 1 fabien fabien 61M May 7 2023 quantumcomputing.stackexchange.com_en_all_2022-11.zim -rw-r--r-- 1 fabien fabien 45M May 7 2023 computergraphics.stackexchange.com_en_all_2022-11.zim -rw-r--r-- 1 fabien fabien 37M May 7 2023 wordnet_en_all_2023-04.zim -rw-r--r-- 1 fabien fabien 23M Jul 17 2023 kiwix-tools_linux-armv6-3.5.0-1.tar.gz -rw-r--r-- 1 fabien fabien 16M Oct 6 21:32 be-stib-gtfs.zip -rw-r--r-- 1 fabien fabien 3.8M Oct 6 21:32 be-sncb-gtfs.zip -rw-r--r-- 1 fabien fabien 2.3M May 7 2023 termux_en_all_maxi_2022-12.zim -rw-r--r-- 1 fabien fabien 1.9M May 7 2023 kiwix-firefox_3.8.0.xpibut if you want the easier version just get Kiwix on whatever device in front of you right now (yes, even mobile phone assuming you have the space) then get whatever content you need.
If need a bit of help I recorded TechSovereignty at home, episode 11 - Offline Wikipedia, Kiwix and checksums with a friend just 3 weeks ago.
I also wrote randomly update https://fabien.benetou.fr/Content/Vademecum and coded https://git.benetou.fr/utopiah/offline-octopus but tbh KDE-Connect is much better now.
The point though is having such a repository takes minutes. If you don’t have the space, buy a 512Go microSD for 50EUR then put that on, stuff it in a drawer then move on. If you want to every 3 months or whenever you feel like it, updated it.
TL;DR: takes longer to write such a meme than actually do it.
Watch out for flash data corruption. Lots of cheap flash (USB sticks, SD cards, SSDs) lose data after just a few years of offline storage. Something something quantum tunnel bullshit, iirc.
So either look for media that guarantee long cold storage retention (lots of businesses need to keep shit for 10 years for tax reasons), or occasionally plug it in and let do the housekeeping.
User older flash tech can be useful here. You might not always need the highest density storage if you want to maintain files for a long time. Getting stuff built in a much larger process node makes for a much more stable form of storage.
Or look for industrial / business grade stuff with long retention times. Old flash also means less sophisticated controllers etc
It’s more that flash NAND uses a small electric charge to keep the NAND gates in the correct configuration. Over time, that charge dissipates. If you power the storage device every once in a while, you minimize these chances.
Here’s a video explaining why it happens to Wii U’s after being powered off for a while. https://youtu.be/JHME4zLs6Qs
Thanks but even though it’s on a plugged HDD I don’t even care for any of that data. What I mean is that none of that data is sensitive. It might be useful, potentially, but it’s not unique. What I mean is that if somehow my
.zimfile for Wikipedia was corrupted I could download it again from https://library.kiwix.org/#lang=eng&category=wikipedia or elsewhere in ~30min (just checked).What I’m trying to highlight here is more the process than the actual outcome.
TL;DR: yes, if one is actually serious about just getting and storing, they should verify periodically if the data is indeed fine. What I do want to highlight though is to first know how to do it at all. Anyway, you are right that for a proper solution on the long run one must understand how (cold) storage actually works. My heuristic is that it’s like can food (which I don’t use much), it might last a while, but not forever.
I thought the point of backing stuff up was to have things in case just downloading it again isn’t a viable option?
Whoa, what are all those things you have?
Commenting inline :
-rw-r--r-- 1 fabien fabien 103G Jul 6 2024 wikipedia_en_all_maxi_2024-01.zim # encyclopedia Wikipedia English with images and more -rw-r--r-- 1 fabien fabien 81G Apr 22 2023 gutenberg_mul_all_2023-04.zim # Project Gutenberg, book collection in multiple languages -rw-r--r-- 1 fabien fabien 75G Jul 7 2024 stackoverflow.com_en_all_2023-11.zim # StackOverflow, programming questions and answers -rw-r--r-- 1 fabien fabien 74G Mar 10 2024 planet-240304.osm.pbf # OpenStreetMap low resolution for the whole World -rw-r--r-- 1 fabien fabien 3.8G Oct 18 06:55 debian-13.1.0-amd64-DVD-1.iso # Debian base ISO -rw-r--r-- 1 fabien fabien 2.6G May 7 2023 ifixit_en_all_2023-04.zim # iFixit colection of guides to fix appliances -rw-r--r-- 1 fabien fabien 1.6G May 7 2023 developer.mozilla.org_en_all_2023-02.zim # Web development documentation -rw-r--r-- 1 fabien fabien 931M May 7 2023 diy.stackexchange.com_en_all_2023-03.zim # Do It Yourself Q&A -rw-r--r-- 1 fabien fabien 808M Jun 5 2023 wikivoyage_en_all_maxi_2023-05.zim # WikiVoyage, the version of Wikipedia for traveling -rw-r--r-- 1 fabien fabien 296M Apr 30 2023 raspberrypi.stackexchange.com_en_all_2022-11.zim # Raspberry Pi Q&A -rw-r--r-- 1 fabien fabien 131M May 7 2023 rapsberry_pi_docs_2023-01.zim # Rasspberry Pi documentation -rw-r--r-- 1 fabien fabien 100M May 7 2023 100r-off-the-grid_en_2022-06.zim # Off the grid documents -rw-r--r-- 1 fabien fabien 61M May 7 2023 quantumcomputing.stackexchange.com_en_all_2022-11.zim # Quantum computer Q&A -rw-r--r-- 1 fabien fabien 45M May 7 2023 computergraphics.stackexchange.com_en_all_2022-11.zim # Computer graphics Q&A -rw-r--r-- 1 fabien fabien 37M May 7 2023 wordnet_en_all_2023-04.zim # Graph of words in English -rw-r--r-- 1 fabien fabien 23M Jul 17 2023 kiwix-tools_linux-armv6-3.5.0-1.tar.gz # Kiwix to read .zim files -rw-r--r-- 1 fabien fabien 16M Oct 6 21:32 be-stib-gtfs.zip # public transport database in Brussels, Belgium -rw-r--r-- 1 fabien fabien 3.8M Oct 6 21:32 be-sncb-gtfs.zip # train transport database in Belgium -rw-r--r-- 1 fabien fabien 2.3M May 7 2023 termux_en_all_maxi_2022-12.zim # Termux, Linux tooling on Android, documentation in English -rw-r--r-- 1 fabien fabien 1.9M May 7 2023 kiwix-firefox_3.8.0.xpi # Kiwix Web Extension for the Firefox browserBy the way, there’s now a Wikipedia 2025 snapshot.
I am currently trying to fit that on my phone somehow. I wish I could just omit the index database at the end that can’t be split it seems. I have to keep it, but when it’s split up, it doesn’t work anyway (search is broken that way) (https://github.com/openzim/zim-tools/issues/295).
My phone can only do FAT32 for SD cards…For 2024 Wikipedia, that seems to be around 18GiB of wasted space.
Thanks, updating (~20min) accordingly.
FWIW I have a CMF Nothing 1 and I can put a 500Go microSD in it.
I’ve got Ulefone Armor 24. It can take a 1TB Micro SD, but only FAT32. Why a Linux-based OS can only do FAT32, despite supporting other FSs on internal storage goes beyond me.
Weird, assuming you have Android 13 it should be usable at least as exFAT and thus can be large enough
we need all repos to be stored offline, and documentations to troubleshoot.
the 1st i have no idea how much space we will need. Most linux packages are prerry light, no? But there is A LOT of them…
the 2nd is easy. Heard someone say the entire of wikipedia is 200GB, should be doable. Dont forget the technical wikis too: Debian, Gentoo, Arch.
The official USBs of Trixie fit all 28 DVDs of AMD64 on a 256GiB USB stick
https://www.linuxcollections.com/products/debian/debianusb.htm?id=51007
You’d probably want the 512GiB with all the sources for a real backup in this scenario
Can’t remember who it was (b3ta? popbitch? penny-arcade?), but I recently saw a comment by someone who’s been running a website since the turn of the millennium, and they said that fully 99% of the links they posted two decades ago were no longer valid.
To really put that into perspective, you have to remember that for most sites to get linked to from a popular site like that, meant that it was usually something of value that would have had a lot of work put into it, and that people found interesting or useful.
It’s truly devastating how much of the old internet has died to the corporations taking over the internet.
Years ago I bought a physical encyclopedia. I remember having one as a kid and using it for school reports. Also just looking through it can be cool. Learning about something you never knew existed is just a unique experience and doing it through a physical book just deepens the whole experience.
I also learned the practice of printing a physical encyclopedia is going out of fashion. I think there is only one company the still prints a yearly encyclopedia and it’s not Encyclopedia Britannica of all things. Might have change since I bought my copy but go give some physical media some love if you can.
There should be.
It’s been on my to-do list for a few years now.
Last year I bought a hard copy of my favorite webcomic in case the website goes down.
Which webcomic?
Different guy but Sunstone.
Girls with Slingshots, it ended over a decade ago, but I still love the characters. I realized if the author dies and stops renewing the website it could disappear. As a foundational part of my early twenties I couldn’t accept that.
I’ll have to check it out. Thanks for the recommendation.
To paraphrase Stan Lee here, comics are like boobs. They look good on the internet, but there is just something special about holding them in your hands.
Wait, isn’t there an offline copy of a part of Wikipedia? The article Just by yourself a nice printer with enough ink and do it yourself ;)
It could cost a bit if you wanted to keep it up to date.
I bought a 14tb drive just for backups of all my other drives… and I got a shitload more space.
old pcs off amazon usually come with good reliable 1/2tb harddrive.
I have been archiving Linux builds for the last 20 years so I could effectively install Linux on almost any hardware since 1998-ish.
I have been archiving docker images to my locally hosted gitlab server for the past 3-5 years (not sure when I started tbh). I’ve got around 100gb of images ranging from core images like OS to full app images like Plex, ffmpeg, etc.
I also have been archiving foss projects into my gitlab and have been using pipelines to ensure they remain up-to-date.
the only thing I lack are packages from package managers like pip, bundler, npm, yum/dnf, apt. there’s just so much to cache it’s nigh impossible to get everything archived.
I have even set up my own local CDN for JS imports on HTML. I use rewrite rules in nginx to redirect them to my local sources.
my goal is to be as self-sustaining on local hosting as possible.
Everyone should have this mindset regarding their data. I always say to my friends and family, “If you like it, download it.”. The internet is always changing and you never know when that piece of media that you like will be moved, deleted, or blocked.
The pornhub collapse should have taught the average person that.
You’re awesome. Keep up the good work.
respectable level of hoarding 🏅
If anyone is interested in philosophy, religion, or just want to archive it for historical reasons, IIRC sacred-texts.com has a USB version of their entire archive. They sell it, but I’m sure someone could find a work around there, if they were opposed to supporting them* for some reason. It’s a massive collection of philosophical and religious works, and I believe they even have things like constitutions and legal works, as well.
*I know nothing about the people that run it or their ideology
Thanks for reminding me about this.
You’ll need about 500gb of free space. not too much of an ask tbh
i know this because i actually do this. its more like ~300gb of space but its better to have even more just in case
It makes me really happy that people can say “500gb … not too much of an ask” these days.
Well we are talking about the greatest repository of human knowledge ever created. So we can afford to spend a little on it at least.
I would add in some rom collections and book repositories as well. The whole library of Nintendo games is under a gig and would go a long way for entertaining people.
Book repos? I didn’t know such a thing existed. Can you share more?
Project Gutenberg has a large collection of public domain books
Thank you kindly
What’s a way to create a local repo mirror?
I can answer one part of your question. Yes, it’s not as big as you think it is.

does this include images?
With images, it is 111,08 GB
That’s still incredibly low, I’d have assumed an enormous increase.
Compressed or uncompressed? Can it be directly read?
Can be read directly, like normal Wikipedia.
That’s very nice. Does it also include other languages, or would that take more space?
This is English only. Other languages are downloaded separately, though they typically take less space.
Nice.
How about, when included previous versions of pages? (excluding images)Not sure, not having that option. Can imagine not much more, if proper version history management is involved.
Yeah, seems like there’s nothing as simple as something similar to a
git cloneavailable.
One would probably have to download multiple full copies from different times and then merge them with deduplication, to get that answer.
No








