Does lemmy have any communities dedicated to archiving/hoarding data?
I would also add Openstreetmap to the list
Did I miss something? Whats happening to debian stable?
debian stable became the go to distro for long term usage in case our FOSS support structure goes haywire due to wars
This is just minor datahoarding. I do it, on an extreme level.
I would add in some rom collections and book repositories as well. The whole library of Nintendo games is under a gig and would go a long way for entertaining people.
Book repos? I didn’t know such a thing existed. Can you share more?
Project Gutenberg has a large collection of public domain books
Thank you kindly
Okay so where do I find some cheap hard drives? Europe if possible :-)
look for dvr’s they have huge hdds in them and you can find them at thrift stores for cheap
I have been archiving Linux builds for the last 20 years so I could effectively install Linux on almost any hardware since 1998-ish.
I have been archiving docker images to my locally hosted gitlab server for the past 3-5 years (not sure when I started tbh). I’ve got around 100gb of images ranging from core images like OS to full app images like Plex, ffmpeg, etc.
I also have been archiving foss projects into my gitlab and have been using pipelines to ensure they remain up-to-date.
the only thing I lack are packages from package managers like pip, bundler, npm, yum/dnf, apt. there’s just so much to cache it’s nigh impossible to get everything archived.
I have even set up my own local CDN for JS imports on HTML. I use rewrite rules in nginx to redirect them to my local sources.
my goal is to be as self-sustaining on local hosting as possible.
Everyone should have this mindset regarding their data. I always say to my friends and family, “If you like it, download it.”. The internet is always changing and that piece of media that you like can be moved, deleted, or blocked at any time.
respectable level of hoarding 🏅
You’re awesome. Keep up the good work.
I can answer one part of your question. Yes, it’s not as big as you think it is.
does this include images?
With images, it is 111,08 GB
That’s still incredibly low, I’d have assumed an enormous increase.
Compressed or uncompressed? Can it be directly read?
Can be read directly, like normal Wikipedia.
That’s very nice. Does it also include other languages, or would that take more space?
This is English only. Other languages are downloaded separately, though they typically take less space.
Nice.
How about, when included previous versions of pages? (excluding images)Not sure, not having that option. Can imagine not much more, if proper version history management is involved.
Yeah, seems like there’s nothing as simple as something similar to a
git clone
available.
One would probably have to download multiple full copies from different times and then merge them with deduplication, to get that answer.
No
I also recommend downloading “Flashpoint archive” to have flash games and animations to stay entertained.
There is a 4gb version and a 2.3TB version.
There is a 4gb version and a 2.3TB version.
That’s quite the range
When I downloaded it years ago it was 1.8TB. It’s crazy how big the archive is. The smaller one is just so it’s accessible to most people.
Is that Flash exclusive or do they accept other games from that era?
I’m not sure, but I do think it’s just flash
Welcome to datahoarders.
We’ve been here for decades.
Also follow 3-2-1 people. 3 Backups, 2 storage mediums, 1 offsite.
“backups”? Pray tell, fine sir and or madam, what is that?
You know there’s only two kind of people, those who do backups and those that haven’t lost a hard drive/data before. Also: raid is no backup
Still remember the PSU blast taking out my main drive plus my backup drive in like 2001. I thought I was so good because I at least had a backup 😑. Those were the days 🤷🏻♀️
That sounds like an adventure!
Ya, me learning that a dinky psu is your worst enemy, i upgraded my SOs old duron to an athlon for work, which used more energy…
old pcs off amazon usually come with good reliable 1/2tb harddrive.
Is there a context to this or just random thought?
You can ignore politics, but politics will not ignore you.
Is there a political movement targeting Debian and Wikipedia?
https://gizmodo.com/elon-musks-wikipedia-competitor-is-going-to-be-a-disaster-2000665751
Debian? Not that I’m aware of.
Yeah I heard of wikipedia, but not debian.
Conservatives hate knowledge, learning is toxic to them. Also the people who start with burning books usually end up burning people eventually
Removing books about sucking cum out of anuses from public schools isn’t really “burning books.” You can still buy them whenever you want, just not putting them in taxpayer funded schools with children.
EDIT: Had to add some details of the “books being burned [but really just removed from public school]”:
During public comment, one woman read a passage from “Yolo” by Lauren Miracle which is found in Freedom High School.
“I climbed onto of him and started kissing him in a way that said very clearly here I am, I’m ready to have sex,” the speaker read.
Another title, “Anatomy of a Single Girl” by Daria Snadowsky, was also read by a speaker.
“Guy tries rubbing my clitoris with his fingers, he wiggles his pelvis back and forth,” another woman read from the book.
“This is ridiculous that this school – any school – has this book,” the woman said to the board.
Julie Gebhards, the woman seen in the first video of our story, is a Hillsborough County mom of six children.
Gebhards read an excerpt from the book “Invisible Monsters Remix” by Chuck Palahniuk. According to the district’s online book library, the title is found in Steinbrenner High School.
“He shoots his load, and then plants his mouth on your anus and sucks out his own warm sperm, plus whatever lubricant and feces are present. That’s felching. It may or may not, I add, include kissing you to pass the sperm and fecal matter into your mouth,” Gebhards said.
We should ban mention of Christianity in public. We should also make it illegal for anyone to teach their children Christianity. Practicing Christians should be declared mentally ill, and if they practice their faith in front of children, they should be put on the sex offender registry.
These freaks actually put giant statues of a naked bleeding man up on full public display in buildings. And they believe the most holy book in the world is one that features incest, murder, rape, genocide, and often fully endorses these horrors. Their main ritual is a form of public ritual cannibalism.
Christians are too dangerous to be allowed near children.
The comment you’re replying to didn’t mention one specific book. You did to try and portray this as some noble cause, but even books such as Fahrenheit 451 and To Kill a Mockingbird have been banned by conservatives, and they most definitely aren’t about “sucking cum out of anuses” as you so dumbly put it.
Nice attempt, but this type of dodging around never worked and will never work.
Oh no sex scenes!!! /s
Prude american?
As if kids aren’t finding shit way worse on the internet on a daily basis. Well… maybe not felching that’s pretty vile. But still.
Oh no! What have you done! Now, I want to go try felching because I saw a message about it online with no context and I just. Have. To. Try. It.
… Oh wait, no. No, I don’t. Pfew!So what was your point again?
You aren’t making the point you think you’re making.
gestures at everything
You’ll need about 500gb of free space. not too much of an ask tbh
It makes me really happy that people can say “500gb … not too much of an ask” these days.
Well we are talking about the greatest repository of human knowledge ever created. So we can afford to spend a little on it at least.
i know this because i actually do this. its more like ~300gb of space but its better to have even more just in case
Neither are that bad honestly. I have jigdo scripts I run with every point release of Debian and have a copy of English Wikipedia on a Kiwix mirror I also host. Wikipedia is a tad over 100 GB. The source, arm64 and amd64 complete repos (DVD images) for Debian Trixie, including the network installer and a couple live boot images, are 353 GB.
Kiwix has copies of a LOT of stuff, including Wikipedia on their website. You can view their zim files with a desktop application or host your own web version. Their website is: https://kiwix.org/
If you want (or if Wikipedia is censored for you) you can also look at my mirror to see what a web hosted version looks like: https://kiwix.marcusadams.me/
Note: I use Anubis to help block scrapers. You should have no issues as a human other than you may see a little anime girl for a second on first load, but every once and a while Brave has a disagreement with her and a page won’t load correctly. I’ve only seen it in Brave, and only rarely, but I’ve seen it once or twice so thought I’d mention it.
I rarely get bounced by Anubis, but oddly enough it has happened to me a couple times in FF, I suspect it’s the fingerprinting resistance settings that cause this to happen? Hasn’t happened in a while though
I thought the whole point of torrenting was to decentralise distribution. I use torrents to get my distros.
In my own little bubble, I thought that’s how most people got their distro.
What happens when they just cut the underwater cables? Torrent over carrier pigeon for a linux distro would take ages
We need some more community wifi projects
Community Wisps are cool
Sneakernet to the rescue. Some of you are too young to know about walking around with boxes full of disks.
A wise man once said
Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway.
It was trading CD-R’s during my high school days… good times. Napster was just starting to take off by the time we had a CD-R trading network set up, Napster just increased the amount of CD’s that got passed around.
Pigeon latency is horrible, but the bandwidth is pretty great. You could probably load up an adult pigeon with at least 12TB of media.
https://en.wikipedia.org/wiki/IP_over_Avian_Carriers
Just gonna leave this here for whoever wants to read more on the methodology and potential risks.
A good way to see what the future of places like the U.S are is to look at places like North Korea, where they do exactly this, move files around on flash media to avoid the state censors.
Tiny jump drives on pigeons is low key excellent imo
@Maroon I thought torrent technology to be a godsend for package managers.
Why none of them use it?
I mean, damn.
Turns out hosting a bunch of files is very cheap.
git and the lot are a lot better at this than people realize.
Torrents are often used for installers, but for packages it tends to be more trouble than what it’s worth. Is creating a torrent for a 4k library worth it?
For wikipedia you’ll want to use Kiwix. A full backup of wikipedia is only like 100GB, and I think that includes pictures too.
Last time I updated it was closer to 120GB but if you’re not sweating 100 GB then an extra 20 isn’t going to bother anyone these days.
Also, thanks for reminding me that I need to check my dates and update.
EDIT: you can also easily configure a SBC like a Raspberry Pi (or any of the clones) that will boot, set the Wi-Fi to access point mode, and serve kiwix as a website that anyone (on the local AP wifi network) can connect to and query… And it’ll run off a USB battery pack. I have one kicking around the house somewhere
Just built one of those using Dietpi as the OS and NVME M.2 for the storage. I have many different ZIMs and running different services and only using about 270GB.
Works great for offline use. Probably should add an ISO or 2 as well.
What other services are you running?
@fmstrat@lemmy.world asked what else I was running in a sibling comment to yours and I didn’t have an answer because I’m not… yet : )
DietPi makes it dead simple to run most of these things as their “software suite” is pretty robust and simple to setup.
For “user facing” applications:
- Homer Dashboard as the landing page when going to the .local address in a browser
- Kiwix for the ZIMs
- Hedgedoc for personal note taking/wiki
- Lychee photos for a very lightweight photo album maker/viewer for keepsake photos.
For “admin side” stuff:
- Portainer to manage the containers/stacks
- Watchtower to auto-update the containers while they’re still network connected
- Transmission daemonized to download and seed the ZIMs or anything else non-pirate related
- Use jojo2357’s ZIM updater to auto-update ZIMs via cron job while they’re still network connected
- DietPi-Dashboard as an all-in-one dashboard to monitor and control the RPi from a web interface. (Yeah I know I can do everything SSH’ing in but I’m lazy.)
- File Browser just in case I want other people to have access to files but since it’s in maintenance mode and I’m unsure I want others to have access, might strip it out
I try to use containers from LinuxServer.io whenever possible. Mostly just cause it’s what I do on my main server.
I’m still looking at adding/removing things as I get more time to sit down but I’m pretty happy with it’s current state.
Do you recommend adding anything else to it?
For instance, OSM maps?
I’ve been thinking about running the Kiwix app + OSMAnd on an old Android phone and auto updating it once a year.
That’s a good question (and good idea) that I hadn’t really thought about past a collection of ZIMs. The one I built advertises it’s own AP SSID that anyone can connect to and then access the ZIMs that are served via
kiwix-serve
on HTTP/80. That is, I wanted a single, low power, headless device that multiple people could use simultaneously via wifi and browser rather than a personal device.I hadn’t really thought about other helpful services past that. I mean, we’ve got a (wee) server so why not use it? I like the idea of OSM and their website is open source but has a lot of dependencies :
openstreetmap-website is a Ruby on Rails application that uses PostgreSQL as its database, and has a large number of dependencies for installation
A fully-functional openstreetmap-website installation depends on other services, including map tile servers and geocoding services, that are provided by other software. The default installation uses publicly-available services to help with development and testing.
I wonder how hard it would be to host everything it needs locally/offline… and what that would do to power consumption : )
Thanks for the idea - something to look into, for sure.
I might beat you to it. I’ve got Kiwix running in docker, just did a PR to the
kiwix-zim-updater
so it can run in Docker on a cron schedule next to the server, and have spun those up with Karakeep (self-hosted web archive I use for bookmarking).Right now I’m adding a ZIM list feature to the updater to list available ZIMs by language, and then I’ll move on to OSM.
You’ll definitely beat me to it : D
Do me a favor and tag me when you post your how to?
I will do my best to remember hah
Saw your comment on mine and finally saw this one.
I’m gonna take a look at openstreetmap-tile-server and see about running that since if all has gone to shit, who knows if GPS will work. Least it’s almost like a paper map and can be auto-updated as long as we still have internet. Quick Gist someone wrote here.
Yeah, I feel the same in that it’s assuredly doable, but how hard is it?
If you’re able to dig into and make some progress, please tag me because I’m interested but don’t have much time these days.
Yeah also if you make a Zim wiki or convert a website into Zim then you can run that stuff too. If you use Emacs it’s easy to convert some pages to wikitext for Zim too
120GB not including Wikimedia 😉
Also, I wish they included OSM maps, not just the wiki.
You can easily download planet.osm, I think it’s a couple of TB for the compressed file.
You can also offline the whole of Project Gutenberg with Kiwix, it’s about 70GB IIRC.
I wonder if there’s anyways to edit these files afterwards? They tend to be read only, right? I must confess, I don’t have too much experience with this myself.
It’s probably hundreds of thousands of HTML files, no? What is the fear about being able to edit or not?
I believe kiwix uses zim files.
Okay, I’m unfamiliar with both. Well, I still don’t understand why read-only state matters; are you concerned about tampering?
Well I think it would be cool to be able to fix/edit any inaccurate articles, or pages that may have been messed with by trolls, or to update with more up to date info.