I’m looking for experiences and opinions on kubernetes storage.
I want to create a highly available homelab that spans 3 locations where the pods have a preferred locations but can move if necessary.
I’ve looked at linstore or seaweedfs/garage with juicefs but I’m not sure how well the performance of those options is across the internet and how well they last in long term operation. Is anyone else hosting k3s across the internet in their homelab?
Edit: fixed wording
That isn’t how you would normally do it
You don’t want to try and span locations on a Container/hypervisor level. The problem is that there is likely to much latency between the sites which will screw with things. Instead, set up replicated data types where it is necessary.
What are you trying to accomplish from this?
The problem is that I want failover to work if a site goes offline, this happens quite a bit with private ISP where I live and instead of waiting for the connection to be restored my idea was that kubernetes would see the failed node and replace it.
Most data will be transfered locally (with node affinity) and only on failure would the pods spread out. The problem that remained in this was storage which is why I’m here looking for options.
That isn’t going to work unfortunately
You need very low latency (something like 10ms or preferably less)
Rook-ceph for sure. And echoing another comment, come join homes operations discord, we have a heap of info and people experienced with kubernetes with homelabbing https://discord.gg/home-operations
One thing I recently found out is that ceph wants whole drives. I could not get it to work with partitions. I got it to work with longhorn, though I’m still setting things up.
Longhorn is pretty easy to use. Garage works well too. Ceph is harder to use but provides both block and object storage (s3).
Ceph (and longhorn) want “10 Gbps network bandwidth between nodes” while I’ll have around 1gbit between nodes, or even lower.
What’s your experience with Garage?
I guess the network will be a bottleneck on Garage too. If you want high performance you might need a hybrid solution, like clustering of stateful apps on local storage as well as periodic full backups on a distributed storage.
It’s fine if the bottleneck is upload/download speed, there’s no easy way around that.
The other problems like high latency or using more bandwith than is required are more my fear. Maybe local read cache or stuff like that can be a solution too but that’s why I’m asking for what is in use and what works vs what is better reserved for dedicated networks.
I tried Longhorn, and ended up concluding that it would not work reliably with Volsync. Volsync (for automatic volume restore on cluster rebuild) is a must for me.
I plan on installing Rook-Ceph. I’m also on 1Gb/s network, so it won’t be fast, but many fellow K8s home opsers are confident it will work.
Rook-ceph does need SSDs with Power Loss Protection (PLP), or it will get extremelly slow (latency). Bandwidth is not as much of an issue. Find some used Samsung PM or SM models, they aren’t expensive.
Longhorn isn’t fussy about consumer SSDs and has its own built-in backup system. It’s not good at ReadWriteMany volumes, but it sounds like you won’t need ReadWriteMany. I suggest you don’t bother with Rook-Ceph yet, as it’s very complex.
Also, join the Home Operations community if you have a Discord account, it’s full of k8s homelabbers.
Thanks for the info!
I’ll try Rook-Ceph, Ceph has been recommended quite a lot now, but my nvme drives sadly don’t have PLP. Afaict that should still work because not all nodes will face power loss at the same time.
I’d rather start with the hardware I have and upgrade as necessary, backups are always running for emergency cases and I can’t afford to replace all hard drives.
I’ll join Home Operations and see what infos I can find
The problem with non-PLP drives is that Rook-Ceph will insist that its writes get done in a way that is safe wrt power loss.
For regular consumer drives, that means it has to wait for the cache to be flushed, which takes aaaages (milliseconds!!) and that can cause all kinds of issues. PLP drives have a cache that is safe in the event of power loss, and thus Rook-Ceph is happy to write to cache and consider the operation done.
Again, 1Gb network is not a big deal, not using PLP drives could cause issues.
If you don’t need volsync and don’t need ReadWriteMany, just use Longhorn with its builtin backup system and call it a day.
I know Ceph would work for this use case, but it’s not a lighthearted choice, kind of an investment and a steep learning curve (at least it was for me).
I heard that ceph lives and dies with the network hardware. Is a slow internet connection even usable when the docs want 10 gbit/s networking between nodes?
I’m really not sure. I’ve heard of people using Ceph across datacenters. Presumably that’s with a fast-ish connection, and it’s like joining separate clusters, so you’d likely need local ceph cluster at each site then replicate between datacenters. Probably not what you’re looking for.
I’ve heard good things about Garbage S3 and that it’s usable across the internet on slow-ish connections. Combined with JuiceFS is what I was looking at using before I landed on Ceph.
My gut says go multi cluster (or not) at that pointbut treat the remote as a service, have a local container be a proxy
I want the failover to work in case of internet or power outage, not local cluster node failure. Multiple clusters would make configuration and failover across locations difficult or am I wrong?
I guess i shouldn’t have answered, I do have experience with multiple storage-classes, but none of the classes you mention (so like i don’t really know anything about them). I envisioned you dealing with pod level storage issues and thought that’d be something most programs would have lots of difficulty dealing with, where as a more service oriented approach would expect remote failures (hence the recommendation).
All of the things you mentioned don’t seem like they have provisioners, so maybe you mean your individual nodes would have these associated remote fs’. At that point i don’t think kubelet cares, you just mount those on the machines and tell kubelet about it via host mount
Oh shit look there’s a CSI driver for juicefs https://juicefs.com/docs/csi/introduction/, they kinda start out recommending the host mount https://juicefs.com/docs/cloud/use_juicefs_in_kubernetes/.
We make some use of PV’s but people i find my team often tend to avoid them.
I probably should have shut my mouth from the start!
They both support k8s, juicefs with either just a hostpath (not what i’d use) or the JuiceFS CSI Driver. Linstore has an operator which uses drbd and provides it too.
If you know of storage classes which are useful for this deployment (or just ones you want to talk about in general) then go on. From what I’m seeing in this thread I’ll probably have to deploy all options that seem reasonable and test them myself anyways.
For this kind of thing i usually go by popularity (active repo/popular repo), mostly to have the most other people in your boat. It doesn’t always work but generally if other users have to migrate at least you can ask them questions.
On the face of it i’d go with the csi driver version, only because we use alternative csi drivers ourselves, and haven’t seen any issues (ours are pretty aws vanella though).
We use storage classes (for our drivers) the “dynamic provisioning” section of https://juicefs.com/docs/csi/guide/pv, you’ll need to make one of those, then create a statefulset and mount the pv in there.
I do find statefulsets to be a bit of a not as well supported part of kubernetes, but generally they work well enough.
You are wrong
Those are block storage services, not Storage backends. Your options for SCi Provisioned are these: https://kubernetes.io/docs/concepts/storage/storage-classes/#provisioner
I would say if you’re not truly familiar with K8s, or DEEPLY familiar with Ops, don’t fuck with K8s internals. You’re gonna have a bad time.
I mean storage backends as in the provisioner, I will use local storage on the nodes with either lvm or just storage on a filesystem.
I already set up a cluster and tried linstore, I’m searching for experiences with the options because I don’t want to test them all.
I currently manage all the servers with a NixOS repository but am looking for better failover.
You won’t find any benchmarks for distributed filesystems, because they don’t apply to any one setup. Nobody knows your network situation, disk speed, availability across clusters…etc.
Well, if that is the case then I will have to try them all but I’m hoping at least general behaviour would be similar to others so that I can start with a good option.