The idea being to maintain a collective pool of media storage (Tahoe requires less mutual trust here) that everyone shares, so that the media get deduplicated across instances, but it still ensures enough redundant copies across instances that outages don't break everyone
@rune I haven't kept up-to-date with the current model of Garage, but in Tahoe-LAFS it's basically RAID-over-the-network, and every client can verify the existence of enough shares and regenerate and reupload the missing shares if some file is not 'healthy' enough, as the mechanism to prevent data loss (so that minimal trust is required for availability)
@joepie91 garage would offer you 3 replicas, which should be plenty, but it does sound like tahoe-lafs is more geared towards untrusted setups.
But garage could definitely work for groups of trusted instances.
@rune Yeah, that was more or less my conclusion when I last evaluated Garage; that it's only really suitable for high-trust groups.
Tahoe doesn't have a set amount of replicas; you set the amount of total shares and the amount of needed shares for recovery, as an attribute of the upload. These settings do need to be the same for everyone for dedupe to work, though.
(Storage overhead is basically totalShares divided by neededShares)
@joepie91 Query - does tahoe support in-file deduplication?
The concern I see with that deduplication is that, while the original media would deduplicate nicely, the compressed version might not be an exact binary copy, timestamps, compression non-determinism, et cetera in play.
I feel like there'd need to be additional scripting to ensure that the compressed objects are also properly deduplicated.
@rallias It does not; it's purely binary deduplication (and I'm not sure anything more involved would be viable without breaking the security properties, though I also question how much real-world benefit it would yield in practice)
@joepie91 Yeah, I mean, having a script that synchronizes the compressed versions based on their source material would make that deduplication work.
That's not entirely unprecedented - I do recall seeing (although am struggling to find) a community of instances that have scripted to share a CDN.
@rallias Right, I have seen that as well, but abstracting over a CDN requires a very high level of trust due to the centralized nature of the CDN itself - hence why I was thinking that something decentralized would be a better fit.
@joepie91 Aye, that's what I'm saying, have the best of both worlds, the deduplication facilitated by using that same type of script to reconcile differences where appropriate, while having the tahoe-lafs for whatever.
@joepie91 this would be super interesting. I guess with garage each site could point the dns to their own node and let it route it to whoever has the copy. You can tag location and iirc it's ok about considering latency.
Trust is probably the biggest thing since once you go beyond 3 nodes you won't have a full copy locally and won't be able to guarantee it is available if enough remote nodes go down or lose their data.