I wonder if something like Tahoe-LAFS (in convergent/deduplicated mode) or even Garage wouldn't significantly improve the storage requirements for small fediverse instances

The idea being to maintain a collective pool of media storage (Tahoe requires less mutual trust here) that everyone shares, so that the media get deduplicated across instances, but it still ensures enough redundant copies across instances that outages don't break everyone

@joepie91 Query - does tahoe support in-file deduplication?

The concern I see with that deduplication is that, while the original media would deduplicate nicely, the compressed version might not be an exact binary copy, timestamps, compression non-determinism, et cetera in play.

I feel like there'd need to be additional scripting to ensure that the compressed objects are also properly deduplicated.

@rallias It does not; it's purely binary deduplication (and I'm not sure anything more involved would be viable without breaking the security properties, though I also question how much real-world benefit it would yield in practice)

@joepie91 Yeah, I mean, having a script that synchronizes the compressed versions based on their source material would make that deduplication work.

That's not entirely unprecedented - I do recall seeing (although am struggling to find) a community of instances that have scripted to share a CDN.

@rallias Right, I have seen that as well, but abstracting over a CDN requires a very high level of trust due to the centralized nature of the CDN itself - hence why I was thinking that something decentralized would be a better fit.

· · Web · 1 · 0 · 0

@joepie91 Aye, that's what I'm saying, have the best of both worlds, the deduplication facilitated by using that same type of script to reconcile differences where appropriate, while having the tahoe-lafs for whatever.

Sign in to participate in the conversation
Pixietown

Small server part of the pixie.town infrastructure. Registration is closed.