The idea is to have a distributed storage system that lets you set up a cooperative storage cluster between multiple semi-trusted parties; with RAID-like striping across participants (so more efficient than duplication) but with all data being encrypted and independently verifiable, so it's resistant to stuff like hacked systems. It's strongly inspired by Tahoe-LAFS (but makes some different design choices for practical reasons).
I got stuck on a seemingly trivial problem; how to make the hashes of different storage objects verifiable while also allowing retrieval of a file when the only thing you have is its decryption key
Anyway the idea here is to have a distributed storage system that can be run on spare mismatched storage space by a group of friends, or a group of sysadmins, or whoever else, and be resistant against compromise and censorship. Both for personal data storage but definitely also for things that are meant for public access.
Which seems to have rapidly gotten a lot more relevant since I started on this project...
@joepie91 I always wanted to have (or to make) something like that. "I have 100 GB to spare, let me store 80 GB of my data distributed with 20 GB of redundancies to spare"
@joepie91 what is the difference between this and (my current favorite object storage system) Garage?
@jonah Mainly that Garage does not have strong resilience against malicious participants; it's a high-trust system towards all participants (or at least it was when I last evaluated it).
That's fine for a lot of cases, but as conditions worsen, it becomes more important to have such resiliency, and it's not something you can trivially patch into the design after the fact, unfortunately.
@jonah ("Malicious participants" here does not just include people who seek to join the cluster with malicious intentions, but also people with honest intentions who are compromised through their system getting hacked, stolen by cops, infiltrated, and so on)
@joepie91 yeah, fair, I only use it on my own machines, but I know they designed it for a collective of people. Definitely still a high-trust situation either way though, so a solution for this sounds super cool (probably not for me specifically, but in general) 👀
@joepie91 Why does your starting point have to be just the key, and not, say, also hash of the encrypted blob? Is deniability an important design goal?
@riley Practicality; you need the key to access the file's contents either way, *and* an address to locate the data to retrieve and decrypt in the first place, so if you were to use the hash of the ciphertext as the address, you would effectively double the size of the address; the actual address plus the decryption key.
By using a hash of the decryption key as the address, having the decryption key alone is enough to locate *and* decrypt the file, and it still lets you share that hash with others to give them access to the ciphertext only (which is necessary for some low-trust maintenance tasks).
The problem was that you can't have something addressed by both its content hash *and* a hash of its decryption key (for a decryption process that's several lookup steps removed from the initial one).
Or well, I *thought* you can't, but it turns out that with a small design change, you can in fact do that