Follow

question: I would like to use libsodium for secret-key encryption, but it requires a nonce, and I need the encryption to be deterministic/convergent (for deduplication).

Is "deriving the nonce from the data by hashing it" a reasonable solution to this problem, or does that have some issue I am not aware of?

· · Web · 5 · 5 · 2

@joepie91 I'm not an expert on the matter, but most encryption algorithms that include a nonce do require it to prevent leaking of e.g. secret key material (something something playstation signature key or something), so if libsodium requires a nonce for that algorithm it seems likely that the specific algorithm isn't built for your use-case. Different algorithms have different use-cases, don't get discouraged if the one you looked at doesn't seem to fit the bill.

@benaryorg The problem is that just about everything seems to require a nonce nowadays. Which is understandable, given how important it is for typical cases, but convergent encryption is very much an edgecase.

@joepie91 what about encryption each chunk you want to encrypt with a randomly chosen nonce then and storing the nonce with the data? I get that's not always possible of course.

@benaryorg That's not sufficiently deterministic for my case, unfortunately; part of the protocol involves "checking if encrypted/sharded chunks already exist in the storage cluster, before uploading anything", for which the whole process (encoding, encryption, sharding) needs to be fully deterministic with zero 'external' malleable factors

@benaryorg (Which has been an absolute pain in the design process, but that's a different discussion 🙃)

@joepie91 I see the problem. I could look further into it but I guess you (knowing what you need) are ahead of me on that part so I'll just wish you good luck ^^

@unnick Huh. Isn't ed25519 public-key crypto rather than secret-key?

@joepie91 oh i completely missed that youre talking about secret key crypto, oops

@joepie91 Is it a reasonable solution is sort of the wrong question. All that is required of a nonce is uniqueness, so iff hash(data) is unique for all your input data then this construction is secure. But if all you need is deterministic resp. convergent encryption, what do you think you gain compared to a nonce of zero?

@dequbed The honest answer is that I have no idea :)

My rationale was something along the lines of: stay as close as possible to the standard recommended approach, and verify that specific deviations do not break the security model (I do not like rolling my own crypto).

By that reasoning, the closest thing that does what I want is "it still has a nonce, but it can be derived from the content/key". It's very possible that that's functionally indistinguishable from a nonce of zero - I simply don't know whether that is true! And so I did not take that step in my approach (yet).

@joepie91 Well, the standard approach is probably not to use libsodium's secret box — it's a very general purpose encryption primitive which means it tries very hard to be e.g. semantically secure which you explicitly do not want.

@dequbed What would a more standard approach be for this usecase?

@joepie91 I usually see convergent encryption being implemented by using a KDF on the input data to derive a key and encrypt the input data (plus padding) using that key with a fixed zero IV/nonce. I'd personally default to AES-CBC + HMAC but that's a tradeoff depending on what exactly you're trying to do.
If you need deterministic but not *convergent* encryption you would seed the KDF with some secret you have.

@joepie91 Mind you, I'm leaving out many important details here for brevity to give an overall direction, if you want to implement this ping me on Matrix or Signal so I can give you a more detailed rundown.

@joepie91 it sounds like what you want is github.com/maidsafe/self_encry

I don't think libsodium has an approach that allows for that directly as it is designed to make devs do the safest crypto for specific well-known use cases by forcing certain practices. I have not seen this sort of scheme be widely adopted yet...

@ben That library is not very confidence-inspiring, to be honest - I haven't forgotten about Maidsafe's original sketchy business model (that they now pretend they've never had), and it speaks of an "additional obfuscation step" but then doesn't seem to provide any details about how that works or why it would be more secure than other approaches (or its vulnerability or lack thereof to known attacks against convergent encryption).

@joepie91 I think the white paper is straightforward enough and though there are obvious questions unanswered (like why obfuscate again after encryption at all?),it is the only crypto I've ever seen attempt to have deduplicatable results. In almost all other crypto, the attempt is to always avoid having the same plain yield the same ciphered text for good reasons. But if you want to deduplicate the results, this is the only I've ever seen attempt to provide that...

@joepie91 the crypto in the white paper is left open, the steps are pretty simple and reasonable enough to follow. It should be easy to reimplement a similar algorithm with more modern crypto (and without the massage dependency tree pulled in). Just not seen anyone trying. Let me know if you find one!

@ben There's quite a bit of history of convergent encryption in P2P software, long predating Maidsafe. Some notable ones include Freenet, GNUNet, and Tahoe-LAFS.

But crucially, there are several known attacks: tahoe-lafs.org/hacktahoelafs/d - and so if an implementation claims that it is "as safe as any other modern encryption algorithm", that is a strong claim that requires supporting rationale (which I do not see here).

@zHXyHkzWuwUI It's also not deterministic, though; see the second property listed on that page

Sign in to participate in the conversation
Pixietown

Small server part of the pixie.town infrastructure. Registration is closed.