-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement E2E key verification process #2142
Comments
For example, Alice wants to verify Bob’s device:
Future alternatives might involve scanning a QR code or using an NFC bump. It’s possible to design a process which establishes verification in both directions (so Bob will also know about Alice’s device), but it’s Complicated. |
(See also #2622: the current UX encourages people to stab at 'Verify' to make the ugly triangles go away) |
Would you need to type it in? Could it not be displayed on both screens like with Signal? |
The implementations in Whatsapp and signal are pretty good UX, except that they use a hexidecimal string for vocal verification. Something like this would be better, keybase uses this and I feel it's very easy to use. |
enough of the rest of e2e is now in place that this should be p1 |
This is fantastic! I've a lot to say about this, and I'm not sure if this is the right ticket or it should have a design doc or be split up. Key things that come to mind:
Phew! |
We ought to be keeping eyes on other ways of transferring keys as well, such as keybase or the new key transparency project from Google. If there are existing methods of key verification, we could either use them or learn from them. |
One other option to consider is whether we can create a more user friendly challenge/response system. If the verifying person can ask the person to be verified a question (obviously over an encrypted channel) that only they would know the answer to and that person can respond with a signed response, that level of security is probably good enough for most threat models. |
Some of my recent thoughts regarding this:
[end wall of text] |
What exactly are we planning on validating through this process? @ara4n seemed to imply in matrix-dev that this was meant to be a per-user+device-pair process, but the language we're using in this issue appears to imply that this will be a "here is my device's public key but as a bunch of simple words." I made this pull request to make a maximally convenient mechanism for smartphones, but this definitely falls flat for situations where you aren't physically nearby. VoIP calls seem good in theory, but especially given how frequently keys are changing due to the brower-based nature of riot, it seems inconvenient to not have a fast and complete way to operate! How comfortable are we in providing multiple separate mechanisms to confirm keys? I'm fine having a "code phrase" as a mechanism which works entirely within Riot, but QR codes provide a faster, easier way for those who have the appropriate equipment and are physically close by. We could present it as a graceful degradation:
Independent of anything else, do we want to make some syntax for a "validate statement"? I was thinking we could have a string like: The QR code could contain a statement like this and Riot would validate the set (presenting a "I see that information exists for ${user} would you like accept this device as verified?"). I think this would also lead nicely into a mechanism where a client could share keys between devices: my phone and desktop are E2E verified, and I can share/send these validation statements from my phone back to my desktop when I visit a friend. |
@uhoreg and @kscz are thinking along similar lines to me here. I had previously written a few words about this at https://docs.google.com/document/d/1Exe28_ymBPXQbJkhB0lHWgH4Mjb1duwj62NjUKUz0Uc (see in particular "verification interface" and the footnotes), but I don't think I transcribed it here, for which apologies. Multiple means of verification would work well here, with users picking the most convenient. One other thought had been to investigate whether it was possible to verify in both directions with a single qr code scan or phrase exchange. |
From the Signal blog post, it looks like they're doing that by just concatenating two individual device fingerprints, which is simple enough to do. They say:
Though I'm not sure if I would do a straight concatenation, though, if we do something like display a word list on both devices and let the users compare, since some users will start comparing from the beginning, and say "OK, it's the same so far, that's good enough", and essentially only verify one of the two keys. But we could do something like: concat the keys and run the concatenated string through sha256. |
I think that there are three possible ways that key verification can happen:
For 1, Alice needs to be able to extract her keys to sign and post, and Bob needs to be able to verify her key by themselves. This is fairly similar to normal PGP key verification. But there may be multiple devices involved, so it may be nice for Alice to be able to say: "these are all my keys: [blob]", and allow Bob to import the blob and verify all Alice's devices in one go. For 2, to make things easier, we can combine Alice's and Bob's keys somehow, so that Alice and Bob only need to compare one string to verify the keys simultaneously, similar to what Signal does. However, Alice and Bob may have multiple devices to verify, so do they select all the keys they want to verify and mash them together (which would require them to tell each other what devices they are verifying)? Or else perform m x n key pair verifications? (e.g. Alice's device A with Bob's device a, Alice's device A with Bob's device b, etc. yuck) Or just treat it as 2 times the first case? Or assume that they'll each just verify one key each and rely on #2714 to get the other devices verified? For 3, I think that we can make things easier by creating a temporary secure channel between Alice and Bob's devices for the purpose of verifying the keys. It would go something like this (warning: lots and hand waving ahead): Alice and Bob tell their devices they want to verify each others' keys. Their devices contact each other somehow and set up a shared secret (maybe something Diffie-Hellman-ish), and then display a code to Alice and Bob. Alice and Bob verify the code with each other to make sure that there was no MITM (as in ZRTP). Once the code is verified, the devices use the secure channel to send their keys to each other, and mark the key that they received as verified. The advantage of this over treating 3 the same as 2 is that since the shared secret is verified immediately after being created, the code used to verify it doesn't have to be as long since an opponent would not have much time to brute force (e.g. it would probably suffice to compare 2 or 3 words from a sufficiently long wordlist. Even comparing a list of emoji might be manageable). [end wall of text] Thoughts? |
I've put a slightly more fleshed out flow for case 3 from my previous comment at: https://cryptpad.fr/pad/#/1/edit/0eqaXl2wZhiq5s5Ty-wzcg/QL4+6V9poZCmbZI0ispNMFbU/ @richvdh can you take a look and see if it seems sane? |
@uhoreg reading through the proposal, I'm not onboard. The main issue is that the whole point of end-to-end encryption is that you don't trust the intermediate server(s), and since you're relying on an effectively unauthenticated path for the "use DH key exchange" you don't have much ability to assume that the server didn't MitM the handshake and just replace the handshake and verification code with a different code. As a quick response to your initial comment, I would say that all of those key verification methodologies can co-exist. We basically already have the first one by virtue of the For the second point I need to go read up on what Signal is doing. For the third point, the issue is that you need to have the secure channel inherently contain ancillary information which allows you to confirm that it isn't being tampered with. When you're confirming keys, you can't really, programatically know that the person on the other side of the connection is who they claim they are. Either you need to do something like be in person and not use the server to communicate, or you use a channel over the server which would be infeasible for the server to intercept and replace the contents (i.e. a video or voice connection). The way I foresee these handshakes going in person is something more to the effect of... 1. Simple QR code sharing (least room for error in impl; highest user friction)
Main advantages for this proposal:
Main disadvantages for this proposal:
2. QR code sharing with bi-directional confirmation (more possibility of developer error; lower user friction)
Main advantages for this proposal:
Main disadvantages for this proposal:
3. The Secret Question Method (or the off-the-record method; very similar to what @uhoreg wrote)
Main advantages for this proposal:
Main disadvantages for this proposal:
Summary:I think doing the work for all of these proposals is feasible and reasonable to do within a good timeframe. The main disadvantage to the QR-code-based proposals is that they're blocked on the complete Any other devs want to chime in on which they would like to see first? I'm pretty motivated to start working on any of the three at this point. I still need to add in a Footnotes:[1] - I think you can leave out the hash here to make guessing the answer computationally infeasible for an attacker (you have to hash then validate the signature for the hash, no shortcuts). I may be wrong though, in which case you should add in the hash and the signature of that hash. |
@kscz That's the point of the handshake code, is that it's derived from the shared secret, and so the only way that the handshake codes are the same is that there wasn't a MitM (or that the adversary has infinite computing power). The attacker can't just "replace" the DH shared secret with their chosen secret, and so they're stuck trying different values to see if they can generate a collision. Hence all the calculations in the long paragraph in my proposal in the cryptpad.fr doc. Is there a specific attack that you're thinking of? The handshake is essentially the handshake from ZRTP, which was designed by someone who knows more than I do about cryptography.
My own impression is that if we have a good synchronous verification system, then we shouldn't worry too much about a good asynchronous two-way verification system. Regarding the "Secret Question Method", given that it's already implemented in a system, it would be interesting to find out what users' experiences are with it. (I haven't tried it out myself.) FTR, I don't have any opinions about QR codes, except that it should be implemented in some way since it's convenient for users, and that there must be some other method implemented as well for those who can't use QR codes. |
@uhoreg Ah, I missed the fact that ZRTP involves the step "verbally compare the Short Authentication String (SAS)". I also failed to read some of your description correctly: the comment "hash it (see below) and distill it into a short 'handshake code'" didn't read correctly when I was trying to understand your proposal. That effectively has the constraint then that you must use another channel of some sort, and since you mentioned that this was for in-person verification, that makes perfect sense! My primary concerns with the SAS basically boils down to the same concerns with any similar system where you check a summary:
For the secret question thing, I find it pretty straightforward. The comments about secret fatigue are from my own experience confirming secret keys repeatedly when my friends got new computers. Anyway, apologies for the confusion on ZRTP being possible to man-in-the-middle! I was wrong, and misread things. |
Each key is 256 bits, and there might be many of them...
Isn't this why we hash it and use it to choose words from a wordlist? |
Yeah, the main reason for not wanting to verify the whole key is that it's much longer. As mentioned previously, verifying 256 bits would require checking roughly 22 words selected from a 4096-word word list, and it would need to be done once per key, whereas in my proposal, you most people could probably get away with 6 words and both keys get verified. Though, as I mention, you could have a customizable paranoia level and check a few more words. For example, if you verify 10 words and your attacker has 5,000,000,000 times more computing power than you do, then they have a less than a 1/100,000,000 chance of a successful attack. Of course, there should also be an option for checking the full key for people who are super paranoid. BTW, one more downside to the OTR "Secret Question Method" is that it's vulnerable to shoulder surfing. That is, if an attacker observes Alice typing in the secret answer, then the the attacker can impersonate both sides, and there's no way at all to check it. (I've tweaked my proposal a bit to reduce the time that an attacker is able to try to attack the DH portion.) |
@richvdh Well yes, but things like the QR code solve that. When you're not in person and using something like a voice call, having a shorter verification path is nice, but you can establish a more high-bandwidth, non-lossy channel when in person (NFC, QR code, bluetooth, etc). This is my point about preferring, if possible, to start with trying to solve for the theoretically perfect solution (i.e. actually confirming the whole key) before considering the options which make tradeoffs.
@richvdh Ah, fair! This still doesn't fix my primary complaint that I don't like using a summary version of the key, but that does solve my gripe about readability. The downside to this is largely that it makes it so that the Riot version of this protocol mandates a wordlist for the implementation in other clients. Something which involves a simpler "show a QR code" or "ask a question and check the answer" has less shared code/large tables of words which have to be used across implementations. Also, as others have pointed out, when trying to establish a wordlist has translation challenges.
@uhoreg But that only matters if the way that you're checking via voice, no? Can we start with another mechanism once we're in-person?
I'm not sure I understand how, what would someone do if they saw the answer? The users don't have to enter that secret more than once, and it only serves to validate that the holder of the public key is the person you think it is at that moment. Ideally, users would cycle between multiple personal questions per device pairing. In person, assuming the "secret question" method was my only verification choice aside from checking the whole key manually, I wouldn't bother using an actual question/answer at all, I would say "Hey Alice, let's verify keys! I'm going to enter in 'buzz112358BLAST)(iguanaGandalf?Rice!'" and then Alice would just enter in that key. |
I might have missed something in the noise, but I didn't think anyone was arguing against QR codes (or other mechanisms like NFC). I think we're discussing alternatives for when QR codes don't work well. |
Yeah, I think everyone is agreed that QR codes or something similar is good, but we still need something usable for people who don't have access to fancy machinery, which is what my proposal is about.
What other mechanism do you suggest for people who can't do QR codes/NFC/etc. Even doing a visual comparison of 22 words can get tedious (and then it has to be done once for each key). And we probably still need to have something for voice-only communication anyways, for those people who want to verify by phone.
It depends on exactly how it's implemented, but assuming a flow like you described above,
|
Ah, okay, so then we're trying to establish the best mechanism for in-person verification given the constraint that we only have voice verification. Got it, I'll stop conflating the two! @uhoreg broad comment on the shoulder-surfing attack is that I don't think the Mallory you have supposed feels very realistic. To accomplish that attack, Mallory would need to already have a fake device in Alice's account with a set of valid credentials (in order to send the verify request to Bob), in addition to controlling the server itself (dropping the request from Alice's real device), and being in-person when Alice and Bob are exchanging keys (to see the question and answer typed), in addition to being fast enough to successfully prompt Bob with a request at the same moment Alice sends her request. I can accept the first two conditions because that's pretty much every attack we're discussing, but the remaining conditions seem challenging for any attacker. Simple fixes:
I guess the real question I have mostly relates to the benefits of the ZRTP method, given the complexity of it's implementation. From what I can tell, the proposal for the ZRTP method requires:
My big complaint about something that complex is that other clients will have to match that implementation to be able to verify keys once E2E is default enabled. A word-list or emoji-list which is canonical for directly comparing the device keys I think I would accept though (with the caveat of course being that I don't actually have any approval power 😛). I think the secret question method could be implemented with a single message containing the device ID, the secret question, and signature of the hash of the answer. If you wanted to limit use to synchronous contexts you could simply add a "valid until" timestamp on the end of that message. |
For question of whether Mallory would be fast enough to send her request at (almost) the same time as Alice, I don't think it's that hard, if Mallory has some specialized software pre-written. She just has to type as Alice types, and hit "Enter" at the same time. Though this is definitely "targetted attack by a government agency/crime ring" territory, and not "random hacker trying to steal your credit card information". (And apologies for initially calling it just a shoulder-surfing attack -- I really should have called it "shoulder-surfing + MitM".) I think that the secret question method as you describe it is vulnerable to a normal MitM attack, though, if the users don't select an answer with enough entropy. The attack would go something like:
So it really relies on Alice and Bob properly selecting a shared secret, which one part that worries me.
It would probably be a set of special to_device messages. I haven't worked out the details, but to me, it doesn't seem that hard. The messages would basically be "Hi, I'm device xxx, and I want to verify with you", "Here are my DH inputs, and parameters for hash", and "My owner says we're good, so given our DH negotiation, here is the HMAC of my key".
I'm assuming you're talking about the DH shared secret, and I would imagine that it would be a word list or emoji list.
Again, it's the length of the comparison. Comparing 2 keys (Alice's and Bob's) would take about 44 words, which most people would run away from. Whereas with the method that I'm proposing, you can probably get reasonable security with 6 words, and pretty good security with 10 words, which would not be that hard for people to do. Of course, someone else should check my math. And maybe do a proper "verifying n bits of the secret handshake is just as secure as verifying m bits of the device key" comparison. I may try my hand at it myself tomorrow; it doesn't seem that hard. I would expect that it would compare pretty favourably, though, since 1) even though we have an n/2 due to birthday attack, that's essentially cancelled out by the fact that we're verifying 2 keys; 2) we can use PBKDF2/bcrypt/something similar to slow the attacker down, whereas bruteforcing a key doesn't have this slowdown; and 3) the attacker only has about 2 seconds to mount the attack, whereas attacking a long-lived key could be done over the course of months or years.
That doesn't seem like much of a complication to me.
I personally don't think that it would be that complicated, compared to all the other e2e stuff, if the DH, hash, and hash-to-words/emoji are implemented in libolm. |
Well, Mallory has to have the server configured to stop verification attempts between Alice and Bob (or at least to delay them for some period). Then she also has to be present, with her computer open at the time that Alice and Bob attempt a verification, and then she needs to see what's typed... is this not defeated by just making the "answer" field a password/opaque field?
Is that any different than your proposal? You have a "paranoia level" setting, and if the users only choose to check the first word, or sends the words they see through Riot (i.e. Bob types "Hey Alice, I see 'waffle monkey peanut piano', that match what you see?") which can similarly be swapped out by a dedicated attacker. All I'm really getting at is: you can't make users care more about trust, and a lazy user will always ruin your security model. In this case, I'm assuming that the server can't cheaply interpret natural language strings and then guess an answer, so even a mediocre question will still provide a pretty high barrier to attack. Especially given that the functionality the server has to undertake is to guess quickly enough (which involves generating a guess, hashing that guess, and then validating that the signature matches that hash). I think the server would struggle to make an appreciable number of guesses before the delay became obvious/an issue. Assuming that the server couldn't interpret natural language strings, what would a dedicated attacker do to cause the server to guess? Have collected sets of words with categories and spin the malicious server into action with hinting? i.e. server tells Mallory "I see this question between Alice and Bob" and Mallory tells the server: "colors" and the server has a categorized wordlist? Or something else?
But you're going to put in a "paranoia level" anyway; why not just make the same thing for the main public keys? Paranoid users can compare more pieces of the public key, and less paranoid users can compare less of the public key. I'm not quite understanding what comparing the DH ephemeral key is buying you over just verbally comparing some (or all) of the main public key.
Since Alice's client and Bob's client connect immediately, the only way the malicious server could find a non-paranoid collision would be to swap out Alice's and Bob's devices for different devices. I guess I can see that:
Though really, it should be a massive red flag to Alice and Bob that they suddenly get new devices on the server that they can't control or delete (Mallory can't trick the clients into having the new malicious identity) |
It is defeated if Mallory can only see the screen, but not if she can observe the keyboard. (Though on many mobile devices, the keyboard is on the screen.)
Yes, because in my proposal, the client can tell you what to verify in order to achieve a certain security level, while in the "secret question" system, it's all up to the user's judgment. In my proposal, the client can tell the user: "if you want to be safe from corporate spying from a medium-sized company, verify n words", whereas with secret questions, the client can't say "Ooh, asking for Bob's mother's maiden name isn't secure enough, but asking the make of his first car should do it." With the "secret question" method, it's entirely up to the user to figure out what is secure and what isn't, and I don't know how much users can be trusted to do that accurately.
It doesn't really need to interpret much natural language, since the class of answers is usually contained in the question itself. If the question is "What is your favourite colour", then the server sees "colour" in the question, and guesses a bunch of colours. If the question is "What is your dog's name", the server sees "dog" and "name", and guesses a bunch of common dog names. Even if it couldn't guess the class of answers, it can have a dictionary of common answers. The dictionary might contain the answer to any of Alice or Bob's secret questions, but more than likely, it will contain answers to a significant number of users' questions.
You say "server", but I'm assuming that the attacker has multiple servers at their disposal. Generating a guess is not hard: you have a dictionary. As for hashing the guess, if the attacker has a dictionary, they can create a rainbow table.
Yes, and I actually already suggested that (see point 2.v.). But again, it's a question of the length of the comparison. I suspect that with my proposal, to get the same level of security, you'd need to compare fewer bits than if you were verifying part of the whole key. Which is why I say later on in the paragraph with the sentence that you quoted: "And maybe do a proper "verifying n bits of the secret handshake is just as secure as verifying m bits of the device key" comparison." Although thinking about it more, I think that the math (or rather the setup) may be a bit harder than I thought at first, and I may just start with "verifying n bits of a secret handshake is just as secure as using an m bit symmetric key" which, although it might not as applicable, will still be somewhat helpful since it will provide a comparison with something that we're more used to thinking about.
That's the same as any of the other proposals, though. Even in my proposal, Alice and Bob are just verifying keys that were downloaded before. Maybe that wasn't clear -- I'll make a note of it in my proposal. |
I won't bother continuing to defend the secret question methodology since both @uhoreg and @richvdh seem to dislike it 😛. I will say that I don't think it's subject to rainbow table issues for a variety of reasons, but if nobody likes it I'll shut up about it! I would like to hear more of a fleshed-out examples of how all the stuff after the DH key confirmation is going to happen: are we adding more message types to the core protocol? What will the messages to confirm stuff look like? Will it simply be transmitting the full signing key to both sides, confirming, and then discarding the DH channel you've formed? I am still curious about any sort of analysis which informs the reason to add the whole DH comparison vs comparing a subset of the signing key! I agree that the analysis is a pain in the butt, but I think we may also be at the point where it's necessary to compare. A random proposal on just comparing the pubkey via word/emoji list:
That way the attacker would have to have a birthday collision with an unknown subset of the key to successfully attack it. And Alice and Bob could always use the whole set of words (maximum paranoia) to defeat an attack if they're truly concerned about it. One major advantage: the server is not involved with this protocol in any way. The point at which the users have verified each others devices is invisible to the server. There's nothing to intercept/alter, so the attack model is a lot simpler. Implementation is entirely in UI, no new messages/whatever.
No, no, that was a comment about the fact that I wasn't understanding this comment:
How would Mallory take advantage of finding a collision for the SAS of a long-lived key? I was trying to figure out how you were pivoting from "the long-lived key is more attackable because it's longer-lived" to an actual attack. Do you have an example of how Mallory would take advantage of this? |
I've added more details to the post-DH phase of my proposal. I also took another look at ZRTP, and noticed that it actually had one more step in it that prevents a MitM from being able to use brute force to attack the DH verification. By having one side send a hash the DH parameters before the DH actually starts, it locks the DH exchange so that a MitM would only have one guess to attack the DH exchange, and they have a 1 in 2^n chance of success if Alice and Bob verify n bits. (That Zimmermann guy is pretty clever.) It doesn't matter how much computing power the attacker has -- they get exactly one guess and that's it. Which means that if Alice and Bob check 4 words from a 4096-word word list, that's 48 bits, and an attacker has a 1 in 2x10^14 chance of success, which is extremely small. 48 bits is also 7 emoji (from a list of 128 emoji) or 12 hex digits (e.g. "0123 4567 89AB"), which is still fairly manageable for people, even with the user-unfriendliness of hex digits. Comparing a subset of the key is doable, but requiring users to enter the bit positions that are being verified just makes it more painful. |
I've implemented a proof-of-concept version of my proposal as a demo page that logs in two sessions and goes through the steps of my proposal. The source code is at https://gitlab.com/uhoreg/matrix_key_verification and I've put up a copy at https://static.uhoreg.ca/key_verification/ |
It would be nice if we could associate a PGP key directly with an account (ie. if an email address associated with an account has an associated PGP key, it should be immediately apparent). Accounts with a published & proven PGP key could then sign and publish their E2EE keys, ideally allowing for seamless in-client E2EE key verification based PGP signatures. |
We now have two proposals for proper verification UIs at:
|
We should implement some sort of process to help people do device verification instead of encouraging them to blindly stab at "verify"
The text was updated successfully, but these errors were encountered: