Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

harden GFp_memcmp #893

Open
lmrs2 opened this issue Aug 27, 2019 · 3 comments
Open

harden GFp_memcmp #893

lmrs2 opened this issue Aug 27, 2019 · 3 comments

Comments

@lmrs2
Copy link

lmrs2 commented Aug 27, 2019

GFp_memcmp may be optimized by compiler and make function not constant time.
Reasoning: the caller of the function only checks if the returned value is 0 or non-zero. As soon as the returned value is non-zero, the function can exit early. The compiler may be able to infer this on its own:
I tested this with gcc (Ubuntu 4.8.5-4ubuntu8~14.04.2) 4.8.5 with the following code:

compile with gcc -O3 test.c -o test

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>


int GFp_memcmp(const uint8_t *a, const uint8_t *b, size_t len) {
  uint8_t x = 0;
  size_t i = 0;
  for (i = 0; i < len; i++) {
    x |= a[i] ^ b[i];
  }

  return x;
}

int main(int argc, char *argv[]) {

if (!(argc >= 3)) {
printf("Usage: %s a b\n", argv[0]);
return -1;
}

const char * a = argv[1];
const char * b = argv[2];

printf("Comparing %s and %s\n", a, b);
int r = GFp_memcmp((uint8_t*)a, (uint8_t*)b, strlen(a));
if (r) {
printf("different\n");
} else {
printf("same\n");
}

return 0;
}

The compiler inlined the call to GFp_memcmp() and unrolled the loop, looking like:

400a3a: 44 0f b6 44 0e 01   movzbl 0x1(%rsi,%rcx,1),%r8d
400a40: 44 32 44 0f 01       xor    0x1(%rdi,%rcx,1),%r8b
400a45: 44 09 c0                or     %r8d,%eax
400a48: 4c 8d 41 02           lea    0x2(%rcx),%r8
400a4c: 4c 39 c2                cmp    %r8,%rdx                                
400a4f: 0f 86 37 01 00 00   jbe    400b8c <GFp_memcmp+0x25c>  EARLY EXIT
400a55: 44 0f b6 44 0e 02   movzbl 0x2(%rsi,%rcx,1),%r8d
400a5b: 44 32 44 0f 02       xor    0x2(%rdi,%rcx,1),%r8b
400a60: 44 09 c0                or     %r8d,%eax
400a63: 4c 8d 41 03           lea    0x3(%rcx),%r8
400a67: 4c 39 c2                cmp    %r8,%rdx                                 
400a6a: 0f 86 1c 01 00 00  jbe    400b8c <GFp_memcmp+0x25c>  EARLY EXIT
400a70: 44 0f b6 44 0e 03 movzbl 0x3(%rsi,%rcx,1),%r8d
400a76: 44 32 44 0f 03      xor    0x3(%rdi,%rcx,1),%r8b
...             

where address 400b8c contains the call to printf.
Note that I tested with gcc (did not work with clang).

One trick to help is to mark the returned value as volatile, forcing the compiler to evaluate each comparison in the loop:

volatile uint8_t x = 0;
@briansmith
Copy link
Owner

/cc @davidben.

I'm quite reluctant to do weird things with volatile but I agree we probably should do something.

@josephlr
Copy link
Contributor

josephlr commented Aug 28, 2019

I was able to reproduce the above issue. However, in the current implementation, this will only be an issue if LTO is turned on.

GFp_memcmp is defined in it's own translation unit, and all of the callers (including the Rust callers), only have access to the function's declaration not its definition, so none of the callers will be able to inline the function. This can be demonstrated by taking @lmrs2's example, splitting it into test2.c, memcmp.c, and memcmp.h, having test.c include memcmp.h, and compling with gcc -O3 test2.c memcmp.c -o test2. GFp_memcmp will not be inlined, preventing early return issues.

However, if LTO is enabled for all of ring's code, GFp_memcmp can be inlined. Also, as Rust supports cross-language inlining, both the Rust and C calls to GFp_memcmp are at risk.

I agree with @briansmith that something should be done, as binaries which transitivly depend on ring may set lto = true or lto = 'thin'. However, using volatile prevents the use of many useful optimizations. The ideal solution would be to prevent GFp_memcmp from being subject to LTO.

@davidben
Copy link
Contributor

davidben commented Aug 28, 2019

I'm not sure those early exits are actually a problem. Looking at more complete compiler output, it's vectorized the bulk of the loop and then the bit at the bottom is an unrolled tail (unrolled since, after vectorization, it has bounded size). That unrolled bit indeed needs to branch because it needs to check the i < len condition. In particular, it's comparing %rcx + 2, %rcx + 3, etc. (the lea) against %rdx. %rcx is how many bytes were handled by the bulk vectorized bit, and %rdx is the length.

In the general case, yeah, constant-time code in general is always at the mercy of the compiler. We have to play this game of handwaiving and judiciously applying value barriers until the right language-level primitives are available.

archlinux-github pushed a commit to archlinux/aur that referenced this issue Jul 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants