harden GFp_memcmp #893

lmrs2 · 2019-08-27T00:38:40Z

GFp_memcmp may be optimized by compiler and make function not constant time.
Reasoning: the caller of the function only checks if the returned value is 0 or non-zero. As soon as the returned value is non-zero, the function can exit early. The compiler may be able to infer this on its own:
I tested this with gcc (Ubuntu 4.8.5-4ubuntu8~14.04.2) 4.8.5 with the following code:

compile with gcc -O3 test.c -o test

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>


int GFp_memcmp(const uint8_t *a, const uint8_t *b, size_t len) {
  uint8_t x = 0;
  size_t i = 0;
  for (i = 0; i < len; i++) {
    x |= a[i] ^ b[i];
  }

  return x;
}

int main(int argc, char *argv[]) {

if (!(argc >= 3)) {
printf("Usage: %s a b\n", argv[0]);
return -1;
}

const char * a = argv[1];
const char * b = argv[2];

printf("Comparing %s and %s\n", a, b);
int r = GFp_memcmp((uint8_t*)a, (uint8_t*)b, strlen(a));
if (r) {
printf("different\n");
} else {
printf("same\n");
}

return 0;
}

The compiler inlined the call to GFp_memcmp() and unrolled the loop, looking like:

400a3a: 44 0f b6 44 0e 01   movzbl 0x1(%rsi,%rcx,1),%r8d
400a40: 44 32 44 0f 01       xor    0x1(%rdi,%rcx,1),%r8b
400a45: 44 09 c0                or     %r8d,%eax
400a48: 4c 8d 41 02           lea    0x2(%rcx),%r8
400a4c: 4c 39 c2                cmp    %r8,%rdx                                
400a4f: 0f 86 37 01 00 00   jbe    400b8c <GFp_memcmp+0x25c>  EARLY EXIT
400a55: 44 0f b6 44 0e 02   movzbl 0x2(%rsi,%rcx,1),%r8d
400a5b: 44 32 44 0f 02       xor    0x2(%rdi,%rcx,1),%r8b
400a60: 44 09 c0                or     %r8d,%eax
400a63: 4c 8d 41 03           lea    0x3(%rcx),%r8
400a67: 4c 39 c2                cmp    %r8,%rdx                                 
400a6a: 0f 86 1c 01 00 00  jbe    400b8c <GFp_memcmp+0x25c>  EARLY EXIT
400a70: 44 0f b6 44 0e 03 movzbl 0x3(%rsi,%rcx,1),%r8d
400a76: 44 32 44 0f 03      xor    0x3(%rdi,%rcx,1),%r8b
...

where address 400b8c contains the call to printf.
Note that I tested with gcc (did not work with clang).

One trick to help is to mark the returned value as volatile, forcing the compiler to evaluate each comparison in the loop:

volatile uint8_t x = 0;

The text was updated successfully, but these errors were encountered:

briansmith · 2019-08-28T03:50:31Z

/cc @davidben.

I'm quite reluctant to do weird things with volatile but I agree we probably should do something.

josephlr · 2019-08-28T06:14:04Z

I was able to reproduce the above issue. However, in the current implementation, this will only be an issue if LTO is turned on.

GFp_memcmp is defined in it's own translation unit, and all of the callers (including the Rust callers), only have access to the function's declaration not its definition, so none of the callers will be able to inline the function. This can be demonstrated by taking @lmrs2's example, splitting it into test2.c, memcmp.c, and memcmp.h, having test.c include memcmp.h, and compling with gcc -O3 test2.c memcmp.c -o test2. GFp_memcmp will not be inlined, preventing early return issues.

However, if LTO is enabled for all of ring's code, GFp_memcmp can be inlined. Also, as Rust supports cross-language inlining, both the Rust and C calls to GFp_memcmp are at risk.

I agree with @briansmith that something should be done, as binaries which transitivly depend on ring may set lto = true or lto = 'thin'. However, using volatile prevents the use of many useful optimizations. The ideal solution would be to prevent GFp_memcmp from being subject to LTO.

davidben · 2019-08-28T13:06:24Z

I'm not sure those early exits are actually a problem. Looking at more complete compiler output, it's vectorized the bulk of the loop and then the bit at the bottom is an unrolled tail (unrolled since, after vectorization, it has bounded size). That unrolled bit indeed needs to branch because it needs to check the i < len condition. In particular, it's comparing %rcx + 2, %rcx + 3, etc. (the lea) against %rdx. %rcx is how many bytes were handled by the bulk vectorized bit, and %rdx is the length.

In the general case, yeah, constant-time code in general is always at the mercy of the compiler. We have to play this game of handwaiving and judiciously applying value barriers until the right language-level primitives are available.

Related: briansmith/ring#1444 Related: briansmith/ring#893

archlinux-github pushed a commit to archlinux/aur that referenced this issue Jul 20, 2023

Update to 0.8.0 and force !lto

a2391ef

Related: briansmith/ring#1444 Related: briansmith/ring#893

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

harden GFp_memcmp #893

harden GFp_memcmp #893

lmrs2 commented Aug 27, 2019 •

edited

Loading

briansmith commented Aug 28, 2019

josephlr commented Aug 28, 2019 •

edited

Loading

davidben commented Aug 28, 2019 •

edited

Loading

harden GFp_memcmp #893

harden GFp_memcmp #893

Comments

lmrs2 commented Aug 27, 2019 • edited Loading

briansmith commented Aug 28, 2019

josephlr commented Aug 28, 2019 • edited Loading

davidben commented Aug 28, 2019 • edited Loading

lmrs2 commented Aug 27, 2019 •

edited

Loading

josephlr commented Aug 28, 2019 •

edited

Loading

davidben commented Aug 28, 2019 •

edited

Loading