p256 point multiplication got ~25% slower in 0.12 #882

randombit · 2023-04-17T16:00:42Z

We just noticed in some internal benchmarks that p256 performance for point multiplication and point addition both became notably slower in the 0.12 release. This is replicated by the benchmarks in p256 as well, at least for point multiplication (seems point addition is not benchmarked):

0.11.1

point operations/point-scalar mul
                        time:   [104.03 us 104.21 us 104.42 us]

Current master

point operations/point-scalar mul
                        time:   [134.47 µs 134.63 µs 134.79 µs]

Both run on the same x86-64 Linux laptop, otherwise idle, Rust 1.65.

We looked at the release notes but don't see any reference to a change that suggests a performance slowdown, I also reviewed the diff and don't see any obvious culprits. Wanted to open an issue to check if this performance change was expected, and if so if there is any way to improve the situation to reach parity with 0.11 performance.

The text was updated successfully, but these errors were encountered:

tarcieri · 2023-04-17T16:34:38Z

The biggest change was probably switching to the generic curve arithmetic implementation in the primeorder crate. Perhaps that impacted inlining?

randombit · 2023-04-17T19:27:13Z

I didn't have much luck bisecting so I wrote a script that just benchmarks every commit

#!/usr/bin/python

import subprocess
import re
import sys

def run_command(cmdline, cwd=None):
    proc = subprocess.run(cmdline, cwd=cwd, capture_output=True)
    return proc.stdout.decode('utf8')

time = re.compile(' +time: +\[(.*)\]')

for commit in open('p256_commits'):
    commit = commit.strip()
    run_command(['git', 'checkout', commit])
    at_commit = run_command(['git', 'rev-parse', 'HEAD']).strip()

    assert(commit == at_commit)

    output = run_command(['cargo', 'bench', 'point-'], cwd='p256')

    for line in output.split('\n'):
        m = time.match(line)
        if m:
            print(commit, m.group(1))
            break

    sys.stdout.flush()

where p256_commits contains the ID for every commit between 0.11.1 and HEAD that touched p256:

$ git log p256/v0.11.1.. p256 | grep ^commit | cut -c 8- > p256_commits

My results from this seem to implicate especially cea8f60, across two runs there is clear jump there

cea8f60ff76ba4f447a5169b9fb44f788b1217c4 149.26 µs 149.72 µs 150.16 µs
f1878f985211937b25308401ba64154256f3d308 129.89 µs 130.30 µs 130.73 µs

and

cea8f60ff76ba4f447a5169b9fb44f788b1217c4 150.13 µs 150.64 µs 151.20 µs
f1878f985211937b25308401ba64154256f3d308 130.01 µs 130.49 µs 131.02 µs

tarcieri · 2023-04-17T20:28:27Z

Nice sleuthing! This makes me want to set up something like https://github.com/bencherdev/bencher to watch for these sorts of regressions.

I'll have to see if I can spot how this impacted codegen.

tarcieri · 2023-04-24T02:46:00Z

I opened #885 which might address this issue

tarcieri · 2023-11-10T01:30:46Z

I think this is addressed now?

tarcieri mentioned this issue Apr 24, 2023

p256: revert primeorder field impls #885

Merged

tarcieri mentioned this issue May 1, 2023

Continuous benchmarking RustCrypto/actions#30

Open

tarcieri closed this as completed Nov 10, 2023

tarcieri mentioned this issue Mar 9, 2024

p256: add 32-bit arithmetic and optimizations #1033

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

p256 point multiplication got ~25% slower in 0.12 #882

p256 point multiplication got ~25% slower in 0.12 #882

randombit commented Apr 17, 2023

tarcieri commented Apr 17, 2023

randombit commented Apr 17, 2023

tarcieri commented Apr 17, 2023 •

edited

Loading

tarcieri commented Apr 24, 2023

tarcieri commented Nov 10, 2023

p256 point multiplication got ~25% slower in 0.12 #882

p256 point multiplication got ~25% slower in 0.12 #882

Comments

randombit commented Apr 17, 2023

tarcieri commented Apr 17, 2023

randombit commented Apr 17, 2023

tarcieri commented Apr 17, 2023 • edited Loading

tarcieri commented Apr 24, 2023

tarcieri commented Nov 10, 2023

tarcieri commented Apr 17, 2023 •

edited

Loading