Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

p256 point multiplication got ~25% slower in 0.12 #882

Closed
randombit opened this issue Apr 17, 2023 · 5 comments
Closed

p256 point multiplication got ~25% slower in 0.12 #882

randombit opened this issue Apr 17, 2023 · 5 comments

Comments

@randombit
Copy link
Contributor

We just noticed in some internal benchmarks that p256 performance for point multiplication and point addition both became notably slower in the 0.12 release. This is replicated by the benchmarks in p256 as well, at least for point multiplication (seems point addition is not benchmarked):

0.11.1

point operations/point-scalar mul
                        time:   [104.03 us 104.21 us 104.42 us]

Current master

point operations/point-scalar mul
                        time:   [134.47 µs 134.63 µs 134.79 µs]

Both run on the same x86-64 Linux laptop, otherwise idle, Rust 1.65.

We looked at the release notes but don't see any reference to a change that suggests a performance slowdown, I also reviewed the diff and don't see any obvious culprits. Wanted to open an issue to check if this performance change was expected, and if so if there is any way to improve the situation to reach parity with 0.11 performance.

@tarcieri
Copy link
Member

The biggest change was probably switching to the generic curve arithmetic implementation in the primeorder crate. Perhaps that impacted inlining?

@randombit
Copy link
Contributor Author

I didn't have much luck bisecting so I wrote a script that just benchmarks every commit

#!/usr/bin/python

import subprocess
import re
import sys

def run_command(cmdline, cwd=None):
    proc = subprocess.run(cmdline, cwd=cwd, capture_output=True)
    return proc.stdout.decode('utf8')

time = re.compile(' +time: +\[(.*)\]')

for commit in open('p256_commits'):
    commit = commit.strip()
    run_command(['git', 'checkout', commit])
    at_commit = run_command(['git', 'rev-parse', 'HEAD']).strip()

    assert(commit == at_commit)

    output = run_command(['cargo', 'bench', 'point-'], cwd='p256')

    for line in output.split('\n'):
        m = time.match(line)
        if m:
            print(commit, m.group(1))
            break

    sys.stdout.flush()

where p256_commits contains the ID for every commit between 0.11.1 and HEAD that touched p256:

$ git log p256/v0.11.1.. p256 | grep ^commit | cut -c 8- > p256_commits

My results from this seem to implicate especially cea8f60, across two runs there is clear jump there

cea8f60ff76ba4f447a5169b9fb44f788b1217c4 149.26 µs 149.72 µs 150.16 µs
f1878f985211937b25308401ba64154256f3d308 129.89 µs 130.30 µs 130.73 µs

and

cea8f60ff76ba4f447a5169b9fb44f788b1217c4 150.13 µs 150.64 µs 151.20 µs
f1878f985211937b25308401ba64154256f3d308 130.01 µs 130.49 µs 131.02 µs

@tarcieri
Copy link
Member

tarcieri commented Apr 17, 2023

Nice sleuthing! This makes me want to set up something like https://github.com/bencherdev/bencher to watch for these sorts of regressions.

I'll have to see if I can spot how this impacted codegen.

@tarcieri
Copy link
Member

I opened #885 which might address this issue

@tarcieri
Copy link
Member

I think this is addressed now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants