Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

alg: profile and optimize algorithms #25

Open
mmcloughlin opened this issue Jul 6, 2019 · 1 comment
Open

alg: profile and optimize algorithms #25

mmcloughlin opened this issue Jul 6, 2019 · 1 comment
Labels
perf Performance issues and optimization

Comments

@mmcloughlin
Copy link
Owner

Addition chain generation is very slow. Since this is one time work, it's not critical. However it would be nice if it was faster, even for experimentation.

There's likely some simple optimization opportunities, since it's not been prioritized at all thus far. Profile the code and fix anything obvious.

@mmcloughlin mmcloughlin transferred this issue from mmcloughlin/ec3 Feb 1, 2020
@mmcloughlin mmcloughlin added the perf Performance issues and optimization label Feb 1, 2020
@mmcloughlin mmcloughlin changed the title addchain: profile and optimize algorithms alg: profile and optimize algorithms May 16, 2020
mmcloughlin added a commit that referenced this issue Apr 26, 2021
mmcloughlin added a commit that referenced this issue May 12, 2021
mmcloughlin added a commit that referenced this issue May 12, 2021
Implement comprehensive profiling options in the search subcommand, using the https://github.com/mmcloughlin/profile module. Profiles are configured with the ADDCHAIN_PROFILE environment variable. Removes the `-cpuprofile` option.

Updates #60
Updates #25
mmcloughlin added a commit that referenced this issue May 12, 2021
Use linear 2SUM-like algorithm in the typical case where the chain is
ascending.

Updates #60
Updates #25
mmcloughlin added a commit that referenced this issue May 13, 2021
Use golden test to ensure that all algorithm results remain unchanged. Intended to provide confidence that algorithm outputs have not changed when making risky or delicate changes.

Updates #60
Updates #25
mmcloughlin added a commit that referenced this issue May 13, 2021
The Approximation.Suggest() function is now the largest consumer in CPU profiles. Like #99 it is currently "accidentally cubic" but can be optimized to use a linear outer loop, similar to the 2-SUM problem.

Updates #60
Updates #25
mmcloughlin added a commit that referenced this issue May 13, 2021
Optimize Approximation.Suggest() by replacing linear bigint.Contains() call with a
new bigints.ContainsSorted() function that uses binary search.

Updates #60
Updates #25
mmcloughlin added a commit that referenced this issue May 13, 2021
)

By far the largest number of allocations comes from the creation of bigvector
basis vectors. These just contain zeros and ones. This PR changes the Vector
type to an interface so that the basis type can be implemented without any
allocations at all.

Unfortunately, this does complicate the interface, since it requires the user
to not write to any integers returned from Idx(). Consider this okay since
it's an internal package.

Updates #60
Updates #25
mmcloughlin added a commit that referenced this issue May 13, 2021
The Approximation.Suggest() method is prominent in the allocation profile.
This PR reduces allocations by hoisting them out of the loop.

Updates #60
Updates #25
@mmcloughlin
Copy link
Owner Author

Combined effect of changes in #60 (comment):

name                       old time/op    new time/op    delta
Results/curve25519_field      32.6s ± 0%      0.5s ± 0%  -98.43%
Results/p256_field            13.7s ± 0%      0.3s ± 0%  -97.98%
Results/p384_field            56.0s ± 0%      0.7s ± 0%  -98.71%
Results/secp256k1_field       19.6s ± 0%      0.4s ± 0%  -98.03%
Results/curve25519_scalar     16.0s ± 0%      0.3s ± 0%  -98.21%
Results/p256_scalar           17.9s ± 0%      0.3s ± 0%  -98.11%
Results/p384_scalar           69.1s ± 0%      0.9s ± 0%  -98.76%
Results/secp256k1_scalar      20.1s ± 0%      0.4s ± 0%  -98.05%
Results/p2213_field           23.4s ± 0%      0.4s ± 0%  -98.23%
Results/p222117_field         22.6s ± 0%      0.4s ± 0%  -98.15%
Results/p2519_field           34.4s ± 0%      0.5s ± 0%  -98.44%
Results/p382105_field          121s ± 0%        1s ± 0%  -99.06%
Results/p383187_field          125s ± 0%        1s ± 0%  -99.09%
Results/p41417_field           159s ± 0%        1s ± 0%  -99.17%
Results/p511187_field          328s ± 0%        2s ± 0%  -99.39%
Results/p192_field            8.12s ± 0%     0.23s ± 0%  -97.16%
Results/p224_field            10.7s ± 0%      0.3s ± 0%  -97.60%
Results/goldilocks_field       104s ± 0%        1s ± 0%  -98.99%
Results/secp192k1_field       8.27s ± 0%     0.23s ± 0%  -97.21%
Results/secp224k1_field       13.6s ± 0%      0.3s ± 0%  -97.68%
[Geo mean]                    33.3s           0.5s       -98.45%

name                       old alloc/op   new alloc/op   delta
Results/curve25519_field     1.12GB ± 0%    0.16GB ± 0%  -85.30%
Results/p256_field           92.9MB ± 0%    48.8MB ± 0%  -47.48%
Results/p384_field            368MB ± 0%     135MB ± 0%  -63.23%
Results/secp256k1_field       290MB ± 0%     102MB ± 0%  -64.87%
Results/curve25519_scalar    64.6MB ± 0%    33.5MB ± 0%  -48.09%
Results/p256_scalar           113MB ± 0%      68MB ± 0%  -40.17%
Results/p384_scalar           536MB ± 0%     179MB ± 0%  -66.70%
Results/secp256k1_scalar      222MB ± 0%      96MB ± 0%  -56.79%
Results/p2213_field          1.03GB ± 0%    0.14GB ± 0%  -86.83%
Results/p222117_field         789MB ± 0%     144MB ± 0%  -81.70%
Results/p2519_field          1.22GB ± 0%    0.18GB ± 0%  -85.61%
Results/p382105_field        3.36GB ± 0%    0.34GB ± 0%  -89.92%
Results/p383187_field        3.87GB ± 0%    0.33GB ± 0%  -91.37%
Results/p41417_field         4.71GB ± 0%    0.38GB ± 0%  -91.94%
Results/p511187_field        9.48GB ± 0%    0.56GB ± 0%  -94.12%
Results/p192_field            160MB ± 0%      60MB ± 0%  -62.49%
Results/p224_field            122MB ± 0%      49MB ± 0%  -59.36%
Results/goldilocks_field     1.20GB ± 0%    0.13GB ± 0%  -89.35%
Results/secp192k1_field       167MB ± 0%      68MB ± 0%  -59.08%
Results/secp224k1_field       233MB ± 0%      86MB ± 0%  -63.09%
[Geo mean]                    544MB          124MB       -77.11%

name                       old allocs/op  new allocs/op  delta
Results/curve25519_field      22.4M ± 0%      1.0M ± 0%  -95.47%
Results/p256_field            2.13M ± 0%     0.64M ± 0%  -70.08%
Results/p384_field            7.88M ± 0%     1.06M ± 0%  -86.56%
Results/secp256k1_field       6.24M ± 0%     0.83M ± 0%  -86.74%
Results/curve25519_scalar     1.40M ± 0%     0.56M ± 0%  -60.24%
Results/p256_scalar           2.61M ± 0%     0.79M ± 0%  -69.90%
Results/p384_scalar           11.6M ± 0%      1.3M ± 0%  -88.90%
Results/secp256k1_scalar      4.96M ± 0%     0.91M ± 0%  -81.68%
Results/p2213_field           21.1M ± 0%      0.9M ± 0%  -95.72%
Results/p222117_field         16.2M ± 0%      0.9M ± 0%  -94.51%
Results/p2519_field           24.4M ± 0%      1.0M ± 0%  -95.84%
Results/p382105_field         61.0M ± 0%      1.4M ± 0%  -97.69%
Results/p383187_field         69.8M ± 0%      1.5M ± 0%  -97.92%
Results/p41417_field          83.0M ± 0%      1.5M ± 0%  -98.18%
Results/p511187_field          156M ± 0%        2M ± 0%  -98.78%
Results/p192_field            3.72M ± 0%     0.68M ± 0%  -81.75%
Results/p224_field            2.78M ± 0%     0.66M ± 0%  -76.44%
Results/goldilocks_field      24.2M ± 0%      1.2M ± 0%  -95.04%
Results/secp192k1_field       3.69M ± 0%     0.66M ± 0%  -82.07%
Results/secp224k1_field       5.10M ± 0%     0.77M ± 0%  -84.91%
[Geo mean]                    11.3M           1.0M       -91.54%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
perf Performance issues and optimization
Projects
None yet
Development

No branches or pull requests

1 participant