-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
alg/dict: SlidingWindow: max zeroes and shortening #110
base: master
Are you sure you want to change the base?
Conversation
This change adds new variations of SlidingWindow and modifies Hybrid to use an arbitrary Decomposer after runs are removed. SlidingWindowRTL and SlidingWindowShortRTL construct the windows from least to most significant bit (right-to-left) instead of the default left-to-right approach. SlidingWindowShort and SlidingWindowShortRTL incorporate a "shortening" heuristic that sometimes cuts windows short. The Z parameter restricts the maximum number of zeroes that may appear in a window. If a window is maximum length, contains at least one zero, and the bit following the window is a one, then the window is shortened in order to yield all trailing ones to the next window. This new behavior was inspired by the windowing technique used for the upper half of smooth isogeny primes in [isogenychains]. This update also adds p512-2 from [isogenychains] to the result set. Improvements with the new Ensemble: - p256_scalar improved from +2 to -1 - p384_scalar improved from +1 to +0 - isop512_field (new) is -3 - p2519_field improved from 263 to 261 Notably, the isop512_field results are better than [isogenychains] when using their weighting metric (square = 0.8 * multiply).
Really sorry I missed this! I'll need to take some time to go through this in more detail, but based on a quick skim it's looking great, and the results are exciting too, so I expect we can land this soon :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With these tweaks, the addition chain would be 289 = 253 doubles + 36 additions, saving one doubling and two additions compared to my previous result.
Analogous comments apply to the other chains.
This is very nice work!
i286 = ((i257 << 7 + _111111) << 10 + _1100011) << 10 | ||
return (_10010101 + i286) << 6 + _1111 | ||
_111 = _10 + _101 | ||
_1000 = 1 + _111 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be an optimization pass that replaces additions that yield an even number with doublings. This should become _1000 = 2 * _100
to replace one addition with a doubling.
_111 = _10 + _101 | ||
_1000 = 1 + _111 | ||
_1110 = 2*_111 | ||
_10000 = _10 + _1110 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, this can be replaced with a doubling.
x16 = _11111111 + i28 | ||
i37 = i28 << 8 | ||
x24 = x16 + i37 | ||
x32 = i37 << 8 + x24 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is strange that this changed from x32 = x16 << 16 + x16
to instead compute x24 unnecessarily (IIUC). If we fixed this then the addition chain would be two shorter than my previously-published one.
i190 = ((i169 << 4 + _101) << 8 + _1011011) << 7 | ||
i210 = ((_100111 + i190) << 9 + _101111) << 8 + _101111 | ||
i229 = ((_1110 + i210) << 11 + _1001111) << 5 + _111 | ||
i249 = (i229 << 9 + _11011111 + _1000) << 8 + _101011 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure why this this addition of _11100111 = _11011111 + _1000
gets inlined here whereas none of the others seem to. It makes it harder to follow. Regardless, computing _11100111
doesn't help; I avoided computing it and then redid the middle of this chain with the remaining windows, which saved an additional addition.
_10111 = _1000 + _1111 | ||
_11001 = _10 + _10111 | ||
_11011 = _10 + _11001 | ||
_11111 = _1000 + _10111 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming it makes sense to calculate 11111:
At this point you have 11111 and 111
, so you can compute x8 = 11111 << 3 + 111
, or x10 = 11111 << 5 + 11111
.
We need 6 * 8 * 4 = 192 + 2 = 194 consecutive ones to start.
x20 = x10 << 10 + x10
x40 = x20 << 20 + x20
x80 = x40 << 40 + x40
x160 = x80 << 80 + x80
x180 = x160 << 160 + x20
x190 = x180 << 10 + x10
x194 = x140 << 4 + _1111
EDIT: This would save one addition:
x20 = x10 << 10 + x10
x24 = x20 << 4 + _1111
x48 = x24 << 24 + x24
x96 = x48 << 48 + x48
x192 = x96 << 96 + x96
To get to x194 we could do:
x194 = x192 << 2 + _11
But as these two bits are in the "random" part of the addition chain, doing something else is likely better.
i23 = i17 << 5 + i17 | ||
i34 = i23 << 10 + i23 | ||
i61 = (i34 << 4 + _11111000) << 21 + i34 | ||
i113 = (i61 << 3 + _1111100) << 47 + i61 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is confusing as to why we're adding even numbers where the least significant bits don't contribute anything. I wonder if this indicates a bug in the new windowing algorithm where it doesn't realize that trailing zeros are worthless.
As discussed in #56. I finally got around to submitting these improvements.
This change adds new variations of SlidingWindow and modifies
Hybrid to use an arbitrary Decomposer after runs are removed.
SlidingWindowRTL and SlidingWindowShortRTL construct the windows
from least to most significant bit (right-to-left) instead of the
default left-to-right approach. SlidingWindowShort and
SlidingWindowShortRTL incorporate a "shortening" heuristic that
sometimes cuts windows short. The Z parameter restricts the
maximum number of zeroes that may appear in a window. If a window
is maximum length, contains at least one zero, and the bit
following the window is a one, then the window is shortened in
order to yield all trailing ones to the next window.
This new behavior was inspired by the windowing technique used for
the upper half of smooth isogeny primes in [isogenychains]. This
update also adds p512-2 from [isogenychains] to the result set.
Improvements with the new Ensemble:
Notably, the isop512_field results are better than [isogenychains]
when using their weighting metric (square = 0.8 * multiply).