Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

alg/dict: SlidingWindow: max zeroes and shortening #110

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Nik-U
Copy link

@Nik-U Nik-U commented Jun 8, 2021

As discussed in #56. I finally got around to submitting these improvements.

This change adds new variations of SlidingWindow and modifies
Hybrid to use an arbitrary Decomposer after runs are removed.
SlidingWindowRTL and SlidingWindowShortRTL construct the windows
from least to most significant bit (right-to-left) instead of the
default left-to-right approach. SlidingWindowShort and
SlidingWindowShortRTL incorporate a "shortening" heuristic that
sometimes cuts windows short. The Z parameter restricts the
maximum number of zeroes that may appear in a window. If a window
is maximum length, contains at least one zero, and the bit
following the window is a one, then the window is shortened in
order to yield all trailing ones to the next window.

This new behavior was inspired by the windowing technique used for
the upper half of smooth isogeny primes in [isogenychains]. This
update also adds p512-2 from [isogenychains] to the result set.

Improvements with the new Ensemble:

  • p256_scalar improved from +2 to -1
  • p384_scalar improved from +1 to +0
  • isop512_field (new) is -3
  • p2519_field improved from 263 to 261

Notably, the isop512_field results are better than [isogenychains]
when using their weighting metric (square = 0.8 * multiply).

This change adds new variations of SlidingWindow and modifies
Hybrid to use an arbitrary Decomposer after runs are removed.
SlidingWindowRTL and SlidingWindowShortRTL construct the windows
from least to most significant bit (right-to-left) instead of the
default left-to-right approach. SlidingWindowShort and
SlidingWindowShortRTL incorporate a "shortening" heuristic that
sometimes cuts windows short. The Z parameter restricts the
maximum number of zeroes that may appear in a window. If a window
is maximum length, contains at least one zero, and the bit
following the window is a one, then the window is shortened in
order to yield all trailing ones to the next window.

This new behavior was inspired by the windowing technique used for
the upper half of smooth isogeny primes in [isogenychains]. This
update also adds p512-2 from [isogenychains] to the result set.

Improvements with the new Ensemble:
- p256_scalar improved from +2 to -1
- p384_scalar improved from +1 to +0
- isop512_field (new) is -3
- p2519_field improved from 263 to 261

Notably, the isop512_field results are better than [isogenychains]
when using their weighting metric (square = 0.8 * multiply).
@mmcloughlin
Copy link
Owner

Really sorry I missed this! I'll need to take some time to go through this in more detail, but based on a quick skim it's looking great, and the results are exciting too, so I expect we can land this soon :)

Copy link

@briansmith briansmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With these tweaks, the addition chain would be 289 = 253 doubles + 36 additions, saving one doubling and two additions compared to my previous result.

Analogous comments apply to the other chains.

This is very nice work!

i286 = ((i257 << 7 + _111111) << 10 + _1100011) << 10
return (_10010101 + i286) << 6 + _1111
_111 = _10 + _101
_1000 = 1 + _111

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be an optimization pass that replaces additions that yield an even number with doublings. This should become _1000 = 2 * _100 to replace one addition with a doubling.

_111 = _10 + _101
_1000 = 1 + _111
_1110 = 2*_111
_10000 = _10 + _1110

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, this can be replaced with a doubling.

x16 = _11111111 + i28
i37 = i28 << 8
x24 = x16 + i37
x32 = i37 << 8 + x24

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is strange that this changed from x32 = x16 << 16 + x16 to instead compute x24 unnecessarily (IIUC). If we fixed this then the addition chain would be two shorter than my previously-published one.

i190 = ((i169 << 4 + _101) << 8 + _1011011) << 7
i210 = ((_100111 + i190) << 9 + _101111) << 8 + _101111
i229 = ((_1110 + i210) << 11 + _1001111) << 5 + _111
i249 = (i229 << 9 + _11011111 + _1000) << 8 + _101011

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why this this addition of _11100111 = _11011111 + _1000 gets inlined here whereas none of the others seem to. It makes it harder to follow. Regardless, computing _11100111 doesn't help; I avoided computing it and then redid the middle of this chain with the remaining windows, which saved an additional addition.

_10111 = _1000 + _1111
_11001 = _10 + _10111
_11011 = _10 + _11001
_11111 = _1000 + _10111
Copy link

@briansmith briansmith Oct 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming it makes sense to calculate 11111:

At this point you have 11111 and 111, so you can compute x8 = 11111 << 3 + 111, or x10 = 11111 << 5 + 11111.

We need 6 * 8 * 4 = 192 + 2 = 194 consecutive ones to start.

x20 = x10 << 10 + x10
x40 = x20 << 20 + x20
x80 = x40 << 40 + x40
x160 = x80 << 80 + x80
x180 = x160 << 160 + x20
x190 = x180 << 10 + x10
x194 = x140 << 4 + _1111

EDIT: This would save one addition:

x20 = x10 << 10 + x10
x24 = x20 << 4 + _1111
x48 = x24 << 24 + x24
x96 = x48 << 48 + x48
x192 = x96 << 96 + x96

To get to x194 we could do:

x194 = x192 << 2 + _11

But as these two bits are in the "random" part of the addition chain, doing something else is likely better.

i23 = i17 << 5 + i17
i34 = i23 << 10 + i23
i61 = (i34 << 4 + _11111000) << 21 + i34
i113 = (i61 << 3 + _1111100) << 47 + i61

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is confusing as to why we're adding even numbers where the least significant bits don't contribute anything. I wonder if this indicates a bug in the new windowing algorithm where it doesn't realize that trailing zeros are worthless.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants