-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a fast integer divide that rounds to zero #6455
Conversation
Expr xsign = select(numerator > 0, cast(t, 0), cast(t, -1)); | ||
|
||
// Multiply-keep-high-half | ||
result = (cast(wide, mul) * numerator); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should use widening_mul
intrinsics, because uses of this are after find_intrinsics
. Maybe this whole sequence should be mul_shift_right
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this code is only called directly by users, so it's before find_intrinsics. The compiler doesn't ever call this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add this as a comment for future readers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually think we should change it to intrinsics anyways. But since the code is just moved and pre-existing, maybe it should be a separate PR.
test/performance/const_division.cpp
Outdated
|
||
// Reference good version | ||
g(x, y) = input(x, y) / cast<T>(y + min_val); | ||
// Reference good version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks identical to the case just above, are they supposed to be identical?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they have different schedules which turn the denominator into a constant in one case but not the other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I'll add a comment)
tools/find_inverse.cpp
Outdated
bool srz_method_0(int den, int sh_post, int bits) { | ||
int64_t min = -(1L << (bits - 1)), max = (1L << (bits - 1)) - 1; | ||
for (int64_t num = min; num <= max; num++) { | ||
// for (int iter = 0; iter < 1000000L; iter++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this commented out? If it's being left in for (eg) debugging purposes, please say so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed (deleted)
See also related issue #6456 |
review ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Expr xsign = select(numerator > 0, cast(t, 0), cast(t, -1)); | ||
|
||
// Multiply-keep-high-half | ||
result = (cast(wide, mul) * numerator); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add this as a comment for future readers.
|
Looks like there's a bug in the handling of constant denominators (an early-out path that assumes we're rounding to -infinity). Will fix. |
See #7008 |
While working on legacy code I discovered a need for this. Performance test shows a good speed-up over native division for vector code: