-
Notifications
You must be signed in to change notification settings - Fork 758
Updated thrust shuffle to use improved bijective function #1566
Conversation
Can one of the admins verify this patch? |
This merge includes the latest work from https://github.com/djns99/CUDA-Shuffle which integrates the Variable Philox function which is faster than the previous round function used, while maintaining equivalent quality. It also adds a new test statistic that targets larger permutation sizes than the existing tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating this. Are you also look to to figure out why the docs aren't rendering and run the thrust benchmarks to establish the improved performance?
I will take a look at these tests again.
Ping me when you're ready for us to review/merge -- I'll let you two iterate on this until you're happy with it.
The docs are broken until #1475 is finished -- this isn't an issue with your contributions.
We'll be removing Just a heads up -- feel free to keep using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple of notes from a quick scan through the patch, I'll do another more detailed pass once it settles.
I'm interested in seeing the perf results!
Running the benchmarks I found ~10x performance gain on my 1050Ti setup from ~6E+07 element/s to ~7.5E+08 albeit with more variance Old:
New:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. @allisonvacanti
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the constructors/c-casts need to be replaced with static casts, otherwise this LGTM.
Once the suggested changes are finished I'll start testing.
Updates the thrust shuffle to use the Variable Philox bijective function with 24 rounds. Updates the test suite to include new test statistic based on maximum mean discrepency to enable more thorough testing of larger permutations.
@allisonvacanti Sorry this notification slipped through the cracks it would seem. I have fixed up those changes now |
Thanks @djns99 -- this LGTM, I'll start testing. run tests |
DVS CL: 30868010 |
Tests are failing, looks like some of the restrictions on c++11 constexpr functions are being broken -- C++11 only allows a single return statement in the body of a constexpr function, and they must return non-void. Check out the failing results (click "Details" next to a failing build, then "View as plain text" in the sidebar). You can ignore the failures in |
I removed constexpr from the functions that were failing, is there a flag I need to supply when I run this locally to verify it is fixed? I didn't have any compilation issues on my machine before or after the change. |
The issue is only on C++11 (which is deprecated) and by default CMake only compiles for C++14. To build C++11 tests: For the default (single config) builds, set the CMake variable
or for multiconfig builds (which build multiple host/device/dialect tests at once):
See https://github.com/NVIDIA/thrust/blob/main/CONTRIBUTING.md#cmake-options for more info. |
DVS CL 30909633 run tests |
Looks good -- remaining test failures are unrelated. |
Updates the thrust shuffle to use the Variable Philox bijective function
with 24 rounds.
Updates the test suite to include new test statistic based on maximum mean
discrepency to enable more thorough testing of larger permutations.