-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fast bit unpacking #2217
Fast bit unpacking #2217
Conversation
9120893
to
59c9fbf
Compare
✅ Deploy Preview for meta-velox canceled.
|
DWIO_ENSURE((numValues & 0x7) == 0); | ||
DWIO_ENSURE(inputBufferLen * 8 >= bitWidth * numValues); | ||
|
||
VELOX_NYI(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe have some fallback implementation here (does not need to be optimized) so that we can start using them in the codebase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the missing functions for bitWidth 17 to 32.
09110a5
to
e0d664f
Compare
Fast bit unpacking shows up to 9x improvement over Arrow BitUnpacker, up to 30x over DuckDB, up to 10x over Lemire's FastPForLib, up to 1.6x over Lemire's bmipacking, and 2x over the rewritten version decode1To24 in velox/dwio/common/IntDecoder.cpp. Design doc is at facebookincubator#2353
@Yuhta Jimmy I've addressed all comments, will you be able to take another look? Thank you. |
@Yuhta has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@Yuhta Jimmy thanks for importing this PR. Could you tell me what the failures are on the facebook internal tests? |
Closing in favor of #3000 |
This PR implements fast bit unpacking for continuous bits as described in Design doc
The following was the benchmark result of BitUnpackingBenchmark. Time unit is us. The "IntDecoder" implementation was a rewrite of the same algorithm for uint32_t in the design doc, and we are going to replace it with the implementation in this PR.