Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add RleDecoder::next() using fast bit unpacking #2847

Closed
wants to merge 2 commits into from

Conversation

yingsu00
Copy link
Contributor

Do not review yet

Fast bit unpacking shows up to 9x improvement over Arrow BitUnpacker,
up to 30x over DuckDB, up to 10x over Lemire's FastPForLib, up to 1.6x
over Lemire's bmipacking, and 2x over the rewritten version decode1To24
in velox/dwio/common/IntDecoder.cpp. Design doc is at
facebookincubator#2353
This commit is to prepare for future changes
@netlify
Copy link

netlify bot commented Oct 14, 2022

Deploy Preview for meta-velox ready!

Name Link
🔨 Latest commit 0f0a7aa
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/63493ddf54bbee00085e3e6f
😎 Deploy Preview https://deploy-preview-2847--meta-velox.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 14, 2022
@mbasmanova
Copy link
Contributor

Curious, how much of the total query CPU time is spent in this function? How much faster we expect the query to become after the optimization?

@yingsu00
Copy link
Contributor Author

Curious, how much of the total query CPU time is spent in this function? How much faster we expect the query to become after the optimization?

@mbasmanova This function will be used to decode Parquet repetition and definition levels. It will eventually replace the legacy ones used else where too, but the gain needs to be measured and will be done later after the ParquetReaderBenchmark is finished. Since Parquet file would be dictionary encoded whenever possible, improving the dictionary id decoding(which is Rle/Bit packed) should be beneficial to overall scan heavy queries. I can update you after the code switching is done. ALso this is not the only optimizations for dictionaries. The RleBpDecoder will be extended to support filters on encoded data in the future.

@yingsu00
Copy link
Contributor Author

closing in favor of #3000

@yingsu00 yingsu00 closed this Oct 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants