Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fast bit unpacking #2217

Closed
wants to merge 1 commit into from
Closed

Conversation

yingsu00
Copy link
Collaborator

@yingsu00 yingsu00 commented Aug 5, 2022

This PR implements fast bit unpacking for continuous bits as described in Design doc

The following was the benchmark result of BitUnpackingBenchmark. Time unit is us. The "IntDecoder" implementation was a rewrite of the same algorithm for uint32_t in the design doc, and we are going to replace it with the implementation in this PR.

Type Width Input Bit Width arrow duckdb lemire_bmi2 lemire_fastpforlib intdecoder velox
8 1 3,090 12,300   4,980   336
8 2 3,060 13,740   4,560   395
8 3 3,170 14,940   5,330   418
8 4 2,970 16,520   4,030   421
8 5 3,530 18,080   5,520   445
8 6 3,510 19,630   5,130   473
8 7 3,630 21,180   6,410   498
8 8 3,220 22,790   3,060   530
16 1 3,090 12,010   2,590   643
16 2 3,060 13,200   2,610   676
16 3 3,180 14,420   3,080   746
16 4 2,970 15,730   2,630   769
16 5 3,530 17,220   3,370   839
16 6 3,510 18,730   6,870   895
16 7 3,630 20,200   3,670   1,070
16 8 3,220 21,800   2,750   910
16 9 3,900 23,310   3,970   1,380
16 10 3,940 25,010   4,110   2,020
16 11 4,100 26,700   4,390   2,120
16 12 4,020 28,350   4,170   2,150
16 13 4,280 37,070   4,730   2,260
16 14 4,330 31,600   4,760   2,560
16 15 4,600 39,370   5,050   2,450
16 16 3,660 35,280   3,100   2,250
32 1 3,090 11,990 1,590 2,880 3,370 1,630
32 2 3,060 13,310 1,600 2,900 3,440 1,690
32 3 3,180 14,440 1,670 3,110 3,480 1,710
32 4 2,970 15,500 1,710 2,960 3,470 1,770
32 5 3,500 16,810 1,740 3,470 3,560 1,800
32 6 3,510 18,140 2,520 3,490 3,630 1,810
32 7 3,580 19,350 2,580 4,060 3,660 1,890
32 8 3,220 20,610 2,720 3,050 3,340 1,940
32 9 3,900 30,140 2,800 4,360 4,100 2,010
32 10 3,950 37,120 2,700 4,370 4,170 2,060
32 11 4,100 44,600 2,730 4,680 4,250 2,170
32 12 4,000 52,790 2,770 4,380 4,250 2,190
32 13 4,290 59,120 3,100 4,840 4,310 2,290
32 14 4,330 67,310 3,340 4,350 4,400 2,620
32 15 4,590 76,580 3,830 4,490 4,620 2,500
32 16 3,550 85,320 3,990 3,390 3,910 2,230
32 17 4,760 82,090 4,140 4,680 4,630 2,660
32 18 4,770 82,700 4,180 4,950 4,640 2,700
32 19 4,920 82,480 4,290 4,430 4,710 2,800
32 20 4,790 84,620 4,280 4,310 4,730 2,760
32 21 5,100 85,790 4,410 4,910 4,770 2,860
32 22 5,120 88,560 4,420 4,620 4,790 2,940
32 23 5,300 88,570 4,550 5,050 4,830 3,050
32 24 5,000 87,550 4,540 4,450 4,580 2,780
32 25 5,570 91,970 4,720 5,890   2,030
32 26 5,640 93,330 4,710 5,340   2,070
32 27 5,750 94,920 4,830 6,090   2,220
32 28 5,700 99,320 4,860 5,470   2,240
32 29 5,990 99,750 3,090 6,180   2,310
32 30 6,200 98,470 4,990 5,790   2,630
32 31 6,340 99,140 5,090 6,000   2,540
32 32 4,170 104,870 4,320 3,500   2,330

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 5, 2022
@yingsu00 yingsu00 force-pushed the bitunpacking branch 7 times, most recently from 9120893 to 59c9fbf Compare August 18, 2022 09:16
@yingsu00 yingsu00 changed the title [WIP] Bitunpacking Fast bit unpacking Aug 18, 2022
@yingsu00
Copy link
Collaborator Author

cc @Yuhta @oerling

@yingsu00
Copy link
Collaborator Author

@Yuhta @oerling @zzhao0 Can you please review this?

@netlify
Copy link

netlify bot commented Aug 23, 2022

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit b854896
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/630d43cf239896000863454d

velox/dwio/common/tests/CMakeLists.txt Outdated Show resolved Hide resolved
DWIO_ENSURE((numValues & 0x7) == 0);
DWIO_ENSURE(inputBufferLen * 8 >= bitWidth * numValues);

VELOX_NYI();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe have some fallback implementation here (does not need to be optimized) so that we can start using them in the codebase

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the missing functions for bitWidth 17 to 32.

@yingsu00 yingsu00 force-pushed the bitunpacking branch 2 times, most recently from 09110a5 to e0d664f Compare August 26, 2022 04:26
Fast bit unpacking shows up to 9x improvement over Arrow BitUnpacker,
up to 30x over DuckDB, up to 10x over Lemire's FastPForLib, up to 1.6x
over Lemire's bmipacking, and 2x over the rewritten version decode1To24
in velox/dwio/common/IntDecoder.cpp. Design doc is at
facebookincubator#2353
@yingsu00
Copy link
Collaborator Author

@Yuhta Jimmy I've addressed all comments, will you be able to take another look? Thank you.

@facebook-github-bot
Copy link
Contributor

@Yuhta has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@yingsu00
Copy link
Collaborator Author

yingsu00 commented Sep 2, 2022

@Yuhta Jimmy thanks for importing this PR. Could you tell me what the failures are on the facebook internal tests?

@yingsu00
Copy link
Collaborator Author

Closing in favor of #3000

@yingsu00 yingsu00 closed this Oct 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants