Fast bit unpacking #2217

yingsu00 · 2022-08-05T11:22:10Z

This PR implements fast bit unpacking for continuous bits as described in Design doc

The following was the benchmark result of BitUnpackingBenchmark. Time unit is us. The "IntDecoder" implementation was a rewrite of the same algorithm for uint32_t in the design doc, and we are going to replace it with the implementation in this PR.

Type Width	Input Bit Width	arrow	duckdb	lemire_bmi2	lemire_fastpforlib	intdecoder	velox
8	1	3,090	12,300		4,980		336
8	2	3,060	13,740		4,560		395
8	3	3,170	14,940		5,330		418
8	4	2,970	16,520		4,030		421
8	5	3,530	18,080		5,520		445
8	6	3,510	19,630		5,130		473
8	7	3,630	21,180		6,410		498
8	8	3,220	22,790		3,060		530
16	1	3,090	12,010		2,590		643
16	2	3,060	13,200		2,610		676
16	3	3,180	14,420		3,080		746
16	4	2,970	15,730		2,630		769
16	5	3,530	17,220		3,370		839
16	6	3,510	18,730		6,870		895
16	7	3,630	20,200		3,670		1,070
16	8	3,220	21,800		2,750		910
16	9	3,900	23,310		3,970		1,380
16	10	3,940	25,010		4,110		2,020
16	11	4,100	26,700		4,390		2,120
16	12	4,020	28,350		4,170		2,150
16	13	4,280	37,070		4,730		2,260
16	14	4,330	31,600		4,760		2,560
16	15	4,600	39,370		5,050		2,450
16	16	3,660	35,280		3,100		2,250
32	1	3,090	11,990	1,590	2,880	3,370	1,630
32	2	3,060	13,310	1,600	2,900	3,440	1,690
32	3	3,180	14,440	1,670	3,110	3,480	1,710
32	4	2,970	15,500	1,710	2,960	3,470	1,770
32	5	3,500	16,810	1,740	3,470	3,560	1,800
32	6	3,510	18,140	2,520	3,490	3,630	1,810
32	7	3,580	19,350	2,580	4,060	3,660	1,890
32	8	3,220	20,610	2,720	3,050	3,340	1,940
32	9	3,900	30,140	2,800	4,360	4,100	2,010
32	10	3,950	37,120	2,700	4,370	4,170	2,060
32	11	4,100	44,600	2,730	4,680	4,250	2,170
32	12	4,000	52,790	2,770	4,380	4,250	2,190
32	13	4,290	59,120	3,100	4,840	4,310	2,290
32	14	4,330	67,310	3,340	4,350	4,400	2,620
32	15	4,590	76,580	3,830	4,490	4,620	2,500
32	16	3,550	85,320	3,990	3,390	3,910	2,230
32	17	4,760	82,090	4,140	4,680	4,630	2,660
32	18	4,770	82,700	4,180	4,950	4,640	2,700
32	19	4,920	82,480	4,290	4,430	4,710	2,800
32	20	4,790	84,620	4,280	4,310	4,730	2,760
32	21	5,100	85,790	4,410	4,910	4,770	2,860
32	22	5,120	88,560	4,420	4,620	4,790	2,940
32	23	5,300	88,570	4,550	5,050	4,830	3,050
32	24	5,000	87,550	4,540	4,450	4,580	2,780
32	25	5,570	91,970	4,720	5,890		2,030
32	26	5,640	93,330	4,710	5,340		2,070
32	27	5,750	94,920	4,830	6,090		2,220
32	28	5,700	99,320	4,860	5,470		2,240
32	29	5,990	99,750	3,090	6,180		2,310
32	30	6,200	98,470	4,990	5,790		2,630
32	31	6,340	99,140	5,090	6,000		2,540
32	32	4,170	104,870	4,320	3,500		2,330

yingsu00 · 2022-08-18T23:54:05Z

cc @Yuhta @oerling

yingsu00 · 2022-08-20T11:38:07Z

@Yuhta @oerling @zzhao0 Can you please review this?

netlify · 2022-08-23T00:22:19Z

✅ Deploy Preview for meta-velox canceled.

Name	Link
🔨 Latest commit	`b854896`
🔍 Latest deploy log	https://app.netlify.com/sites/meta-velox/deploys/630d43cf239896000863454d

velox/dwio/common/tests/CMakeLists.txt

Yuhta · 2022-08-23T16:02:33Z

velox/dwio/common/BitUnpacking.h

+  DWIO_ENSURE((numValues & 0x7) == 0);
+  DWIO_ENSURE(inputBufferLen * 8 >= bitWidth * numValues);
+
+  VELOX_NYI();


Maybe have some fallback implementation here (does not need to be optimized) so that we can start using them in the codebase

Added the missing functions for bitWidth 17 to 32.

velox/dwio/common/BitUnpacking.h

Fast bit unpacking shows up to 9x improvement over Arrow BitUnpacker, up to 30x over DuckDB, up to 10x over Lemire's FastPForLib, up to 1.6x over Lemire's bmipacking, and 2x over the rewritten version decode1To24 in velox/dwio/common/IntDecoder.cpp. Design doc is at facebookincubator#2353

yingsu00 · 2022-08-29T22:56:48Z

@Yuhta Jimmy I've addressed all comments, will you be able to take another look? Thank you.

facebook-github-bot · 2022-08-30T22:21:37Z

@Yuhta has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

yingsu00 · 2022-09-02T20:04:46Z

@Yuhta Jimmy thanks for importing this PR. Could you tell me what the failures are on the facebook internal tests?

yingsu00 · 2022-10-28T06:41:46Z

Closing in favor of #3000

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 5, 2022

yingsu00 force-pushed the bitunpacking branch 7 times, most recently from 9120893 to 59c9fbf Compare August 18, 2022 09:16

yingsu00 changed the title ~~[WIP] Bitunpacking~~ Fast bit unpacking Aug 18, 2022

kagamiori requested review from Yuhta, zzhao0 and oerling August 22, 2022 20:46

yingsu00 force-pushed the bitunpacking branch from 59c9fbf to 6117ec0 Compare August 23, 2022 00:22

Yuhta reviewed Aug 23, 2022

View reviewed changes

yingsu00 force-pushed the bitunpacking branch 2 times, most recently from 09110a5 to e0d664f Compare August 26, 2022 04:26

leoluan2009 reviewed Aug 26, 2022

View reviewed changes

velox/dwio/common/BitUnpacking.h Show resolved Hide resolved

leoluan2009 reviewed Aug 26, 2022

View reviewed changes

velox/dwio/common/BitUnpacking.h Outdated Show resolved Hide resolved

leoluan2009 reviewed Aug 26, 2022

View reviewed changes

velox/dwio/common/BitUnpacking.h Show resolved Hide resolved

leoluan2009 reviewed Aug 26, 2022

View reviewed changes

velox/dwio/common/BitUnpacking.h Show resolved Hide resolved

yingsu00 force-pushed the bitunpacking branch from e0d664f to b854896 Compare August 29, 2022 22:55

yingsu00 closed this Oct 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fast bit unpacking #2217

Fast bit unpacking #2217

yingsu00 commented Aug 5, 2022 •

edited

Loading

yingsu00 commented Aug 18, 2022

yingsu00 commented Aug 20, 2022

netlify bot commented Aug 23, 2022 •

edited

Loading

Yuhta Aug 23, 2022

yingsu00 Aug 26, 2022

yingsu00 commented Aug 29, 2022

facebook-github-bot commented Aug 30, 2022

yingsu00 commented Sep 2, 2022

yingsu00 commented Oct 28, 2022

Fast bit unpacking #2217

Fast bit unpacking #2217

Conversation

yingsu00 commented Aug 5, 2022 • edited Loading

yingsu00 commented Aug 18, 2022

yingsu00 commented Aug 20, 2022

netlify bot commented Aug 23, 2022 • edited Loading

✅ Deploy Preview for meta-velox canceled.

Yuhta Aug 23, 2022

Choose a reason for hiding this comment

yingsu00 Aug 26, 2022

Choose a reason for hiding this comment

yingsu00 commented Aug 29, 2022

facebook-github-bot commented Aug 30, 2022

yingsu00 commented Sep 2, 2022

yingsu00 commented Oct 28, 2022

yingsu00 commented Aug 5, 2022 •

edited

Loading

netlify bot commented Aug 23, 2022 •

edited

Loading