Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proactively skip huffman compression based on sampling where non-comp… #2717

Merged
merged 1 commit into from
Jul 1, 2021

Conversation

binhdvo
Copy link
Contributor

@binhdvo binhdvo commented Jun 28, 2021

When a large block is suspected to be incompressible (based on ratio of literals to sequences), we evaluate whether or not huffman should be applied based on a smaller sampling of 4k blocks at the start and end of the buffer to proactively skip having to construct a histogram for the full data range. Benchmarks show improvements on low-compressibility samples:

Benchmarked on macos without PR:

binhvo@binhvo-mbp zstd % ./tests/fullbench -b1 -P0 -B100000000
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Jun 7 2021) ***
Sample 100000000 bytes :
1#compress : 1628.7 MB/s (100002299)
binhvo@binhvo-mbp zstd % ./tests/fullbench -b1 -P1 -B100000000
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Jun 7 2021) ***
Sample 100000000 bytes :
1#compress : 1596.6 MB/s (100002299)
binhvo@binhvo-mbp zstd % ./tests/fullbench -b1 -P5 -B100000000
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Jun 7 2021) ***
Sample 100000000 bytes :
1#compress : 850.4 MB/s (79994983)
binhvo@binhvo-mbp zstd % ./tests/fullbench -b1 -P10 -B100000000
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Jun 7 2021) ***
Sample 100000000 bytes :
1#compress : 663.1 MB/s (74066054)
binhvo@binhvo-mbp zstd % ./tests/fullbench -b1 -P50 -B100000000
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Jun 7 2021) ***
Sample 100000000 bytes :
1#compress : 564.4 MB/s (31792743)
binhvo@binhvo-mbp zstd % ./tests/fullbench -b1 -P100 -B100000000
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Jun 7 2021) ***
Sample 100000000 bytes :
1#compress : 7765.7 MB/s ( 3570)

With PR:

binhvo@binhvo-mbp zstd % ./tests/fullbench -b1 -P0 -B100000000
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Jun 25 2021) ***
Sample 100000000 bytes :
1#compress : 3322.8 MB/s (100002299)
binhvo@binhvo-mbp zstd % ./tests/fullbench -b1 -P1 -B100000000
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Jun 25 2021) ***
Sample 100000000 bytes :
1#compress : 3373.9 MB/s (100002299)
binhvo@binhvo-mbp zstd % ./tests/fullbench -b1 -P1 -B100000000
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Jun 25 2021) ***
Sample 100000000 bytes :
1#compress : 3323.8 MB/s (100002299)
binhvo@binhvo-mbp zstd % ./tests/fullbench -b1 -P5 -B100000000
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Jun 25 2021) ***
Sample 100000000 bytes :
1#compress : 853.9 MB/s (79994983)
binhvo@binhvo-mbp zstd % ./tests/fullbench -b1 -P10 -B100000000
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Jun 25 2021) ***
Sample 100000000 bytes :
1#compress : 676.9 MB/s (74066054)
binhvo@binhvo-mbp zstd % ./tests/fullbench -b1 -P50 -B100000000
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Jun 25 2021) ***
Sample 100000000 bytes :
1#compress : 540.4 MB/s (31792743)
binhvo@binhvo-mbp zstd % ./tests/fullbench -b1 -P100 -B100000000
*** Zstandard speed analyzer 1.5.0 64-bits, by Yann Collet (Jun 25 2021) ***
Sample 100000000 bytes :
1#compress : 7838.4 MB/s ( 3570)

Copy link
Contributor

@terrelln terrelln left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Just have a couple style nits, otherwise it looks great!

lib/compress/huf_compress.c Outdated Show resolved Hide resolved
lib/common/huf.h Outdated Show resolved Hide resolved
lib/compress/zstd_compress_literals.h Outdated Show resolved Hide resolved
lib/compress/zstd_compress.c Outdated Show resolved Hide resolved
const void* src, size_t srcSize,
unsigned maxSymbolValue, unsigned tableLog,
void* workSpace, size_t wkspSize, /**< `workSpace` must be aligned on 4-bytes boundaries, `wkspSize` must be >= HUF_WORKSPACE_SIZE */
HUF_CElt* hufTable, HUF_repeat* repeat, int preferRepeat, int bmi2, unsigned suspectUncompressible);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor : api design :

as we get more and more parameters in the function signature,
it becomes more and more difficult to understand what this 1 or this 0 means.
For clarity, we end up commenting them, so that the reader can track what each value means.
... , 1 /* suspected uncompressible */);.

Another possibility is to use enum.
..., HUF_suspectedUncompressible);
In such a context, it's essentially a compiler-checked comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some thought, added comments rather than an enum; this is really just a stand-in for a boolean value so I think the argument name is sufficient to convey the meaning as far as the function definitions are concerned..

@binhdvo binhdvo merged commit b3e372c into facebook:dev Jul 1, 2021
@binhdvo binhdvo deleted the bootcamp branch August 24, 2021 02:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants