🎉 Zstd 1.5.0 Release 🎉 #2636

senhuang42 · 2021-05-11T22:18:46Z

Changelog

api: Various functions promoted from experimental to stable API: ([1.5.0] Promote ZSTD_getDictID_fromCDict() into stable API #2579-[1.5.0] Promote ZSTD_c_literalCompressionMode to stable params #2581, @senhuang42)
- ZSTD_defaultCLevel()
- ZSTD_getDictID_fromCDict()
api: Several experimental functions have been deprecated and will emit a compiler warning ([1.5.0] Deprecate some functions #2582, @senhuang42)
- ZSTD_compress_advanced()
- ZSTD_compress_usingCDict_advanced()
- ZSTD_compressBegin_advanced()
- ZSTD_compressBegin_usingCDict_advanced()
- ZSTD_initCStream_srcSize()
- ZSTD_initCStream_usingDict()
- ZSTD_initCStream_usingCDict()
- ZSTD_initCStream_advanced()
- ZSTD_initCStream_usingCDict_advanced()
- ZSTD_resetCStream()
api: ZSTDMT_NBWORKERS_MAX reduced to 64 for 32-bit environments (reduce ZSTDMT_NBWORKERS_MAX in 32-bit mode #2643, @Cyan4973)
perf: Significant speed improvements for middle compression levels (SIMD Row Based Matchfinder 🚀 #2494, @senhuang42 @terrelln)
perf: Block splitter to improve compression ratio, enabled by default for high compression levels (Recursive block splitting #2447, @senhuang42)
perf: Decompression loop refactor, speed improvements on clang and for --long modes (faster speed for decompressSequencesLong #2614 improved gcc-9 and gcc-10 decoding speed #2630, @Cyan4973)
perf: Reduced stack usage during compression and decompression entropy stage (Reduce stack usage of ZSTD_buildCTable() #2522 [huf] Reduce stack usage of HUF_readDTableX2 by ~972 bytes #2524, @terrelln)
bug: Make the number of physical CPU cores detection more robust (Make the number of physical CPU cores detection more robust #2517, @PaulBone)
bug: Improve setting permissions of created files (Improve Setting Permissions of Created Files #2525, @felixhandte)
bug: Fix large dictionary non-determinism ([lib] Always load the dictionary in one go #2607, @terrelln)
bug: Fix various dedicated dictionary search bugs (Fix dedicated dict search isSupported() requirements. #2540 Add DDS to oss fuzzer #2586, @senhuang42 @felixhandte)
bug: Fix non-determinism test failures on Linux i686 ([tests] Reduce memory usage of MT CLI tests #2606, @terrelln)
bug: Fix UBSAN error in decompression ([lib] Fix UBSAN warning in ZSTD_decompressSequences() #2625, @terrelln)
bug: Fix superblock compression divide by zero bug (Assert no division by 0 in ZSTD_entropyCost(), fix superblocks no sequences case #2592, @senhuang42)
bug: Ensure ZSTD_estimateCCtxSize*() monotonically increases with compression level (Add memory monotonicity test over srcSize #2538, @senhuang42)
doc: Improve zdict.h dictionary training API documentation ([zdict] Add a FAQ to the top of zdict.h #2622, @terrelln)
doc: Note that public ZSTD_free*() functions accept NULL pointers (doc: ZSTD_free*() functions accept NULL pointer #2521, @animalize)
doc: Add style guide docs for open source contributors (added a paragraph on coding style #2626, @Cyan4973)
tests: Better regression test coverage for different dictionary modes (Add different dict modes to compression ratio regression test, update results.csv #2559, @senhuang42)
tests: Better test coverage of index reduction (Bug fix & run overflow correction much more frequently in tests #2603, @terrelln)
tests: OSS-Fuzz coverage for seekable format (Fuzzer for seekable format #2617, @senhuang42)
tests: Test coverage for ZSTD threadpool API (Add threadPool unit tests to fuzzer.c #2604, @senhuang42)
build: Dynamic library built multithreaded by default ([1.5.0] Enable multithreading in lib build by default #2584, @senhuang42)
build: Move zstd_errors.h and zdict.h to lib/ root ([1.5.0] Move zstd_errors.h and zdict.h to lib/ root #2597, @terrelln)
build: Single file library build script moved to build/ directory (Move Single-File Build Script from contrib/ to build/ #2618, @felixhandte)
build: Allow ZSTDMT_JOBSIZE_MIN to be configured at compile-time, reduce default to 512KB (allow jobSize to be as low as 512 KB #2611, @Cyan4973)
build: Fixed Meson build (meson: fix build by adding missing files #2548, @SupervisedThinking @kloczek)
build: ZBUFF_*() is no longer built by default ([1.5.0] Remove ZBUFF #2583, @senhuang42)
build: Fix excessive compiler warnings with clang-cl and CMake (Fix for excessive compiler warnings when building with clang-cl #2600, @nickhutchinson)
build: Detect presence of md5 on Darwin (Detect Presence of md5 on Darwin #2609, @felixhandte)
build: Avoid SIGBUS on armv6 (Avoid SIGBUS on armv6 #2633, @bmwiedmann)
cli: --progress flag added to always display progress bar (Add --progress flag #2595, @senhuang42)
cli: Allow reading from block devices with --force (Allow Reading from Block Devices with --force #2613, @felixhandte)
cli: Fix CLI filesize display bug (fix #2549 #2550, @Cyan4973)
cli: Fix windows CLI --filelist end-of-line bug (fix --filelist compatibility with Windows cr+lf line ending #2620, @Cyan4973)
contrib: Various fixes for linux kernel patch (Fixes for the next linux kernel patch version #2539, @terrelln)
contrib: Seekable format - Decompression hanging edge case fix (Seekable hang fix #2516, @senhuang42)
contrib: Seekable format - New seek table-only API ([contrib] Support seek table-only API #2113 New direct seekTable access methods #2518, @mdittmer @Cyan4973)
contrib: Seekable format - Fix seek table descriptor check when loading (Fix seek table descriptor check when loading #2534, @foxeng)
contrib: Seekable format - Decompression fix for large offsets, (seekable decompression fixes #2594, @azat)
misc: Automatically published release tarballs available on Github (Add GitHub Action to Automatically Publish Release Tarballs #2535, @felixhandte)

Memory constrained use cases that manage multiple archives benefit from retaining multiple archive seek tables without retaining a ZSTD_seekable instance for each. * New opaque type for seek table: ZSTD_seekTable. * ZSTD_seekable_copySeekTable() supports copying seek table out of a ZSTD_seekable. * ZSTD_seekTable_[eachSeekTableOp]() defines seek table API that mirrors existing seek table operations. * Existing ZSTD_seekable_[eachSeekTableOp]() retained; they delegate to ZSTD_seekTable the variant. These changes allow the above-mentioned use cases to initialize a ZSTD_seekable, extract its ZSTD_seekTable, then throw the ZSTD_seekable away to save memory. Standard ZSTD operations can then be used to decompress frames based on seek table offsets. The copy and delegate patterns are intended to minimize impact on existing code and clients. Using copy instead of move for the infrequent operation extracting a seek table ensures that the extraction does not render the ZSTD_seekable useless. Delegating to *new* seek table-oriented APIs ensures that this is not a breaking change for existing clients while supporting all meaningful operations that depend only on seek table data.

[contrib] Support seek table-only API

read-only objects are properly const-ified in parameters

Seekable hang fix

and simple roundtrip test

New direct seekTable access methods

It is a stack high-point for some compression strategies and has an easy fix. This moves the normalized count into the entropy workspace.

Reduce stack usage of ZSTD_buildCTable()

This saves ~700 bytes of stack space in HUF_writeCTable.

Add HUF_writeCTable_wksp() function

* Use `HUF_readStats_wksp()` * Use workspace in `HUF_fillDTableX2*()` * Clean up workspace usage to use a workspace struct

* Move `counting` into the workspace * Inrease `HUF_DECOMPRESS_WORKSPACE_SIZE` by 512 bytes

doc: ZSTD_free*() functions accept NULL pointer

Make the number of physical CPU cores detection more robust

This commit introduces a GitHub action that is triggered on release creation, which creates the release tarball, compresses it, hashes it, signs it, and attaches all of those files to the release.

changed strategy, now unconditionally prefetch the first 2 cache lines, instead of cache lines corresponding to the first and last bytes of the match. This better corresponds to cpu expectation, which should auto-prefetch following cachelines on detecting the sequential nature of the read. This is globally positive, by +5%, though exact gains depend on compiler (from -2% to +15%). The only negative counter-example is gcc-9.

…_prefetch_refactor

This seems to bring an additional ~+1.2% decompression speed on average across 10 compilers x 6 scenarios.

Refactor prefetching for the decoding loop

the new alignment setting is better for gcc-9 and gcc-10 by about ~+5%. Unfortunately, it's worse for essentially all other compilers. Make the new alignment setting conditional to gcc-9+.

Apply flags to libzstd-nomt in libzstd style

improved gcc-9 and gcc-10 decoding speed

When running armv6 userspace on armv8 hardware with a 64 bit Linux kernel, the mode 2 caused SIGBUS (unaligned memory access). Running all our arm builds in the build farm only on armv8 simplifies administration a lot. Depending on compiler and environment, this change might slow down memory accesses (did not benchmark it). The original analysis is 6 years old. Fixes #2632

Avoid SIGBUS on armv6

Cyan4973

As expected,
extended fuzzer tests started during the week-end have not found anything so far.
This seems good to go.

ghost · 2021-05-12T02:03:47Z

On Windows 10, maybe this release has a performance regression.

Just replace the lib folder, pyzstd module unit-tests use: 3.0 sec -> 3.4 sec. (intel haswell)

ghost · 2021-05-12T02:25:24Z

This change is missing from changelog:

[1.5.0] Enable multithreading in lib build by default (#2584)

and restored limit to 256 when in 64-bit mode (it was reduced to 200 to give more room for 32-bit). This should fix test instability issues using lot of threads in 32-bit environments.

With small enough input files, the inferred value of fileWindowLog could be smaller than ZSTD_WINDOWLOG_MIN. This can be reproduced like so: $ echo abc > small $ echo abcdef > small2 $ zstd --patch-from small small2 -o patch previously, this would fail with the error "zstd: error 11 : Parameter is out of bound"

reduce ZSTDMT_NBWORKERS_MAX in 32-bit mode

hopefully, bionic will have a more recent version of python required to install meson.

Fixed meson test on travisCI

…#2645)

mdittmer and others added 30 commits May 7, 2020 09:31

Merge pull request #2113 from mdittmer/expose-seek-table

ce6d1b9

[contrib] Support seek table-only API

ZSTD_seekable_decompress() example that hangs.

3cbdbb8

Fix seekable decompress hanging

527a20c

fix potential leak on exit

a80b10f

fixed const guarantees

c7e42e1

read-only objects are properly const-ified in parameters

strengthen compilation flags

029f974

If cpuinfo parsing fails fallback to sysconf

eb1a09d

Only set numPhysicalCores if ratio is valid

4d6c78f

various minor style fixes

ac95a30

Merge pull request #2516 from senhuang42/seekable_hang_fix

0388054

Seekable hang fix

Merge branch 'dev' into seekTable

24d59a6

fixed gcc conversion warnings

a1d7b9d

fixed wrong assert condition

6c0bfc4

fixed gcc-7 conversion warning

713d495

added test case for seekTable API

16ec1cf

and simple roundtrip test

doc: ZSTD_free*() functions accept NULL pointer

0933775

Merge branch 'seekTable' of github.com:facebook/zstd into seekTable

6e390ce

added code comments for new API ZSTD_seekTable

2fa4c8c

Merge pull request #2518 from facebook/seekTable

c4d54ab

New direct seekTable access methods

Reduce stack usage of ZSTD_buildCTable()

27498ff

It is a stack high-point for some compression strategies and has an easy fix. This moves the normalized count into the entropy workspace.

Merge pull request #2522 from terrelln/stack-reduction

e50f88c

Reduce stack usage of ZSTD_buildCTable()

Add HUF_writeCTable_wksp() function

5df2a21

This saves ~700 bytes of stack space in HUF_writeCTable.

Merge pull request #2523 from terrelln/huf-stack-reduction

b5fd348

Add HUF_writeCTable_wksp() function

[huf] Reduce stack usage of HUF_readDTableX2 by ~460 bytes

0f18059

* Use `HUF_readStats_wksp()` * Use workspace in `HUF_fillDTableX2*()` * Clean up workspace usage to use a workspace struct

[fse] Reduce stack usage of FSE_decompress_wksp() by 512 bytes

3b1aba4

* Move `counting` into the workspace * Inrease `HUF_DECOMPRESS_WORKSPACE_SIZE` by 512 bytes

Merge pull request #2521 from animalize/doc_free

3d6c903

doc: ZSTD_free*() functions accept NULL pointer

Merge pull request #2517 from PaulBone/num_cores

a3feed8

Make the number of physical CPU cores detection more robust

Add GitHub Action to Automatically Publish Release Tarballs

5d1fec8

This commit introduces a GitHub action that is triggered on release creation, which creates the release tarball, compresses it, hashes it, signs it, and attaches all of those files to the release.

Fix seek table descriptor check when loading

21697b9

senhuang42 and others added 12 commits May 7, 2021 14:03

Add PHONY targets to makefiles (#2629)

13449d7

Merge branch 'd_prefetch_refactor' of github.com:facebook/zstd into d…

4d9caa4

…_prefetch_refactor

update decoder hot loop alignment

6755baf

This seems to bring an additional ~+1.2% decompression speed on average across 10 compilers x 6 scenarios.

Merge pull request #2547 from facebook/d_prefetch_refactor

5b6d38a

Refactor prefetching for the decoding loop

improved gcc-9 and gcc-10 decoding speed

439e58d

the new alignment setting is better for gcc-9 and gcc-10 by about ~+5%. Unfortunately, it's worse for essentially all other compilers. Make the new alignment setting conditional to gcc-9+.

Merge pull request #2628 from skitt/libzstd-nomt-flags

334ac69

Apply flags to libzstd-nomt in libzstd style

Merge pull request #2630 from facebook/gcc9

9fb5a04

improved gcc-9 and gcc-10 decoding speed

Merge pull request #2633 from bmwiedemann/issue2632

162f540

Avoid SIGBUS on armv6

Bump version to 1.5.0, rebuild documentation (#2634)

9c23ea9

updated generated man pages for v1.5.0 (#2635)

8a53a88

facebook-github-bot added the CLA Signed label May 11, 2021

senhuang42 changed the title ~~Zstd 1.5.0 release~~ 🎉 Zstd 1.5.0 Release 🎉 May 11, 2021

Cyan4973 approved these changes May 11, 2021

View reviewed changes

senhuang42 and others added 12 commits May 12, 2021 11:31

Add mt lib build to CL, shuffle around bugs section (#2638)

01fe479

Remove const data members in threadpooltest payload (#2639)

b35c250

Remove const data members in threadpooltest payload (#2639) (#2640)

c730b8c

reduce Max nb Workers to 64 in 32-bit mode

cb0cad9

and restored limit to 256 when in 64-bit mode (it was reduced to 200 to give more room for 32-bit). This should fix test instability issues using lot of threads in 32-bit environments.

Merge branch 'dev' of github.com:facebook/zstd into dev

8fae355

Update CHANGELOG to include patch from fix (#2642)

a51e342

Merge pull request #2643 from facebook/workers32

705a62b

reduce ZSTDMT_NBWORKERS_MAX in 32-bit mode

updated meson test

988beb3

hopefully, bionic will have a more recent version of python required to install meson.

Merge pull request #2644 from facebook/mesonFix

b57022e

Fixed meson test on travisCI

Add source level deprecation warning disabling to certain tests/utils (…

40def70

…#2645)

Remove deprecate flag for vcx (#2647)

0671808

senhuang42 merged commit a488ba1 into release May 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🎉 Zstd 1.5.0 Release 🎉 #2636

🎉 Zstd 1.5.0 Release 🎉 #2636

senhuang42 commented May 11, 2021 •

edited

Loading

Cyan4973 left a comment

ghost commented May 12, 2021 •

edited by ghost

Loading

ghost commented May 12, 2021

🎉 Zstd 1.5.0 Release 🎉 #2636

🎉 Zstd 1.5.0 Release 🎉 #2636

Conversation

senhuang42 commented May 11, 2021 • edited Loading

Changelog

Cyan4973 left a comment

Choose a reason for hiding this comment

ghost commented May 12, 2021 • edited by ghost Loading

ghost commented May 12, 2021

senhuang42 commented May 11, 2021 •

edited

Loading

ghost commented May 12, 2021 •

edited by ghost

Loading