Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🎉 Zstd 1.5.0 Release 🎉 #2636

Merged
merged 225 commits into from
May 14, 2021
Merged

🎉 Zstd 1.5.0 Release 🎉 #2636

merged 225 commits into from
May 14, 2021

Conversation

senhuang42
Copy link
Contributor

@senhuang42 senhuang42 commented May 11, 2021

Changelog

mdittmer and others added 30 commits May 7, 2020 09:31
Memory constrained use cases that manage multiple archives benefit from
retaining multiple archive seek tables without retaining a ZSTD_seekable
instance for each.

* New opaque type for seek table: ZSTD_seekTable.
* ZSTD_seekable_copySeekTable() supports copying seek table out of a
  ZSTD_seekable.
* ZSTD_seekTable_[eachSeekTableOp]() defines seek table API that mirrors
  existing seek table operations.
* Existing ZSTD_seekable_[eachSeekTableOp]() retained; they delegate to
  ZSTD_seekTable the variant.

These changes allow the above-mentioned use cases to initialize a
ZSTD_seekable, extract its ZSTD_seekTable, then throw the ZSTD_seekable
away to save memory. Standard ZSTD operations can then be used to
decompress frames based on seek table offsets.

The copy and delegate patterns are intended to minimize impact on
existing code and clients. Using copy instead of move for the infrequent
operation extracting a seek table ensures that the extraction does not
render the ZSTD_seekable useless. Delegating to *new* seek
table-oriented APIs ensures that this is not a breaking change for
existing clients while supporting all meaningful operations that depend
only on seek table data.
[contrib] Support seek table-only API
read-only objects are properly const-ified in parameters
and simple roundtrip test
New direct seekTable access methods
It is a stack high-point for some compression strategies and has an easy
fix. This moves the normalized count into the entropy workspace.
Reduce stack usage of ZSTD_buildCTable()
This saves ~700 bytes of stack space in HUF_writeCTable.
Add HUF_writeCTable_wksp() function
* Use `HUF_readStats_wksp()`
* Use workspace in `HUF_fillDTableX2*()`
* Clean up workspace usage to use a workspace struct
* Move `counting` into the workspace
* Inrease `HUF_DECOMPRESS_WORKSPACE_SIZE` by 512 bytes
doc: ZSTD_free*() functions accept NULL pointer
Make the number of physical CPU cores detection more robust
This commit introduces a GitHub action that is triggered on release creation,
which creates the release tarball, compresses it, hashes it, signs it, and
attaches all of those files to the release.
senhuang42 and others added 12 commits May 7, 2021 14:03
changed strategy,
now unconditionally prefetch the first 2 cache lines,
instead of cache lines corresponding to the first and last bytes of the match.

This better corresponds to cpu expectation,
which should auto-prefetch following cachelines on detecting the sequential nature of the read.

This is globally positive, by +5%,
though exact gains depend on compiler (from -2% to +15%).
The only negative counter-example is gcc-9.
This seems to bring an additional ~+1.2% decompression speed
on average across 10 compilers x 6 scenarios.
Refactor prefetching for the decoding loop
the new alignment setting is better for gcc-9 and gcc-10
by about ~+5%.

Unfortunately, it's worse for essentially all other compilers.

Make the new alignment setting conditional to gcc-9+.
Apply flags to libzstd-nomt in libzstd style
improved gcc-9 and gcc-10 decoding speed
When running armv6 userspace on armv8 hardware with a 64 bit Linux kernel,
the mode 2 caused SIGBUS (unaligned memory access).
Running all our arm builds in the build farm
only on armv8 simplifies administration a lot.

Depending on compiler and environment, this change might slow down
memory accesses (did not benchmark it). The original analysis is 6 years old.

Fixes #2632
@senhuang42 senhuang42 changed the title Zstd 1.5.0 release 🎉 Zstd 1.5.0 Release 🎉 May 11, 2021
Copy link
Contributor

@Cyan4973 Cyan4973 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As expected,
extended fuzzer tests started during the week-end have not found anything so far.
This seems good to go.

@ghost
Copy link

ghost commented May 12, 2021

On Windows 10, maybe this release has a performance regression.

Just replace the lib folder, pyzstd module unit-tests use: 3.0 sec -> 3.4 sec. (intel haswell)

@ghost
Copy link

ghost commented May 12, 2021

This change is missing from changelog:

[1.5.0] Enable multithreading in lib build by default (#2584)

senhuang42 and others added 12 commits May 12, 2021 11:31
and restored limit to 256 when in 64-bit mode
(it was reduced to 200 to give more room for 32-bit).

This should fix test instability issues
using lot of threads in 32-bit environments.
With small enough input files, the inferred value of fileWindowLog could
be smaller than ZSTD_WINDOWLOG_MIN.

This can be reproduced like so:
$ echo abc > small
$ echo abcdef > small2
$ zstd --patch-from small small2 -o patch
previously, this would fail with the error "zstd: error 11 : Parameter is out of bound"
reduce ZSTDMT_NBWORKERS_MAX in 32-bit mode
hopefully, bionic will have a more recent version of python
required to install meson.
Fixed meson test on travisCI
@senhuang42 senhuang42 merged commit a488ba1 into release May 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.