Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated the lz4 decompressor to 1.9.3 #12805

Merged
merged 1 commit into from
Jan 7, 2022
Merged

Conversation

rincebrain
Copy link
Contributor

@rincebrain rincebrain commented Nov 29, 2021

(This is a draft solely because I want to hear people's comments on how I lobbed the upstream code in and whether there's any more testing they'd like to see - I'm confident it's pretty stable, from testing it somewhat on a bunch of platforms...)

Motivation and Context

Well, the lz4 code is so old, it predates their switch to git in 2014, so it seemed likely that there were performance gains to be found.

In the quick and dirty benchmark I stole (and enhanced) from the zstd PR, the old LZ4 decompression code got between 1.5 and 2.2 GB/s decompressing the 3GB chunk of all Wikipedia pages I took (first 3 GB uncompressed of enwiki-20170820-pages-articles.xml.bz2, chosen because it was an easy dump to find and I was worried the 1GB dump used in the zstd tests might not be enough to measure a difference), depending on recordsize, and the new LZ4 code gets between 1.6 and 2.9 GB/s (all on my Ryzen 5900X in a Debian bullseye VM).

(I've not got a great way of inlining a spreadsheet here, but here's a list of results for compression=off and =lz4 with the old and new decompressor, at recordsizes from 16k to 4M, according to dd, with average of 3 results each. Not the most scientific analysis I've ever done, but if people want more results of a specific kind, that can be done.)

Description

  • Punted all the old LZ4 code into lz4_zfs.c, since "lz4.c" versus "lz4_vendor.c" or "lz4_decompressor.c" seemed less good than "lz4.c" and "lz4_zfs.c", to me.
  • Dropped the decompressor code from lz4_zfs.c
  • Landed the decompressor and associated macros, inlined functions, and defines in lz4.c from vanilla lz4 1.9.3, mostly untouched (a few defines needed to be stubbed around, like includes that are necessary in userland, or stubbing out DEBUGLOG() entirely rather than editing the decompression code)
  • Makefile edits everywhere to include the new file and move the FBSD warning exception from lz4.c to lz4_zfs.c
  • Added lz4.c to the code coverage and cstyle exception lists

How Has This Been Tested?

I tried building and running it through at least the sanity test suite and also interactively decompressing ~20 GB of data from an unmodified install on:

  • Linux/x86_64 (Debian bullseye, Fedora 34)
  • FBSD/x86_64 (13.0-RELEASE-p4)*
  • Linux/sparc64 (Debian sid)
  • Linux/aarch64 (Debian bullseye)
  • Linux/ppc64el (Debian sid)

(The FreeBSD one hung and crashed at zfs_program_json every time, but it also did that when I tried it with unmodified git ded851b, so I don't think that was this?)

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@sempervictus
Copy link
Contributor

Well, there goes my "bravery in git" award for the week: 🏆 - thank you.
In terms of testing, since the decompressor isn't being iteratively upgraded but replaced altogether, i think we need to have some very hefty verification of the resulting buffers against what they were on the original version. Something along the lines of:

  1. create highly differentiated data set of test files - contents with sha512 and md5 sums (colliding both is ... unlikely using the same input), consisting of everything from text to compressed and encrypted formats and things which may or may not trip the pre-compression check in the compressor. Maybe even use AFL or some sort of fuzzer to generate these per-test to provide increasing levels of coverage over time.
  2. write them into a zpool or mock buffer using the old compressor as it was in the original code (using the current master version of this repo)
  3. read them using the new decompressor, comparing both sums against the output buffer

While i get that in theory this shouldn't be a concern since the compressor doesn't change, getting even 1bit wrong can have devastating consequences as we tend to treat data from a local FS as trusted in every sense; and since we cant really track changes due to git history not being there, it looks like a nice place for Murphy to camp out and mess with us.

@rincebrain
Copy link
Contributor Author

That's "easy", as it turns out there are nice premade test datasets for compressors to be run against, and I have a wide variety of random nonsense to throw at it. I'll put something together, but I'll be somewhat impressed if it goes awry.

sparc64, ppc64el, aarch64, i386, armhf, and x86_64 and I will have some fun stress times.

But given a test barrage like that not burping at all as a precondition for anyone considering merging, I'd like to hear people's thoughts on how I reshuffled things, anything they'd like to see differently, any other edge cases they'd like to see prodded, etc?

@rincebrain
Copy link
Contributor Author

Well, as a first data point, I made a pool containing a dataset for each of {4k,8k, ..., 16M} with recordsize set as you might expect from the name, and then 9 datasets under each of those each containing some data to test - several were disappointingly small compression test corpuses (artificl, calgary, cantrbry, large, misc, silesia, and largefiles containing enwik8 and enwik9 zipped and unzipped), then I threw in a couple of big git trees (linux-firmware and illumos) for variety.

The end result is just past 40 GB - I can upload it somewhere if people really want, but it's 40 GB, so I'd rather wait for someone to ask for a copy first.

Anyway, I took that, generated on a 2.0.6 box, and computed the sha256sums of all the files on it on said box. Then I ran the new decompressor tree on a Linux/x86_64 and a Linux/sparc64 box and did the same thing, and compared them.

No differences. So that's a nice sign.

(Footnote: As an experiment, I made a sibling branch to this one which also swaps out the compressor with the one from 1.9.3 - that actually involved more work than the decompressor, since there wasn't as much free glue for the old API, then I benchmarked that.

Compressed size on the aforementioned test dataset is within very tiny error bars - so tiny I double-checked that it was definitely running the new code. In terms of compression speed, there's a very minor boost sometimes. If/when people figure out a good solution for versioned compressors in OpenZFS, I'll presumably open a PR for this then...or maybe an even newer version, by then.)

@sempervictus
Copy link
Contributor

Thank you @rincebrain. Any chance that compressor branch is up somewhere?
@behlendorf: It looks like the whole "compression algorithms and implementations evolve" thing can't be put off forever or the delta in versions we use from what is state of the art might get too cumbersome to overcome. If this compressor delta is really that small then its probably not of huge importance, but the implementation might be. The zstd side of this is much worse as that's a fast-moving codebase with real performance benefits which matter for at-scale work. These dmu level things also benefit lustre AFAIK so we're sort of missing out across the board.
Can we use scrubs or inline accesses to recompress data? Not quite the mythical BP rewrite, but an RMW cycle on blocks written with an unversioned compressor or one of an older version. That way we can start versioning compressors as feature flags and assume any without such flags are v0.

@rincebrain
Copy link
Contributor Author

rincebrain commented Dec 11, 2021

I've not put it up anywhere, no - it's currently a massive hackjob with functions and definitions duplicated and strewn around, I'd want to do something like the pass I did over this, where I go and look for things that are unnecessarily changed or in a confusing order or missing their comments or w/e before sharing.

(If it had turned out to be a nontrivial improvement, I would have eventually argued in favor of something like lz4old and lz4 compression options, but it didn't.)

@ikozhukhov
Copy link
Contributor

ikozhukhov commented Dec 11, 2021

i think it will be much more better to prepare new LZ4 update to another lz4_{version} for reduce of potential issues with old data. for example, we have customers about 15PB of old data and it will be danger to update known working lz4 compression.
my 2c

@rincebrain
Copy link
Contributor Author

I could do that, but the point is that the decompressor should be perfectly happy decompressing all the data, old and new, currently, has so far has shown no signs of not being so, and has a nice speedup.

I'd rather not maintain an extra, slower codepath when nobody's demonstrated a flaw in the decompressor's claims that it will handle all old data?

Now, for shoving the new compressor in, that's a whole different can of worms, but that's not what this PR is for, I just tried the quick experiment out of curiosity.

@adamdmoss
Copy link
Contributor

I could do that, but the point is that the decompressor should be perfectly happy decompressing all the data, old and new, currently, has so far has shown no signs of not being so, and has a nice speedup.

I'd rather not maintain an extra, slower codepath when nobody's demonstrated a flaw in the decompressor's claims that it will handle all old data?

Now, for shoving the new compressor in, that's a whole different can of worms, but that's not what this PR is for, I just tried the quick experiment out of curiosity.

Agree on all points.

@sempervictus
Copy link
Contributor

@ikozhukhov: my take on that problem is that there isn't any danger to the data your customer has unless it is already damaged (not compressed in a valid LZ4 format). The decompressor routine should not be able to produce anything other than what was originally compressed, it will just do so faster to reduce the access time to those 15P :)

@h1z1
Copy link

h1z1 commented Dec 15, 2021

Strange world.. was testing lz4 last week. A not so useful but fun test is dd on /dev/zero. It gives absolutely bizzare test results on read back.

@adamdmoss
Copy link
Contributor

my take on that problem is that there isn't any danger to the data your customer has unless it is already damaged

Ironically the newer decompressor is probably less dangerous than the older once, since a bunch of weird lz4 corner-cases were fixed in the last X years where the decompressor would barf on (extremely unlikely) valid input or the compressor would fall over (or worse, output bad data) on extremely-unlikely input. The latter cases of course would not be fixed by this decompressor-only change. :) (and - I don't mean to imply that these cases have ever been hit by anyone in the wild, just wanted to emphasize that 2021's liblz4 code is incredibly more battle-hardened than 2014's...)

@adamdmoss
Copy link
Contributor

Strange world.. was testing lz4 last week. A not so useful but fun test is dd on /dev/zero. It gives absolutely bizzare test results on read back.

Not sure if you're referring to zfs's lz4 or the lz4 tool itself, but last time I looked zfs will still do an all-zero check before any other compressor gets the see the data, and treat zeros as a special ultra-fast case. So /dev/zero isn't a good exercise for any of zfs's compressors - it'll actually bypass them.

@h1z1
Copy link

h1z1 commented Dec 16, 2021

was a bit of both. It's a fun test because it doesn't touch lz4 at all rather highlights what is possible and what may be broken. Thus the fun part, smaller read block sizes (< 131072 ) - are quite low (32k is better .. still 1.5G/s vs 5.3G/s though). Bigger recordsize helps of course. Still hitting a ~2G/s on write.

module/zfs/lz4_zfs.c Outdated Show resolved Hide resolved
@rincebrain rincebrain marked this pull request as ready for review January 5, 2022 17:06
@behlendorf behlendorf added the Status: Accepted Ready to integrate (reviewed, tested) label Jan 5, 2022
@behlendorf
Copy link
Contributor

@rincebrain one last rebase and we can get this merged.

As an experiment, I stole the lz4 decompressor from
upstream lz4 (1.9.3), and landed it.

Feedback suggested that keeping the vendor lz4 code isolated and
unlinted was probably reasonable, so I lobbed it into its own file.

It also seemed reasonable to put the mostly-untouched* code into
lz4.c proper, and relegate the integrated and ZFS-specific code to
lz4_zfs.c.

Signed-off-by: Rich Ercolani <[email protected]>
@sempervictus
Copy link
Contributor

Ouch. Did not merge into the 2.1.2 branch i've got going well at all - lz4.c mangled beyond quick-remedy status.
@rincebrain - is there any chance you could please make a backport of this to a staging 2.1.3 branch or a 2.1.2 stable?

rincebrain added a commit to rincebrain/zfs that referenced this pull request Jan 7, 2022
As an experiment, I stole the lz4 decompressor from
upstream lz4 (1.9.3), and landed it.

Feedback suggested that keeping the vendor lz4 code isolated and
unlinted was probably reasonable, so I lobbed it into its own file.

It also seemed reasonable to put the mostly-untouched* code into
lz4.c proper, and relegate the integrated and ZFS-specific code to
lz4_zfs.c.

Reviewed-by: Matthew Ahrens <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rich Ercolani <[email protected]>
Closes openzfs#12805
rincebrain added a commit to rincebrain/zfs that referenced this pull request Jan 7, 2022
As an experiment, I stole the lz4 decompressor from
upstream lz4 (1.9.3), and landed it.

Feedback suggested that keeping the vendor lz4 code isolated and
unlinted was probably reasonable, so I lobbed it into its own file.

It also seemed reasonable to put the mostly-untouched* code into
lz4.c proper, and relegate the integrated and ZFS-specific code to
lz4_zfs.c.

Reviewed-by: Matthew Ahrens <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rich Ercolani <[email protected]>
Closes openzfs#12805
@PrivatePuffin
Copy link
Contributor

@BrainSlayer As the ZSTD decompressor is also backwards compatible, we might want to look into doing something for ZSTD as well? upgrading the decompressor-only?

Because all the blockers are mainly related to the compressor :)

@rincebrain
Copy link
Contributor Author

rincebrain commented Jul 25, 2022

I've got a PR to do that, if you really want, but it's not noticably faster, IME...

For that matter, I have one that solves the problems with the compressor upgrade, but the newer ones aren't really faster or better before 1.5.1, and after 1.5.0 they changed the constants to make it slower to improve the compression ratio because they felt they'd made the performance better enough that it's a win, but that seems to have been for much much larger things than ZFS will ever use, because the perf (though not compression ratio) is worse, even if you let them play in the SIMD sandbox.

I have a bunch of charts. They're sad.

@BrainSlayer
Copy link
Contributor

BrainSlayer commented Jul 26, 2022

@BrainSlayer As the ZSTD decompressor is also backwards compatible, we might want to look into doing something for ZSTD as well? upgrading the decompressor-only?

Because all the blockers are mainly related to the compressor :)

regarding performance. have you checked my zfs tree recently for performance measurement? i mean it includes latest zstd code

@PrivatePuffin
Copy link
Contributor

Thanks for the info, in that regards it might indeed not be worth it to update zstd at this point in time...

nicman23 pushed a commit to nicman23/zfs that referenced this pull request Aug 22, 2022
As an experiment, I stole the lz4 decompressor from
upstream lz4 (1.9.3), and landed it.

Feedback suggested that keeping the vendor lz4 code isolated and
unlinted was probably reasonable, so I lobbed it into its own file.

It also seemed reasonable to put the mostly-untouched* code into
lz4.c proper, and relegate the integrated and ZFS-specific code to
lz4_zfs.c.

Reviewed-by: Matthew Ahrens <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rich Ercolani <[email protected]>
Closes openzfs#12805
nicman23 pushed a commit to nicman23/zfs that referenced this pull request Aug 22, 2022
As an experiment, I stole the lz4 decompressor from
upstream lz4 (1.9.3), and landed it.

Feedback suggested that keeping the vendor lz4 code isolated and
unlinted was probably reasonable, so I lobbed it into its own file.

It also seemed reasonable to put the mostly-untouched* code into
lz4.c proper, and relegate the integrated and ZFS-specific code to
lz4_zfs.c.

Reviewed-by: Matthew Ahrens <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rich Ercolani <[email protected]>
Closes openzfs#12805
snajpa pushed a commit to vpsfreecz/zfs that referenced this pull request Oct 22, 2022
As an experiment, I stole the lz4 decompressor from
upstream lz4 (1.9.3), and landed it.

Feedback suggested that keeping the vendor lz4 code isolated and
unlinted was probably reasonable, so I lobbed it into its own file.

It also seemed reasonable to put the mostly-untouched* code into
lz4.c proper, and relegate the integrated and ZFS-specific code to
lz4_zfs.c.

Reviewed-by: Matthew Ahrens <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rich Ercolani <[email protected]>
Closes openzfs#12805
snajpa pushed a commit to vpsfreecz/zfs that referenced this pull request Oct 22, 2022
As an experiment, I stole the lz4 decompressor from
upstream lz4 (1.9.3), and landed it.

Feedback suggested that keeping the vendor lz4 code isolated and
unlinted was probably reasonable, so I lobbed it into its own file.

It also seemed reasonable to put the mostly-untouched* code into
lz4.c proper, and relegate the integrated and ZFS-specific code to
lz4_zfs.c.

Reviewed-by: Matthew Ahrens <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rich Ercolani <[email protected]>
Closes openzfs#12805
snajpa pushed a commit to vpsfreecz/zfs that referenced this pull request Oct 23, 2022
As an experiment, I stole the lz4 decompressor from
upstream lz4 (1.9.3), and landed it.

Feedback suggested that keeping the vendor lz4 code isolated and
unlinted was probably reasonable, so I lobbed it into its own file.

It also seemed reasonable to put the mostly-untouched* code into
lz4.c proper, and relegate the integrated and ZFS-specific code to
lz4_zfs.c.

Reviewed-by: Matthew Ahrens <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rich Ercolani <[email protected]>
Closes openzfs#12805
bsdjhb pushed a commit to CTSRD-CHERI/zfs that referenced this pull request Jul 13, 2023
This is not associated with a specific upstream commit but apparently
a local diff applied as part of:

commit e92ffd9b626833ebdbf2742c8ffddc6cd94b963e
Merge: 3c3df3660072 17b2ae0
Author: Martin Matuska <[email protected]>
Date:   Sat Jan 22 23:05:15 2022 +0100

    zfs: merge openzfs/zfs@17b2ae0b2 (master) into main

    Notable upstream pull request merges:
      openzfs#12766 Fix error propagation from lzc_send_redacted
      openzfs#12805 Updated the lz4 decompressor
      openzfs#12851 FreeBSD: Provide correct file generation number
      openzfs#12857 Verify dRAID empty sectors
      openzfs#12874 FreeBSD: Update argument types for VOP_READDIR
      openzfs#12896 Reduce number of arc_prune threads
      openzfs#12934 FreeBSD: Fix zvol_*_open() locking
      openzfs#12947 lz4: Cherrypick fix for CVE-2021-3520
      openzfs#12961 FreeBSD: Fix leaked strings in libspl mnttab
      openzfs#12964 Fix handling of errors from dmu_write_uio_dbuf() on FreeBSD
      openzfs#12981 Introduce a flag to skip comparing the local mac when raw sending
      openzfs#12985 Avoid memory allocations in the ARC eviction thread

    Obtained from:  OpenZFS
    OpenZFS commit: 17b2ae0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.