-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated the lz4 decompressor to 1.9.3 #12805
Conversation
Well, there goes my "bravery in git" award for the week: 🏆 - thank you.
While i get that in theory this shouldn't be a concern since the compressor doesn't change, getting even 1bit wrong can have devastating consequences as we tend to treat data from a local FS as trusted in every sense; and since we cant really track changes due to git history not being there, it looks like a nice place for Murphy to camp out and mess with us. |
That's "easy", as it turns out there are nice premade test datasets for compressors to be run against, and I have a wide variety of random nonsense to throw at it. I'll put something together, but I'll be somewhat impressed if it goes awry. sparc64, ppc64el, aarch64, i386, armhf, and x86_64 and I will have some fun stress times. But given a test barrage like that not burping at all as a precondition for anyone considering merging, I'd like to hear people's thoughts on how I reshuffled things, anything they'd like to see differently, any other edge cases they'd like to see prodded, etc? |
Well, as a first data point, I made a pool containing a dataset for each of {4k,8k, ..., 16M} with recordsize set as you might expect from the name, and then 9 datasets under each of those each containing some data to test - several were disappointingly small compression test corpuses (artificl, calgary, cantrbry, large, misc, silesia, and largefiles containing enwik8 and enwik9 zipped and unzipped), then I threw in a couple of big git trees (linux-firmware and illumos) for variety. The end result is just past 40 GB - I can upload it somewhere if people really want, but it's 40 GB, so I'd rather wait for someone to ask for a copy first. Anyway, I took that, generated on a 2.0.6 box, and computed the sha256sums of all the files on it on said box. Then I ran the new decompressor tree on a Linux/x86_64 and a Linux/sparc64 box and did the same thing, and compared them. No differences. So that's a nice sign. (Footnote: As an experiment, I made a sibling branch to this one which also swaps out the compressor with the one from 1.9.3 - that actually involved more work than the decompressor, since there wasn't as much free glue for the old API, then I benchmarked that. Compressed size on the aforementioned test dataset is within very tiny error bars - so tiny I double-checked that it was definitely running the new code. In terms of compression speed, there's a very minor boost sometimes. If/when people figure out a good solution for versioned compressors in OpenZFS, I'll presumably open a PR for this then...or maybe an even newer version, by then.) |
Thank you @rincebrain. Any chance that compressor branch is up somewhere? |
I've not put it up anywhere, no - it's currently a massive hackjob with functions and definitions duplicated and strewn around, I'd want to do something like the pass I did over this, where I go and look for things that are unnecessarily changed or in a confusing order or missing their comments or w/e before sharing. (If it had turned out to be a nontrivial improvement, I would have eventually argued in favor of something like lz4old and lz4 compression options, but it didn't.) |
i think it will be much more better to prepare new LZ4 update to another lz4_{version} for reduce of potential issues with old data. for example, we have customers about 15PB of old data and it will be danger to update known working lz4 compression. |
I could do that, but the point is that the decompressor should be perfectly happy decompressing all the data, old and new, currently, has so far has shown no signs of not being so, and has a nice speedup. I'd rather not maintain an extra, slower codepath when nobody's demonstrated a flaw in the decompressor's claims that it will handle all old data? Now, for shoving the new compressor in, that's a whole different can of worms, but that's not what this PR is for, I just tried the quick experiment out of curiosity. |
Agree on all points. |
@ikozhukhov: my take on that problem is that there isn't any danger to the data your customer has unless it is already damaged (not compressed in a valid LZ4 format). The decompressor routine should not be able to produce anything other than what was originally compressed, it will just do so faster to reduce the access time to those 15P :) |
Strange world.. was testing lz4 last week. A not so useful but fun test is dd on /dev/zero. It gives absolutely bizzare test results on read back. |
Ironically the newer decompressor is probably less dangerous than the older once, since a bunch of weird lz4 corner-cases were fixed in the last X years where the decompressor would barf on (extremely unlikely) valid input or the compressor would fall over (or worse, output bad data) on extremely-unlikely input. The latter cases of course would not be fixed by this decompressor-only change. :) (and - I don't mean to imply that these cases have ever been hit by anyone in the wild, just wanted to emphasize that 2021's liblz4 code is incredibly more battle-hardened than 2014's...) |
Not sure if you're referring to zfs's lz4 or the lz4 tool itself, but last time I looked zfs will still do an all-zero check before any other compressor gets the see the data, and treat zeros as a special ultra-fast case. So /dev/zero isn't a good exercise for any of zfs's compressors - it'll actually bypass them. |
was a bit of both. It's a fun test because it doesn't touch lz4 at all rather highlights what is possible and what may be broken. Thus the fun part, smaller read block sizes (< 131072 ) - are quite low (32k is better .. still 1.5G/s vs 5.3G/s though). Bigger recordsize helps of course. Still hitting a ~2G/s on write. |
@rincebrain one last rebase and we can get this merged. |
As an experiment, I stole the lz4 decompressor from upstream lz4 (1.9.3), and landed it. Feedback suggested that keeping the vendor lz4 code isolated and unlinted was probably reasonable, so I lobbed it into its own file. It also seemed reasonable to put the mostly-untouched* code into lz4.c proper, and relegate the integrated and ZFS-specific code to lz4_zfs.c. Signed-off-by: Rich Ercolani <[email protected]>
Ouch. Did not merge into the 2.1.2 branch i've got going well at all - lz4.c mangled beyond quick-remedy status. |
As an experiment, I stole the lz4 decompressor from upstream lz4 (1.9.3), and landed it. Feedback suggested that keeping the vendor lz4 code isolated and unlinted was probably reasonable, so I lobbed it into its own file. It also seemed reasonable to put the mostly-untouched* code into lz4.c proper, and relegate the integrated and ZFS-specific code to lz4_zfs.c. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rich Ercolani <[email protected]> Closes openzfs#12805
As an experiment, I stole the lz4 decompressor from upstream lz4 (1.9.3), and landed it. Feedback suggested that keeping the vendor lz4 code isolated and unlinted was probably reasonable, so I lobbed it into its own file. It also seemed reasonable to put the mostly-untouched* code into lz4.c proper, and relegate the integrated and ZFS-specific code to lz4_zfs.c. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rich Ercolani <[email protected]> Closes openzfs#12805
@BrainSlayer As the ZSTD decompressor is also backwards compatible, we might want to look into doing something for ZSTD as well? upgrading the decompressor-only? Because all the blockers are mainly related to the compressor :) |
I've got a PR to do that, if you really want, but it's not noticably faster, IME... For that matter, I have one that solves the problems with the compressor upgrade, but the newer ones aren't really faster or better before 1.5.1, and after 1.5.0 they changed the constants to make it slower to improve the compression ratio because they felt they'd made the performance better enough that it's a win, but that seems to have been for much much larger things than ZFS will ever use, because the perf (though not compression ratio) is worse, even if you let them play in the SIMD sandbox. I have a bunch of charts. They're sad. |
regarding performance. have you checked my zfs tree recently for performance measurement? i mean it includes latest zstd code |
Thanks for the info, in that regards it might indeed not be worth it to update zstd at this point in time... |
As an experiment, I stole the lz4 decompressor from upstream lz4 (1.9.3), and landed it. Feedback suggested that keeping the vendor lz4 code isolated and unlinted was probably reasonable, so I lobbed it into its own file. It also seemed reasonable to put the mostly-untouched* code into lz4.c proper, and relegate the integrated and ZFS-specific code to lz4_zfs.c. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rich Ercolani <[email protected]> Closes openzfs#12805
As an experiment, I stole the lz4 decompressor from upstream lz4 (1.9.3), and landed it. Feedback suggested that keeping the vendor lz4 code isolated and unlinted was probably reasonable, so I lobbed it into its own file. It also seemed reasonable to put the mostly-untouched* code into lz4.c proper, and relegate the integrated and ZFS-specific code to lz4_zfs.c. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rich Ercolani <[email protected]> Closes openzfs#12805
As an experiment, I stole the lz4 decompressor from upstream lz4 (1.9.3), and landed it. Feedback suggested that keeping the vendor lz4 code isolated and unlinted was probably reasonable, so I lobbed it into its own file. It also seemed reasonable to put the mostly-untouched* code into lz4.c proper, and relegate the integrated and ZFS-specific code to lz4_zfs.c. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rich Ercolani <[email protected]> Closes openzfs#12805
As an experiment, I stole the lz4 decompressor from upstream lz4 (1.9.3), and landed it. Feedback suggested that keeping the vendor lz4 code isolated and unlinted was probably reasonable, so I lobbed it into its own file. It also seemed reasonable to put the mostly-untouched* code into lz4.c proper, and relegate the integrated and ZFS-specific code to lz4_zfs.c. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rich Ercolani <[email protected]> Closes openzfs#12805
As an experiment, I stole the lz4 decompressor from upstream lz4 (1.9.3), and landed it. Feedback suggested that keeping the vendor lz4 code isolated and unlinted was probably reasonable, so I lobbed it into its own file. It also seemed reasonable to put the mostly-untouched* code into lz4.c proper, and relegate the integrated and ZFS-specific code to lz4_zfs.c. Reviewed-by: Matthew Ahrens <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Rich Ercolani <[email protected]> Closes openzfs#12805
This is not associated with a specific upstream commit but apparently a local diff applied as part of: commit e92ffd9b626833ebdbf2742c8ffddc6cd94b963e Merge: 3c3df3660072 17b2ae0 Author: Martin Matuska <[email protected]> Date: Sat Jan 22 23:05:15 2022 +0100 zfs: merge openzfs/zfs@17b2ae0b2 (master) into main Notable upstream pull request merges: openzfs#12766 Fix error propagation from lzc_send_redacted openzfs#12805 Updated the lz4 decompressor openzfs#12851 FreeBSD: Provide correct file generation number openzfs#12857 Verify dRAID empty sectors openzfs#12874 FreeBSD: Update argument types for VOP_READDIR openzfs#12896 Reduce number of arc_prune threads openzfs#12934 FreeBSD: Fix zvol_*_open() locking openzfs#12947 lz4: Cherrypick fix for CVE-2021-3520 openzfs#12961 FreeBSD: Fix leaked strings in libspl mnttab openzfs#12964 Fix handling of errors from dmu_write_uio_dbuf() on FreeBSD openzfs#12981 Introduce a flag to skip comparing the local mac when raw sending openzfs#12985 Avoid memory allocations in the ARC eviction thread Obtained from: OpenZFS OpenZFS commit: 17b2ae0
(This is a draft solely because I want to hear people's comments on how I lobbed the upstream code in and whether there's any more testing they'd like to see - I'm confident it's pretty stable, from testing it somewhat on a bunch of platforms...)Motivation and Context
Well, the lz4 code is so old, it predates their switch to git in 2014, so it seemed likely that there were performance gains to be found.
In the quick and dirty benchmark I stole (and enhanced) from the zstd PR, the old LZ4 decompression code got between 1.5 and 2.2 GB/s decompressing the 3GB chunk of all Wikipedia pages I took (first 3 GB uncompressed of enwiki-20170820-pages-articles.xml.bz2, chosen because it was an easy dump to find and I was worried the 1GB dump used in the zstd tests might not be enough to measure a difference), depending on recordsize, and the new LZ4 code gets between 1.6 and 2.9 GB/s (all on my Ryzen 5900X in a Debian bullseye VM).
(I've not got a great way of inlining a spreadsheet here, but here's a list of results for compression=off and =lz4 with the old and new decompressor, at recordsizes from 16k to 4M, according to dd, with average of 3 results each. Not the most scientific analysis I've ever done, but if people want more results of a specific kind, that can be done.)
Description
How Has This Been Tested?
I tried building and running it through at least the sanity test suite and also interactively decompressing ~20 GB of data from an unmodified install on:
(The FreeBSD one hung and crashed at zfs_program_json every time, but it also did that when I tried it with unmodified git ded851b, so I don't think that was this?)
Types of changes
Checklist:
Signed-off-by
.