-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zio_compress: introduce max_size threshold #9416
Conversation
Codecov Report
@@ Coverage Diff @@
## master #9416 +/- ##
==========================================
+ Coverage 79.15% 79.15% +<.01%
==========================================
Files 416 416
Lines 123655 123658 +3
==========================================
+ Hits 97876 97880 +4
+ Misses 25779 25778 -1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See inline comments.
Also it should be verified that changing the module parameter won't modify the result for existing data when called from https://github.com/zfsonlinux/zfs/blob/13a4027a7cd68069cb252e94c18ba1e5eb5af1cd/module/zfs/arc.c#L1625 as that would break encryption MAC validation.
Possibly this needs to be a feature tracking the TXG it's activated on to behave differently depending on the age of the data.
10488d9
to
39d1434
Compare
@gmelikov Thanks for pointing me here. Good work on this, LGTM, this PR deserves some more love! 👍 edit |
I would prefer to eliminate the module parameter, and instead "use sector size of this pool as threshold". I.e. always make the threshold 1 sector (spa_max_ashift). |
@ahrens hmm, I agree with you, we don't need to use larger threshold here with lz4 and future zstd. Will get back to this PR in a week. |
c147e6d
to
f7523c3
Compare
Codecov Report
@@ Coverage Diff @@
## master #9416 +/- ##
==========================================
- Coverage 79.38% 79.38% -0.01%
==========================================
Files 388 388
Lines 123392 123396 +4
==========================================
Hits 97953 97953
- Misses 25439 25443 +4
Continue to review full report at Codecov.
|
Updated,
|
@gmelikov without looking at it too closely i think i might know whats going on. I think there are 2 things as play which are combining to result in the ARC's "recompress before checksumming" to work ok, but I think it is a pretty brittle dependency. For some quick background, we need to store data in the L2ARC exactly the same as it is on disk in order for the decryption to work. This is why we do the re-compression here in the ARC layer. When the data gets recompressed, the ARC uses the compression algorithm that it stored in the ARC header. This info comes from the bp when the block first enters the ARC. This patch doesn't change the way compression is done (AFAICT), it just changes the threshold at which we decide the compression is "worth it". However, no matter what that threshold is, if ZFS decides not to compress the block it sets the compression algorithm to That being said, this discussion is making me wonder if QAT compression support breaks L2ARC with encryption + no compression. I suspect it does, but that is a very niche bug with very minor repercussions in that case (the data simply fails to read from the L2ARC) and checks the main pool instead. Probably a problem for another time. Anyway, hope that helps. |
@tcaputi Thank you for insight, you're right, if it was compressed - we can use zero threshold here too, so I was wrong here (how glad I am)! So, this patch is ready to review, I've rebased it. |
module/zfs/arc.c
Outdated
@@ -1755,7 +1756,7 @@ arc_hdr_authenticate(arc_buf_hdr_t *hdr, spa_t *spa, uint64_t dsobj) | |||
abd_take_ownership_of_buf(abd, B_TRUE); | |||
|
|||
csize = zio_compress_data(HDR_GET_COMPRESS(hdr), | |||
hdr->b_l1hdr.b_pabd, tmpbuf, lsize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, steps to get problem here:
- Use code from this PR
- get 89+% compressed encrypted block
- L2ARC attached to pool
- Load pool with pre-PR code
zio_compress_data
will not compress this block again here due to old 12.5% threshold inside it
Heck, because it's the problem - I'll add a read-only feature flag only to make it safe for this case. In future zio_compress_data
will have threshold argument so we won't get in this probem again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, if you move the pool to a system with older software, it doesn't correctly write to the L2ARC, because even though the block was compressed, the old L2ARC-writing code will not compress it if it doesn't save at least 1/8th of the space. This only applies if the old system has disabled compressed ARC, right?
Introducing a readonly-compatible feature flag is a bit unfortunate. It would mean maintaining the old 1/8th code forever. I wonder how common it is to disable compressed ARC, nowadays? It might be reasonable to say that if you disable compressed ARC, and use L2ARC, and bring a newer pool back to old bits, then some of the data won't be able to "stick" in the L2ARC. I mean, we previously considered (but didn't implement) saying that disabling compressed ARC makes L2ARC not work with compressed blocks at all (i.e. removing the recompress-on-L2ARC-write code entirely).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ahrens yes, you've got the point. The funny thing is that I can't reproduce problems in such theoretic scenario, l2arc works well under manual tests with data from new code, but with old module and disabled ARC compression.
There is a "best effort" thing near arc_hdr_authenticate
function, maybe it's the case?
I've tried to add a feature flag, and code began to transform from beautiful and easy to read info ugly if-if monster. I don't want to do this :(
If it's already a problematic and rare case (compressed l2arc with encryption on old code with pool and data from new code), and we are ready to go with this - I propose to merge it as is without a feature flag, PR is ready for it. I can add this comment somewhere (I'd be glad if someone gives me exact place for this).
(as an aside, the comments here talk about a >1MB recordsize a few times - I thought 1MB was the upper limit?) |
@adamdmoss by default it is, but technically 16MB is an upper limit now, you may just change this module parameter and voila https://github.com/openzfs/zfs/wiki/ZFS-on-Linux-Module-Parameters#zfs_max_recordsize It's good if you have really large block workload with rare read. |
@gmelikov the Fedora build failures do look like they're introduced by this PR. |
9b3d7f7
to
5737823
Compare
@behlendorf not sure what was with fedora runner, after rebase I've got 2 green runs. But I had flaky tests (they've failed only once):
|
Now default compression is lz4, which can stop compression process by itself on incompressible data. If there are additional size checks - we will only make our compressratio worse. New usable compression thresholds are: - less than BPE_PAYLOAD_SIZE (embedded_data feature); - at least one saved sector. Old 12.5% threshold is left to minimize affect on existing user expectations of CPU utilization. If data wasn't compressed - it will be saved as ZIO_COMPRESS_OFF, so if we really need to recompress data without ashift info and check anything - we can just compress it with zero threshold. So, we don't need a new feature flag here! Signed-off-by: George Melikov <[email protected]>
ZLE compressor needs additional bytes to process d_len argument efficiently. Don't use BPE_PAYLOAD_SIZE as d_len with it before we rework zle compressor somehow. Signed-off-by: George Melikov <[email protected]>
On compression we could be more explicit here for cases where we can not recompress the data. Co-authored-by: Alexander Motin <[email protected]> Signed-off-by: George Melikov <[email protected]>
5737823
to
c12ed0d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating this. Looks good, the CI failures were unrelated.
ZLE compressor needs additional bytes to process d_len argument efficiently. Don't use BPE_PAYLOAD_SIZE as d_len with it before we rework zle compressor somehow. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: George Melikov <[email protected]> Closes #9416
On compression we could be more explicit here for cases where we can not recompress the data. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Co-authored-by: Alexander Motin <[email protected]> Signed-off-by: George Melikov <[email protected]> Closes #9416
Now default compression is lz4, which can stop compression process by itself on incompressible data. If there are additional size checks - we will only make our compressratio worse. New usable compression thresholds are: - less than BPE_PAYLOAD_SIZE (embedded_data feature); - at least one saved sector. Old 12.5% threshold is left to minimize affect on existing user expectations of CPU utilization. If data wasn't compressed - it will be saved as ZIO_COMPRESS_OFF, so if we really need to recompress data without ashift info and check anything - we can just compress it with zero threshold. So, we don't need a new feature flag here! Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: George Melikov <[email protected]> Closes openzfs#9416
ZLE compressor needs additional bytes to process d_len argument efficiently. Don't use BPE_PAYLOAD_SIZE as d_len with it before we rework zle compressor somehow. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: George Melikov <[email protected]> Closes openzfs#9416
On compression we could be more explicit here for cases where we can not recompress the data. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Co-authored-by: Alexander Motin <[email protected]> Signed-off-by: George Melikov <[email protected]> Closes openzfs#9416
Now default compression is lz4, which can stop compression process by itself on incompressible data. If there are additional size checks - we will only make our compressratio worse. New usable compression thresholds are: - less than BPE_PAYLOAD_SIZE (embedded_data feature); - at least one saved sector. Old 12.5% threshold is left to minimize affect on existing user expectations of CPU utilization. If data wasn't compressed - it will be saved as ZIO_COMPRESS_OFF, so if we really need to recompress data without ashift info and check anything - we can just compress it with zero threshold. So, we don't need a new feature flag here! Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: George Melikov <[email protected]> Closes openzfs#9416
ZLE compressor needs additional bytes to process d_len argument efficiently. Don't use BPE_PAYLOAD_SIZE as d_len with it before we rework zle compressor somehow. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: George Melikov <[email protected]> Closes openzfs#9416
On compression we could be more explicit here for cases where we can not recompress the data. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Co-authored-by: Alexander Motin <[email protected]> Signed-off-by: George Melikov <[email protected]> Closes openzfs#9416
Now default compression is lz4, which can stop compression process by itself on incompressible data. If there are additional size checks - we will only make our compressratio worse. New usable compression thresholds are: - less than BPE_PAYLOAD_SIZE (embedded_data feature); - at least one saved sector. Old 12.5% threshold is left to minimize affect on existing user expectations of CPU utilization. If data wasn't compressed - it will be saved as ZIO_COMPRESS_OFF, so if we really need to recompress data without ashift info and check anything - we can just compress it with zero threshold. So, we don't need a new feature flag here! Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: George Melikov <[email protected]> Closes openzfs#9416
ZLE compressor needs additional bytes to process d_len argument efficiently. Don't use BPE_PAYLOAD_SIZE as d_len with it before we rework zle compressor somehow. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: George Melikov <[email protected]> Closes openzfs#9416
On compression we could be more explicit here for cases where we can not recompress the data. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Co-authored-by: Alexander Motin <[email protected]> Signed-off-by: George Melikov <[email protected]> Closes openzfs#9416
Now default compression is lz4, which can stop compression process by itself on incompressible data. If there are additional size checks - we will only make our compressratio worse. New usable compression thresholds are: - less than BPE_PAYLOAD_SIZE (embedded_data feature); - at least one saved sector. Old 12.5% threshold is left to minimize affect on existing user expectations of CPU utilization. If data wasn't compressed - it will be saved as ZIO_COMPRESS_OFF, so if we really need to recompress data without ashift info and check anything - we can just compress it with zero threshold. So, we don't need a new feature flag here! Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: George Melikov <[email protected]> Closes openzfs#9416
ZLE compressor needs additional bytes to process d_len argument efficiently. Don't use BPE_PAYLOAD_SIZE as d_len with it before we rework zle compressor somehow. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: George Melikov <[email protected]> Closes openzfs#9416
On compression we could be more explicit here for cases where we can not recompress the data. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Co-authored-by: Alexander Motin <[email protected]> Signed-off-by: George Melikov <[email protected]> Closes openzfs#9416
Created brand new PR because of some problems with rebasing old one #9311, + updated description.
Motivation and Context
87.5% minimum compressratio is way to large, on 1TB with recordsize 16M such artifical limitation may lead to up to 125GB of wasted space in worst case (but you must be very, very unlucky).
So i propose to check only for max ashift of pool, it may save you nearly 99% of compressratio on large recordsizes.
And I think if you enable slow compression on dataset you will prefer better compressratio over decompression absence for data nowadays.
So, pros:
Cons:
Description
Now default compression is lz4, which can stop
compression process by itself on incompressible data.
So we can check only for common sector size.
How Has This Been Tested?
Generate 12% (just as needed to get into 87.5% threshold) compressible file with fio:
So there may be up to ~12.5% compression gain for hardly compressible files.
Tested manually in VM, didn't run actual system on this code yet.
Types of changes
Checklist:
Signed-off-by
.