Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the ZFS checksum error histograms with larger record sizes #15049

Merged
merged 1 commit into from
Jul 14, 2023

Conversation

asomers
Copy link
Contributor

@asomers asomers commented Jul 10, 2023

My analysis in PR #14716 was incorrect. Each histogram bucket contains the number of incorrect bits, by position in a 64-bit word, over the entire record. 8-bit buckets can overflow for record sizes above 2k. To forestall that, saturate each bucket at 255. That should still get the point across: either all bits are equally wrong, or just a couple are.

Sponsored-by: Axcient
Signed-off-by: Alan Somers [email protected]

Motivation and Context

After PR #14716, the bad_cleared_histogram and bad_set_histogram fields of an ereport.fs.zfs.checksum event could contain incorrect values for record sizes of 2kB and above, due to an integer overflow.

Description

Rather than overflow, saturate the histogram buckets at 255. That will still be sufficient for diagnostic purposes.

How Has This Been Tested?

Tested on FreeBSD 14 using the zfsd_degrade_001_pos test case from FreeBSD's zfsd test suite.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

My analysis in PR openzfs#14716 was incorrect.  Each histogram bucket contains
the number of incorrect bits, by position in a 64-bit word, over the
entire record.  8-bit buckets can overflow for record sizes above 2k.
To forestall that, saturate each bucket at 255.  That should still get
the point across: either all bits are equally wrong, or just a couple
are.

Sponsored-by:	Axcient
Signed-off-by:	Alan Somers <[email protected]>
@behlendorf behlendorf added the Status: Accepted Ready to integrate (reviewed, tested) label Jul 13, 2023
@behlendorf behlendorf merged commit 67c5e1b into openzfs:master Jul 14, 2023
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request Jul 21, 2023
My analysis in PR openzfs#14716 was incorrect.  Each histogram bucket contains
the number of incorrect bits, by position in a 64-bit word, over the
entire record.  8-bit buckets can overflow for record sizes above 2k.
To forestall that, saturate each bucket at 255.  That should still get
the point across: either all bits are equally wrong, or just a couple
are.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Alan Somers <[email protected]>
Sponsored-by: Axcient
Closes openzfs#15049
behlendorf pushed a commit that referenced this pull request Jul 21, 2023
My analysis in PR #14716 was incorrect.  Each histogram bucket contains
the number of incorrect bits, by position in a 64-bit word, over the
entire record.  8-bit buckets can overflow for record sizes above 2k.
To forestall that, saturate each bucket at 255.  That should still get
the point across: either all bits are equally wrong, or just a couple
are.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Alan Somers <[email protected]>
Sponsored-by: Axcient
Closes #15049
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Dec 12, 2023
My analysis in PR openzfs#14716 was incorrect.  Each histogram bucket contains
the number of incorrect bits, by position in a 64-bit word, over the
entire record.  8-bit buckets can overflow for record sizes above 2k.
To forestall that, saturate each bucket at 255.  That should still get
the point across: either all bits are equally wrong, or just a couple
are.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Alan Somers <[email protected]>
Sponsored-by: Axcient
Closes openzfs#15049
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants