Fix the ZFS checksum error histograms with larger record sizes #15049

asomers · 2023-07-10T21:20:58Z

My analysis in PR #14716 was incorrect. Each histogram bucket contains the number of incorrect bits, by position in a 64-bit word, over the entire record. 8-bit buckets can overflow for record sizes above 2k. To forestall that, saturate each bucket at 255. That should still get the point across: either all bits are equally wrong, or just a couple are.

Sponsored-by: Axcient
Signed-off-by: Alan Somers [email protected]

Motivation and Context

After PR #14716, the bad_cleared_histogram and bad_set_histogram fields of an ereport.fs.zfs.checksum event could contain incorrect values for record sizes of 2kB and above, due to an integer overflow.

Description

Rather than overflow, saturate the histogram buckets at 255. That will still be sufficient for diagnostic purposes.

How Has This Been Tested?

Tested on FreeBSD 14 using the zfsd_degrade_001_pos test case from FreeBSD's zfsd test suite.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)
Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the OpenZFS code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
I have run the ZFS Test Suite with this change applied.
All commit messages are properly formatted and contain Signed-off-by.

My analysis in PR openzfs#14716 was incorrect. Each histogram bucket contains the number of incorrect bits, by position in a 64-bit word, over the entire record. 8-bit buckets can overflow for record sizes above 2k. To forestall that, saturate each bucket at 255. That should still get the point across: either all bits are equally wrong, or just a couple are. Sponsored-by: Axcient Signed-off-by: Alan Somers <[email protected]>

My analysis in PR openzfs#14716 was incorrect. Each histogram bucket contains the number of incorrect bits, by position in a 64-bit word, over the entire record. 8-bit buckets can overflow for record sizes above 2k. To forestall that, saturate each bucket at 255. That should still get the point across: either all bits are equally wrong, or just a couple are. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alan Somers <[email protected]> Sponsored-by: Axcient Closes openzfs#15049

My analysis in PR #14716 was incorrect. Each histogram bucket contains the number of incorrect bits, by position in a 64-bit word, over the entire record. 8-bit buckets can overflow for record sizes above 2k. To forestall that, saturate each bucket at 255. That should still get the point across: either all bits are equally wrong, or just a couple are. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alan Somers <[email protected]> Sponsored-by: Axcient Closes #15049

My analysis in PR openzfs#14716 was incorrect. Each histogram bucket contains the number of incorrect bits, by position in a 64-bit word, over the entire record. 8-bit buckets can overflow for record sizes above 2k. To forestall that, saturate each bucket at 255. That should still get the point across: either all bits are equally wrong, or just a couple are. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alan Somers <[email protected]> Sponsored-by: Axcient Closes openzfs#15049

behlendorf approved these changes Jul 13, 2023

View reviewed changes

behlendorf added the Status: Accepted Ready to integrate (reviewed, tested) label Jul 13, 2023

behlendorf merged commit 67c5e1b into openzfs:master Jul 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix the ZFS checksum error histograms with larger record sizes #15049

Fix the ZFS checksum error histograms with larger record sizes #15049

asomers commented Jul 10, 2023

Fix the ZFS checksum error histograms with larger record sizes #15049

Fix the ZFS checksum error histograms with larger record sizes #15049

Conversation

asomers commented Jul 10, 2023

Motivation and Context

Description

How Has This Been Tested?

Types of changes

Checklist: