-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zfs fails to detect ZFS-8000-8A corruption: Reading file causes ZFS-8000-8A, scrub claims OK, repeat #16520
Comments
|
Personally I'd go one step further. I'd not trust the Ubuntu packages at all. |
Or use this PPA https://launchpad.net/~patrickdk/+archive/ubuntu/zfs/+packages instead (if you trust the maintainer). |
I'm not so sure Ubuntu's 2.2.0 corrupts pools. If you're referring to the bug I think you are, they backported a fix, but didn't upgrade the version number. Also, the 22.04 HWE kernel is now 6.8, with 2.2.2. The change from 6.5 to 6.8 is recent. They still use mismatched kernel and user in HWE, however. A number of bug reports have been ignored. However for myself, I run 2.2.5 (haven't rebooted since 2.2.6) on Ubuntu. Discussions on this topic have indicated that they don't think openzfs does enough testing of new releases to meet their standards, so they stick with versions that have gone through a full Ubuntu release beta cycle. My own evaluation is different, both for zfs and kernel. (I'm running 6.6.44 on my file servers.) |
|
I would be curious to see pointers of these discussions, as well as the specific bugs that you're advocating upgrading to 2.2.2 to resolve. |
I believe some people suggested 2.2.2 because of three serious problems with 2.2.0: BRT bugs, leading to 2.2.1 turning off the feature by default. CVE-2023-49298, a potentially serious corruption problem, fixed in 2.2.2. And #15526, which Ubuntu's change log treats as separae from the CVE. I'm not sure whether this is right. I note, however, that the fix to CVE-2023-49298 was cherry-picked into Ubuntu's ZFS 2.2.0, along with a fix to #15526. I believe disabling BRT was as well, thuogh I can't verify it.The first two were also cherry-picked into their ZFS 2.1.5. Ubuntu prefers to freeze ZFS and cherry-pick only CVEs and other very serious fixes. I disagree, which is why I'm using 2.2.5. (We haven't rebooted since 2.2.6 was released.) |
The CVE is also wrong, for most useful definitions, as you might expect for a CVE generated by some random person opening it. As the end of #15526 says, that wasn't fully resolved until later, with (among others) #16019, because, separate from any bugs found in BRT itself, it exposed a bunch of existing difficult to hit races since a metadata-only copy is inherently going to be a faster operation, so some otherwise impossible or impractical to hit races that have existed for a long time turned up. There were also fixes in BRT, like #15842, though once the killswitch PR is cherrypicked that becomes less urgent unless people override that. In particular, though, none of the flaws around BRT that I'm aware of should be producing checksum errors, since they were all around logical data handling, so none of those should be germane for this bug. |
Sure. I doubt that most of the discussion here has anything to do with what's causing the user's problem. |
System information
Describe the problem you're observing
zpool status -xv
reports: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A, all error counters zero and that file permanent errordate --rfc-3339=second && zpool scrub -w z2023 && date --rfc-3339=second && zpool scrub -w z2023 && date --rfc-3339=second
claims everything fixed.
BUG: scrub and zfs should not claim everything is fine when it isn’t
BUG: there is no way to have zfs admit that there is corruption
.
QUESTION: is zfs-2.1.5 OK paired with zfs-kmod-2.2.0? the semantic versions are different.
fresh installs have same versions, another host also have the same difference
.
Describe how to reproduce the problem
The software that wrote this file:
— first wrote the file verifying no errors
– then read the file verifying no errors and validated the checksum
meaning: immediately after writing, the file could be read
the first bookmark event I/O error was 6 days later
nothing in particular happened to the host or the disk during that time, no reboots or such
this particular host has operated this pool for two years
zfs came up with this error all by itself, it can’t read what it writes to disk and
zfs can’t figure out ahead of time that it is unreadable.
Of course, 100s of other files worked written around that same time
syslog:
there is no zed logging since when this file was written, September 2, or any sys-logging when the file was written
zfs came up with this issue all by itself, there was no power outage or tripping over cables
every time a scrub completes and the I/O error occurs, the bookmark log statement is printed
Include any warning/errors/backtraces from the system logs
The text was updated successfully, but these errors were encountered: