-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
panic loop: VERIFY0(0 == dmu_object_free(spa->spa_meta_objset, spa_err_obj, tx)) failed (0 == 2) #15277
Labels
Type: Defect
Incorrect behavior (e.g. crash, hang)
Comments
@gamanakis please take a look at this issue |
Thank you for catching this, and for the useful analysis. Let me see how we can fix it. |
13 tasks
@behlendorf we should include this in zfs-2.2-release. |
13 tasks
Yes, I agree, this should go into 2.2. The recovery from this can be quite cumbersome (as seen in #14643). I ended up installing a modified ZFS kernel to recover the one instance we encountered. |
behlendorf
pushed a commit
that referenced
this issue
Sep 19, 2023
spa_upgrade_errlog() does not update the MOS directory when the head_errlog feature is enabled. In this case if spa_errlog_sync() is not called, the MOS dir references the old errlog_last and errlog_sync objects. Thus when doing a scrub a panic will occur: Call Trace: dump_stack+0x6d/0x8b panic+0x101/0x2e3 spl_panic+0xcf/0x102 [spl] delete_errlog+0x124/0x130 [zfs] spa_errlog_sync+0x256/0x260 [zfs] spa_sync_iterate_to_convergence+0xe5/0x250 [zfs] spa_sync+0x2f7/0x670 [zfs] txg_sync_thread+0x22d/0x2d0 [zfs] thread_generic_wrapper+0x83/0xa0 [spl] kthread+0x104/0x140 ret_from_fork+0x1f/0x40 Fix this by updating the related MOS directory objects in spa_upgrade_errlog(). Reviewed-by: Mark Maybee <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes #15279 Closes #15277
gamanakis
added a commit
to gamanakis/zfs
that referenced
this issue
Sep 19, 2023
spa_upgrade_errlog() does not update the MOS directory when the head_errlog feature is enabled. In this case if spa_errlog_sync() is not called, the MOS dir references the old errlog_last and errlog_sync objects. Thus when doing a scrub a panic will occur: Call Trace: dump_stack+0x6d/0x8b panic+0x101/0x2e3 spl_panic+0xcf/0x102 [spl] delete_errlog+0x124/0x130 [zfs] spa_errlog_sync+0x256/0x260 [zfs] spa_sync_iterate_to_convergence+0xe5/0x250 [zfs] spa_sync+0x2f7/0x670 [zfs] txg_sync_thread+0x22d/0x2d0 [zfs] thread_generic_wrapper+0x83/0xa0 [spl] kthread+0x104/0x140 ret_from_fork+0x1f/0x40 Fix this by updating the related MOS directory objects in spa_upgrade_errlog(). Reviewed-by: Mark Maybee <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes openzfs#15279 Closes openzfs#15277
behlendorf
pushed a commit
that referenced
this issue
Sep 19, 2023
spa_upgrade_errlog() does not update the MOS directory when the head_errlog feature is enabled. In this case if spa_errlog_sync() is not called, the MOS dir references the old errlog_last and errlog_sync objects. Thus when doing a scrub a panic will occur: Call Trace: dump_stack+0x6d/0x8b panic+0x101/0x2e3 spl_panic+0xcf/0x102 [spl] delete_errlog+0x124/0x130 [zfs] spa_errlog_sync+0x256/0x260 [zfs] spa_sync_iterate_to_convergence+0xe5/0x250 [zfs] spa_sync+0x2f7/0x670 [zfs] txg_sync_thread+0x22d/0x2d0 [zfs] thread_generic_wrapper+0x83/0xa0 [spl] kthread+0x104/0x140 ret_from_fork+0x1f/0x40 Fix this by updating the related MOS directory objects in spa_upgrade_errlog(). Reviewed-by: Mark Maybee <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes #15279 Closes #15277
lundman
pushed a commit
to openzfsonwindows/openzfs
that referenced
this issue
Dec 12, 2023
spa_upgrade_errlog() does not update the MOS directory when the head_errlog feature is enabled. In this case if spa_errlog_sync() is not called, the MOS dir references the old errlog_last and errlog_sync objects. Thus when doing a scrub a panic will occur: Call Trace: dump_stack+0x6d/0x8b panic+0x101/0x2e3 spl_panic+0xcf/0x102 [spl] delete_errlog+0x124/0x130 [zfs] spa_errlog_sync+0x256/0x260 [zfs] spa_sync_iterate_to_convergence+0xe5/0x250 [zfs] spa_sync+0x2f7/0x670 [zfs] txg_sync_thread+0x22d/0x2d0 [zfs] thread_generic_wrapper+0x83/0xa0 [spl] kthread+0x104/0x140 ret_from_fork+0x1f/0x40 Fix this by updating the related MOS directory objects in spa_upgrade_errlog(). Reviewed-by: Mark Maybee <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: George Amanakis <[email protected]> Closes openzfs#15279 Closes openzfs#15277
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
System information
Describe the problem you're observing
After upgrading a system which did not have the
head_errlog
feature (introduced in 0409d33) and with an existing scrub IO error recorded, to a version which did have this feature, the system entered a panic loop during ascrub
.When a pool transitions to a state where the
head_errlog
feature is enabled, any persistent errlog state is converted to a new format. This involves deleting the old persistent errlog objects (errlog_last
anderrlog_scrub
) and replacing them with new ones. While the internal state is updated (spa_errlog_last
andspa_errlog_scrub
), the persistent pointers to these objects in the MOS directory object are not updated. These are currently only updated inspa_errlog_sync()
when it is called during sync phase (and there is "something to do").Describe how to reproduce the problem
The following sequence will reproduce the issue:
Start with a pool which does not have the head_errlog feature (this was introduced in release 6.0.15.0).
A data error occurs in the pool at some point and is discovered via a scrub. This will result in the errlog_last entry in the pools MOS directory to have a non-zero value (an object Id for a zap object that records the error block). Note that on pool import the entries from the MOS directory object are used to populate fields in the spa (e.g. errlog_last is used to initialize spa_errlog_last).
Upgrade the pool to a later version (e.g., 7.0.0.0.0). This will cause spa_errlog_upgrade() to be called, which will transfer the entries from the old object and delete the old object (replacing the value of spa_errlog_last). Note that this function does not update the MOS directory object, so the directory will continue to reference the old (now deleted) object.
Before another scrub, reboot the system. This will cause the spa to be initialized with, now stale, data from the MOS directory object. At this point the spa will now reference a deleted object from spa_errlog_last and the MOS directory object continues to reference the deleted object as well.
Start a new scrub. When we attempt to rotate the new scrub data (from spa_errlog_scrub) to the spa_errlog_last we will detect a non-zero value in spa_errlog_last and attempt to free that (already freed) object. This will trigger the panic loop.
Include any warning/errors/backtraces from the system logs
Panic stack:
The text was updated successfully, but these errors were encountered: