[2.2] BRT and other fixes into 2.2.3-staging #15714

mmatuska · 2023-12-26T23:25:26Z

Motivation and Context

Backport BRT fixes + fixes suggested by amotin@ from master into 2.2.3-staging

Description

Backported commits:

86e115e uses ASSERT0P() which is not in 2.2, so use ASSERT0()

How Has This Been Tested?

Build and test on Linux and FreeBSD

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)
Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the OpenZFS code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
I have run the ZFS Test Suite with this change applied.
All commit messages are properly formatted and contain Signed-off-by.

amotin · 2023-12-27T15:40:18Z

adcea23 and bee9cfb are not related to block cloning, but I don't mind, though I am not familiar with the last.

But if we are going wider, I'd also propose e007908 , 58398cb , 3a8d9b8 , 35da345 , 2a27fd4 and f9765b1.

mmatuska · 2023-12-27T17:40:18Z

I have removed bee9cfb as it has introduced a new feature.

Once we verified the ABDs and asserted the sizes we should never see premature ABDs ends. Assert that and remove extra branches from production builds. Reviewed-by: Brian Atkinson <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#15428

Add a dataset_kstats_rename function, and call it when renaming a zvol on FreeBSD and Linux. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alan Somers <[email protected]> Sponsored-by: Axcient Closes openzfs#15482 Closes openzfs#15486

- Use sbuf_new_for_sysctl() to reduce double-buffering on sysctl output. - Use much faster sbuf_cat() instead of sbuf_printf("%s"). Together it reduces `sysctl kstat.zfs.misc.dbufs` time from minutes to seconds, making dbufstat almost usable. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#15495

It is unused for 3 years since openzfs#10576. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#15507

PR openzfs#15457 exposed weird logic in L2ARC write sizing. If it appeared bigger than device size, instead of liming write it reset all the system-wide tunables to their default. Aside of being excessive, it did not actually help with the problem, still allowing infinite loop to happen. This patch removes the tunables reverting logic, but instead limits L2ARC writes (or at least eviction/trim) to 1/4 of the capacity. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Amanakis <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#15519

This should make sure we have log written without overflows. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#15517

Since we use a limited set of kmem caches, quite often we have unused memory after the end of the buffer. Put there up to a 512-byte canary when built with debug to detect buffer overflows at the free time. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#15553

zil_claim_clone_range() takes references on cloned blocks before ZIL replay. Later zil_free_clone_range() drops them after replay or on dataset destroy. The total balance is neutral. It means we do not need to do anything (drop the references) for not implemented yet TX_CLONE_RANGE replay for ZVOLs. This is a logical follow up to openzfs#15603. Reviewed-by: Kay Pedersen <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#15612

ZIL claim can not handle block pointers cloned from the future, since they are not yet allocated at that point. It may happen either if the block was just written when it was cloned, or if the pool was frozen or somehow else rewound on import. Handle it from two sides: prevent cloning of blocks with physical birth time from not yet synced or frozen TXG, and abort ZIL claim if we still detect such blocks due to rewind or something else. While there, assert that any cloned blocks we claim are really allocated by calling metaslab_check_free(). Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#15617

When two datasets share the same master encryption key, it is safe to clone encrypted blocks. Currently only snapshots and clones of a dataset share with it the same encryption key. Added a test for: - Clone from encrypted sibling to encrypted sibling with non encrypted parent - Clone from encrypted parent to inherited encrypted child - Clone from child to sibling with encrypted parent - Clone from snapshot to the original datasets - Clone from foreign snapshot to a foreign dataset - Cloning from non-encrypted to encrypted datasets - Cloning from encrypted to non-encrypted datasets Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Original-patch-by: Pawel Jakub Dawidek <[email protected]> Signed-off-by: Kay Pedersen <[email protected]> Closes openzfs#15544

Block pointers are not encrypted in TX_WRITE and TX_CLONE_RANGE records, so we can dump them, that may be useful for debugging. Related to openzfs#15543. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#15629

To improve 128KB block write performance in case of multiple VDEVs ZIL used to spit those writes into two 64KB ones. Unfortunately it was found to cause LWB buffer overflow, trying to write maximum- sizes 128KB TX_CLONE_RANGE record with 1022 block pointers into 68KB buffer, since unlike TX_WRITE ZIL code can't split it. This is a minimally-invasive temporary block cloning fix until the following more invasive prediction code refactoring. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Ameer Hamza <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#15634

Without this patch on pool of 60 vdevs with ZFS_DEBUG enabled clone takes much more time than copy, while heavily trashing dbgmsg for no good reason, repeatedly dumping all vdevs BRTs again and again, even unmodified ones. I am generally not sure this dumping is not excessive, but decided to keep it for now, just restricting its scope to more reasonable. Reviewed-by: Kay Pedersen <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#15625

dmu_assign_arcbuf_by_dnode() should drop dn_struct_rwlock lock in case dbuf_hold() failed. I don't have reproduction for this, but it looks inconsistent with dmu_buf_hold_noread_by_dnode() and co. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#15644

In some cases dbuf_assign_arcbuf() may be called on a block that was recently cloned. If it happened in current TXG we must undo the block cloning first, since the only one dirty record per TXG can't and shouldn't mean both cloning and overwrite same time. Reviewed-by: Kay Pedersen <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#15653

Block cloning normally creates dirty record without dr_data. But if the block is read after cloning, it is moved into DB_CACHED state and receives the data buffer. If after that we call dbuf_unoverride() to convert the dirty record into normal write, we should give it the data buffer from dbuf and release one. Reviewed-by: Kay Pedersen <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#15654 Closes openzfs#15656

While 763ca47 closes the situation of block cloning creating unencrypted records in encrypted datasets, existing data still causes panic on read. Setting zfs_recover bypasses this but at the cost of potentially ignoring more serious issues. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Chris Peredun <[email protected]> Closes openzfs#15677

mmatuska · 2023-12-27T17:59:25Z

I have updated the patchlist with suggested patches from @amotin

amotin · 2024-01-03T20:29:46Z

Meanwhile I fixed two more cloning issues to add here after master: #15732 and #15735 .

tonyhutter · 2024-01-04T17:51:52Z

Thanks for these backports.

I see 46c0bfc has in its commit message:

"This is a logical follow up to #15603."

Do we want #15603 included as well?

amotin · 2024-01-04T17:54:27Z

Do we want #15603 included as well?

@tonyhutter As I see it was merged to 2.2.2 already.

mmatuska force-pushed the zfs-2.2.3-staging-import branch from d5b017c to 70576ee Compare December 27, 2023 00:32

amotin requested a review from behlendorf December 27, 2023 15:44

amotin approved these changes Dec 27, 2023

View reviewed changes

mmatuska force-pushed the zfs-2.2.3-staging-import branch from 70576ee to ea6e518 Compare December 27, 2023 17:39

amotin and others added 17 commits December 27, 2023 18:49

ZIL: Assert record sizes in different places

f9abc6d

This should make sure we have log written without overflows. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#15517

mmatuska force-pushed the zfs-2.2.3-staging-import branch from 07c976f to f183b0b Compare December 27, 2023 17:51

mmatuska changed the title ~~[2.2] BRT fixes into 2.2.3-staging~~ [2.2] BRT and other fixes into 2.2.3-staging Dec 27, 2023

mmatuska requested a review from amotin December 27, 2023 22:29

amotin approved these changes Dec 27, 2023

View reviewed changes

amotin requested a review from tonyhutter December 27, 2023 22:46

robn approved these changes Dec 27, 2023

View reviewed changes

tonyhutter approved these changes Jan 4, 2024

View reviewed changes

behlendorf approved these changes Jan 9, 2024

View reviewed changes

behlendorf merged commit f71c16a into openzfs:zfs-2.2.3-staging Jan 9, 2024
24 checks passed

mmatuska deleted the zfs-2.2.3-staging-import branch January 17, 2024 21:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[2.2] BRT and other fixes into 2.2.3-staging #15714

[2.2] BRT and other fixes into 2.2.3-staging #15714

mmatuska commented Dec 26, 2023 •

edited

Loading

amotin commented Dec 27, 2023 •

edited

Loading

mmatuska commented Dec 27, 2023

mmatuska commented Dec 27, 2023 •

edited

Loading

amotin commented Jan 3, 2024

tonyhutter commented Jan 4, 2024

amotin commented Jan 4, 2024

[2.2] BRT and other fixes into 2.2.3-staging #15714

[2.2] BRT and other fixes into 2.2.3-staging #15714

Conversation

mmatuska commented Dec 26, 2023 • edited Loading

Motivation and Context

Description

How Has This Been Tested?

Types of changes

Checklist:

amotin commented Dec 27, 2023 • edited Loading

mmatuska commented Dec 27, 2023

mmatuska commented Dec 27, 2023 • edited Loading

amotin commented Jan 3, 2024

tonyhutter commented Jan 4, 2024

amotin commented Jan 4, 2024

mmatuska commented Dec 26, 2023 •

edited

Loading

amotin commented Dec 27, 2023 •

edited

Loading

mmatuska commented Dec 27, 2023 •

edited

Loading