-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZFS_IOC_COUNT_FILLED does unnecessary txg_wait_synced() #13368
Conversation
(polite ping! wondering if @behlendorf's review comment is a show-stopper.) |
The idea here is great. The issue caught by the test case failure just needs to be resolved before it can be merged. |
`lseek(SEEK_DATA | SEEK_HOLE)` are only accurate when the on-disk blocks reflect all writes, i.e. when there are no dirty data blocks. To ensure this, if the target dnode is dirty, they wait for the open txg to be synced, so we can call them "stabilizing operations". If they cause txg_wait_synced often, it can be detrimental to performance. Typically, a group of files are all modified, and then SEEK_DATA/HOLE are performed on them. In this case, the first SEEK does a txg_wait_synced(), and subsequent SEEKs don't need to wait, so performance is good. However, if a workload involves an interleaved metadata modification, the subsequent SEEK may do a txg_wait_synced() unnecessarily. For example, if we do a `read()` syscall to each file before we do its SEEK. This applies even with `relatime=on`, when the `read()` is the first read after the last write. The txg_wait_synced() is unnecessary because the SEEK operations only care that the structure of the tree of indirect and data blocks is up to date on disk. They don't care about metadata like the contents of the bonus or spill blocks. (They also don't care if an existing data block is modified, but this would be more involved to filter out.) This commit changes the behavior of SEEK_DATA/HOLE operations such that they do not call txg_wait_synced() if there is only a pending change to the bonus or spill block. Signed-off-by: Matthew Ahrens <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to have sense.
`lseek(SEEK_DATA | SEEK_HOLE)` are only accurate when the on-disk blocks reflect all writes, i.e. when there are no dirty data blocks. To ensure this, if the target dnode is dirty, they wait for the open txg to be synced, so we can call them "stabilizing operations". If they cause txg_wait_synced often, it can be detrimental to performance. Typically, a group of files are all modified, and then SEEK_DATA/HOLE are performed on them. In this case, the first SEEK does a txg_wait_synced(), and subsequent SEEKs don't need to wait, so performance is good. However, if a workload involves an interleaved metadata modification, the subsequent SEEK may do a txg_wait_synced() unnecessarily. For example, if we do a `read()` syscall to each file before we do its SEEK. This applies even with `relatime=on`, when the `read()` is the first read after the last write. The txg_wait_synced() is unnecessary because the SEEK operations only care that the structure of the tree of indirect and data blocks is up to date on disk. They don't care about metadata like the contents of the bonus or spill blocks. (They also don't care if an existing data block is modified, but this would be more involved to filter out.) This commit changes the behavior of SEEK_DATA/HOLE operations such that they do not call txg_wait_synced() if there is only a pending change to the bonus or spill block. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes openzfs#13368 Issue openzfs#14594 Issue openzfs#14512 Issue openzfs#14009
Hi, @ahrens Could we say the dnode is also clean if the blkid of lseek offset is bigger than the blkid of all dirty records and free ranges? |
@Finix1979 that is probably true, but it might take too long to determine that, with the current data structures. (dn_dirty_records is not sorted so you'd have to look through them all) |
`lseek(SEEK_DATA | SEEK_HOLE)` are only accurate when the on-disk blocks reflect all writes, i.e. when there are no dirty data blocks. To ensure this, if the target dnode is dirty, they wait for the open txg to be synced, so we can call them "stabilizing operations". If they cause txg_wait_synced often, it can be detrimental to performance. Typically, a group of files are all modified, and then SEEK_DATA/HOLE are performed on them. In this case, the first SEEK does a txg_wait_synced(), and subsequent SEEKs don't need to wait, so performance is good. However, if a workload involves an interleaved metadata modification, the subsequent SEEK may do a txg_wait_synced() unnecessarily. For example, if we do a `read()` syscall to each file before we do its SEEK. This applies even with `relatime=on`, when the `read()` is the first read after the last write. The txg_wait_synced() is unnecessary because the SEEK operations only care that the structure of the tree of indirect and data blocks is up to date on disk. They don't care about metadata like the contents of the bonus or spill blocks. (They also don't care if an existing data block is modified, but this would be more involved to filter out.) This commit changes the behavior of SEEK_DATA/HOLE operations such that they do not call txg_wait_synced() if there is only a pending change to the bonus or spill block. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes #13368 Issue #14594 Issue #14512 Issue #14009
`lseek(SEEK_DATA | SEEK_HOLE)` are only accurate when the on-disk blocks reflect all writes, i.e. when there are no dirty data blocks. To ensure this, if the target dnode is dirty, they wait for the open txg to be synced, so we can call them "stabilizing operations". If they cause txg_wait_synced often, it can be detrimental to performance. Typically, a group of files are all modified, and then SEEK_DATA/HOLE are performed on them. In this case, the first SEEK does a txg_wait_synced(), and subsequent SEEKs don't need to wait, so performance is good. However, if a workload involves an interleaved metadata modification, the subsequent SEEK may do a txg_wait_synced() unnecessarily. For example, if we do a `read()` syscall to each file before we do its SEEK. This applies even with `relatime=on`, when the `read()` is the first read after the last write. The txg_wait_synced() is unnecessary because the SEEK operations only care that the structure of the tree of indirect and data blocks is up to date on disk. They don't care about metadata like the contents of the bonus or spill blocks. (They also don't care if an existing data block is modified, but this would be more involved to filter out.) This commit changes the behavior of SEEK_DATA/HOLE operations such that they do not call txg_wait_synced() if there is only a pending change to the bonus or spill block. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes openzfs#13368 Issue openzfs#14594 Issue openzfs#14512 Issue openzfs#14009
`lseek(SEEK_DATA | SEEK_HOLE)` are only accurate when the on-disk blocks reflect all writes, i.e. when there are no dirty data blocks. To ensure this, if the target dnode is dirty, they wait for the open txg to be synced, so we can call them "stabilizing operations". If they cause txg_wait_synced often, it can be detrimental to performance. Typically, a group of files are all modified, and then SEEK_DATA/HOLE are performed on them. In this case, the first SEEK does a txg_wait_synced(), and subsequent SEEKs don't need to wait, so performance is good. However, if a workload involves an interleaved metadata modification, the subsequent SEEK may do a txg_wait_synced() unnecessarily. For example, if we do a `read()` syscall to each file before we do its SEEK. This applies even with `relatime=on`, when the `read()` is the first read after the last write. The txg_wait_synced() is unnecessary because the SEEK operations only care that the structure of the tree of indirect and data blocks is up to date on disk. They don't care about metadata like the contents of the bonus or spill blocks. (They also don't care if an existing data block is modified, but this would be more involved to filter out.) This commit changes the behavior of SEEK_DATA/HOLE operations such that they do not call txg_wait_synced() if there is only a pending change to the bonus or spill block. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes openzfs#13368 Issue openzfs#14594 Issue openzfs#14512 Issue openzfs#14009
`lseek(SEEK_DATA | SEEK_HOLE)` are only accurate when the on-disk blocks reflect all writes, i.e. when there are no dirty data blocks. To ensure this, if the target dnode is dirty, they wait for the open txg to be synced, so we can call them "stabilizing operations". If they cause txg_wait_synced often, it can be detrimental to performance. Typically, a group of files are all modified, and then SEEK_DATA/HOLE are performed on them. In this case, the first SEEK does a txg_wait_synced(), and subsequent SEEKs don't need to wait, so performance is good. However, if a workload involves an interleaved metadata modification, the subsequent SEEK may do a txg_wait_synced() unnecessarily. For example, if we do a `read()` syscall to each file before we do its SEEK. This applies even with `relatime=on`, when the `read()` is the first read after the last write. The txg_wait_synced() is unnecessary because the SEEK operations only care that the structure of the tree of indirect and data blocks is up to date on disk. They don't care about metadata like the contents of the bonus or spill blocks. (They also don't care if an existing data block is modified, but this would be more involved to filter out.) This commit changes the behavior of SEEK_DATA/HOLE operations such that they do not call txg_wait_synced() if there is only a pending change to the bonus or spill block. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes openzfs#13368 Issue openzfs#14594 Issue openzfs#14512 Issue openzfs#14009
`lseek(SEEK_DATA | SEEK_HOLE)` are only accurate when the on-disk blocks reflect all writes, i.e. when there are no dirty data blocks. To ensure this, if the target dnode is dirty, they wait for the open txg to be synced, so we can call them "stabilizing operations". If they cause txg_wait_synced often, it can be detrimental to performance. Typically, a group of files are all modified, and then SEEK_DATA/HOLE are performed on them. In this case, the first SEEK does a txg_wait_synced(), and subsequent SEEKs don't need to wait, so performance is good. However, if a workload involves an interleaved metadata modification, the subsequent SEEK may do a txg_wait_synced() unnecessarily. For example, if we do a `read()` syscall to each file before we do its SEEK. This applies even with `relatime=on`, when the `read()` is the first read after the last write. The txg_wait_synced() is unnecessary because the SEEK operations only care that the structure of the tree of indirect and data blocks is up to date on disk. They don't care about metadata like the contents of the bonus or spill blocks. (They also don't care if an existing data block is modified, but this would be more involved to filter out.) This commit changes the behavior of SEEK_DATA/HOLE operations such that they do not call txg_wait_synced() if there is only a pending change to the bonus or spill block. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Matthew Ahrens <[email protected]> Closes openzfs#13368 Issue openzfs#14594 Issue openzfs#14512 Issue openzfs#14009
This seems to cause #14753. |
Motivation and Context
lseek(SEEK_DATA | SEEK_HOLE)
are only accurate when the on-disk blocksreflect all writes, i.e. when there are no dirty data blocks. To ensure
this, if the target dnode is dirty, they wait for the open txg to be
synced, so we can call them "stabilizing operations". If they cause
txg_wait_synced often, it can be detrimental to performance.
Typically, a group of files are all modified, and then SEEK_DATA/HOLE
are performed on them. In this case, the first SEEK does a
txg_wait_synced(), and subsequent SEEKs don't need to wait, so
performance is good.
However, if a workload involves an interleaved metadata modification,
the subsequent SEEK may do a txg_wait_synced() unnecessarily. For
example, if we do a
read()
syscall to each file before we do its SEEK.This applies even with
relatime=on
, when theread()
is the firstread after the last write. The txg_wait_synced() is unnecessary because
the SEEK operations only care that the structure of the tree of indirect
and data blocks is up to date on disk. They don't care about metadata
like the contents of the bonus or spill blocks. (They also don't care
if an existing data block is modified, but this would be more involved
to filter out.)
Description
This commit changes the behavior of SEEK_DATA/HOLE operations such that
they do not call txg_wait_synced() if there is only a pending change to
the bonus or spill block.
How Has This Been Tested?
Tested with a workload that does:
Previously, the first SEEK_DATA of each file caused a txg_wait_synced(). Now only the first SEEK_DATA of the first file causes a txg_wait_synced().
Types of changes
Checklist:
Signed-off-by
.