Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VFS: I think lseek() shouldn't really be a VFS operation. #451

Open
nyh opened this issue Aug 11, 2014 · 7 comments
Open

VFS: I think lseek() shouldn't really be a VFS operation. #451

nyh opened this issue Aug 11, 2014 · 7 comments
Labels

Comments

@nyh
Copy link
Contributor

nyh commented Aug 11, 2014

lseek() is a per-fd, not per-vnode operation.
Yet, our sys_lseek() (inherited from Prex, I believe) calls a vnode-specific VOP_SEEK, and holds the vnode lock while doing it.

If we look at implementations of these VOP_SEEK we see that zfs_seek() does some trivial checking and ramfs_seek is a null op. I think we should get rid of this VOP_SEEK. (TODO: check what Linux's VFS does regarding seek).

Currently, we're seeing a slowdown in Cassandra because several threads are doing lseek()/read() concurrently. But I think the problem is the long read() blocking (see issue #450), not the vnode lock in sys_lseek() which should be fairly harmless because it is held for such short durations. I'm not sure whether, even with the change suggested here, we can avoid the (very short-term) vnode lock in sys_lseek(): We may need this lock if we find ourselves needing to check the validity of the file type for example. But in most cases, it is better to avoid the vnode layer at all. E.g.,., I think after someone opens "/dev/console", we don't need to remember this name - the fd can have all the file-operations of the console device - it doesn't need the vfs file-operations and doesn't need "vnode"

@slivne
Copy link
Contributor

slivne commented Sep 29, 2014

@raphaelsc did we update the seek operation ?

@raphaelsc
Copy link
Member

On Mon, Sep 29, 2014 at 5:08 AM, slivne [email protected] wrote:

@raphaelsc https://github.com/raphaelsc did we update the seek
operation ?

No, but it shouldn't be a performance issue anymore given that seek will
proceed quickly when there is a concurrent read/write on the same node.


Reply to this email directly or view it on GitHub
#451 (comment)
.

Raphael S. Carvalho

@slivne
Copy link
Contributor

slivne commented Sep 29, 2014

@raphaelsc - ok, did we do any tests to prove that its performance is now good - so we can close this ?

@raphaelsc
Copy link
Member

On Mon, Sep 29, 2014 at 9:29 AM, slivne [email protected] wrote:

@raphaelsc https://github.com/raphaelsc - ok, did we do any tests to
prove that its performance is now good - so we can close this ?

No performance test to back up my claim, but seek will not block anymore
waiting for either read/write to release the node lock. It would be good to
run that workload Gleb ran (that reproduced the contention on seek; I don't
remember which one exactly), and see if the contention on seek is indeed
gone.


Reply to this email directly or view it on GitHub
#451 (comment)
.

Raphael S. Carvalho

@gleb-cloudius
Copy link
Contributor

On Mon, Sep 29, 2014 at 05:44:09AM -0700, Raphael S.Carvalho wrote:

On Mon, Sep 29, 2014 at 9:29 AM, slivne [email protected] wrote:

@raphaelsc https://github.com/raphaelsc - ok, did we do any tests to
prove that its performance is now good - so we can close this ?

No performance test to back up my claim, but seek will not block anymore
waiting for either read/write to release the node lock. It would be good to
run that workload Gleb ran (that reproduced the contention on seek; I don't
remember which one exactly), and see if the contention on seek is indeed
gone.

It was cassandra with compressed tables, but I am not sure how relevant
all this now since the patch that removes locking created deadlock.

        Gleb.

@raphaelsc
Copy link
Member

On Mon, Sep 29, 2014 at 9:51 AM, Gleb Natapov [email protected]
wrote:

On Mon, Sep 29, 2014 at 05:44:09AM -0700, Raphael S.Carvalho wrote:

On Mon, Sep 29, 2014 at 9:29 AM, slivne [email protected]
wrote:

@raphaelsc https://github.com/raphaelsc - ok, did we do any tests to
prove that its performance is now good - so we can close this ?

No performance test to back up my claim, but seek will not block anymore
waiting for either read/write to release the node lock. It would be good
to
run that workload Gleb ran (that reproduced the contention on seek; I
don't
remember which one exactly), and see if the contention on seek is indeed
gone.

It was cassandra with compressed tables, but I am not sure how relevant
all this now since the patch that removes locking created deadlock.

Gleb, it happens when taking the vnode lock to update the vnode size when
the file is being extended, on zfs_write. I'm trying to think about a
solution. How about adding a function to get the node size from the file
system instead? It's not the best of the solutions, but it would fix the
problem well.

Gleb.


Reply to this email directly or view it on GitHub
#451 (comment)
.

Raphael S. Carvalho

@gleb-cloudius
Copy link
Contributor

On Mon, Sep 29, 2014 at 05:54:52AM -0700, Raphael S.Carvalho wrote:

On Mon, Sep 29, 2014 at 9:51 AM, Gleb Natapov [email protected]
wrote:

On Mon, Sep 29, 2014 at 05:44:09AM -0700, Raphael S.Carvalho wrote:

On Mon, Sep 29, 2014 at 9:29 AM, slivne [email protected]
wrote:

@raphaelsc https://github.com/raphaelsc - ok, did we do any tests to
prove that its performance is now good - so we can close this ?

No performance test to back up my claim, but seek will not block anymore
waiting for either read/write to release the node lock. It would be good
to
run that workload Gleb ran (that reproduced the contention on seek; I
don't
remember which one exactly), and see if the contention on seek is indeed
gone.

It was cassandra with compressed tables, but I am not sure how relevant
all this now since the patch that removes locking created deadlock.

Gleb, it happens when taking the vnode lock to update the vnode size when
the file is being extended, on zfs_write. I'm trying to think about a
solution. How about adding a function to get the node size from the file
system instead? It's not the best of the solutions, but it would fix the
problem well.

If this does not break anything else... You will have to implement this
for all other FSes too.

        Gleb.

@wkozaczuk wkozaczuk added the vfs label Feb 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants