VFS: I think lseek() shouldn't really be a VFS operation. #451

nyh · 2014-08-11T15:09:47Z

lseek() is a per-fd, not per-vnode operation.
Yet, our sys_lseek() (inherited from Prex, I believe) calls a vnode-specific VOP_SEEK, and holds the vnode lock while doing it.

If we look at implementations of these VOP_SEEK we see that zfs_seek() does some trivial checking and ramfs_seek is a null op. I think we should get rid of this VOP_SEEK. (TODO: check what Linux's VFS does regarding seek).

Currently, we're seeing a slowdown in Cassandra because several threads are doing lseek()/read() concurrently. But I think the problem is the long read() blocking (see issue #450), not the vnode lock in sys_lseek() which should be fairly harmless because it is held for such short durations. I'm not sure whether, even with the change suggested here, we can avoid the (very short-term) vnode lock in sys_lseek(): We may need this lock if we find ourselves needing to check the validity of the file type for example. But in most cases, it is better to avoid the vnode layer at all. E.g.,., I think after someone opens "/dev/console", we don't need to remember this name - the fd can have all the file-operations of the console device - it doesn't need the vfs file-operations and doesn't need "vnode"

slivne · 2014-09-29T08:08:29Z

@raphaelsc did we update the seek operation ?

raphaelsc · 2014-09-29T12:15:01Z

On Mon, Sep 29, 2014 at 5:08 AM, slivne [email protected] wrote:

@raphaelsc https://github.com/raphaelsc did we update the seek
operation ?

No, but it shouldn't be a performance issue anymore given that seek will
proceed quickly when there is a concurrent read/write on the same node.

—
Reply to this email directly or view it on GitHub
#451 (comment)
.

Raphael S. Carvalho

slivne · 2014-09-29T12:29:18Z

@raphaelsc - ok, did we do any tests to prove that its performance is now good - so we can close this ?

raphaelsc · 2014-09-29T12:44:01Z

On Mon, Sep 29, 2014 at 9:29 AM, slivne [email protected] wrote:

@raphaelsc https://github.com/raphaelsc - ok, did we do any tests to
prove that its performance is now good - so we can close this ?

No performance test to back up my claim, but seek will not block anymore
waiting for either read/write to release the node lock. It would be good to
run that workload Gleb ran (that reproduced the contention on seek; I don't
remember which one exactly), and see if the contention on seek is indeed
gone.

—
Reply to this email directly or view it on GitHub
#451 (comment)
.

Raphael S. Carvalho

gleb-cloudius · 2014-09-29T12:51:27Z

On Mon, Sep 29, 2014 at 05:44:09AM -0700, Raphael S.Carvalho wrote:

On Mon, Sep 29, 2014 at 9:29 AM, slivne [email protected] wrote:

@raphaelsc https://github.com/raphaelsc - ok, did we do any tests to
prove that its performance is now good - so we can close this ?

No performance test to back up my claim, but seek will not block anymore
waiting for either read/write to release the node lock. It would be good to
run that workload Gleb ran (that reproduced the contention on seek; I don't
remember which one exactly), and see if the contention on seek is indeed
gone.

It was cassandra with compressed tables, but I am not sure how relevant
all this now since the patch that removes locking created deadlock.

        Gleb.

raphaelsc · 2014-09-29T12:54:44Z

On Mon, Sep 29, 2014 at 9:51 AM, Gleb Natapov [email protected]
wrote:

On Mon, Sep 29, 2014 at 05:44:09AM -0700, Raphael S.Carvalho wrote:

On Mon, Sep 29, 2014 at 9:29 AM, slivne [email protected]
wrote:

@raphaelsc https://github.com/raphaelsc - ok, did we do any tests to
prove that its performance is now good - so we can close this ?

No performance test to back up my claim, but seek will not block anymore
waiting for either read/write to release the node lock. It would be good
to
run that workload Gleb ran (that reproduced the contention on seek; I
don't
remember which one exactly), and see if the contention on seek is indeed
gone.

It was cassandra with compressed tables, but I am not sure how relevant
all this now since the patch that removes locking created deadlock.

Gleb, it happens when taking the vnode lock to update the vnode size when
the file is being extended, on zfs_write. I'm trying to think about a
solution. How about adding a function to get the node size from the file
system instead? It's not the best of the solutions, but it would fix the
problem well.

Gleb.

—
Reply to this email directly or view it on GitHub
#451 (comment)
.

Raphael S. Carvalho

gleb-cloudius · 2014-09-29T14:24:02Z

On Mon, Sep 29, 2014 at 05:54:52AM -0700, Raphael S.Carvalho wrote:

On Mon, Sep 29, 2014 at 9:51 AM, Gleb Natapov [email protected]
wrote:

On Mon, Sep 29, 2014 at 05:44:09AM -0700, Raphael S.Carvalho wrote:

On Mon, Sep 29, 2014 at 9:29 AM, slivne [email protected]
wrote:

@raphaelsc https://github.com/raphaelsc - ok, did we do any tests to
prove that its performance is now good - so we can close this ?

No performance test to back up my claim, but seek will not block anymore
waiting for either read/write to release the node lock. It would be good
to
run that workload Gleb ran (that reproduced the contention on seek; I
don't
remember which one exactly), and see if the contention on seek is indeed
gone.

It was cassandra with compressed tables, but I am not sure how relevant
all this now since the patch that removes locking created deadlock.

Gleb, it happens when taking the vnode lock to update the vnode size when
the file is being extended, on zfs_write. I'm trying to think about a
solution. How about adding a function to get the node size from the file
system instead? It's not the best of the solutions, but it would fix the
problem well.

If this does not break anything else... You will have to implement this
for all other FSes too.

        Gleb.

wkozaczuk added the vfs label Feb 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VFS: I think lseek() shouldn't really be a VFS operation. #451

VFS: I think lseek() shouldn't really be a VFS operation. #451

nyh commented Aug 11, 2014

slivne commented Sep 29, 2014

raphaelsc commented Sep 29, 2014

slivne commented Sep 29, 2014

raphaelsc commented Sep 29, 2014

gleb-cloudius commented Sep 29, 2014

raphaelsc commented Sep 29, 2014

gleb-cloudius commented Sep 29, 2014

VFS: I think lseek() shouldn't really be a VFS operation. #451

VFS: I think lseek() shouldn't really be a VFS operation. #451

Comments

nyh commented Aug 11, 2014

slivne commented Sep 29, 2014

raphaelsc commented Sep 29, 2014

slivne commented Sep 29, 2014

raphaelsc commented Sep 29, 2014

gleb-cloudius commented Sep 29, 2014

raphaelsc commented Sep 29, 2014

gleb-cloudius commented Sep 29, 2014