Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VFS: read()/write() lock the vnode for too long #450

Open
nyh opened this issue Aug 11, 2014 · 5 comments
Open

VFS: read()/write() lock the vnode for too long #450

nyh opened this issue Aug 11, 2014 · 5 comments

Comments

@nyh
Copy link
Contributor

nyh commented Aug 11, 2014

As already noticed long ago in commit 907e633, our VFS implementation (copied from Prex) has a serious problem in vfs_file::read() and vfs_file::write(): These unduely hold the vnode lock for the entire duration of read() or write call.

In commit 907e633, we noticed one bad result of this fact: While one thread is blocked read()ing from /dev/console, another thread cannot write() to it! This is completely broken behavior, and the aforementioned commit just worked around it (not open /dev/console) instead of fixing it.

Today, the same bug is causing us slowdown in multi-cpu Cassandra runs: Multiple threads hold several fd's pointing to the same on-disk file, and doing lseek(); read() on those fd's concurrently. Because we hold the vnode lock throughout the entire read() call, we basically serialize the calls to read() instead of doing much of the work in parallel (and batching I/O).

@nyh
Copy link
Contributor Author

nyh commented Aug 11, 2014

Gleb rightly pointed out that in the Cassandra benchmark, after the warmup all the data is cached in memory so read()s do not involve any actual I/O, and aren't supposed to block or be especially slow (just to copy the data from the ARC cache, which is pretty short).

In that case, trying to hold the vnode lock for slightly shorter durations might not be very helpful. But it is definitely important when reads may block (/dev/console is an extreme example, but in this case it is probably a mistake that we continue to go through the VFS layer after the first open).

@raphaelsc
Copy link
Member

It's possible that b5eadc3 should close this issue.

UPDATE: Actually, the issue shouldn't be closed as the commit above only addresses ZFS.

@slivne
Copy link
Contributor

slivne commented Sep 29, 2014

@raphaelsc, @nyh for the Cassandra case we have solved the issue - if so we should close this bug as it was created for cassandra - if for other cases there is a differenr issue letcs create an issue for that

@nyh
Copy link
Contributor Author

nyh commented Jun 7, 2016

Please do not close this issue - there's nothing Cassandra-specific in it, and it is still very much an issue.

wkozaczuk referenced this issue Jul 5, 2018
Use:

  perf callstack tracepoint

to list frequent callstacks for a tracepoint (from 'perf list')
@wkozaczuk
Copy link
Collaborator

The comment #504 (comment) could be helpful to track the attempt to address this issue and possibly re-use the original commit b5eadc3 and fix the bugs in it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants