VFS: read()/write() lock the vnode for too long #450

nyh · 2014-08-11T14:49:10Z

As already noticed long ago in commit 907e633, our VFS implementation (copied from Prex) has a serious problem in vfs_file::read() and vfs_file::write(): These unduely hold the vnode lock for the entire duration of read() or write call.

In commit 907e633, we noticed one bad result of this fact: While one thread is blocked read()ing from /dev/console, another thread cannot write() to it! This is completely broken behavior, and the aforementioned commit just worked around it (not open /dev/console) instead of fixing it.

Today, the same bug is causing us slowdown in multi-cpu Cassandra runs: Multiple threads hold several fd's pointing to the same on-disk file, and doing lseek(); read() on those fd's concurrently. Because we hold the vnode lock throughout the entire read() call, we basically serialize the calls to read() instead of doing much of the work in parallel (and batching I/O).

The text was updated successfully, but these errors were encountered:

nyh · 2014-08-11T15:39:04Z

Gleb rightly pointed out that in the Cassandra benchmark, after the warmup all the data is cached in memory so read()s do not involve any actual I/O, and aren't supposed to block or be especially slow (just to copy the data from the ARC cache, which is pretty short).

In that case, trying to hold the vnode lock for slightly shorter durations might not be very helpful. But it is definitely important when reads may block (/dev/console is an extreme example, but in this case it is probably a mistake that we continue to go through the VFS layer after the first open).

raphaelsc · 2014-08-27T06:44:40Z

It's possible that b5eadc3 should close this issue.

UPDATE: Actually, the issue shouldn't be closed as the commit above only addresses ZFS.

slivne · 2014-09-29T08:11:17Z

@raphaelsc, @nyh for the Cassandra case we have solved the issue - if so we should close this bug as it was created for cassandra - if for other cases there is a differenr issue letcs create an issue for that

nyh · 2016-06-07T11:18:42Z

Please do not close this issue - there's nothing Cassandra-specific in it, and it is still very much an issue.

Use: perf callstack tracepoint to list frequent callstacks for a tracepoint (from 'perf list')

wkozaczuk · 2021-06-30T21:01:57Z

The comment #504 (comment) could be helpful to track the attempt to address this issue and possibly re-use the original commit b5eadc3 and fix the bugs in it.

nyh mentioned this issue Aug 11, 2014

VFS: I think lseek() shouldn't really be a VFS operation. #451

Open

wkozaczuk referenced this issue Jul 5, 2018

cli: add 'perf callstack'

219dcfd

Use: perf callstack tracepoint to list frequent callstacks for a tracepoint (from 'perf list')

wkozaczuk mentioned this issue Dec 11, 2023

Implement nvme driver #1284

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VFS: read()/write() lock the vnode for too long #450

VFS: read()/write() lock the vnode for too long #450

nyh commented Aug 11, 2014

nyh commented Aug 11, 2014

raphaelsc commented Aug 27, 2014

slivne commented Sep 29, 2014

nyh commented Jun 7, 2016

wkozaczuk commented Jun 30, 2021

VFS: read()/write() lock the vnode for too long #450

VFS: read()/write() lock the vnode for too long #450

Comments

nyh commented Aug 11, 2014

nyh commented Aug 11, 2014

raphaelsc commented Aug 27, 2014

slivne commented Sep 29, 2014

nyh commented Jun 7, 2016

wkozaczuk commented Jun 30, 2021