Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fs: synchronize close with other I/O for streams #30837

Closed
wants to merge 2 commits into from

Conversation

addaleax
Copy link
Member

@addaleax addaleax commented Dec 7, 2019

Part of the flakiness in the
parallel/test-readline-async-iterators-destroy test comes from
fs streams starting _read() and _destroy() without waiting
for the other to finish, which can lead to the fs.read() call
resulting in EBADF if timing is bad.

Fix this by synchronizing write and read operations with close().

Refs: #30660

/cc @ronag

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • commit message follows commit guidelines

@nodejs-github-bot nodejs-github-bot added the fs Issues and PRs related to the fs subsystem / file system. label Dec 7, 2019
Copy link
Member

@ronag ronag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM makes sense as long as it's considered semver-minor.

This would become unnecessary and reverted through #29656.

lib/internal/fs/streams.js Outdated Show resolved Hide resolved
fs.write(this.fd, data, 0, data.length, this.pos, (er, bytes) => {
// Return early if this stream has been destroyed. The close() call inside
// _destroy() may cause errors when writing and we don't want to emit those.
if (this.destroyed) return cb();

This comment was marked as outdated.

Copy link
Member

@ronag ronag Dec 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had a long conversation about swallowing errors after destroy() here, #29197. The consensus was that we should not swallow until after 'close'.

This comment was marked as outdated.

@ronag

This comment has been minimized.

@ronag

This comment has been minimized.

@addaleax
Copy link
Member Author

addaleax commented Dec 7, 2019

@ronag Thinking about this more, the EBADF indicates that we should probably try to avoid this race condition altogether and only call fs.close() until after a fs.read() or fs.write() operation has finished … I’ll update the PR along those lines

@addaleax addaleax changed the title fs: ignore fs operations and errors for destroyed streams fs: synchronize close with other I/O for streams Dec 7, 2019
@addaleax
Copy link
Member Author

addaleax commented Dec 7, 2019

@ronag PTAL

lib/internal/fs/streams.js Outdated Show resolved Hide resolved
lib/internal/fs/streams.js Outdated Show resolved Hide resolved
lib/internal/fs/streams.js Outdated Show resolved Hide resolved
lib/internal/fs/streams.js Outdated Show resolved Hide resolved
lib/internal/fs/streams.js Outdated Show resolved Hide resolved
Copy link
Member

@lpinca lpinca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM with @ronag's suggestions.

@nodejs-github-bot
Copy link
Collaborator

lib/internal/fs/streams.js Outdated Show resolved Hide resolved
Copy link
Member

@ronag ronag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ronag
Copy link
Member

ronag commented Dec 7, 2019

Should we apply the same fixes to net.Socket?

@ronag ronag mentioned this pull request Dec 7, 2019
4 tasks
@@ -339,7 +356,17 @@ WriteStream.prototype._write = function(data, encoding, cb) {
});
}

if (this.destroyed) return cb();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might need to be a cb(new ERR_STREAM_DESTROYED('write'))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? There hasn’t been any error, has there?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The write hasn't completed. If we don't send an error here the caller would think the write has completed even though it hasn't.

See, https://github.com/nodejs/node/blob/master/lib/_stream_writable.js#L821

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m worried that introducing an error when there previously was none (or at least not usually) would be semver-major, and I’d prefer to keep this PR as close to just being a fix for the bug as possible.

Copy link
Member

@ronag ronag Dec 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's only if called with a callback (which is very unusual) in which case it's actually a bug if it's not an error.

Though if you are worried about it I guess we can leave it as is. It's an unusual edge case after all. Could we at least have a separate semver-major PR for "correct" behaviour?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a correctness issue I don't think it needs to be semver-major?

Maybe it doesn’t need to be, but I still feel like these are two very different things…

  • This PR addresses a race condition that can occur randomly and surprisingly. Fixing it removes unexpected errors.
  • Adding an error to the callback makes behaviour more consistent, but it would add unexpected errors.

I’d rather not mix the two, and I think we’ve treated other situations where we add errors for consistency as semver-major in the past (and I don’t really see any reason not to do that here, too).

Copy link
Member

@bnoordhuis bnoordhuis Dec 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that follows, does it? Right now you'll receive an EBADF error event. While the EBADF error is, er, in error, it at least tells you that the write didn't go through.

(I suppose it could also end up writing to a different file if the fd was reopened in the mean time, which of course is - edit: a lot - worse than what this PR does.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess my reasoning is mostly that this is only relevant when the stream has already been destroyed at this point, and so it should be expected that writes may not finish?

If you feel strongly, I’ll apply @ronag’s suggestion, but I’m still a bit worried about breakage.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I see it is that you already receive unpredictable EBADF errors now. A predictable error is better, and certainly better than silently dropping data on the floor.

Another way of looking at it: how likely is this change to break existing, functionally correct code? I expect the answer is 'close to zero' - any code that breaks was probably already broken, just not reliably so.

Does that sound reasonable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve pushed a commit with the suggestion … still feeling a bit worried about it but we’ll see if this is problematic

ronag added a commit to nxtedition/node that referenced this pull request Dec 7, 2019
_write and _read can be called from 'connect' after
Socket.destroy() has been called. This should be a noop.

Refs: nodejs#30837
@@ -339,7 +356,17 @@ WriteStream.prototype._write = function(data, encoding, cb) {
});
}

if (this.destroyed) return cb();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I share @ronag's concern though: it's tantamount to silently ignoring the write request from the user.

Since this is a correctness issue I don't think it needs to be semver-major?

fs.read(this.fd, pool, pool.used, toRead, this.pos, (er, bytesRead) => {
this[kIsPerformingIO] = false;
// Tell ._destroy() that it's safe to close the fd now.
if (this.destroyed) return this.emit(kIoDone, er);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is observable when emit() is monkey-patched, which isn't entirely uncommon. Not a reason per se not to introduce this pattern (it's pretty elegant) but I thought I'd point it out anyway.

@ronag
Copy link
Member

ronag commented Dec 8, 2019

Just a thought. What if for whatever reason the io doesn’t complete? Do we need a timeout? Or does libuv handle that?

@addaleax
Copy link
Member Author

addaleax commented Dec 8, 2019

@ronag In that case, this PR delays the close() call along with it. I don’t think that’s a bad thing, though.

@ronag
Copy link
Member

ronag commented Dec 9, 2019

I would like to ask for @mcollina's take on this before merging. See, #30864 (comment).

@nodejs-github-bot
Copy link
Collaborator

Part of the flakiness in the
parallel/test-readline-async-iterators-destroy test comes from
fs streams starting `_read()` and `_destroy()` without waiting
for the other to finish, which can lead to the `fs.read()` call
resulting in `EBADF` if timing is bad.

Fix this by synchronizing write and read operations with `close()`.

Refs: nodejs#30660
@nodejs-github-bot
Copy link
Collaborator

@addaleax
Copy link
Member Author

addaleax commented Dec 9, 2019

@ronag I guess we can do that, but at the same time I think we should fix this bug.

@ronag
Copy link
Member

ronag commented Dec 9, 2019

@ronag I guess we can do that, but at the same time I think we should fix this bug.

Given @mcollina's answer in the linked comment I'm not sure he would agree with this PR. Another valid (?) solution is also to just add on('error') handlers in the failing test.

@addaleax
Copy link
Member Author

@ronag Let’s wait for him to comment. In my opinion EBADF just because of using async iterators over a file stream is very clearly a bug.

@nodejs-github-bot
Copy link
Collaborator

@ronag
Copy link
Member

ronag commented Dec 10, 2019

I mostly agree. My personal concern is whether this fix should apply everywhere (net, http, http2, quic etc...) or whether fs is a somehow an edge case?

just because of using async iterators over a file stream is very clearly a bug.

This is a bit strange for me though. Shouldn't the iterator be released once leaving the for block? How can a released async iterator cause an exception at destroy()? @mcollina @benjamingr what is the expect semantics of async iterators here? https://github.com/nodejs/node/blob/master/test/parallel/test-readline-async-iterators-destroy.js I would expect the iterator to be released and release any error listeners.

The readable stream will of course still 'error' (which might or might not make sense) and an error listener should be registered regardless?

@mcollina
Copy link
Member

I'm a bit conflicted by this change.
On one hand It seems clear that we should protect the call to _destroy() to when there is no I/O operation happening. On the other hand, destroy() should be a "safe" operation and waiting for asynchronous completion seems a bit off.

I think we should consider making the callback of destroy(err, cb) documented and part of the official API.

What do you think?

Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ronag
Copy link
Member

ronag commented Dec 10, 2019

On one hand It seems clear that we should protect the call to _destroy() to when there is no I/O operation happening. On the other hand, destroy() should be a "safe" operation and waiting for asynchronous completion seems a bit off.

Yes, that's a bit contradictory. I guess it is "safe" to wait for I/O if we're under the assumption that it will always complete (or fail) within reasonable time, otherwise we might end up with a stuck stream without means to abort it, e.g. a socket trying to write to a server which is bugged/crashed/corrupt? It might be a case by case basis. In fs I would guess it's very unlikely it would not complete within reasonable time (if you exclude FUSE). Also depends on whether e.g. libuv or the os has some sort of timeout or error handling for this.

I think we should consider making the callback of destroy(err, cb) documented and part of the official API.

I think it should be part of the official API. Not sure how that helps us here though? Also, before making it public we should probably ensure the cb is invoked asynchronously (which is not the case today).

@mcollina
Copy link
Member

Yes, that's a bit contradictory. I guess it is "safe" to wait for I/O if we're under the assumption that it will always complete (or fail) within reasonable time, otherwise we might end up with a stuck stream without means to abort it, e.g. a socket trying to write to a server which is bugged/crashed/corrupt? It might be a case by case basis. In fs I would guess it's very unlikely it would not complete within reasonable time (if you exclude FUSE). Also depends on whether e.g. libuv or the os has some sort of timeout or error handling for this.

I agree.

I think it should be part of the official API. Not sure how that helps us here though? Also, before making it public we should probably ensure the cb is invoked asynchronously (which is not the case today).

I think documenting that is enough.

@addaleax
Copy link
Member Author

I'm a bit conflicted by this change.
On one hand It seems clear that we should protect the call to _destroy() to when there is no I/O operation happening.

I’m not sure if that’s always the case, but here it’s definitely problematic. Getting EBADF is bad enough but I think it’s even possible that data is read or written from the wrong file here if the race condition timing works out really badly.

On the other hand, destroy() should be a "safe" operation and waiting for asynchronous completion seems a bit off.

I think we should consider making the callback of destroy(err, cb) documented and part of the official API.

I’d be okay with that, yes 👍

addaleax added a commit that referenced this pull request Dec 10, 2019
Part of the flakiness in the
parallel/test-readline-async-iterators-destroy test comes from
fs streams starting `_read()` and `_destroy()` without waiting
for the other to finish, which can lead to the `fs.read()` call
resulting in `EBADF` if timing is bad.

Fix this by synchronizing write and read operations with `close()`.

Refs: #30660

PR-URL: #30837
Reviewed-By: Luigi Pinca <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
@addaleax
Copy link
Member Author

Landed in 8a5c7f6

@addaleax addaleax closed this Dec 10, 2019
@addaleax addaleax deleted the fs-readline-test branch December 10, 2019 15:06
targos pushed a commit that referenced this pull request Dec 11, 2019
Part of the flakiness in the
parallel/test-readline-async-iterators-destroy test comes from
fs streams starting `_read()` and `_destroy()` without waiting
for the other to finish, which can lead to the `fs.read()` call
resulting in `EBADF` if timing is bad.

Fix this by synchronizing write and read operations with `close()`.

Refs: #30660

PR-URL: #30837
Reviewed-By: Luigi Pinca <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
@MylesBorins MylesBorins mentioned this pull request Dec 13, 2019
targos pushed a commit that referenced this pull request Jan 14, 2020
Part of the flakiness in the
parallel/test-readline-async-iterators-destroy test comes from
fs streams starting `_read()` and `_destroy()` without waiting
for the other to finish, which can lead to the `fs.read()` call
resulting in `EBADF` if timing is bad.

Fix this by synchronizing write and read operations with `close()`.

Refs: #30660

PR-URL: #30837
Reviewed-By: Luigi Pinca <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
BethGriggs pushed a commit that referenced this pull request Feb 6, 2020
Part of the flakiness in the
parallel/test-readline-async-iterators-destroy test comes from
fs streams starting `_read()` and `_destroy()` without waiting
for the other to finish, which can lead to the `fs.read()` call
resulting in `EBADF` if timing is bad.

Fix this by synchronizing write and read operations with `close()`.

Refs: #30660

PR-URL: #30837
Reviewed-By: Luigi Pinca <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
@MylesBorins MylesBorins mentioned this pull request Feb 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fs Issues and PRs related to the fs subsystem / file system.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants