-
Notifications
You must be signed in to change notification settings - Fork 29.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix JS Stream Socket finishShutdown crash #49400
Fix JS Stream Socket finishShutdown crash #49400
Conversation
Review requested:
|
8851f62
to
3131da6
Compare
A JS stream socket wraps a stream, exposing it as a socket for something on top which needs a socket specifically (e.g. an HTTP server). If the internal stream is closed in the same tick as the layer on top attempts to close this stream, the race between doShutdown and doClose results in an uncatchable exception. A similar race can happen with doClose and doWrite. It seems legitimate these can happen in parallel, so this resolves that by explicitly detecting and handling that situation: if a close is in progress, both doShutdown & doWrite allow doClose to run finishShutdown/Write for them, cancelling the operation, without trying to use this._handle (which will be null) in the meantime.
3131da6
to
fb9611b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There's two CI failures here:
|
Ok, that's somewhat clearer:
Given that, I think that means this is good to go, and CI just needs nudging until it passes. I'll leave it here, but let me know if there's anything else I can do to help get this merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Commit Queue failed- Loading data for nodejs/node/pull/49400 ✔ Done loading data for nodejs/node/pull/49400 ----------------------------------- PR info ------------------------------------ Title Fix JS Stream Socket finishShutdown crash (#49400) ⚠ Could not retrieve the email or name of the PR author's from user's GitHub profile! Branch pimterry:fix-js-stream-socket-finishShutdown-crash -> nodejs:main Labels author ready Commits 2 - net: fix crash due to simultaneous close/shutdown on JS Stream Sockets - net: use asserts in JS Socket Stream to catch races in future Committers 1 - Tim Perry PR-URL: https://github.com/nodejs/node/pull/49400 Reviewed-By: Matteo Collina Reviewed-By: Luigi Pinca ------------------------------ Generated metadata ------------------------------ PR-URL: https://github.com/nodejs/node/pull/49400 Reviewed-By: Matteo Collina Reviewed-By: Luigi Pinca -------------------------------------------------------------------------------- ℹ This PR was created on Tue, 29 Aug 2023 10:04:35 GMT ✔ Approvals: 2 ✔ - Matteo Collina (@mcollina) (TSC): https://github.com/nodejs/node/pull/49400#pullrequestreview-1603592553 ✔ - Luigi Pinca (@lpinca): https://github.com/nodejs/node/pull/49400#pullrequestreview-1600355896 ✔ Last GitHub CI successful ℹ Last Full PR CI on 2023-08-30T16:27:57Z: https://ci.nodejs.org/job/node-test-pull-request/53647/ - Querying data for job/node-test-pull-request/53647/ ✔ Last Jenkins CI successful -------------------------------------------------------------------------------- ✔ No git cherry-pick in progress ✔ No git am in progress ✔ No git rebase in progress -------------------------------------------------------------------------------- - Bringing origin/main up to date... From https://github.com/nodejs/node * branch main -> FETCH_HEAD ✔ origin/main is now up-to-date - Downloading patch for 49400 From https://github.com/nodejs/node * branch refs/pull/49400/merge -> FETCH_HEAD ✔ Fetched commits as b781eaf4309a..fb9611bd5d29 -------------------------------------------------------------------------------- [main 775d624f19] net: fix crash due to simultaneous close/shutdown on JS Stream Sockets Author: Tim Perry Date: Thu Aug 24 16:05:02 2023 +0100 2 files changed, 91 insertions(+) create mode 100644 test/parallel/test-http2-client-connection-tunnelling.js [main 316fc7cb68] net: use asserts in JS Socket Stream to catch races in future Author: Tim Perry Date: Fri Aug 25 14:16:35 2023 +0100 1 file changed, 3 insertions(+) ✔ Patches applied There are 2 commits in the PR. Attempting autorebase. Rebasing (2/4)https://github.com/nodejs/node/actions/runs/6035974708 |
Landed in f863117...47add7e |
PR-URL: #49400 Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: Luigi Pinca <[email protected]>
A JS stream socket wraps a stream, exposing it as a socket for something on top which needs a socket specifically (e.g. an HTTP server). If the internal stream is closed in the same tick as the layer on top attempts to close this stream, the race between doShutdown and doClose results in an uncatchable exception. A similar race can happen with doClose and doWrite. It seems legitimate these can happen in parallel, so this resolves that by explicitly detecting and handling that situation: if a close is in progress, both doShutdown & doWrite allow doClose to run finishShutdown/Write for them, cancelling the operation, without trying to use this._handle (which will be null) in the meantime. PR-URL: nodejs#49400 Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: Luigi Pinca <[email protected]>
PR-URL: nodejs#49400 Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: Luigi Pinca <[email protected]>
A JS stream socket wraps a stream, exposing it as a socket for something on top which needs a socket specifically (e.g. an HTTP server). If the internal stream is closed in the same tick as the layer on top attempts to close this stream, the race between doShutdown and doClose results in an uncatchable exception. A similar race can happen with doClose and doWrite. It seems legitimate these can happen in parallel, so this resolves that by explicitly detecting and handling that situation: if a close is in progress, both doShutdown & doWrite allow doClose to run finishShutdown/Write for them, cancelling the operation, without trying to use this._handle (which will be null) in the meantime. PR-URL: #49400 Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: Luigi Pinca <[email protected]>
PR-URL: #49400 Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: Luigi Pinca <[email protected]>
* chore: upgrade to Node.js v20 * src: allow embedders to override NODE_MODULE_VERSION nodejs/node#49279 * src: fix missing trailing , nodejs/node#46909 * src,tools: initialize cppgc nodejs/node#45704 * tools: allow passing absolute path of config.gypi in js2c nodejs/node#49162 * tools: port js2c.py to C++ nodejs/node#46997 * doc,lib: disambiguate the old term, NativeModule nodejs/node#45673 * chore: fixup Node.js BSSL tests * nodejs/node#49492 * nodejs/node#44498 * deps: upgrade to libuv 1.45.0 nodejs/node#48078 * deps: update V8 to 10.7 nodejs/node#44741 * test: use gcUntil() in test-v8-serialize-leak nodejs/node#49168 * module: make CJS load from ESM loader nodejs/node#47999 * src: make BuiltinLoader threadsafe and non-global nodejs/node#45942 * chore: address changes to CJS/ESM loading * module: make CJS load from ESM loader (nodejs/node#47999) * lib: improve esm resolve performance (nodejs/node#46652) * bootstrap: optimize modules loaded in the built-in snapshot nodejs/node#45849 * test: mark test-runner-output as flaky nodejs/node#49854 * lib: lazy-load deps in modules/run_main.js nodejs/node#45849 * url: use private properties for brand check nodejs/node#46904 * test: refactor `test-node-output-errors` nodejs/node#48992 * assert: deprecate callTracker nodejs/node#47740 * src: cast v8::Object::GetInternalField() return value to v8::Value nodejs/node#48943 * test: adapt test-v8-stats for V8 update nodejs/node#45230 * tls: ensure TLS Sockets are closed if the underlying wrap closes nodejs/node#49327 * test: deflake test-tls-socket-close nodejs/node#49575 * net: fix crash due to simultaneous close/shutdown on JS Stream Sockets nodejs/node#49400 * net: use asserts in JS Socket Stream to catch races in future nodejs/node#49400 * lib: fix BroadcastChannel initialization location nodejs/node#46864 * src: create BaseObject with node::Realm nodejs/node#44348 * src: implement DataQueue and non-memory resident Blob nodejs/node#45258 * sea: add support for V8 bytecode-only caching nodejs/node#48191 * chore: fixup patch indices * gyp: put filenames in variables nodejs/node#46965 * build: modify js2c.py into GN executable * fix: (WIP) handle string replacement of fs -> original-fs * [v20.x] backport vm-related memory fixes nodejs/node#49874 * src: make BuiltinLoader threadsafe and non-global nodejs/node#45942 * src: avoid copying string in fs_permission nodejs/node#47746 * look upon my works ye mighty and dispair * chore: patch cleanup * [api] Remove AllCan Read/Write https://chromium-review.googlesource.com/c/v8/v8/+/5006387 * fix: missing include for NODE_EXTERN * chore: fixup patch indices * fix: fail properly when js2c fails in Node.js * build: fix js2c root_gen_dir * fix: lib/fs.js -> lib/original-fs.js * build: fix original-fs file xforms * fixup! module: make CJS load from ESM loader * build: get rid of CppHeap for now * build: add patch to prevent extra fs lookup on esm load * build: greatly simplify js2c modifications Moves our original-fs modifications back into a super simple python script action, wires up the output of that action into our call to js2c * chore: update to handle moved internal/modules/helpers file * test: update @types/node test * feat: enable preventing cppgc heap creation * feat: optionally prevent calling V8::EnableWebAssemblyTrapHandler * fix: no cppgc initialization in the renderer * gyp: put filenames in variables nodejs/node#46965 * test: disable single executable tests * fix: nan tests failing on node headers missing file * tls,http2: send fatal alert on ALPN mismatch nodejs/node#44031 * test: disable snapshot tests * nodejs/node#47887 * nodejs/node#49684 * nodejs/node#44193 * build: use deps/v8 for v8/tools Node.js hard depends on these in their builtins * test: fix edge snapshot stack traces nodejs/node#49659 * build: remove js2c //base dep * build: use electron_js2c_toolchain to build node_js2c * fix: don't create SafeSet outside packageResolve Fixes failure in parallel/test-require-delete-array-iterator: === release test-require-delete-array-iterator === Path: parallel/test-require-delete-array-iterator node:internal/per_context/primordials:426 constructor(i) { super(i); } // eslint-disable-line no-useless-constructor ^ TypeError: object is not iterable (cannot read property Symbol(Symbol.iterator)) at new Set (<anonymous>) at new SafeSet (node:internal/per_context/primordials:426:22) * fix: failing crashReporter tests on Linux These were failing because our change from node::InitializeNodeWithArgs to node::InitializeOncePerProcess meant that we now inadvertently called PlatformInit, which reset signal handling. This meant that our intentional crash function ElectronBindings::Crash no longer worked and the renderer process no longer crashed when process.crash() was called. We don't want to use Node.js' default signal handling in the renderer process, so we disable it by passing kNoDefaultSignalHandling to node::InitializeOncePerProcess. * build: only create cppgc heap on non-32 bit platforms * chore: clean up util:CompileAndCall * src: fix compatility with upcoming V8 12.1 APIs nodejs/node#50709 * fix: use thread_local BuiltinLoader * chore: fixup v8 patch indices --------- Co-authored-by: Keeley Hammond <[email protected]> Co-authored-by: Samuel Attard <[email protected]>
* chore: upgrade to Node.js v20 * src: allow embedders to override NODE_MODULE_VERSION nodejs/node#49279 * src: fix missing trailing , nodejs/node#46909 * src,tools: initialize cppgc nodejs/node#45704 * tools: allow passing absolute path of config.gypi in js2c nodejs/node#49162 * tools: port js2c.py to C++ nodejs/node#46997 * doc,lib: disambiguate the old term, NativeModule nodejs/node#45673 * chore: fixup Node.js BSSL tests * nodejs/node#49492 * nodejs/node#44498 * deps: upgrade to libuv 1.45.0 nodejs/node#48078 * deps: update V8 to 10.7 nodejs/node#44741 * test: use gcUntil() in test-v8-serialize-leak nodejs/node#49168 * module: make CJS load from ESM loader nodejs/node#47999 * src: make BuiltinLoader threadsafe and non-global nodejs/node#45942 * chore: address changes to CJS/ESM loading * module: make CJS load from ESM loader (nodejs/node#47999) * lib: improve esm resolve performance (nodejs/node#46652) * bootstrap: optimize modules loaded in the built-in snapshot nodejs/node#45849 * test: mark test-runner-output as flaky nodejs/node#49854 * lib: lazy-load deps in modules/run_main.js nodejs/node#45849 * url: use private properties for brand check nodejs/node#46904 * test: refactor `test-node-output-errors` nodejs/node#48992 * assert: deprecate callTracker nodejs/node#47740 * src: cast v8::Object::GetInternalField() return value to v8::Value nodejs/node#48943 * test: adapt test-v8-stats for V8 update nodejs/node#45230 * tls: ensure TLS Sockets are closed if the underlying wrap closes nodejs/node#49327 * test: deflake test-tls-socket-close nodejs/node#49575 * net: fix crash due to simultaneous close/shutdown on JS Stream Sockets nodejs/node#49400 * net: use asserts in JS Socket Stream to catch races in future nodejs/node#49400 * lib: fix BroadcastChannel initialization location nodejs/node#46864 * src: create BaseObject with node::Realm nodejs/node#44348 * src: implement DataQueue and non-memory resident Blob nodejs/node#45258 * sea: add support for V8 bytecode-only caching nodejs/node#48191 * chore: fixup patch indices * gyp: put filenames in variables nodejs/node#46965 * build: modify js2c.py into GN executable * fix: (WIP) handle string replacement of fs -> original-fs * [v20.x] backport vm-related memory fixes nodejs/node#49874 * src: make BuiltinLoader threadsafe and non-global nodejs/node#45942 * src: avoid copying string in fs_permission nodejs/node#47746 * look upon my works ye mighty and dispair * chore: patch cleanup * [api] Remove AllCan Read/Write https://chromium-review.googlesource.com/c/v8/v8/+/5006387 * fix: missing include for NODE_EXTERN * chore: fixup patch indices * fix: fail properly when js2c fails in Node.js * build: fix js2c root_gen_dir * fix: lib/fs.js -> lib/original-fs.js * build: fix original-fs file xforms * fixup! module: make CJS load from ESM loader * build: get rid of CppHeap for now * build: add patch to prevent extra fs lookup on esm load * build: greatly simplify js2c modifications Moves our original-fs modifications back into a super simple python script action, wires up the output of that action into our call to js2c * chore: update to handle moved internal/modules/helpers file * test: update @types/node test * feat: enable preventing cppgc heap creation * feat: optionally prevent calling V8::EnableWebAssemblyTrapHandler * fix: no cppgc initialization in the renderer * gyp: put filenames in variables nodejs/node#46965 * test: disable single executable tests * fix: nan tests failing on node headers missing file * tls,http2: send fatal alert on ALPN mismatch nodejs/node#44031 * test: disable snapshot tests * nodejs/node#47887 * nodejs/node#49684 * nodejs/node#44193 * build: use deps/v8 for v8/tools Node.js hard depends on these in their builtins * test: fix edge snapshot stack traces nodejs/node#49659 * build: remove js2c //base dep * build: use electron_js2c_toolchain to build node_js2c * fix: don't create SafeSet outside packageResolve Fixes failure in parallel/test-require-delete-array-iterator: === release test-require-delete-array-iterator === Path: parallel/test-require-delete-array-iterator node:internal/per_context/primordials:426 constructor(i) { super(i); } // eslint-disable-line no-useless-constructor ^ TypeError: object is not iterable (cannot read property Symbol(Symbol.iterator)) at new Set (<anonymous>) at new SafeSet (node:internal/per_context/primordials:426:22) * fix: failing crashReporter tests on Linux These were failing because our change from node::InitializeNodeWithArgs to node::InitializeOncePerProcess meant that we now inadvertently called PlatformInit, which reset signal handling. This meant that our intentional crash function ElectronBindings::Crash no longer worked and the renderer process no longer crashed when process.crash() was called. We don't want to use Node.js' default signal handling in the renderer process, so we disable it by passing kNoDefaultSignalHandling to node::InitializeOncePerProcess. * build: only create cppgc heap on non-32 bit platforms * chore: clean up util:CompileAndCall * src: fix compatility with upcoming V8 12.1 APIs nodejs/node#50709 * fix: use thread_local BuiltinLoader * chore: fixup v8 patch indices --------- Co-authored-by: Keeley Hammond <[email protected]> Co-authored-by: Samuel Attard <[email protected]>
A JS stream socket wraps a stream, exposing it as a socket for something on top which needs a socket specifically (e.g. an HTTP server). If the internal stream is closed in the same tick as the layer on top attempts to close this stream, the race between doShutdown and doClose results in an uncatchable exception. A similar race can happen with doClose and doWrite. It seems legitimate these can happen in parallel, so this resolves that by explicitly detecting and handling that situation: if a close is in progress, both doShutdown & doWrite allow doClose to run finishShutdown/Write for them, cancelling the operation, without trying to use this._handle (which will be null) in the meantime. PR-URL: nodejs/node#49400 Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: Luigi Pinca <[email protected]>
PR-URL: nodejs/node#49400 Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: Luigi Pinca <[email protected]>
A JS stream socket wraps a stream, exposing it as a socket for something on top which needs a socket specifically (e.g. an HTTP server). If the internal stream is closed in the same tick as the layer on top attempts to close this stream, the race between doShutdown and doClose results in an uncatchable exception. A similar race can happen with doClose and doWrite. It seems legitimate these can happen in parallel, so this resolves that by explicitly detecting and handling that situation: if a close is in progress, both doShutdown & doWrite allow doClose to run finishShutdown/Write for them, cancelling the operation, without trying to use this._handle (which will be null) in the meantime. PR-URL: nodejs/node#49400 Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: Luigi Pinca <[email protected]>
PR-URL: nodejs/node#49400 Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: Luigi Pinca <[email protected]>
This fixes #48519
EDIT: and fixes #46094
This is independent of my other related fix #49327 but in most relevant scenarios you'll want both together. Each of these PRs has a test that catastrophically fails right now, but works correctly with the corresponding change included, but you'll likely want both since in production I find that either fix by itself ends up resulting in the other error crashing my application instead a little later on (i.e. these two separate issues are currently racing to crash my code).
There's a detailed breakdown of the crash fixed in this PR here. To summarize:
allowHalfOpen = false
set (this is common for TLS, AFAICT).doClose
, and also ends the TLS stream (which in turn closes the writable side, due toallowHalfOpen = false
) which callsdoShutdown
on the JSS. Effectively both sides try to shut down the JSS at the same time.doClose
runs first, callsthis.stream.destroy()
which setsthis._handle
to null, and schedules asetImmediate
to cancel any pending writes or shutdowns.doShutdown
then runs, picks upthis._handle
(now null), and then schedules afinishShutdown
that will callthis._handle.finishShutdown
(=uncatchable NPE)doClose
callback ran first, clearedkCurrentShutdownRequest
, and so the null pointer was never actually read, but effectively just by luck.finishShutdown
fromdoShutdown
runs first, and so does use its null pointer every time, crashing everything.Rather than just reverting to
setImmediate
or usingprocess.nextTick
indoClose
too, I've fixed this properly: now, if there is any race withdoClose
, the other methods recognize it, store thekCurrent{Shutdown,Write}Request
param to allowdoClose
to do its clean cancellation a moment later, and then they avoid reading or using the nullthis._handle
value entirely.That above example explains the
doShutdown
race. A similar race applies todoWrite
/finishWrite
too, which I can reliably reproduce in my application after fixing just the shutdown case, although I don't have a reliable test to repro this (#46094, #35695, #27258 and microsoft/vscode#188676 all appear to be examples of the same issue). With the equivalent fix included fordoWrite
here too, I can no longer reproduce that error either.With the fixes for both methods here plus #49327, I can proxy significant chunks of real HTTP/2 browser traffic through my app, including all the weird edge cases and errors of the web for at least an hour. Previously this crashed comfortably within a minute on Node 18+.
This also adds asserts to both affected methods, to more explicitly catch any other cases where this could happen (in both cases, if those asserts fail, it's very likely that there'll be an NPE next stick in the corresponding finishX methods).