Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node 20 segfaults when our loader runs #50280

Closed
bmacnaughton opened this issue Oct 19, 2023 · 46 comments
Closed

node 20 segfaults when our loader runs #50280

bmacnaughton opened this issue Oct 19, 2023 · 46 comments
Labels
addons Issues and PRs related to native addons. loaders Issues and PRs related to ES module loaders

Comments

@bmacnaughton
Copy link
Contributor

bmacnaughton commented Oct 19, 2023

Version

v20.8.0

Platform

Linux wsl 5.15.90.1-microsoft-standard-WSL2 #1 SMP Fri Jan 27 02:56:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Subsystem

module loaders

What steps will reproduce the bug?

Unfortunately, I am unable to reproduce this in a simple application; I have spent a lot of time trying to do so but am obviously missing something. I'll try to explain what I see.

we execute as a loader, node --loader our-loader user-application.

Prior to node 20, the loaders ran in the same thread as main. Our-loader initializes "data", which includes functions, in global. It then transforms user-application such that the "data" in global is referenced. Because the loader thread does not share global with the main thread, an error is thrown when referencing properties in "data". (This is easily reproducible, but I have not been able to create a small test case that reproduces the segfault.)

The "SyntaxError" that is thrown is our check to verify that the required "data" is in global. If I fake up "data" using globalPreload or register(), it will get past this check but will die when "data" is really needed.

How often does it reproduce? Is there a required condition?

Every time on node 20.0+

What is the expected behavior? Why is that the expected behavior?

No segfault.

What do you see instead?

 node --loader ./dist/esm-loader.mjs  test/_servers/express.mjs
(node:24127) ExperimentalWarning: Custom ESM Loaders is an experimental feature and might change at any time
(Use `node --trace-warnings ...` to show where the warning was created)
(node:24127) ExperimentalWarning: Custom ESM Loaders is an experimental feature and might change at any time
(Use `node --trace-warnings ...` to show where the warning was created)
[--loader load() hook: test/_servers/express.mjs, module, id: 1]
(node:24127) ExperimentalWarning: Custom ESM Loaders is an experimental feature and might change at any time
(Use `node --trace-warnings ...` to show where the warning was created)
[--loader load() hook: node:child_process, builtin, id: 1]
[--loader load() hook: node:fs, builtin, id: 1]
file:///patch-thing/test/_servers/express.mjs:3
  throw new SyntaxError("data undefined during compilation");
        ^

SyntaxError: data undefined during compilation
    at file:///patch-thing/test/_servers/express.mjs:3:9
    at file:///patch-thing/test/_servers/express.mjs:4:3
    at ModuleJob.run (node:internal/modules/esm/module_job:192:25)

Node.js v20.0.0
Segmentation fault

Additional information

I submitted this not-very-good bug report because I've wrestled with this for a while now and have not been able to create a simple test case. The fact that it segfaults led me to submit this anyway with the hopes that someone that's been working on this can say "ok, this is the problem".

I have tried using --import rather than --loader and get the same result. I'm working on changing our code to use register() but doesn't seem likely to address the issue of global not being available in a worker-thread.

I am very interested in any suggestions on how to handle this. Maybe we need to start forcing import and/or require into the transformed user-application in order to reproduce global? We've tried to avoid that but maybe it's no longer possible.

@marco-ippolito marco-ippolito added the test_runner Issues and PRs related to the test runner subsystem. label Oct 19, 2023
@targos
Copy link
Member

targos commented Oct 19, 2023

Can you run it with gdb and post more information about the state when it segfaults? For example a stack trace with bt.

@bmacnaughton
Copy link
Contributor Author

bmacnaughton commented Oct 19, 2023

will do. i can post this is the meantime - it's just the vscode debug call stack right before the SyntaxError is thrown (maybe vscode has a way to copy the stack as text, but i don't know it). this is node 20.8.0
image

@bmacnaughton
Copy link
Contributor Author

not sure this is helpful, but if you'd like me to set any particular breakpoints and check values, let me know.

(gdb) set args  --loader ./dist/esm-loader.mjs  test/_servers/express.mjs
(gdb) run
Starting program: /home/bruce/github/nodejs/node/node_g --loader ./dist/esm-loader.mjs  test/_servers/express.mjs
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7f02335ff640 (LWP 266861)]
[New Thread 0x7f0232dfe640 (LWP 266863)]
[New Thread 0x7f02325fd640 (LWP 266864)]
[New Thread 0x7f0231dfc640 (LWP 266865)]
[New Thread 0x7f02315fb640 (LWP 266866)]
[New Thread 0x7f0233c94640 (LWP 266868)]
[New Thread 0x7f0230dfa640 (LWP 266869)]
[New Thread 0x7f02305ff640 (LWP 266872)]
[New Thread 0x7f02303fe640 (LWP 266873)]
[New Thread 0x7f021b9ff640 (LWP 266874)]
[New Thread 0x7f021b7fe640 (LWP 266875)]
[New Thread 0x7f021b5fd640 (LWP 266876)]
[New Thread 0x7f021b3fc640 (LWP 266877)]
[New Thread 0x7f021b1fb640 (LWP 266878)]
[New Thread 0x7f021affa640 (LWP 266879)]
[New Thread 0x7f021adf9640 (LWP 266880)]
[New Thread 0x7f021abf8640 (LWP 266881)]
[New Thread 0x7f021a9f7640 (LWP 266882)]
[New Thread 0x7f021a7f6640 (LWP 266883)]
[New Thread 0x7f021a5f5640 (LWP 266884)]
[New Thread 0x7f021a3f4640 (LWP 266885)]
[New Thread 0x7f021a1f3640 (LWP 266886)]
[New Thread 0x7f0219ff2640 (LWP 266887)]
[New Thread 0x7f0219df1640 (LWP 266888)]
[New Thread 0x7f0219bf0640 (LWP 266889)]
[New Thread 0x7f02199ef640 (LWP 266890)]
[New Thread 0x7f02197ee640 (LWP 266891)]
[New Thread 0x7f02195ed640 (LWP 266892)]
[New Thread 0x7f02193ec640 (LWP 266893)]
[New Thread 0x7f02191eb640 (LWP 266894)]
[New Thread 0x7f0218fea640 (LWP 266895)]
[New Thread 0x7f0218de9640 (LWP 266896)]
[New Thread 0x7f0218be8640 (LWP 266897)]
[New Thread 0x7f02189e7640 (LWP 266898)]
[New Thread 0x7f02187e6640 (LWP 266899)]
[New Thread 0x7f02185e5640 (LWP 266900)]
[New Thread 0x7f02183e4640 (LWP 266901)]
[New Thread 0x7eff1ffff640 (LWP 266902)]
[New Thread 0x7eff1fdfe640 (LWP 266903)]
[New Thread 0x7eff1fbfd640 (LWP 266904)]
[New Thread 0x7eff1f9fc640 (LWP 266905)]
[New Thread 0x7eff1f7fb640 (LWP 266906)]
[New Thread 0x7eff1f5fa640 (LWP 266907)]
[New Thread 0x7eff1f3f9640 (LWP 266908)]
[New Thread 0x7eff1f1f8640 (LWP 266909)]
[New Thread 0x7eff1eff7640 (LWP 266910)]
[New Thread 0x7eff1edf6640 (LWP 266911)]
[New Thread 0x7eff1ebf5640 (LWP 266912)]
[New Thread 0x7eff1e9f4640 (LWP 266913)]
[New Thread 0x7eff1e7f3640 (LWP 266914)]
[New Thread 0x7eff1e5f2640 (LWP 266915)]
[New Thread 0x7eff1e3f1640 (LWP 266916)]
[New Thread 0x7eff1e1f0640 (LWP 266917)]
[New Thread 0x7eff1dfef640 (LWP 266918)]
[New Thread 0x7eff1ddee640 (LWP 266919)]
(node:266857) ExperimentalWarning: `--experimental-loader` may be removed in the future; instead use `register()`:
--import 'data:text/javascript,import { register } from "node:module"; import { pathToFileURL } from "node:url"; register("./dist/esm-loader.mjs", pathToFileURL("./"));'
(Use `node_g --trace-warnings ...` to show where the warning was created)
/home/bruce/github/patch-thing/test/_servers/skeleton.js:4
  throw new SyntaxError data undefined during compilation");
        ^

SyntaxError: data undefined during compilation
    at /home/bruce/github/patch-thing/test/_servers/skeleton.js:4:9
    at Object.<anonymous> (/home/bruce/github/patch-thing/test/_servers/skeleton.js:5:3)
    at loadCJSModule (node:internal/modules/esm/translators:208:3)
    at ModuleWrap.<anonymous> (node:internal/modules/esm/translators:234:7)
    at ModuleJob.run (node:internal/modules/esm/module_job:217:25)
    at async ModuleLoader.import (node:internal/modules/esm/loader:316:24)
    at async loadESM (node:internal/process/esm_loader:34:7)
    at async handleMainPromise (node:internal/modules/run_main:66:12)

Node.js v20.8.0
[Thread 0x7eff1ddee640 (LWP 266919) exited]
[Thread 0x7eff1dfef640 (LWP 266918) exited]
[Thread 0x7eff1e1f0640 (LWP 266917) exited]
[Thread 0x7eff1e3f1640 (LWP 266916) exited]
[Thread 0x7eff1e7f3640 (LWP 266914) exited]
[Thread 0x7eff1edf6640 (LWP 266911) exited]
[Thread 0x7f02185e5640 (LWP 266900) exited]
[Thread 0x7f02189e7640 (LWP 266898) exited]
[Thread 0x7f02187e6640 (LWP 266899) exited]
[Thread 0x7f0218be8640 (LWP 266897) exited]
[Thread 0x7f0218de9640 (LWP 266896) exited]
[Thread 0x7eff1e9f4640 (LWP 266913) exited]
[Thread 0x7f0218fea640 (LWP 266895) exited]
[Thread 0x7f02191eb640 (LWP 266894) exited]
[Thread 0x7f02193ec640 (LWP 266893) exited]
[Thread 0x7f02197ee640 (LWP 266891) exited]
[Thread 0x7eff1fdfe640 (LWP 266903) exited]
[Thread 0x7f0219bf0640 (LWP 266889) exited]
[Thread 0x7eff1fbfd640 (LWP 266904) exited]
[Thread 0x7f021a3f4640 (LWP 266885) exited]
[Thread 0x7eff1e5f2640 (LWP 266915) exited]
[Thread 0x7f021a5f5640 (LWP 266884) exited]
[Thread 0x7f021a7f6640 (LWP 266883) exited]
[Thread 0x7eff1ffff640 (LWP 266902) exited]
[Thread 0x7f021a9f7640 (LWP 266882) exited]
[Thread 0x7f02195ed640 (LWP 266892) exited]
[Thread 0x7eff1f3f9640 (LWP 266908) exited]
[Thread 0x7f021affa640 (LWP 266879) exited]
[Thread 0x7f0219ff2640 (LWP 266887) exited]
[Thread 0x7f021b1fb640 (LWP 266878) exited]
[Thread 0x7eff1f9fc640 (LWP 266905) exited]
[Thread 0x7f0219df1640 (LWP 266888) exited]
[Thread 0x7f021b3fc640 (LWP 266877) exited]
[Thread 0x7eff1f1f8640 (LWP 266909) exited]
[Thread 0x7f021b7fe640 (LWP 266875) exited]
[Thread 0x7f021a1f3640 (LWP 266886) exited]
[Thread 0x7f02303fe640 (LWP 266873) exited]
[Thread 0x7eff1f7fb640 (LWP 266906) exited]
[Thread 0x7f021b9ff640 (LWP 266874) exited]
[Thread 0x7eff1ebf5640 (LWP 266912) exited]
[Thread 0x7eff1eff7640 (LWP 266910) exited]
[Thread 0x7eff1f5fa640 (LWP 266907) exited]
[Thread 0x7f02183e4640 (LWP 266901) exited]
[Thread 0x7f02199ef640 (LWP 266890) exited]
[Thread 0x7f021abf8640 (LWP 266881) exited]
[Thread 0x7f021adf9640 (LWP 266880) exited]
[Thread 0x7f021b5fd640 (LWP 266876) exited]
[Thread 0x7f02305ff640 (LWP 266872) exited]

Thread 8 "node_g" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f0230dfa640 (LWP 266869)]
0x00007f023080ccd0 in ?? ()
(gdb) bt
#0  0x00007f023080ccd0 in ?? ()
#1  0x00007f0233691691 in __GI___nptl_deallocate_tsd ()
    at ./nptl/nptl_deallocate_tsd.c:73
#2  __GI___nptl_deallocate_tsd () at ./nptl/nptl_deallocate_tsd.c:22
#3  0x00007f023369494a in start_thread (arg=<optimized out>)
    at ./nptl/pthread_create.c:453
#4  0x00007f0233726a40 in clone3 ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
(gdb) 

@bnoordhuis
Copy link
Member

Looks like memory corruption at first sight; it's crashing in the glibc function that cleans up thread-local storage. Can you run node under valgrind and see what it reports?

@bmacnaughton
Copy link
Contributor Author

bmacnaughton commented Oct 19, 2023

sorry this took a while (this is with the debug version of node 20.8.0).
issue-50280.txt

this is the terminal output:

(node:317302) ExperimentalWarning: `--experimental-loader` may be removed in the future; instead use `register()`:
--import 'data:text/javascript,import { register } from "node:module"; import { pathToFileURL } from "node:url"; register("./dist/esm-loader.mjs", pathToFileURL("./"));'
(Use `node_g --trace-warnings ...` to show where the warning was created)
/home/bruce/github/patch-thing/test/_servers/skeleton.js:4
  throw new SyntaxError("__contrast undefined during compilation");
        ^

SyntaxError: data undefined during compilation
    at /home/bruce/github/patch-thing/test/_servers/skeleton.js:4:9
    at Object.<anonymous> (/home/bruce/github/patch-thing/test/_servers/skeleton.js:5:3)
    at loadCJSModule (node:internal/modules/esm/translators:208:3)
    at ModuleWrap.<anonymous> (node:internal/modules/esm/translators:234:7)
    at ModuleJob.run (node:internal/modules/esm/module_job:217:25)
    at async ModuleLoader.import (node:internal/modules/esm/loader:316:24)
    at async loadESM (node:internal/process/esm_loader:34:7)
    at async handleMainPromise (node:internal/modules/run_main:66:12)

Node.js v20.8.0
Segmentation fault (core dumped)

@bmacnaughton
Copy link
Contributor Author

should i rerun this with any of the following options?

--error-limit=no
--leak-check=full
--track-origins=yes

@bnoordhuis
Copy link
Member

The log file didn't provide additional clues, I'm afraid. Do you load any native modules (files with a .node suffix)?

@bmacnaughton
Copy link
Contributor Author

yes on .node files, but they have been used in production and tested with valgrind independently, so i don't suspect them. still, worth a double check.

@bnoordhuis
Copy link
Member

Can you still reproduce the crash when you exclude all .node files?

@bmacnaughton
Copy link
Contributor Author

thank you! i have isolated the crash to one file and am going to work on isolating it further. i will update this.

@bmacnaughton
Copy link
Contributor Author

bmacnaughton commented Oct 20, 2023

posting intermediate findings in case they bring something to mind.
- requiring the binary is not a problem
- deleting all of our loader's .d.ts files that reference the binary package (to get its d.ts definitions), prevents the segfault.

more to come.

@bmacnaughton
Copy link
Contributor Author

bmacnaughton commented Oct 21, 2023

OK, excluding all the .node files prevents the crash. If you'd like, I can close this issue and open a new issue - I'm at a loss on how to handle .node files that are being handled by an esm load() hook. (My original goal was just to update our rewrite code to handle node 20.)

before node 20, --loader/--experimental-loader worked as follows:

  • our rewrite code reads JavaScript source code for a user application, transforms it, and returns that transformed source from the load hook (or calls nextLoad)
    • if the format is not 'commonjs' or 'module', we just return nextLoad(url, context)

using --loader with node 20.8.1, there are situations where we're loading a module-based user-application, e.g., test-app.mjs, that ends up requiring a .node file:

executing:
node --loader ./dist/esm-compat-loader.mjs user-app.js

user-app.mjs:

import { default as w } from './user-w.js';
console.log('w', w);

setTimeout(() => {
  console.log('done');
}, 1000);

w.js:

const x = require('./x.js');
module.exports = x;

x.js:

module.exports = require('@contrast/agent-lib');

result:

TypeError [ERR_UNKNOWN_FILE_EXTENSION]: Unknown file extension ".node" for /home/bruce/tmp/patch-o-leum/node_modules/@contrast/agent-lib/contrast.linux-x64-gnu.node
    at new NodeError (node:internal/errors:406:5)
    at Object.getFileProtocolModuleFormat [as file:] (node:internal/modules/esm/get_format:99:9)
    at defaultGetFormat (node:internal/modules/esm/get_format:142:36)
    at defaultLoad (node:internal/modules/esm/load:120:20)
    at nextLoad (node:internal/modules/esm/hooks:833:28)
    at load (file:///home/bruce/tmp/patch-o-leum/dist/esm-compat-loader.mjs:175:12)
    at nextLoad (node:internal/modules/esm/hooks:833:28)
    at Hooks.load (node:internal/modules/esm/hooks:416:26)
    at handleMessage (node:internal/modules/esm/worker:168:24)
    at Immediate.checkForMessages (node:internal/modules/esm/worker:117:28) {
  code: 'ERR_UNKNOWN_FILE_EXTENSION'
}

node:internal/process/esm_loader:40
      internalBinding('errors').triggerUncaughtException(
                                ^
TypeError [ERR_UNKNOWN_FILE_EXTENSION]: Unknown file extension ".node" for /home/bruce/tmp/patch-o-leum/node_modules/@contrast/agent-lib/contrast.linux-x64-gnu.node
    at new NodeError (node:internal/errors:406:5)
    at Object.getFileProtocolModuleFormat [as file:] (node:internal/modules/esm/get_format:99:9)
    at defaultGetFormat (node:internal/modules/esm/get_format:142:36)
    at defaultLoad (node:internal/modules/esm/load:120:20)
    at nextLoad (node:internal/modules/esm/hooks:833:28)
    at load (file:///home/bruce/tmp/patch-o-leum/dist/esm-compat-loader.mjs:175:12)
    at nextLoad (node:internal/modules/esm/hooks:833:28)
    at Hooks.load (node:internal/modules/esm/hooks:416:26)
    at handleMessage (node:internal/modules/esm/worker:168:24)
    at Immediate.checkForMessages (node:internal/modules/esm/worker:117:28) {
  code: 'ERR_UNKNOWN_FILE_EXTENSION'
}

The same error occurs when using --import (node --import ./dist/esm-compat-loader.mjs user-app.js).

Converting user-app to a .mjs file:

user-app.mjs:

import { default as w } from './w.js';
console.log('w', w);

setTimeout(() => {
  console.log('done');
}, 1000);

results in the same error. In all cases, it's throwing because the code (in both the exported or register()'d hook) returns nextLoad() when the format is not 'commonjs' or 'module':

  if (format !== 'commonjs' && format !== 'module') {
    console.debug({ load: url, format }, 'Skipping rewrite; loading original code');
    return nextLoad(url, context);
  }

My question is how are we supposed to handle .node extensions? I've tried returning {format, source: null } which results in the error:

Error [ERR_LOADER_CHAIN_INCOMPLETE]: "./dist/esm-v20-6-loader.mjs 'load'" did not call the next hook in its chain and did not explicitly signal a short circuit. If this is intentional, include `shortCircuit: true` in the hook's return.
    at new NodeError (node:internal/errors:406:5)
    at Hooks.load (node:internal/modules/esm/hooks:424:13)
    at async handleMessage (node:internal/modules/esm/worker:168:18) {
  code: 'ERR_LOADER_CHAIN_INCOMPLETE'
}

Changing the return to {format, source: null, shortCircuit: true } results in:

RangeError [ERR_UNKNOWN_MODULE_FORMAT]: Unknown module format: undefined for URL file:///home/bruce/tmp/patch-o-leum/node_modules/@contrast/agent-lib/contrast.linux-x64-gnu.node
    at new NodeError (node:internal/errors:406:5)
    at Object.throwUnknownModuleFormat (node:internal/modules/esm/load:233:9)
    at Hooks.load (node:internal/modules/esm/hooks:456:44)
    at async handleMessage (node:internal/modules/esm/worker:168:18) {
  code: 'ERR_UNKNOWN_MODULE_FORMAT'
}

And, as a last gasp, I try creating a require in the loader:

  if (format !== 'commonjs' && format !== 'module') {
    console.debug({ load: url, format }, 'Skipping rewrite; loading original code');
    // new hack start
    if (path.extname(filename) === '.node') {
      const req = createRequire(import.meta.url);
      const m = req(filename);
      return { format, source: m, shortCircuit: true };
    }
    // new hack end
    return nextLoad(url, context);
  }

which results in a segfault:

RangeError [ERR_UNKNOWN_MODULE_FORMAT]: Unknown module format: undefined for URL file:///home/bruce/tmp/patch-o-leum/node_modules/@contrast/agent-lib/contrast.linux-x64-gnu.node
    at new NodeError (node:internal/errors:406:5)
    at Object.throwUnknownModuleFormat (node:internal/modules/esm/load:233:9)
    at Hooks.load (node:internal/modules/esm/hooks:456:44)
    at async handleMessage (node:internal/modules/esm/worker:168:18) {
  code: 'ERR_UNKNOWN_MODULE_FORMAT'
}

Node.js v20.8.1
Segmentation fault

If I change to use a different .node file, it doesn't segfault, so I have something to look into there. But it does still give the ERR_UNKNOWN_MODULE_FORMAT, so I see no way to load a .node file when using either --loader or --import with node 20.

There is an additional issue (should I create another issue?) with global not being propagated to the worker thread that is executing the load() function, but I think I have a workaround for that using either globalPreload or the initialize() function. I still have some work to do on that, but what I think will work is creating a mock in the worker-thread's global so that when it starts executing in the main thread, it will pick up the correct version of global.

Sorry for such a dump here - I know that what we're doing is not mainstream, but we have no alternatives.

@bmacnaughton
Copy link
Contributor Author

^^^ if all that isn't clear (and I've been buried in this, so I might unintentionally omit some things), I can clean up my test programs and create a repo that contains repros for each.

@koshic
Copy link

koshic commented Oct 21, 2023

^^^ if all that isn't clear (and I've been buried in this, so I might unintentionally omit some things), I can clean up my test programs and create a repo that contains repros for each.

It still not clear. I've created simple test with my own TS loader (very simple, used in a lot of internal projects), and run the following code with --import=my-loader:

import { createRequire } from "node:module";

const require = createRequire(import.meta.url);

const addon = require("./addon.node");

console.log(addon); 

It works, I see exports from addon. But! Both load and resolve hooks don't executed for .node file, which is expected according to docs.

So, what is your task? Is to replace require('./xxx.js') to kind of require('./addon.node')? Or you want to redirect import('xxx') to .node file?

Update: with the following code in resolve hook I can load .node file without any errors:

image image image

@bmacnaughton
Copy link
Contributor Author

bmacnaughton commented Oct 21, 2023

Thanks for taking a look. I'm working on making a relatively clean repo that allows for easy repros. In the meantime, maybe this helps?

my task

  • to transform certain patterns in the user's code from within the load() hook. it's analogous to converting typescript => javascript. all I want to do with .node files it to ignore them, but once they appear in the load() hook, I don't see how to do so.

I have not used a resolve hook (and honestly, I've had a hard time seeing how they would help me, so maybe they're just a big hole in my knowledge). Here's a quick attempt at clarifying things:

Our application is the loader hook, specified after either --import or --loader.

An arbitrary user application (and all that it requires) is the target for our rewriting.

So node --loader our-esm-hook user-application.mjs.

If the user-application is commonjs, we specific that the command line is node -r our-req-hook user-application.js (and we use the Module.prototype._compile replacement technique).

I do now know which .node files might be loaded by the user-application. The specific situation that I tried to document above was:

user-application.mjs is loaded

  • it executes import {default as x} from './user-x.js' (because import can access common js files)
  • user-x.js executes const thing = require('package') which selects the proper .node file for the architecture and loads it

that .node file appears in the load hook.

@koshic
Copy link

koshic commented Oct 21, 2023

@bnoordhuis is it proper initial setup - https://github.com/koshic/addon-test and node ./user-application.mjs? If yes, we can add additional file / import and use loader to rewrite it.

@bmacnaughton
Copy link
Contributor Author

repo: https://github.com/bmacnaughton/patch-o-matic

dist/esm-loader-compat.mjs could be simpler, but I wanted it to be the common entry point and to include require patching (for --loader) to demonstrate that the require hooks are not being called.

@koshic
Copy link

koshic commented Oct 21, 2023

Which command should I use? And what is expected result?

@cjihrig cjihrig removed the test_runner Issues and PRs related to the test runner subsystem. label Oct 21, 2023
@bmacnaughton
Copy link
Contributor Author

They each demonstrate different results. I've added them to the README. I'm sorry I didn't do that proactively.

To demonstrate that the --loader hook sees the .node file even when it's required, run esm-loader-js-no-segfault or esm-loader-mjs-no-segfault. To demonstrate that the --import hook sees the .node file when it's required, use esm-import-js-no-segfault or esm-import-mjs-no-segfault.

And you can run req-loader.js to show that patching module.prototype._compile works as expected.

@bnoordhuis
Copy link
Member

OK, excluding all the .node files prevents the crash. If you'd like, I can close this issue and open a new issue

We don't accept bug reports for native modules unless it can be conclusively shown it's a bug in node itself (because, quite literally, 99% of bug reports about native modules are bugs in the module itself, not node.)

@bnoordhuis bnoordhuis added the addons Issues and PRs related to native addons. label Oct 22, 2023
@koshic
Copy link

koshic commented Oct 22, 2023

OK, excluding all the .node files prevents the crash. If you'd like, I can close this issue and open a new issue

We don't accept bug reports for native modules unless it can be conclusively shown it's a bug in node itself (because, quite literally, 99% of bug reports about native modules are bugs in the module itself, not node.)

Sadly, it's a Node issue, related to CJS loader used from ESM loader via hooks.

I found easy way to reproduce it - https://github.com/koshic/addon-test. 'addon.node' can be any valid native module, it used only to demonstrate behavior difference.

At first, we have 1.cjs file which loads 'addon.node' twice - with relative path, and with absolute path:

// 1.cjs
console.log(require("./addon.node"));
console.log(require(require('path').resolve("./addon.node")));

It works via node 1.cjs as expected, without any issues.

Next, let's use load hook returns source with relative path require:

// hook-relative.mjs
export const load = async (url, context, nextLoad) => {
    if (context.format === 'commonjs') {
        return {
            source: `console.log(require("./addon.node"));`,
            format: "commonjs",
            shortCircuit: true,  
        }
    }

    return nextLoad(url, context);
}  

node --loader=./hook-relative.mjs 1.cjs works fine.

Last try - to use hook with absolute path in require:

// hook-absolute.mjs
export const load = async (url, context, nextLoad) => {
    if (context.format === 'commonjs') {
        return {
            source: `require(require('path').resolve("./addon.node"))`,
            format: "commonjs",
            shortCircuit: true,  
        }
    }

    return nextLoad(url, context);
}  

And we see an error:

@koshic ➜ /workspaces/addon-test (main) $ node --loader=./hook-absolute.mjs 1.cjs
(node:20147) ExperimentalWarning: `--experimental-loader` may be removed in the future; instead use `register()`:
--import 'data:text/javascript,import { register } from "node:module"; import { pathToFileURL } from "node:url"; register("./hook-absolute.mjs", pathToFileURL("./"));'
(Use `node --trace-warnings ...` to show where the warning was created)

node:internal/process/esm_loader:40
      internalBinding('errors').triggerUncaughtException(
                                ^
TypeError [ERR_UNKNOWN_FILE_EXTENSION]: Unknown file extension ".node" for /workspaces/addon-test/addon.node
    at new NodeError (node:internal/errors:406:5)
    at Object.getFileProtocolModuleFormat [as file:] (node:internal/modules/esm/get_format:99:9)
    at defaultGetFormat (node:internal/modules/esm/get_format:142:36)
    at defaultLoad (node:internal/modules/esm/load:120:20)
    at nextLoad (node:internal/modules/esm/hooks:833:28)
    at load (file:///workspaces/addon-test/hook-absolute.mjs:10:12)
    at nextLoad (node:internal/modules/esm/hooks:833:28)
    at Hooks.load (node:internal/modules/esm/hooks:416:26)
    at handleMessage (node:internal/modules/esm/worker:168:24)
    at checkForMessages (node:internal/modules/esm/worker:117:28) {
  code: 'ERR_UNKNOWN_FILE_EXTENSION'
}

Node.js v20.8.0
@koshic ➜ /workspaces/addon-test (main) $ 

There is not segfault, but I think it depends on the particular setup with deeper import / require chains.

@bmacnaughton
Copy link
Contributor Author

bmacnaughton commented Oct 22, 2023

@koshic. Your observation makes sense - both addons return absolute paths (only their index.js files are referenced using relative paths - each has code to select the right binary for the architecture that returns an absolute path). Thank you for isolating the issue.

@bnoordhuis - I'm sorry this got so long; it was an interactive exchange; I had some trouble making the context clear. There are two issues here:

  • the esm loader sees the .node file and there is no way to handle it. that is not dependent on a specific addon.
  • the crash with a particular node file.

I am not ready to say that the .node file that causes the crash is absolutely node's problem, but we've been using it since node 14, run it with valgrind, and have an extensive test suite, so it's definitely a possibility that node 20 introduced an issue. (It's also a possibility that node 20 unmasked a latent issue.) Today I'm going to work with a debug version of node to see if I can't narrow down the issue a bit. I'm not at all familiar with all the new work for loaders, so it's slow going.

@bnoordhuis
Copy link
Member

@nodejs/loaders can you confirm whether #50280 (comment) is because of the removal of legacyExtensionFormatMap in #44859? It removed .node suffix from the list of recognized file extensions so I assume it's intentional?

@GeoffreyBooth GeoffreyBooth added the loaders Issues and PRs related to ES module loaders label Oct 22, 2023
@bmacnaughton
Copy link
Contributor Author

bmacnaughton commented Oct 22, 2023

@bnoordhuis - thank you. You're much quicker than me; I just found that it gets down to internal/modules/esm/get_format#getFileProtocolModuleFormat() where getPackageType() returns 'none' so it throws on the unknown file extension.

The segfault is the result of one of my attempts to make createRequire() solve the issue. The following code in my load() hook used createRequire to load the module.

  if (format !== 'commonjs' && format !== 'module') {
    console.debug({ load: url, format }, 'Skipping rewrite; loading original code');
    // new hack start
    if (path.extname(filename) === '.node') {
      const req = createRequire(import.meta.url);
      const m = req(filename);
      return { format, source: m, shortCircuit: true };
    }
    // new hack end
    return nextLoad(url, context);
  }

The code doesn't solve the problem, but the one .node module causes a segfault while the other does not. I'm going to continue looking at it to see if I can isolate it, but I'm guessing that it's in the C++ code and my progress will be slow. I'm also guessing some code isn't expecting that format of binary data.

@GeoffreyBooth
Copy link
Member

@nodejs/loaders can you confirm whether #50280 (comment) is because of the removal of legacyExtensionFormatMap in #44859? It removed .node suffix from the list of recognized file extensions so I assume it’s intentional?

I doubt it. legacyExtensionFormatMap was only used by --experimental-specifier-resolution, which that PR removed, and that was many versions of Node ago.

Can the issue be reproduced with one of the .node files in https://github.com/nodejs/node? Like one of the fixtures under test/. That would help narrow things down for the minimal reproduction.

With regard to the segfaulting .node file, perhaps it has something to do with being run off-thread. If you require that .node file from a worker thread, with no loaders involved, does it work? It could also perhaps be segfaulting because of globals that it expects to find that aren’t present in the hooks thread.

I’m having trouble following everything in this thread, but there might be an issue here in terms of nextLoad not supporting .node files. That used to not be an issue because these hooks only supported import, but they recently were extended to support require too, so now they might be getting triggered for .node files where they weren’t before. Having a minimal reproduction, ideally using a .node file in the Node repo, would help narrow things down.

Note that in Node 21 globalPreload has been removed, so you should refactor away from it. See #49493 and linked PRs for some examples of how to do so.

@koshic
Copy link

koshic commented Oct 22, 2023

@GeoffreyBooth I've already created minimal repro - https://github.com/koshic/addon-test. Just clone it (or use GitHub codespaces) and run node --loader=./hook-absolute.mjs 1.cjs. As you can see on the stack trace (detail - #50280 (comment)), Node failed to resolve '.node' file, and doesn't load it.

I can remove .node file from the example if you want, because error is related to resolver, not loader.

@targos
Copy link
Member

targos commented Oct 22, 2023

The segfault is probably related to loading the addon from the loader thread, or from both loader and main threads.

@bmacnaughton
Copy link
Contributor Author

bmacnaughton commented Oct 22, 2023

@GeoffreyBooth Thank you for the thoughts. Yes, the thread is a mess largely because of me. This started with one issue that I couldn't reproduce, and turned into multiple other issues.

globals that don't exist in another thread

Yes, that was what appeared to be the problem initially - it was a bit of a surprise to me but I have been able to work around that by using either globalPreload() prior to node 20.6.0 or initialize() for 20.6.0+. (I create a "mock" setup in global so that the compile works in another thread.) But originally, compile errors due to non-defined globals were causing segfaults during the compilation. The code in the patch-o-matic repo comes from my attempts to reproduce that in a more contained environment - I have not been able to do so.

the segfaulting

Requiring that specific .node file works fine when using require() that is handled by module.prototype._compile. And requiring the file in load() (using the result of createRequire() succeeds. The segfault in the patch-o-matic code is due to returning it as the source from the load() hook:

      const req = createRequire(import.meta.url);
      const m = req(filename);
      return { format, source: m, shortCircuit: true };

This was just an abortive attempt of mine trying to find a way to get the esm hook to handle the .node file.

minimal repro

@koshic's repro is mininal. As he's noted, it's happening before the module is loaded - it's the result of internal/modules/esm/get_format#getFileProtocolModuleFormat() where getPackageType() returns 'none' so it throws on the unknown file extension.

Again, thanks for trying to follow this very messy and unfocused thread. The nature of the problem changed as I dug into it. There is still a potential segfault with code within the context of module.evaluate() throwing an error, but I have only a big, messy ability to reproduce it.

@bmacnaughton
Copy link
Contributor Author

bmacnaughton commented Oct 22, 2023

A minor issue I also ran into, I attempted to return source: null from the load() hook to get the CommonJS loader to handle it (https://nodejs.org/docs/latest-v20.x/api/module.html#loadurl-context-nextload) but only got errors about invalid source - I don't recall the exact error.

@GeoffreyBooth
Copy link
Member

@koshic’s repro is mininal. As he’s noted, it’s happening before the module is loaded - it’s the result of internal/modules/esm/get_format#getFileProtocolModuleFormat() where getPackageType() returns ’none’ so it throws on the unknown file extension.

You need to define a resolve hook to override Node throwing on unknown extensions. See the example in https://nodejs.org/api/module.html#transpilation.

@bmacnaughton
Copy link
Contributor Author

I don't follow how that would work. The resolve hook can look for any .node file, but what should it do with them? I think I'm missing a fundamental concept on how resolve hooks are intended to be used.

@koshic
Copy link

koshic commented Oct 22, 2023

You need to define a resolve hook to override Node throwing on unknown extensions. See the example in https://nodejs.org/api/module.html#transpilation.

Why? That code works fine without resolve hook, via node --loader=./hook-relative.mjs 1.cjs:

export const load = async (url, context, nextLoad) => {
    if (context.format === 'commonjs') {
        return {
            source: `console.log(require("./addon.node"));`,
            format: "commonjs",
            shortCircuit: true,  
        }
    }

    return nextLoad(url, context);
}  

Node resolver should work consistently, regardless of 'resolve' hook presence:

  • If '.node' is known extension, both './addon.node' and '/workspaces/addon-test/addon.node' should be resolved.
  • If '.node' is unknown extension, both './addon.node' and '/workspaces/addon-test/addon.node' should fail.

What we see - require('./addon.node') works, require('/workspaces/addon-test/addon.node') doesn't.

@GeoffreyBooth
Copy link
Member

GeoffreyBooth commented Oct 22, 2023

In ESM Node throws on unknown or unsupported extensions. That’s why in https://nodejs.org/api/module.html#transpilation there’s a resolve hook to provide successful resolution for .coffee files, to make them not throw.

This “minimal” example is somewhat convoluted, in that a loader is returning source that contains require statements that I guess get run. Is that necessary? The original 1.js contains require calls.

I don’t think we currently support resolve for require calls; I’m not sure. .node is supported in require, so maybe that’s why it’s not throwing (or not throwing some of the time).

@koshic
Copy link

koshic commented Oct 22, 2023

@GeoffreyBooth, ok, let's simplify setup.

Prerequsite:

  • touch 1.cjs // yeah, it may be empty
  • touch 1.node // empty too

Case 1 - 'load' hook returns CJS source require('./1.node'). Works as expected.

cat > hook1.mjs << EOF
export const load = (url, context, nextLoad) => {
    if (url.endsWith('1.cjs')) {
        return {
            source: 'require("./addon.node")',
            format: 'commonjs',
            shortCircuit: true
        }
    }

    return nextLoad(url, context);
}
EOF

node --loader ./hook1.mjs 1.cjs

@koshic ➜ /workspaces/addon-test (main) $ node --loader ./hook1.mjs 1.cjs
(node:2392) ExperimentalWarning: `--experimental-loader` may be removed in the future; instead use `register()`:
--import 'data:text/javascript,import { register } from "node:module"; import { pathToFileURL } from "node:url"; register("./hook1.mjs", pathToFileURL("./"));'
(Use `node --trace-warnings ...` to show where the warning was created)
node:internal/modules/cjs/loader:1327
  return process.dlopen(module, path.toNamespacedPath(filename));
                 ^

Error: /workspaces/addon-test/1.node: file too short
    at Module._extensions..node (node:internal/modules/cjs/loader:1327:18)
    at Module.load (node:internal/modules/cjs/loader:1091:32)
    at Module._load (node:internal/modules/cjs/loader:938:12)
    at require (node:internal/modules/esm/translators:185:30)
    at Object.<anonymous> (/workspaces/addon-test/1.cjs:1:1)
    at loadCJSModule (node:internal/modules/esm/translators:208:3)
    at ModuleWrap.<anonymous> (node:internal/modules/esm/translators:234:7)
    at ModuleJob.run (node:internal/modules/esm/module_job:217:25)
    at async ModuleLoader.import (node:internal/modules/esm/loader:316:24)
    at async loadESM (node:internal/process/esm_loader:34:7) {
  code: 'ERR_DLOPEN_FAILED'
}

Node.js v20.8.0
@koshic ➜ /workspaces/addon-test (main) $ 

Error: /workspaces/addon-test/1.node: file too short - 1.node resolved, but can't be loaded because empty. It's ok.

Case 2 - 'load' hook returns CJS source with absolute path (you can insert path suitable for your machine) inside require. Doesn't work as expected.

cat > hook2.mjs << EOF
export const load = (url, context, nextLoad) => {
    if (url.endsWith('1.cjs')) {
        return {
            source: 'require("/workspaces/addon-test/1.node")',
            format: 'commonjs',
            shortCircuit: true
        }
    }

    return nextLoad(url, context);
}
EOF

node --loader ./hook2.mjs 1.cjs

@koshic ➜ /workspaces/addon-test (main) $ node --loader ./hook2.mjs 1.cjs
(node:5358) ExperimentalWarning: `--experimental-loader` may be removed in the future; instead use `register()`:
--import 'data:text/javascript,import { register } from "node:module"; import { pathToFileURL } from "node:url"; register("./hook2.mjs", pathToFileURL("./"));'
(Use `node --trace-warnings ...` to show where the warning was created)

node:internal/process/esm_loader:40
      internalBinding('errors').triggerUncaughtException(
                                ^
TypeError [ERR_UNKNOWN_FILE_EXTENSION]: Unknown file extension ".node" for /workspaces/addon-test/1.node
    at new NodeError (node:internal/errors:406:5)
    at Object.getFileProtocolModuleFormat [as file:] (node:internal/modules/esm/get_format:99:9)
    at defaultGetFormat (node:internal/modules/esm/get_format:142:36)
    at defaultLoad (node:internal/modules/esm/load:120:20)
    at nextLoad (node:internal/modules/esm/hooks:833:28)
    at load (file:///workspaces/addon-test/hook2.mjs:10:12)
    at nextLoad (node:internal/modules/esm/hooks:833:28)
    at Hooks.load (node:internal/modules/esm/hooks:416:26)
    at MessagePort.handleMessage (node:internal/modules/esm/worker:168:24)
    at [nodejs.internal.kHybridDispatch] (node:internal/event_target:807:20) {
  code: 'ERR_UNKNOWN_FILE_EXTENSION'
}

Node.js v20.8.0
@koshic ➜ /workspaces/addon-test (main) $ 

TypeError [ERR_UNKNOWN_FILE_EXTENSION]: Unknown file extension ".node" for /workspaces/addon-test/1.node

Why do we have different results for the same file depends on path form, relative / absolute? Without loader, in plain .cjs file, both variants work without difference.

@bmacnaughton
Copy link
Contributor Author

Give bmacnaughton/patch-o-matic a try. Clone it, run npm install, and execute npm run only, which executes

SEGFAULT=no node --loader ./dist/esm-loader-only.mjs user-app.mjs (the env var just selects the .node file that doesn't cause a segfault).

user-app.mjs:

import { default as x } from './user-x.js';

setTimeout(() => {
  console.log('done');
}, 1000);

user-x.js:

const m = process.env.SEGFAULT === 'no' ? '@contrast/distringuish' : '@contrast/agent-lib';
console.log(`requiring ${m}`);

module.exports = require(m);

With the resolve() hook as it is, I try to do supply the required .node file in resolve(). With that, I get:

ELF
^

SyntaxError: Invalid or unexpected token
segfault

If you change the name of resolve() to _resolve(), so it's not a hook, I get:

node:internal/process/esm_loader:40
      internalBinding('errors').triggerUncaughtException(
                                ^
TypeError [ERR_UNKNOWN_FILE_EXTENSION]: Unknown file extension ".node" for /home/bruce/tmp/patch-o-matic/node_modules/@contrast/agent-lib/contrast.linux-x64-gnu.node

What am I supposed to do in the resolve() hook so it works?

@aduh95
Copy link
Contributor

aduh95 commented Oct 22, 2023

@koshic can you try with:

cat > hook3.mjs << 'EOF'
export const load = (url, context, nextLoad) => {
    if (url.endsWith('1.cjs')) {
        return {
            source: `require("node:module").createRequire(${JSON.stringify(url)})("./1.node")`,
            format: 'commonjs',
            shortCircuit: true,
        }
    }

    return nextLoad(url, context);
}
EOF

@koshic
Copy link

koshic commented Oct 22, 2023

@aduh95 Error: /workspaces/addon-test/1.node: file too short - works as expected

@bmacnaughton
Copy link
Contributor Author

bmacnaughton commented Oct 22, 2023

@aduh95 - here's one additional piece of data - your solution causes the loader to be called again with the url 'node:module' but the format undefined.

I made the changes to the esm-loader-only.mjs#load() function in the patch-o-matic repo and the .node module loads correctly:

  if (format !== 'commonjs' && format !== 'module') {
    console.debug({ load: url, format }, 'Skipping rewrite; loading original code');
    if (path.extname(filename) === '.node') {
      return {
        source: `require("module").createRequire("${url}")("${filename}");`,
        format: 'commonjs',
        shortCircuit: true,
      };
    }
    return nextLoad(url, context);
  }

I had to remove the node: prefix in require("node:module") or it failed as follows because the load hook was being called again with the url "node:module" but format is undefined, not builtin. this is specific to the code in esm-loader-only.mjs in that it returns immediately if the format is 'builtin'. And when called again, the url module interprets node: as the protocol.

TypeError [ERR_INVALID_URL_SCHEME]: The URL must be of scheme file
    at new NodeError (node:internal/errors:406:5)
    at fileURLToPath (node:internal/url:1393:11)
    at load (file:///home/bruce/github/bmacnaughton/patch-o-matic/dist/esm-loader-only.mjs:33:20)
    at nextLoad (node:internal/modules/esm/hooks:833:28)
    at Hooks.load (node:internal/modules/esm/hooks:416:26)
    at handleMessage (node:internal/modules/esm/worker:168:24)
    at Immediate.checkForMessages (node:internal/modules/esm/worker:117:28)
    at process.processImmediate (node:internal/timers:478:21) {
  code: 'ERR_INVALID_URL_SCHEME'
}

@bmacnaughton
Copy link
Contributor Author

bmacnaughton commented Nov 4, 2023

If anyone is interested in tracking down the segfault, https://github.com/bmacnaughton/patch-o-matic, reproduces it.

So far, I have not been able to produce a segfault unless I load @contrast/agent-lib. I have tried to reproduce the segfault with other .node files (using C++, rust/napi-rs) but have had no success.

I'm happy to close this if that's what is appropriate. I can always reopen or create a new issue if and when I am able to further isolate the issue.

@bnoordhuis
Copy link
Member

None of us is going to debug someone else's binary blob so I'll be so free to close this. You should report this to them.

@bnoordhuis bnoordhuis closed this as not planned Won't fix, can't repro, duplicate, stale Nov 4, 2023
@kvenn
Copy link

kvenn commented Feb 25, 2024

@bmacnaughton were you able to find a fix or workaround? I'm having this issue as well on node 20 with esm. Same setup was working on node 19 and the segfault started after upgrading (only while debugging).

@bmacnaughton
Copy link
Contributor Author

@bmacnaughton were you able to find a fix or workaround? I'm having this issue as well on node 20 with esm. Same setup was working on node 19 and the segfault started after upgrading (only while debugging).

I can help with this next week; am traveling now.

@bmacnaughton
Copy link
Contributor Author

The abbreviated answer is that the ESM load hook needs to check to see if the file being loaded is an addon:

  if (urlObject.pathname.endsWith('.node')) {
    const metaUrl = JSON.stringify(url);
    const addon = urlObject.pathname;
    return {
      source: `require("node:module").createRequire(${metaUrl})("${addon}")`,
      format: 'commonjs',
      shortCircuit: true,
    };
  }

This was the solution suggested by aduh95 and koshic, both of whom were great help.

@kvenn
Copy link

kvenn commented Mar 6, 2024

You wrote this in a custom loader and did something like this?

node --experimental-loader ./custom-loader.mjs your-app.js

Thank you!

@bmacnaughton
Copy link
Contributor Author

yes, that code is in the load hook in the custom-loader.mjs file.

fwiw, --loader and --experimental-loader are synonyms, so you can save a bit of typing.

@bmacnaughton
Copy link
Contributor Author

bmacnaughton commented Mar 6, 2024

oops - no, that's what I use with the --import option, not --loader. i was not able to make things work using --loader with node 20. sorry, it's been a bit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
addons Issues and PRs related to native addons. loaders Issues and PRs related to ES module loaders
Projects
None yet
Development

No branches or pull requests

9 participants