Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESM loaders cannot be defined via Worker option execArgv in v20 #47747

Open
cjihrig opened this issue Apr 27, 2023 · 60 comments
Open

ESM loaders cannot be defined via Worker option execArgv in v20 #47747

cjihrig opened this issue Apr 27, 2023 · 60 comments
Labels
esm Issues and PRs related to the ECMAScript Modules implementation. loaders Issues and PRs related to ES module loaders worker Issues and PRs related to Worker support.

Comments

@cjihrig
Copy link
Contributor

cjihrig commented Apr 27, 2023

Version

20.0.0 (tested on main at 36e4e3d too)

Platform

macOS but probably all platforms

Subsystem

esm

What steps will reproduce the bug?

The following works in Node 18 and 19, but not 20:

// main.mjs
import { Worker } from 'node:worker_threads';

new Worker('./worker.js', {
  execArgv: ['--experimental-loader', './loader.mjs'],
});
// worker.js
'use strict';

async function main() {
  await import('node:fs');
}

main();
// loader.mjs
export function resolve (specifier, context, nextResolve) {
  throw new Error('boom');
}

Run: node main.mjs. In Node 18 and 19, the exception in the loader is thrown. In Node 20 it is not. The problem is not unique to throwing. I haven't been able to see console.log() or any other indication that the loader is being called.

How often does it reproduce? Is there a required condition?

100% of the time for me.

What is the expected behavior? Why is that the expected behavior?

I expect it to work as it did in earlier versions of Node. This is the expected behavior because right now it doesn't seem to work at all.

What do you see instead?

See the description of the bug above.

Additional information

Just a guess, but I'm assuming this is related to #44710.

@VoltrexKeyva VoltrexKeyva added esm Issues and PRs related to the ECMAScript Modules implementation. loaders Issues and PRs related to ES module loaders labels Apr 27, 2023
@isaacs
Copy link
Contributor

isaacs commented Apr 27, 2023

Another side effect of ES loaders to a separate thread: updating require.extensions in an es loader no longer has any effect, meaning that node --loader=ts-node/esm no longer functions properly to load both cjs and mjs typescript.

This makes it significantly more challenging to have loaders that work for both esm and cjs, effectively breaking the esm on-ramp for those of us supporting both modes, since now you need to know ahead of time what the module type is (or write the transpiled source to another path and return a short circuit url to that instead), rather than being able to have a single loader capable of handling both.

@isaacs
Copy link
Contributor

isaacs commented Apr 27, 2023

Verified that node --loader=ts-node/esm foo.ts stopped working on 4667b07

@cjihrig
Copy link
Contributor Author

cjihrig commented Apr 28, 2023

cc: @nodejs/loaders

@JakobJingleheimer
Copy link
Contributor

As stated in the ESM doc, async activity like console.log (which is specifically cited in the caveat) is not guaranteed to run.

Please try other methods, such as fs.writeSync to confirm.

I believe this is not a dupe of #47615 because it specifically provides execArgv, thus avoiding the fork-bomb.

@cjihrig
Copy link
Contributor Author

cjihrig commented Apr 28, 2023

@JakobJingleheimer I've tried fs.writeSync() and process._rawDebug() with no luck.

@mcollina
Copy link
Member

I’m not sure how would it work in the current design. There is (by design) one loader process for the whole Node.js process. In the example above, we are asking Node.js to start another one just for that specific thread.

We need a new API to handle this case, something that:

  1. allows us to create a "loaders worker", potentially exposing what we are already doing in core
  2. pass that loader down to new Worker(path, { loader } ), so that the loader thread could be reused.

@cjihrig
Copy link
Contributor Author

cjihrig commented Apr 28, 2023

We need a new API to handle this case

What you are proposing seems like an explicit API. It seems like --experimental-loader passed to the Worker constructor should implicitly do that (without needing to expose more public APIs).

@targos
Copy link
Member

targos commented Apr 28, 2023

It's interesting because it doesn't work --loader in the worker's execArgv, but it works fine if --loader is passed to the main script (the worker inherits the flag).

@GeoffreyBooth
Copy link
Member

It’s interesting because it doesn’t work --loader in the worker’s execArgv, but it works fine if --loader is passed to the main script (the worker inherits the flag).

I don’t think of worker threads like child processes that can have their own Node flags. I feel like they should inherit whatever the parent process has, which is what @targos is describing here.

@cjihrig
Copy link
Contributor Author

cjihrig commented Apr 28, 2023

I don’t think of worker threads like child processes that can have their own Node flags.

That is the exact purpose of the execArgv option to the Worker constructor. Some flags are not supported there, but, for example, --require does appear to be. And the loaders docs do claim that they follow the pattern of --require

@GeoffreyBooth
Copy link
Member

the loaders docs do claim that they follow the pattern of --require

Only in the sense that the way that chaining works for multiple invocations of --require is the same way that chaining works for multiple invocations of --loader. There’s not much else that loaders share in common with require.

Unless there’s some argument for why this should be considered a bug, I think we should just update the docs for https://nodejs.org/api/worker_threads.html#new-workerfilename-options to clarify that --loader is one of the flags not supported by that option.

@GeoffreyBooth GeoffreyBooth changed the title ESM loaders no longer work with worker threads in v20 ESM loaders cannot be defined via Worker option execArgv in v20 Apr 28, 2023
@GeoffreyBooth GeoffreyBooth added the worker Issues and PRs related to Worker support. label Apr 28, 2023
@cjihrig
Copy link
Contributor Author

cjihrig commented Apr 28, 2023

I guess my question is, are there actually technical reasons why this can't be implemented?

@JakobJingleheimer
Copy link
Contributor

I guess my question is, are there actually technical reasons why this can't be implemented?

I was about to ask this too. If it works from main, I would've expected it to work here too, but if not, I imagine it would be fairly simple to support from the esm worker 🤔 (unless we get stuck in Inception)

@GeoffreyBooth
Copy link
Member

I guess my question is, are there actually technical reasons why this can’t be implemented?

In the current behavior, there’s a loaders thread and it works for both the main application thread and all worker threads (I think); within that loaders thread there’s shared scope that the loaders can use, such as to keep caches and such. If we support execArgv then that would require spinning up a new loaders thread specific to that worker thread, because the --loader argument passed to execArgv might be different than that used for the root process. This presents a few issues:

  • There are now multiple loaders threads, with multiple scopes, and so loader authors would need to be aware of this possibility. We would probably get calls for ways to communicate between loaders threads.
  • The number of loaders threads would increase linearly with the number of worker threads spawned with execArgv, doubling the memory/processing requirements of each worker thread.

In short, it’s a lot of added complexity for both Node and for loader authors, and for what? I don’t see the benefit. If you want to create a new process with different loaders, spawn a child process.

@mcollina
Copy link
Member

@GeoffreyBooth I agree with your assessment with supporting execArgv: it will cost too much memory and CPU time. I still think allowing to provide a custom loader for each worker is very useful for all sorts of use cases (mocking, transpiling, ...). Processes are more costly and harder to manage.

IMHO, we need a new API to handle this case, something that:

  1. allows us to create a "loaders worker", potentially exposing what we are already doing in core.
  2. pass that loader down to new Worker(path, { loader } ), so that the loader thread could be reused across multiple workers.

@GeoffreyBooth
Copy link
Member

I still think allowing to provide a custom loader for each worker is very useful for all sorts of use cases

What would a specific use case be? When would you want to do something that only a loader can do, that you wouldn’t want applied to the main application thread?

Keep in mind there’s also the registerLoader API in progress, that allows for registering loaders after startup.

@mcollina
Copy link
Member

What would a specific use case be? When would you want to do something that only a loader can do, that you wouldn’t want applied to the main application thread?

CLI tools can't really set Node.js options. registerLoader would solve all the same use cases of execArgv in this case. One can always set up a "jump" worker to get things done.
I didn't know it was in the works, is there a tracking issue/pr?

@cjihrig
Copy link
Contributor Author

cjihrig commented Apr 28, 2023

Looks like the PR is #46826. I believe that would work for my use case.

I still think execArgv should be an option, but I also don't care enough to push for it if there is an alternative that works for me.

@GeoffreyBooth
Copy link
Member

GeoffreyBooth commented Apr 28, 2023

I believe that would work for my use case.

What is your use case?

CLI tools can’t really set Node.js options.

We’ve been discussing adding support for .env files: #43973 (comment). This would be a shorter-term solution for being able to set NODE_OPTIONS from a file. Farther down the road I think we’d like to support a new field in package.json (or possibly even a new config file) that would do the same thing.

@cjihrig
Copy link
Contributor Author

cjihrig commented Apr 28, 2023

What is your use case?

Using loaders in a worker without requiring users to specify CLI flags. Using a shebang is not really an option for me either, and ideally I don't want to spawn a child process just to hide the CLI flag. The fact that the loader thread is shared between workers and the main thread is not a problem for me.

@GeoffreyBooth
Copy link
Member

Using loaders in a worker without requiring users to specify CLI flags.

Can you explain a bit more? Is this a CLI tool? What does it do? Why does it create worker threads?

@cjihrig
Copy link
Contributor Author

cjihrig commented Apr 28, 2023

I can't go into too much detail yet unfortunately, as it is for work and still unreleased. There is a CLI, but also an SDK. The loader itself is just used for hot module reloading, so I think registerLoader() will work just fine.

@GeoffreyBooth
Copy link
Member

The loader itself is just used for hot module reloading, so I think registerLoader() will work just fine.

If you find a good general solution for hot module reloading, please share it.

The registerLoader PR is by a new contributor and I’m not sure he’s working on it any more, so if you have time to help get it finished, that would be greatly appreciated.

@cjihrig
Copy link
Contributor Author

cjihrig commented Apr 28, 2023

I will take a look at the registerLoader PR and can take it over if the current author is no longer interested in working on it.

@targos
Copy link
Member

targos commented May 1, 2023

In the current behavior, there’s a loaders thread and it works for both the main application thread and all worker threads (I think)

That's wrong. Each worker thread has its own loaders thread. Without evidence of the contrary, I think we should consider that there is a bug somewhere.

@targos
Copy link
Member

targos commented May 1, 2023

Found the bug: when a worker is created with execArgv, its child loaders worker doesn't inherit the CLI arguments, so we have:

main thread: []
main thread loaders: []
worker thread: ['--experimental-loader', './loader.mjs']
worker thread loaders: []

@steve2507
Copy link

Sharing my very straightforward use case here: I run my code using tsx, because the codebase is written in typescript. I'd expect my workers (which are also written in typescript) to be able to also leverage tsx. Right now, this does not seem to work.

@JoCat
Copy link

JoCat commented Apr 19, 2024

Similarly, I started implementing testing in my typescript project using ava + tsx. And got the same error.

@thesauri
Copy link

Another use case: this prevents me from spinning up a worker thread in a test with vitest as the test runner. The purpose of the test is to reproduce and fix a concurrency issue that only occurs when running a particular piece of code from a worker thread. Writing the worker thread test piece in JavaScript isn't sufficient as it needs to import modules written in TypeScript.

The suggested solution in the vitest issue tracker doesn't work anymore due to the limitation tracked here.

@GeoffreyBooth
Copy link
Member

GeoffreyBooth commented May 14, 2024

This might have been fixed by #52706, which is on main or nightly but hasn’t gone out yet in a release; it will be in the next 22.x release. That PR won’t enable the code in the top post; the intent isn’t to allow different loaders per thread, but rather to have whatever was loaded for the main thread be shared with the worker threads. So in other words, if you start your app via node --import tsx/esm ./app.ts and then app.ts spawns worker threads that are TypeScript files, hopefully it should work now. You wouldn’t need to pass arguments to new Worker.

@cjihrig or someone else on this thread, do you mind giving it a try and reporting back?

cc @dygabo

@dygabo
Copy link
Member

dygabo commented May 15, 2024

triggered by this issue I submitted PR #53006 to clarify a few things related to the single customization hooks thread introduced by #52706

@bradchristensen
Copy link

bradchristensen commented May 21, 2024

Edit: For those looking for a workaround in the meantime, @hi-ogawa has a working solution described here:

vitest-dev/vitest#5757 (comment)


@GeoffreyBooth thanks for championing this for us! I've just tried out the newly released v22.2.0 as you suggested, with the following simplified repro, in which the script tries to spawn a worker thread from its own file path:

import { fileURLToPath } from "node:url";
import { Worker, isMainThread } from "node:worker_threads";

const THREAD_COUNT = 2;

if (isMainThread) {
  console.log("Main thread running");

  const workers = [...new Array(THREAD_COUNT)].map(() => {
    const worker = new Worker(fileURLToPath(import.meta.url));

    // Capture threadId immediately after spawning, while the worker is alive
    const threadId = worker.threadId;

    worker.on("exit", (exitCode: number) => {
      console.log(`Worker ${threadId} exited with code ${exitCode}`);
    });
    return worker;
  });

  // The real application would post messages to the worker threads here
} else {
  console.log("Doing work");
}

Unfortunately I found that for both Node v22.1.0 and v22.2.0, executing this with npx tsx src/index.ts has the same result - it fails and logs the following:

Main thread running

node:internal/event_target:1090
  process.nextTick(() => { throw err; });
                           ^
TypeError [ERR_UNKNOWN_FILE_EXTENSION]: Unknown file extension ".ts" for /app/src/index.ts

I was executing this in an ESM package (with { "type": "module" }), but changing it to "type": "commonjs" didn't appear to change anything.

I'm not familiar with the internals here but I'm wondering if this is a consequence of not being able to modify require.extensions as per @isaacs' comment above, or something similar (e.g. would it work as expected with a custom loader for regular .js files, and it's just .ts that it doesn't recognise now?)

@SynthLuvr
Copy link

SynthLuvr commented May 21, 2024

I'm getting ERR_UNKNOWN_FILE_EXTENSION for node v20.13.1. For node v22.2.0, I'm getting timeouts before the scripts are even run.

Edit: For node v22.1.0 I also get ERR_UNKNOWN_FILE_EXTENSION. So the error changed between v22.1.0 and v22.2.0.

@daniel-nagy
Copy link

daniel-nagy commented May 27, 2024

Update

I was missing something, you need to import the code that registers the hooks inside the worker. Once I did that the hooks started working.


Am I missing something, or are single thread hooks not working in Node v22.2.0?

According to the changelog, this landed in v22.2.0

[8fbf6628d6] - module: have a single hooks thread for all workers (Gabriel Bota) #52706

However, I can't seem to make this work. I'm trying to add a loader for HTTP imports, using the example in the Nodejs docs (modified for HTTP instead of HTTPS). It works on the main thread but when I spawn a worker I get the following error.

Error [ERR_UNSUPPORTED_ESM_URL_SCHEME]: Only URLs with a scheme in: file, data, and node are supported by the default ESM loader. Received protocol 'http:'

I actually do want different loaders for the main thread and the worker thread. My use case is for React 19 server components. But I can probably workaround the single hooks thread limitation if it worked.

@daniel-nagy
Copy link

I see most people here are trying to use tsx to import a worker that is a TypeScript file. That still does not work. Node will throw an error

TypeError [ERR_UNKNOWN_FILE_EXTENSION]: Unknown file extension ".tsx"

The workaround suggested here does work. However, when I pass conditions to my worker that does not seem to work with tsImport. e.g.

const worker = new TsWorker(new URL("./ClientRenderer.tsx", import.meta.url), {
  execArgv: ["--conditions", "node"], // <-- doesn't seem to be working with suggested workaround.
});

@Guihgo
Copy link

Guihgo commented May 31, 2024

If you are using tsx you can run work threads as follow:

  1. Remove "type": "module", from package.json
  2. Imports without extension .ts or .js
  • Use: import {some} from "./my_dir/my_file"
  • Don't use: import {some} from "./my_dir/my_file.js"
  1. Create the Worker using tsx with execArgv param
const worker = new Worker(require.resolve(workerPath), { execArgv: ["--require", "tsx/cjs"] })

Like if its help you 👍


the 'bug' was reported at privatenumber/tsx#354 (comment) too.

@SystemParadox
Copy link

Remove type: module?? That's really counterproductive.

@daniel-nagy
Copy link

BTW, the --conditions issue is a separate issue with Node itself #50885

@GeoffreyBooth
Copy link
Member

If you are using tsx you can run work threads as follow:

This approach has tsx transpile everything to CommonJS and run it that way. You won’t be able to import ESM dependencies if you do this, at least without --experimental-require-module. Your code will run as CommonJS, not as ESM.

@Guihgo

This comment was marked as off-topic.

@GeoffreyBooth
Copy link
Member

FYI, #52706 landed in 22.2.0 as an attempt to have a single hooks thread that applies to all application threads, including the main thread. It has some issues that we’re addressing in follow-ups, but in 22.3.0 or 22.4.0 there should hopefully be a cleaner way to handle this, where if you register hooks (such as tsx) at startup then they will automatically apply to all worker threads. So node --import=tsx app.js should work and apply tsx to any worker threads that your code creates, without needing to specify anything related to tsx in execArgv.

@proxybee
Copy link

proxybee commented Jun 7, 2024

Edit: For those looking for a workaround in the meantime, @hi-ogawa has a working solution described here:

vitest-dev/vitest#5757 (comment)

@GeoffreyBooth thanks for championing this for us! I've just tried out the newly released v22.2.0 as you suggested, with the following simplified repro, in which the script tries to spawn a worker thread from its own file path:

import { fileURLToPath } from "node:url";
import { Worker, isMainThread } from "node:worker_threads";

const THREAD_COUNT = 2;

if (isMainThread) {
  console.log("Main thread running");

  const workers = [...new Array(THREAD_COUNT)].map(() => {
    const worker = new Worker(fileURLToPath(import.meta.url));

    // Capture threadId immediately after spawning, while the worker is alive
    const threadId = worker.threadId;

    worker.on("exit", (exitCode: number) => {
      console.log(`Worker ${threadId} exited with code ${exitCode}`);
    });
    return worker;
  });

  // The real application would post messages to the worker threads here
} else {
  console.log("Doing work");
}

Unfortunately I found that for both Node v22.1.0 and v22.2.0, executing this with npx tsx src/index.ts has the same result - it fails and logs the following:

Main thread running

node:internal/event_target:1090
  process.nextTick(() => { throw err; });
                           ^
TypeError [ERR_UNKNOWN_FILE_EXTENSION]: Unknown file extension ".ts" for /app/src/index.ts

I was executing this in an ESM package (with { "type": "module" }), but changing it to "type": "commonjs" didn't appear to change anything.

I'm not familiar with the internals here but I'm wondering if this is a consequence of not being able to modify require.extensions as per @isaacs' comment above, or something similar (e.g. would it work as expected with a custom loader for regular .js files, and it's just .ts that it doesn't recognise now?)

I guess your take works but when running typescript ensure your tsconfig is properly set and you build with tsc before running node, I was able to run the above with this tsconfig, it is just the index.ts so this works.
{
"compilerOptions": {
"rootDir": "./",
"outDir": "build",
"target": "esnext",
"module": "NodeNext",
"lib": ["es2020"],
"moduleResolution":"NodeNext",
},
"include": [
".ts",
"
.js"
],
"exclude": [
"package.json"
]
}
image

@bradchristensen
Copy link

@proxybee it's true that is certainly worth considering, if it fits your use case, but then you wouldn't be using a loader at all, which is the topic being discussed here 🙂

@GeoffreyBooth
Copy link
Member

There’s a discussion at nodejs/loaders#203 regarding whether registering hooks should apply automatically to new Worker calls that spawn worker threads. Please comment there if you have any opinions to share. cc @cjihrig

@alshdavid
Copy link

alshdavid commented Aug 14, 2024

I have been doing this while I wait for a solution:

// package.json
{
  "type": "module",
  "imports": {
    "#worker": {
      "source": "./cmd/worker.ts",
      "default": "./cmd/worker.js"
    }
  },
  // ...
}
// cmd/bin.ts
export function spawnWorker() {
  let workerPath = url.fileURLToPath(import.meta.resolve('#worker'));
  if (workerPath.endsWith('.ts')) {
    return new Worker(`import('tsx/esm/api').then(({ register }) => { register(); import('${workerPath}') })`, { eval: true })
  } else {
    return new Worker(workerPath)
  }
}

spawnWorker()

Then run it with:

node --conditions="source" --import tsx ./cmd/bin.ts

My long term plan here is to eventually replace tsx with Node's built-in --experimental-strip-types (though this pattern works already with Node 22.6+, it's not yet usable)

@SynthLuvr
Copy link

I have been doing this while I wait for a solution:

// package.json
{
  "type": "module",
  "imports": {
    "#worker": {
      "source": "./cmd/worker.ts",
      "default": "./cmd/worker.js"
    }
  },
  // ...
}
// cmd/bin.ts
export function spawnWorker() {
  let workerPath = url.fileURLToPath(import.meta.resolve('#worker'));
  if (workerPath.endsWith('.ts')) {
    return new Worker(`import('tsx/esm/api').then(({ register }) => { register(); import('${workerPath}') })`, { eval: true })
  } else {
    return new Worker(workerPath)
  }
}

spawnWorker()

Then run it with:

node --conditions="source" --import tsx ./cmd/bin.ts

My long term plan here is to eventually replace tsx with Node's built-in --experimental-strip-types (though this pattern works already with Node 22.6+, it's not yet usable)

I've tried this workaround but cannot get it working. Still stuck with this issue

@alshdavid
Copy link

alshdavid commented Aug 26, 2024

I've tried this workaround but cannot get it working. Still stuck with this issue

This is how I am currently using it links: [1] [2] [3]

I'm using Node 20 and tsx.
Eventually, I will replace tsx with --experimental-strip-types (when .ts extensions can be rewritten to .js by tsc)

@SynthLuvr
Copy link

This is how I am currently using it links: [1] [2] [3]

I've see the solution but it doesn't work for my use case. I still need a fix for the problem described in this ticket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
esm Issues and PRs related to the ECMAScript Modules implementation. loaders Issues and PRs related to ES module loaders worker Issues and PRs related to Worker support.
Projects
None yet
Development

No branches or pull requests