Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maintaining hook module registration for on-thread hooks #203

Open
guybedford opened this issue Jun 20, 2024 · 18 comments
Open

Maintaining hook module registration for on-thread hooks #203

guybedford opened this issue Jun 20, 2024 · 18 comments

Comments

@guybedford
Copy link

guybedford commented Jun 20, 2024

One of the benefits of having a module.register hook module registration like we currently do for loaders in that it applies for all spawned workers is that by default, resolver, network and transpilation loaders automatically can be applied to all spawned workers.

It is being discussed to migrate to same-thread hooks as an option of the discussion in #201, while possibly also unifying with the synchronous hooks proposal in #198.

Currently the synchronous hooks proposal does not use register to register a hooks module, but instead takes arbitrary functions.

I'd like to suggest we consider maintaining something similar to our existing register API in providing a module for hook definitions, as this more naturally allows hooks to be shared between different loading contexts in a portable way.

This is separate to the question of hook customization across contexts, which could also in turn have manual configuration - for example some kind of pattern like new Worker(path, { register: 'custom-module' }) or new Worker(path, { hooks: 'blank' }) could be used at worker instantiation time. In addition it might even be possible to register hooks that don't get shared by default something like registerHooks(import.meta.resolve('./hooks.js'), { shared: false }).

Specifically, the default story of hooks using the same code implementation by default is what we get for workers today and were looking to get for a shared hooks worker. I think this would be a really valuable property to keep where the default user experience for writing loaders from transpilers to APM is that they do work across contexts, without that being a major structural change to the loader code only to be discovered too late in the loader authoring process.

@joyeecheung
Copy link
Member

joyeecheung commented Jun 20, 2024

To elaborate the current module.register() API looks like this:

// register.js
module.register('/path/to/hook.js');  // inherited by all child workers by default

// hook.js
export function resolve() { ... } 

Where as I am proposing in #198 to switch to take the functions directly in the API, with a separate API to configure the inheritance:

// This tells Node.js that from now on all the child worker/process
// spawned from the current instance should preload this file.
process.preloadInChildren(__filename);  // or import.meta.filename

module.registerHooks({
  resolve() { ... }
});

process.preloadInChildren is an imaginary name and can be changed, or we can split or provide more granular control over child worker / child process behaviors (especially fork behaviors). This makes the inheritance opt-in and explicit, and also allows more control, e.g. if the hooks need to spawn workers via a dependency, but don't want the workers spawned by its dependency to load themselves, they can run the code that spawns the worker before calling process.preloadInChildren() with themselves, or temporarily disables the preload of itself via a different function when loading third-party plugins that may spawn workers.

This can also be used in other use cases that are not related to module hooks (e.g. APM hooks that only cares about generic stats like v8 heap statistics/event loop utilization in all children instances, or not patching modules to do tracing when it's turned off). We can also add APIs to allow more control for the preload scripts (querying, disabling).

@GeoffreyBooth
Copy link
Member

cc @nodejs/loaders plus the participants of #103: @cspotcode @giltayar @arcanis @bumblehead @bizob2828 plus participants of nodejs/node#53332: @VoltrexKeyva @Flarna @alan-agius4

@mcollina
Copy link
Member

I think we need granular control on how the loaders are passed from parent to children. Moreover, we need to consider the case of a child process/thread changing the hooks too. There are a few of edge cases to consider, and how a developer could "reset the chain".

@ShogunPanda
Copy link

I agree with @mcollina. I like the approach taken by @joyeecheung but we need to be able to diverge from it, even during runtime.
Thinks, for instance a library that spawns a worker thread with some hooks and then this thread spawns the real user code. The user might also want to extend or clean the hooks chain rather than just inheriting it.

@joyeecheung
Copy link
Member

joyeecheung commented Jun 21, 2024

for instance a library that spawns a worker thread with some hooks and then this thread spawns the real user code. The user might also want to extend or clean the hooks chain rather than just inheriting it.

Do you mean:

  1. Main thread: no hooks, spawns a worker with hooks (controlled by library)
  2. Worker 2: has hooks (controlled by library), spawns user code
  3. Worker 3: no hooks or extend hooks (controlled by end user)

I think with a flexible API it can be done as:

// Main thread
require('hooks.js')
new Worker('worker2.js', ...);

// hooks.js
process.preloadInChildren(__filename);
module.register({
  ...
  deregister() {
    process.removePreloadInChildren(__filename);
  }
})

// Worker 2
// Has preloaded hooks set by main thread
new Worker('user.js', ...);

// Worker 3
// Has preloaded hooks set by main thread
for (const hook of module.registeredHooks()) {
  hook.deregister();
}
// Or register new hooks with module.registeredHooks()
// Or wrap the hook manipulation code in a separate file
// and set it as preload in children if different behavior needs to be inherited

@joyeecheung
Copy link
Member

Actually if we want to make it possible for the hooks to be dynamically enabled/disabled, we should add callbacks for users to customize what should be done when the hook is enabled/disabled.

@mcollina
Copy link
Member

I'm more thinking something like:

new Workers('user.js', { hooks: [] })

or similar.

@guybedford
Copy link
Author

@mcollina something like that would be great to see. I think there are two kinds of use cases we have here:

  1. Spawning a worker and also creating a new compartment (to use TC39 terminology), where you want to control the hooks for that environment.
  2. Spawning a worker that is within the same overall application compartment and hooks model, so that things like resolver hooks, transpilation hooks and APM attachments also work across the worker spawning.

My main concern here is in the case of (2) where it shouldn't be a requirement that every worker instantiation must be updated to set the hooks, instead having the default of compartment sharing for workers will significantly improve APM ergonomics.

So my preference would be for allowing customization, but with a good default experience.

@Qard
Copy link
Member

Qard commented Jun 21, 2024

I would have separate APIs for explicit hook passing and implicit sharing.

Explicit:

new Workers('user.js', { hooks: [] })

Implicit:

Worker.setDefaultHooks([])

I might even go as far as only allowing implicit hook inheritance via a cli flag. In my opinion it's a bit of an anti-pattern applying implicit changes to sub-environments without explicit action from the parent. While APM would like to inject itself into every thread, it's a bit of a security concern if child threads inherit code which was not explicitly shared with it and therefore perhaps not intended to be shared.

@guybedford
Copy link
Author

In principle I'm all for a Worker.setDefaultHooks and a hooks argument to a worker but the thing is that that they both require registration of a hooks module - we can't just pass direct functions in.

My concern is that the existing status quo of the register API solves for hook portability, in a way that is compatible with setDefaultHooks and a hooks argument to worker, but if we drop this API for a direct function registration API then this again adds portability friction for worker use cases.

That is, I simply want us to maintain the existing design of a registered hooks module.

I could be okay with not making hook registration for workers the default, but the important point is that changing that default should be easy. For example one could imagine the register API supporting a custom option for this:

module.registerHooks(import.meta.resolve('./hooks/mjs'), { childWorkers: true });

While APM would like to inject itself into every thread, it's a bit of a security concern if child threads inherit code which was not explicitly shared with it and therefore perhaps not intended to be shared.

Worker threads are quite frankly not a security boundary. Can you elaborate on the specific security concern here? If anything security based loaders and global attenuation that acts as enforcing security properties of the environment would very much want to be setting childWorkers: true as a security guarantee.

@guybedford
Copy link
Author

If there are concerns around module.registerHooks being async, perhaps we can make it a sync call like require(ESM) and disable top-level await for hooks modules and their dependencies similarly.

@ShogunPanda
Copy link

Do you mean:

  1. Main thread: no hooks, spawns a worker with hooks (controlled by library)
  2. Worker 2: has hooks (controlled by library), spawns user code
  3. Worker 3: no hooks or extend hooks (controlled by end user)

Exactly, that was the scenario I was thinking about.

I think with a flexible API it can be done as:

...

I love that!

@Flarna
Copy link
Member

Flarna commented Jun 24, 2024

I don't think we should rely on a single user to decide on propagating hooks to workers or not.
Loader hooks are often just an implementation detail of some tool. And these tools may be applied at different levels from different persons.
e.g.

  • App developer might decide to use ts-node or similar which might use loader hooks
  • Some framework like angular (?) used as dependency might use workers and hooks internally
  • App deployer might add an APM tool (e.g. without code change, via NODE_OPTIONS) which uses loader hooks internally
  • Dev might user workers/hooks for it's own application logic

I think each individual tool/library should have the possibility to set the hooks in a way it is required to operated correct. Delegating all into a single user tends to be tedious.

@guybedford
Copy link
Author

guybedford commented Jun 24, 2024

To be clear - this is exactly what we have currently and I am only advocating to retain the status quo in Node.js. Each tool and library can choose how it sets hooks, there is no single hooks "owner".

All I am requesting is that we either make it a default to allow hooks to work on workers, or as a compromise, allow a simple flag to enable this feature such as workers: true when registering the hooks. Any code is still in a position to be able to register hooks though without having to go through some single gatekeeping process.

The constraint specifically is that hook functions are not transferrable to workers, and if workers are to have sync hooks on their own threads, we should design a worker hooks system that is by default transferrable, as opposed to by default not transferrable. The ultimate reason being one of encouraging workers, not adding more frictions to them (there are enough frictions with workers for JS, and it is our job to remove those to make the language more multi threaded and faster).

@jsumners-nr
Copy link

I can imagine an application that relies on an APM tool:

  1. Not knowing how the APM tool does what it does
  2. Spawning a worker and expecting to get traces from the code in the worker

Something like:

require('cool-apm')

console.log('app started')
const worker = new Worker('./some.js')

// wait for worker to do things

I know that the tool I work on promotes omitting that first line and starting the app like: node -r cool-apm app.js. So that's a thing to consider.

But we also don't really support threads.

In short, users are going to expect things to just work.

@Qard
Copy link
Member

Qard commented Jun 24, 2024

Most APMs don't really do much of anything about threads currently, or just fallback on -r loading a separate instance of their library into every thread, so I think we're fine so long as the --require/--import flags continue to work as they already do. It would be nice to have some additional mechanism for APMs to internally wire up worker support when loaded programmatically though. So a require('some-apm') could internally do Worker.useLoaderHooks(...) to auto-load the library into any spawned worker. Or better, having a method to bootstrap into every worker with an explicit separate connection from the main MessagePort.

@joyeecheung
Copy link
Member

joyeecheung commented Aug 5, 2024

I noticed that with nodejs/node#53787 a hook-specific inheritance setting can be somewhat awkward, if we want to make that config to support "preload this file for all Node.js instances, main thread or workers". The best experience overall I think would be something like users specifying a noderc entry to preload cool-apm and by default, this gets loaded in all threads, which is going to be more akin to relying on -r inheritance described by @Qard just that it's configured with a file on disk. And then we could have some JS API for removing/querying loaded hooks in the current instance or changing preloads when spawning a worker. If the preloaded file registers the inheritance as part of module.registerHooks() it would be weird since there is already a preload config, so either we do some internal deduplication, which seems awkward, of users get the hooks loaded twice, which can be a bug.

@joyeecheung
Copy link
Member

joyeecheung commented Aug 14, 2024

We discussed in the meeting this week and we agreed that we should let a dedicated preload configuration to handle this, ideally via something like nodejs/node#53787

To summarize why specifier + parentURL based inheritance doesn't work:

  1. This adds the question of "how to resolve the loader" - module.register assumes the loader is resolved as ESM e.g. the matching exports condition is import, which doesn't really make sense for a loading API that's actually neither import() nor require(). It also follows the ESM resolution rules which are different from CJS resolution rules (e.g. NODE_PATH support) so it doesn't work well with the universalness that we are aiming. The resolved URL is also not deterministic (can be affected by registered loaders), so it doesn't really serve what I thought would be nice to have - a unique, predictable identifier for the hook.
  2. For a dependency chain like instrumentation.js -> open-telemetry -> import-in-the-middle/hook.mjs (which gets passed to the registration API), automatically preloading import-in-the-middle/hook.mjs in child workers is meaningless without also preloading instrumentation.js and open-telemetry. But if the entire chain of instrumentation.js -> open-telemetry -> import-in-the-middle/hook.mjs is configured to be preloaded in child workers, having import-in-the-middle/hook.mjs automatically preloaded too is a conflict and leads to double registration.
  3. Actually in the case of 2, IIUC that the registration model is already pretty broken for module.register, that is why to actually inherit the consuming code of the low-level hooks the loader worker has to be a singleton, but then a singleton means you cannot have different configurations for different workers/the main thread which was problematic and why there were regressions. The only way I can think of for this model to work is for all the consuming code to also take options that they are supposed to load via a URL instead of a user-provided object, like what module.register does, so that everything can be sent over to a new worker via URLs to code that should be evaluated again, and it should not have states that cannot be serialized and copied over to another worker. But from what I can tell no existing hook users actually do this pattern (this also requires them to use import() or require() to get the options, which is another can of worms), and asking them to change to this pattern is certainly breaking.

At the end of the day I think for auto registration of end module hooks to work it still needs automatic inheritance for arbitrary scripts that consume the low-level hook, so automatic inheritance of a low-level hook module itself is just redundant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants