RFC: Per Package Loader Hooks #18233

bmeck · 2018-01-18T16:00:06Z

Per-package loader hooks

This proposal seeks strong support for per-package loader hooks.
This is a definition of how to achieve them.

Problem

Individual application and packages have loading considerations that vary. The ability to globally mutate the Node module system is problematic and causes packages to alter each other's behavior implicitly.

CommonJS had various abilities to mutate the CJS loader with NODE_PATH, require.extentsion, etc. These have all been deprecated.

Various workarounds to these use cases do exist but do not apply to all code using these patterns.

Example Use Cases

Proposal

Scope of hooks

Hooks must be confined to a well defined subsection of the URL space (fs) used by import.

This proposal will define the boundaries of subsections to be:

A directory containing package.json will have a termination when crossing the directory.

Given the fs of:

/path-searching-hook
/foo
  /package.json
  /bar/example.mjs
/a
  /package.json
  /a.mjs

// /foo/bar/example.mjs
import '../' // does not cross boundary by resolving to `/foo`
import '../..' // does cross boundary by resolving outside of `/foo` to `/`

// /a/a.mjs
import '../foo' // does cross boundary by resolving out of `/a`
import '../foo' // does cross boundary by resolving to `/foo`

Consumer and Author negotiation

It must be possible as a consumer to affect the path resolved within another package's scope.
It must be possible as a author to affect the path resolved within the author's package scope.

In order to avoid recursive boundary crossing in one step, all paths will be resolved in two phases. This is similar to

External resolution that is resolved by consumers from a different package scope.
Self resolution that is resolved by the package scope containing the resolved path.

// /a/a.mjs
import('/foo');

// 1. fires /a 's package scope loader hooks, seeing `/a/a.mjs` as source and `/foo` as specifier
// lets assume it resolves to /foo
// 2. fires /foos 's package scope loader hooks, seeing `/foo` as source and `./` as specifier

Declaration of hooks

Per package loader hooks can be declared in a package.json file as a specifier to find using the globally defined resolution algorithm.
Global hooks may affect this resolution, but package hooks may not.
This allows code coverage, instrumentation, etc. to access package hooks.

{
  "name": "foo",
  "loader": "../path-searching-loader"
}

This also allows the hooks to exist outside of package boundaries. This file when loaded as a loader will be in a separate Module Map space from userland and only has the globally defined resolution algorithm.

Types of hooks

only a resolve hook. use vm.Module to obtain a new URL if you need to create Module records dynamically.

On the nature of static resolution

ESM is able to link statically and there should be a path to allow static / ahead of time usage of per package hooks ideally.

By only having a single resolve hook, paths can be rewritten and observed to do in-source replacement.

This is problematic however, since vm.Module lives in memory.
Usage of such APIs on platforms without writable fs like Heroku should have a path forward for these hooks.

I recommend a combination of V8's SnapshotCreator when possible, and a flag to allow rewriting vm.Module reservations to a location on disk.

Problem, multiple boundary crossing

/root
  /package.json
  /entry
    /package.json
  /dep
    /package.json

If entry were to import('../dep'). It would be handled in the typical entry hooks then dep hooks manner. This does not give root a chance to intercept the imports.

This is seen as a suitable limitation since root is presumed to have ownership of entry and dep's source code by them existing within its directory. Edit the entry and dep packages as needed in order to achieve hooking that goes through root's use cases.

Composition

Hooks should have a means by which to achieve composition. This is needed for cases of multiple transformations. A package might seek to call a super of sorts to get the result of a parent loader, and it may seek to do the exact opposite as a guard to ensure expected behavior.

Loaders therefore need to have a concept of a parent loader hooks to defer to, or to ignore.

Changing hook allocation to be done using new and providing the parent as a paremeter is sufficient for this:

#! node --loader
module.exports = class LogImports {
  constructor(parent) {
    this.parent = parent;
  }
  async resolve(specifier, referrer) {
    debugger;
    const ret = await this.parent.resolve(specifier, referrer);
    console.log(url, 'became', ret);
    return ret;
  }
}

Example use cases for composition

Code Coverage
Instrumentation such as APM
Mocks/Spies in testing frameworks
Logging/Debugging
Compilation
Linting
Isolation (such as with code signing)

Isolation

Hooks that are composed still are isolated by per-package boundaries. Nested packages will not fire the parent loader hooks unless they cross into a package boundary with those hooks.

Passing arbitrary data between instances can be problematic for both isolation and threading. Therefore the only data passed between instances of loaders will be transferables (including structured clone algorithm) or primitives.

The parent passed to the constructor of a loader will be a limited facade that only shows white listed properties and calls the relevant method on the true parent instance. It will ensure errors are thrown if given improper arguments length and/or non-transferable data.

Per-package composition

Can be achieved by manually constructing the chain inside their per-package hook code.

Global composition

Can be achieved by providing multiple --loader flags. This allows for better debugging when development loaders need to be added. The full design of this is left to another RFC.

npm start
# => node hasErrors.mjs
# aborts
export NODE_OPTIONS='--loader DebugImports'
npm start
# will log imports if HasErrors defers to the parent loader

Ignoring parents

In certain scenarios a package may need to ignore the parent loader. In those situations the hooks will be unable to defer to the default global behavior of the process, which may provide debugging behavior such as logging/code coverage/linting/etc.

For now escape hatches are punted on this design space to userland, but it is recommended that when using NODE_ENV=development or NODE_ENV=test all loaders defer to the parent loader.

Code signing invariant implications

Mutating the code loaded in a code signed bundle is problematic. Integrity checks of unexpectedly mutated imports should fail. This area needs more research. Use of any sort of in-source translation should be avoided.

Future research

Given the problems of ignoring scripts and code signing being unable to easily defer to parent loaders more design needs to be done around development workflows. Inspector tooling is the recommended approach. This may mean adding special hooks to inject loader hooks during development via a flag such as --inspector-loader-hooks=LogImport that may fire before per package hooks but ensures the inspector is running. Such hooks would not be suitable for production environments.

This design also does not instrument CJS as loaders currently are not able to instrument CJS. It is not a design goal of this specification to add CJS support to ESM loaders; however, any design for CJS loaders that is presented should and will be considered for compatibility reasons.

The text was updated successfully, but these errors were encountered:

mcollina · 2018-02-01T16:59:02Z

It would be better to store all ESM info in: a single "esm" property.
I suspect we will be adding several configs like this one in package.json, so I'd prefer it to be future-proof.

bmeck · 2018-02-01T17:12:04Z

@mcollina I've looked at allowing require to be instrumented as well but decided not to instrument it in loaders since it would need two separate hooks; one for synchronous operations, one for asynchronous. I'm fine putting it under a property / in an array / etc. but would prefer not to tie it specifically to ESM for now.

lykkin · 2018-02-12T23:35:13Z

Is the release of official ES module support certain to include this functionality? The particular mechanism used to interact with the module loading isn't important, just the ability to interact in the loading.

I'm looking into how one might load APM instrumentation into ES modules, and this level of support would be awesome (it is way better than the work arounds I've been able to come up with).

bmeck · 2018-02-12T23:39:53Z

@lykkin APMs would probably use the existing APM example workflow and a global loader (--loader) for now

lykkin · 2018-02-16T22:03:02Z

We spent some time experimenting with using loader hooks for instrumenting external modules, but weren't able to find a reasonable solution. Our test case involved a bare-bones express app (with express' index.js file converted to index.mjs), with a few different approaches:

`resolve` hook

We were able to make this approach work using the APM loader example workflow as a base, but it is a fragile solution. Because what is ultimately returned from the resolve hook is a file url and format (not the module itself), we were forced to resolve to our own passthrough ESM file, which then actually imported the module (reaching up into node_modules, because it's not a local dependency of the APM agent) and ultimately exported an instrumented version of it:

import express from '../../../../express' // pointing back to node_modules
import shimmer from '../../shimmer'
import api from '../../../index'
const { agent } = api

export default shimmer.instrument(agent, express, 'express')

This approach also includes a potential problem with nested/duplicated dependencies. If two versions of a module are installed in the project, it is impossible to know which one was just loaded. Since the resolve hook depends on constructing a url from the specifier (in this case 'express'), any module with an internal dependency on the same module would resolve with the same url (assuming it's also an ES module, regardless of version. Working around this in the resolve hook would be very cumbersome.

`postLoad` hook

This was an attempt to create an additional hook that would fire just after a module is instantiated, but before it is returned. The issue we ran into with this approach seemed to be that because all dependencies in a graph are instantiated in a single this.module.instantiate call, when module.evaluate is executed (here), there are no usable references to which modules were loaded, and also nowhere to hook in during the operation because it all takes place inside V8.

`dynamicInstantiate` hook

We also tried using a dynamicInstantiate hook in combination with a modified resolve. resolve would first check if .mjs instrumentation existed for the given specifier (like the example loader), and then, if so, resolve with format: 'dynamic'. The issue with this approach came from having to know the module's exports upfront, to use in execute. This led to attempting await import(url) in order to assign the exported keys to exports, but that was unsuccessful, mainly due to the resulting recursion making keeping track of urls also overly cumbersome.

We're thinking what we need is either:

A new hook similar to postLoad, which would intercept modules as they're instantiated by V8. You can see a prototype of the desired functionality here.
A change to existing loader functionality that allows access to both the loaded module's export value and its original top-level specifier.

Again, the first approach with the resolve hook works as described, but that pattern of hijacking the import process and explicitly pointing back up to node_modules is not the way to go. A better approach might be to utilize import.meta when it's implemented, by storing the exports and specifier.

bmeck · 2018-02-16T22:25:06Z

@lykkin as stated in the JS spec, the module namespace object should not be mutable. So, I'm not sure your idea with postLoad would actually be beneficial since the VM should prevent mutation of it directly.

bmeck · 2018-02-16T22:28:36Z

@lykkin we should setup a call probably before discussing adding hooks, since that might apply to both per package and global hooks, which are separate things.

lykkin · 2018-02-16T22:51:28Z

@bmeck The postLoad hook was just the simplest interface to get at the data required to instrument things like we currently do, I'm sure I'm stomping on some assumption I'm unaware of. Since the namespace object is immutable, the resolve hook proxying could work though there needs to be enough information provided to the proxy module to know what module it is proxying.

A call sounds super helpful, I've been trying to read up on the discussion going on around this, though it sounds like there are a bunch of points I'm missing.

guybedford · 2018-03-04T15:51:38Z

@lykkin very interesting feedback. Would you be interested in proposing a postLoad hook further here for Node?

lykkin · 2018-03-08T18:12:24Z

@guybedford It sounds like there would have to be a case made upstream to v8 about exposing modules as they are being loaded, though there are also other interfaces that could be implemented to get the same functionality (e.g. an instrumentation API in v8 where you can register hooks on a module's methods and it will export the relevant data to a listener).

Having a hook in v8 for doing these kinds of things would be helpful for sure, but it sounds like I need to know more about v8's goals in this area to make a proper proposal. Would you know where I can read more about that?

guybedford · 2018-03-09T17:26:58Z

@lykkin I wonder how far we might get with a translation approach something like:

instrument (moduleName: string) {
  return {
    exports: ['x', 'y', 'p'],
    instrument (exportName, value) {
      return wrap(exportName, value);
    }
  }
}

export var x = 5;
export const y= 10;
export function p () {
  return ++x;
}

// this source is added by the instrumentation hook
x = doInstrument(x);
p = doInstrument(p);
y = doInstrument(y);

The only problem with the above would be const bindings, but perhaps a source replacement of export const -> export let would also be possible as a hack.

At least using these hacks we could possibly explore the space somewhat. v8 API hooks would be the ideal certainly though, we'd just need to have a pretty good idea of what we'd want for Node and why before pushing I think?

guybedford · 2018-03-09T17:27:51Z

(sorry @bmeck for going off-topic here, perhaps we should start a new thread)

BridgeAR · 2020-01-02T23:49:49Z

Is this issue still relevant? There was a lot ongoing since this issue was opened.

bmeck · 2020-01-06T14:57:51Z

@BridgeAR deferred while other things still are being looked into, but yes this is still being discussed though this proposal likely needs some updating

bmeck · 2021-11-22T18:01:03Z

@guybedford I want to re-re-revisit this.

guybedford · 2021-11-22T18:50:23Z

@bmeck I'm still quite strongly against arbitrary hooks, so would prefer specific per-package features that are justified by direct use cases over arbitrary resolution hooking, and only arbitrary hooks where the limits of a more specific method are really being hit. The reason being that package invariants are much easier to handle with generic resolution rules that can be applied.

bmeck · 2021-11-23T04:08:47Z

The general problem with using loaders due to it being a CLI flag and also being global and the persistent leaning on loaders to solve various use cases is recurring throughout multiple years. I think claiming there isn't a justification is a bit much at this point.

bmeck added feature request Issues that request new features to be added to Node.js. esm Issues and PRs related to the ECMAScript Modules implementation. labels Jan 18, 2018

bmeck self-assigned this Jan 18, 2018

bmeck mentioned this issue Jan 26, 2018

esm: Implement esm mode flag #18392

Closed

4 tasks

devsnek mentioned this issue Feb 1, 2018

esm: provide named exports for all builtin libraries #18131

Closed

4 tasks

bmeck mentioned this issue Feb 5, 2018

--loader hooks for require() nodejs/modules#6

Closed

bmeck mentioned this issue Jun 21, 2018

Proposal for transparent interop implementation nodejs/modules#137

Closed

bmeck mentioned this issue Jul 12, 2018

Feature: ESM in executable files #49444

Closed

bmeck mentioned this issue Aug 2, 2018

Proposals for configuring parse goals of files in --experimental-modules nodejs/modules#160

Closed

bmeck mentioned this issue Sep 18, 2018

Pluggable Loaders to support multiple use cases nodejs/modules#82

Closed

reasonablytall mentioned this issue Jul 17, 2019

Loader Hooks nodejs/modules#351

Closed

bmeck mentioned this issue May 1, 2020

Usage of Loader hook as APM/RASP vendors nodejs/modules#508

Closed

bmeck mentioned this issue May 27, 2021

Tool config transpilation for webpack, gulp, etc nodejs/loaders#7

Open

This was referenced Apr 7, 2024

[Snyk] Security upgrade eslint from 4.19.1 to 9.0.0 aliscco/alisco-node#1365

Open

[Snyk] Security upgrade eslint from 4.19.1 to 9.0.0 aliscco/alisco-node#1366

Open

baby636 mentioned this issue Apr 7, 2024

[Snyk] Security upgrade eslint from 7.32.0 to 9.0.0 baby636/node#145

Open

ryan-ally mentioned this issue Apr 7, 2024

[Snyk] Security upgrade eslint from 6.8.0 to 9.0.0 ryan-ally/node#106

Open

aliscco mentioned this issue Apr 7, 2024

[Snyk] Security upgrade eslint from 3.19.0 to 9.0.0 aliscco/alisco-node#1372

Open

leonardoadame mentioned this issue Apr 7, 2024

[Snyk] Security upgrade eslint from 3.19.0 to 9.0.0 leonardoadame/node#770

Open

This was referenced Apr 8, 2024

[Snyk] Security upgrade eslint from 4.19.1 to 9.0.0 aliscco/alisco-node#1375

Open

[Snyk] Security upgrade eslint from 7.32.0 to 9.0.0 aliscco/alisco-node#1376

Open

[Snyk] Security upgrade eslint from 5.16.0 to 9.0.0 aliscco/alisco-node#1377

Open

wedataintelligence mentioned this issue Oct 9, 2024

[Snyk] Fix for 3 vulnerabilities wedataintelligence/node#127

Open

ron-a20 mentioned this issue Oct 9, 2024

[Snyk] Fix for 3 vulnerabilities ron-a20/node#142

Open

aliceUnhinged613 mentioned this issue Oct 11, 2024

[Snyk] Fix for 4 vulnerabilities aliceUnhinged613/node#146

Open

This was referenced Oct 21, 2024

[Snyk] Fix for 6 vulnerabilities Graysonbarton/node#38

Open

[Snyk] Fix for 16 vulnerabilities Graysonbarton/node#61

Open

[Snyk] Fix for 5 vulnerabilities Graysonbarton/node#93

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Per Package Loader Hooks #18233

RFC: Per Package Loader Hooks #18233

bmeck commented Jan 18, 2018 •

edited

Loading

mcollina commented Feb 1, 2018

bmeck commented Feb 1, 2018

lykkin commented Feb 12, 2018

bmeck commented Feb 12, 2018

lykkin commented Feb 16, 2018

bmeck commented Feb 16, 2018 •

edited

Loading

bmeck commented Feb 16, 2018

lykkin commented Feb 16, 2018

guybedford commented Mar 4, 2018

lykkin commented Mar 8, 2018

guybedford commented Mar 9, 2018

guybedford commented Mar 9, 2018

BridgeAR commented Jan 2, 2020

bmeck commented Jan 6, 2020

bmeck commented Nov 22, 2021

guybedford commented Nov 22, 2021

bmeck commented Nov 23, 2021

RFC: Per Package Loader Hooks #18233

RFC: Per Package Loader Hooks #18233

Comments

bmeck commented Jan 18, 2018 • edited Loading

Per-package loader hooks

Problem

Example Use Cases

Proposal

Scope of hooks

Consumer and Author negotiation

Declaration of hooks

Types of hooks

On the nature of static resolution

Problem, multiple boundary crossing

Composition

Example use cases for composition

Isolation

Per-package composition

Global composition

Ignoring parents

Code signing invariant implications

Future research

mcollina commented Feb 1, 2018

bmeck commented Feb 1, 2018

lykkin commented Feb 12, 2018

bmeck commented Feb 12, 2018

lykkin commented Feb 16, 2018

resolve hook

postLoad hook

dynamicInstantiate hook

bmeck commented Feb 16, 2018 • edited Loading

bmeck commented Feb 16, 2018

lykkin commented Feb 16, 2018

guybedford commented Mar 4, 2018

lykkin commented Mar 8, 2018

guybedford commented Mar 9, 2018

guybedford commented Mar 9, 2018

BridgeAR commented Jan 2, 2020

bmeck commented Jan 6, 2020

bmeck commented Nov 22, 2021

guybedford commented Nov 22, 2021

bmeck commented Nov 23, 2021

bmeck commented Jan 18, 2018 •

edited

Loading

`resolve` hook

`postLoad` hook

`dynamicInstantiate` hook

bmeck commented Feb 16, 2018 •

edited

Loading