Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

module: initialize hook returns load, resolve #50044

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ggoodman
Copy link
Contributor

@ggoodman ggoodman commented Oct 4, 2023

This commit allows the initialize() hook to optionally return an object having the resolve() and load() hooks as properties. This allows state passed into initialize() to be shared with the resolve() and load() hooks either via closure or class instance.

In addition to developer ergonomics, supporting this model will make it easier to write tests against a loader module. The existing design
forces state to be shared at the module level which puts the burden of invalidating the ESM module cache on anyone hoping to write isolated
tests against a loader module.

TODO:

  • Obtain consent / consensus to explore this api.
  • Write tests.

Fixes: #50042

@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/loaders

@nodejs-github-bot nodejs-github-bot added esm Issues and PRs related to the ECMAScript Modules implementation. needs-ci PRs that need a full CI run. labels Oct 4, 2023
@GeoffreyBooth
Copy link
Member

It would help in understanding your PR if you could update the docs too. That would make it very clear what your intended API is, with examples. (Obviously it’ll also need tests and so on, but having at least docs would help at this stage.)

This might be related to #49159; the deregister API from that PR would allow you to register and deregister hooks over and over for testing.

@GeoffreyBooth GeoffreyBooth added the loaders Issues and PRs related to ES module loaders label Oct 4, 2023
@ggoodman
Copy link
Contributor Author

ggoodman commented Oct 4, 2023

Hi @GeoffreyBooth I think the deregister function described in that issue is complementary.

allow hook methods to receive a fourth state argument – so instead of returning the result of initialize and discard it, we now keep the result attached to the loader instance. this value is then fed back to hook invocations. after thinking about the comments in nodejs/loaders#147 this actually seems necessary if there can be an arbitrary number of "copies" of any given loader; how can loaders differentiate their internal state from one another? there isn't a good way without something like this.

I think my pull request addresses this use-case in a more elegant way. Instead of adding new arguments to loader hook methods, the design in this PR means that data passed into initialize is now trivially available by the returned hooks.

It would help in understanding your PR if you could update the docs too. That would make it very clear what your intended API is, with examples.

I'll attach the example from the issue I originally created that I think illustrates the consumer-facing benefits of the design.

register.js:

import { register } from 'node:modules';

const mc = new MessageChannel();

mc.port2.onmessage = (msg) => {
  console.debug(msg);
};

register('./loader.js', { port: mc.port1 });

loader.js:

Here's an example of how we can implement a loader as a class that holds instance-level state in a private field. This avoids needing to have mutable module-level state for the message port.

// Here is the initialize function that accepts the data passed into `register()`.
export async function initialize(data) {
  // We don't need to store `data` in module-level state anymore.
  return new MyLoader(data);
}

// The actual loader can now be a class instance.
class MyLoader {
  #port;
  
  constructor(data) {
    this.#port = data.port;
  }
  
  // The class instance implements the resolve contract
  resolve(specifier, context, nextResolve) {
    this.#port.postMessage(['resolve', specifier, context]);
    // No-op, defer to built-in
    return nextResolve(specifier, context);
  }
  
  // The class instance implements the load contract
  load(url, context, nextLoad) {
    this.#port.postMessage(['load', url, context]);
    // No-op, defer to built-in
    return nextLoad(url, context);
  } 
}

test.js:

Here's an example of how it is easier to write a unit test against the proposed API. The key is that ./loader.js isn't stateful; it doesn't have module-level hidden state that needs to be reset between tests. The loader is the resolved value of the initialize() function meaning that it can be instantiated more than once with different options.

import test from 'node:test';

import { initialize } from './loader.js';

test('My custom loader', async (t) => {
  await t.test('resolve()', async (t) => {
    await t.test('does useful stuff', async (t) => {
      const mc = new MessageChannel();
      const loader = await initialize({ port: mc.port1 });

      // Example of using this _instance_ of the loader to test it.
      await runFirstTestOnLoaderInstance(t, loader);
    });

    await t.test('does other useful stuff', async (t) => {
      const mc = new MessageChannel();
      const loader = await initialize({ port: mc.port1 });

      // Example of using another _instance_ of the loader to test it.
      await runSecondTestOnLoaderInstance(t, loader);
    });
  });
});

@ggoodman
Copy link
Contributor Author

I know that @GeoffreyBooth suggested that I flesh out docs for this to better illustrate the changes in API but I would prefer to hold off on that until I get some signals of interest from the @nodejs/modules / @nodejs/loaders folks.

This change is proposed as a small incremental improvement. If the team prefers, I can propose a more dramatic change to the loaders API with ideas taken from esbuild's plugin API that seem relevant to the problem being solved.

@GeoffreyBooth
Copy link
Member

If the team prefers, I can propose a more dramatic change to the loaders API with ideas taken from esbuild‘s plugin API that seem relevant to the problem being solved.

What change did you have in mind? We recently marked the overall API as “release candidate,” meaning we don’t anticipate any further changes before marking it stable, so it’s not really a good time for breaking design changes unless there’s a strong motivator.

The proposal about initialize returning the other hooks seems more or less fine to me; an earlier version of initialize had its return value passed along into the return value of register, so I’m hesitant to repurpose the return for something else in case we need to go back to that earlier design. Though I guess it could always become a new property on the return object, like initialize could return { resolve, load, registerReturnValue } or whatever.

@ggoodman
Copy link
Contributor Author

I'll share my general feedback on the loaders API in #50042 so that we can leave this PR to discuss the incremental 'improvement' proposed here.

This commit allows the `initialize()` hook to optionally return an
object having the `resolve()` and `load()` hooks as properties.
This allows state passed into `initialize()` to be shared with the
`resolve()` and `load()` hooks either via closure or class
instance.

In addition to developer ergonomics, supporting this model will
make it easier to write tests against a loader module. The
existing design forces state to be shared at the module level
which puts the burden of invalidating the ESM module cache
on anyone hoping to write isolated tests against a loader
module.

Fixes: nodejs#50042
@JakobJingleheimer
Copy link
Contributor

I'm also leaning in favour of this for the communication aspect—message channel seems a heavy-handed solution for a simple problem (and a simple problem ought to have a simple solution), not to mention the latency. I'm also generally in favour of OOP.

What is/are the specific scenario(s) that need addressing? Are multiple workers the only one where this surfaces? There is overhead to a class that's otherwise not paid with functional exports, and from what I can think of, module-level state generally fits the bill. I'm looking to ensure we address a problem people will actually have—if only 1 person will be in this scenario, maybe the overhead etc this would impose on everyone isn't worth the non-ideal message channel design.

If it's merely to facilitate testing, that is easily a simply addressed with a generation query param on the import specifier.

I see @giltayar's #50042 (comment), and he authors multiple major libraries effected by this (so that gives me a fair amount of confidence there is wide(r) issue this will address). But I think I'm missing a piece of context that seems very important to understanding the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
esm Issues and PRs related to the ECMAScript Modules implementation. loaders Issues and PRs related to ES module loaders needs-ci PRs that need a full CI run.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ESM Loader Hooks (20.8) should support a more testable, less stateful design
4 participants