Skip to content
This repository has been archived by the owner on Sep 2, 2023. It is now read-only.

Undocumented behavior of package type searches within node_modules #436

Closed
DerekNonGeneric opened this issue Nov 18, 2019 · 14 comments
Closed

Comments

@DerekNonGeneric
Copy link
Contributor

DerekNonGeneric commented Nov 18, 2019

As suggested, I've opened an issue for tracking the undocumented behavior of package type searches performed by implementations of the READ_PACKAGE_SCOPE algorithm occurring within node_modules directories. I've moved my preliminary findings from the code review to this thread to allow for more focused discussions to take place.

Initial observations (non-exhaustive)

From the original thread:

  • The upward search seems to stop once it reaches a directory named node_modules.
  • Placing a package.json with a "type": "module" field in a node_modules directory doesn't result in .js (or even .mjs) files as being interpreted to be in ESM context.
  • It's not only loose files adjacent to the package.json that aren't switching context. Files within subdirectories of the node_modules that contains this package.json are similarly unaffected.

This behavior doesn't seem to be documented anywhere, so I'm not sure if is intentional. I'll keep looking and update this issue with any other findings.

Aside from local manual testing, further details can be gleaned from:

  1. The following step1 of the READ_PACKAGE_SCOPE specification.

    scopeURL ends in a "node_modules" path segment, return null.

  2. The following code block2 of the internal implementation.

    if (pjson\_url\_path.length() > 25 &&
        pjson\_url\_path.substr(pjson\_url\_path.length() - 25, 25) ==
        "node\_modules/package.json") {
      break;
    }
    

Conclusion

If this behavior is by design, the Package Scope and File Extensions section should probably mention it. I'm curious as to what the rationale is behind node_modules being a DMZ (de-modularized zone). 😄


Footnotes

1: This is step 2.1. in the pseudocode for the READ_PACKAGE_SCOPE(url) function located in the Resolver Algorithm section of ECMAScript Modules document.
2: This block starts on line 683 of module_wrap.cc in the internal C++ implementation.

Implications of node_modules being a DMZ

  • ESM context cannot cascade into every module of every subdirectory of node_modules.
  • Projects currently taking advantage of free bare specifiers in the /src/node_modules directory structure pattern may be forced to refactor.
@Jessidhia
Copy link

My assumption is that it's to avoid breaking packages or symlinks inside the node_modules/, which would be resolved by the search algorithm but may not necessarily have a package.json.

I don't think it's something that can happen with regular npm modules, though; even if you somehow had a package file without a package.json, npm would need to write its own metadata in there.

@DerekNonGeneric
Copy link
Contributor Author

DerekNonGeneric commented Nov 19, 2019

Example of implications as promised in nodejs/node#30514 (comment)

This occurs in projects with the following directory structure, which resembles mine.

The hierarchy enables access to free bare specifier imports, which is a technique popularized by @Rich-Harris in his wildly popular tweet about it.

// Flag: --es-module-specifier-resolution=node

/lorem/                                  # Hybrid dual-mode package
├──  /bin/
|    └──  lorem                          # CommonJS w/ node shebang
├──  /src/
|    ├──  /esm-cli/*
|    |    └──  /commands/
|    |    |    ├──  help.js
|    |    |    └──  run.js       
|    |    └──  package.json              # Type module package manifest
|    ├──  /cjs-cli/
|    |    └──  /commands/
|    |         ├──  help.js
|    |         └──  run.js
|    └──  /node_modules/*                # Free bare specifiers for local modules
|         └──  package.json              # Type module package manifest (does nothing)
|         └──  /shared/*
|         |   ├──  /index.js
|         |   ├──  /utils.js
|         |   └──  /constants.js
|         └──  /ipsum/*
|         └──  /dolor/*
|         └──  /set/*
|         └──  /amet/*
|         └──  loosemodule.js             # import('loosemodule')
├──  /node_modules/                       # 3P deps from npm
└──  /package.json                        # Cannot be a hybrid package w/ type: "module"

* earmarked for ESM

To enable ESM in projects with this current structure post-unflagging, it should be as simple as adding a "type": "module" package manifest in /src/node_modules/ and /src/esm-cli/, right? Well, to maintain the current resolution algorithm, the --es-module-specifier-resolution=node will need to be enabled, which isn't too bad. But, as it currently stands, every single one of those directories with asterisks will need a package manifest specifying module type in them too. Why? The node_modules DMZ boundary.

@MylesBorins
Copy link
Contributor

@DerekNonGeneric could --experimental-resolve-self combined with export maps allow for similar behavior without the resolution algorithm?

@DerekNonGeneric
Copy link
Contributor Author

DerekNonGeneric commented Nov 24, 2019

/to @MylesBorins

[...] could --experimental-resolve-self combined with export maps allow for similar behavior without the resolution algorithm?

If I understand correctly, that would also require a package.json in each of the subdirectories because package.json is not recognized when it is inside of node_modules. When importing that module in the index.mjs, it wouldn't be recognized as an ES module. So I tried putting a type: module manifest inside of the subdir and I got a very weird error. It seems like the ESM loader is trying to load the manifest as a module. What follows is my attempt. It's possible that I did something wrong, please let me know.

lorem/package.json:

  "name": "lorem",
  "exports": {
    ".": [
      "./index.mjs"
    ]
  }

lorem/index.mjs:

import * as constants from './src/node_modules/shared/constants.js';
export default constants;

lorem/x.mjs:

import lorem from 'lorem';

Run the following command.

node --experimental-resolve-self x.mjs

Error:

(node:16589) ExperimentalWarning: The ESM module loader is experimental.
internal/modules/run_main.js:50
    internalBinding('errors').triggerUncaughtException(
                              ^

TypeError [ERR_UNKNOWN_FILE_EXTENSION]: Unknown file extension: /mnt/c/apps/lorem/package.json
    at Loader.resolve [as _resolve] (internal/modules/esm/default_resolve.js:114:13)
    at Loader.resolve (internal/modules/esm/loader.js:74:33)
    at Loader.getModuleJob (internal/modules/esm/loader.js:148:40)
    at ModuleWrap.<anonymous> (internal/modules/esm/module_job.js:41:40)
    at link (internal/modules/esm/module_job.js:40:36) {
  code: 'ERR_UNKNOWN_FILE_EXTENSION'
}

@ljharb
Copy link
Member

ljharb commented Nov 24, 2019

What do you mean by, package.json is not recognized in node_modules?

@DerekNonGeneric
Copy link
Contributor Author

DerekNonGeneric commented Nov 24, 2019

@ljharb, the ESM resolver bails before it can read "type": "module" from the package.json located in the node_modules root (/node_modules/package.json).

@ljharb
Copy link
Member

ljharb commented Nov 24, 2019

@DerekNonGeneric that's not how type module is supposed to work; only the closest package.json can set it. so ./node_modules/package.json won't affect anything unless there's a dir inside node_modules that has no package.json (which is super rare)

@DerekNonGeneric
Copy link
Contributor Author

DerekNonGeneric commented Nov 24, 2019

[...] unless there's a dir inside node_modules that has no package.json (which is super rare)

Albeit rare, this is precisely the use-case that currently lacks support. I was able to solve this issue by implementing a modified version of ESM_FORMAT (from the spec) in userland JS from within a loader. It's working very well. I have yet to compare its performance to the internal implementation via proper benchmarking, but that will be next.

I was informed that the "node" resolution mode will more likely be removed than become the default. In light of this information, and the loader solution described above, a PR to Node core for additional support of this directory structure may be unnecessary. I don't foresee myself continuing to use this structure, which is a "hack" after all. If I uncovered a bug in my above attempt at the suggestion, I will try to create a reduction and file a separate issue about it. As such, the highest-priority item remaining in this issue would be considering illuminating node_modules as being a package boundary DMZ (de-modularized zone) in the documentation. If it is determined that the algorithm pseudocode is sufficient, feel free to close this issue.

@GeoffreyBooth
Copy link
Member

If I remember correctly, the special handling of node_modules was to avoid breaking files or symlinks placed in node_modules/.bin, since that folder doesn’t contain a package.json and assumes CommonJS mode. We wouldn’t want it to break if the user adds "type": "module" to their app’s package.json, which would otherwise trickle down into node_modules/.bin. Then again, it would be a pretty trivial fix for npm to just start putting a package.json containing {"type": "commonjs"} into that folder, assuming .bin is an npm/npx-only thing.

As far as replicating the behavior of that popular tweet, "exports" and resolve-self should provide a less hacky equivalent:

// package.json
"name": "foo",
"type": "module",
"exports": {
  ".": "package-main-entry-point.js",
  "./utils/": "./src/utils/"
// package-main-entry-point.js
import { shuffle } from 'foo/utils/shuffle.js';

I haven’t tested the above code, but if I understand correctly this is how --experimental-resolve-self is supposed to work. cc @jkrems

@DerekNonGeneric
Copy link
Contributor Author

/to @GeoffreyBooth

Thanks. Unfortunately, the self-reference solution would require refactoring the codebase. At the very least, all of the specifiers would need to be updated. Adding the specifier-path mappings would also be another chore (probably needs to be autogenerated by a separate tool).

@GeoffreyBooth
Copy link
Member

GeoffreyBooth commented Nov 25, 2019

Unfortunately, the self-reference solution would require refactoring the codebase.

Then just add package.json files into each subfolder of your custom node_modules.

I think the resolve-self way should be the recommended pattern. The tweet was clever, taking advantage of Node resolution, but I’d hardly say that it’s intuitive. For new apps I would expect people to use the resolve-self approach.

@DerekNonGeneric
Copy link
Contributor Author

To get this issue closed, I believe the consensus was to put the algorithms more front-and-center.

I think to accomplish this, at the very least, we should keep these algorithms' details expanded. On the left you can see how it looks by default, on the right is how it looks expanded (after tapping it).

https://nodejs.org/api/esm.html#esm_resolver_algorithm

proposal

There is prior art on this with the All Together... section of the Modules API docs.

all-together

@DerekNonGeneric
Copy link
Contributor Author

DerekNonGeneric commented Mar 20, 2020

/to @GeoffreyBooth

This relates to UX and user journey. Is there anyone else who you think has an opinion on this before I seek input via a PR? (I would be proposing either keeping the accordion expanded or taking these algorithms out of an accordion).

@GeoffreyBooth
Copy link
Member

I'm not sure why the algorithm needs to be expanded. It feels like information for a very select audience, namely developers implementing tools that work closely with Node's resolution. The general audience shouldn't need to know the detailed algorithm.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants