Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(commonjs): node_modules lookup depth limit to avoid infinite loops #985

Closed

Conversation

danielgindi
Copy link
Contributor

@danielgindi danielgindi commented Sep 1, 2021

Rollup Plugin Name: commonjs

This PR contains:

  • bugfix
  • feature
  • refactor
  • documentation
  • other

Are tests included?

  • yes (bugfixes and features will not be merged without tests)
  • no

Breaking Changes?

  • yes (breaking changes will not be merged unless absolutely necessary)
  • no

Description

For some freaking reason, I stumbled upon a few packages (like js-yaml etc.) that include a "hack" like this:

  // A trick for browserified version, to not include `Buffer` shim
  var _require = require;
  NodeBuffer = _require('buffer').Buffer;

In this case, rollup currently does not keep it with a native require, but goes through commonjsRequire. And in dynamic mode, this means that it may try forever to resolve the upper level node_modules, as it has no knowledge of the possible depth or the physical file system.

For this I have introduced a barrier, nodeModulesLookupDepth, that limits the number of lookup loops.
The default for this is 15 which I found to be reasonable and allowing.
It will only need to hit that barrier in extreme circumstances anyway.
This replaces the previous barrier which only detected when user input allowed it (endsWith('../')) but could theoretically fail too early.

@@ -172,7 +172,8 @@ export function commonjsResolveImpl (path, originalModuleDir, testCache) {
if (path[0] === '/') {
originalModuleDir = '/';
}
while (true) {
let depth = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let depth = 0;
var depth = 0;

Shouldn’t this be es5?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although actually I already see another let and it’s already a template string 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's inside the generated code, not the plugin's code, should be backwards compatible :-)

Copy link
Member

@tjenkinson tjenkinson Oct 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeh so rollup itself tries to be compatible with ES3 by default (see rollup/rollup#4215), so I think ideally this should be var, and we shouldn't be using template strings (edit: template string is not in the output)

@guybedford
Copy link
Contributor

And in dynamic mode, this means that it may try forever to resolve the upper level node_modules, as it has no knowledge of the possible depth or the physical file system.

Can you help me understand this better? How does resolving node_modules hierarchically lead to a performance cliff? From an abstract perspective "check to root" lookups on trees are always fast as they have convergent bounds. Are you sure there isn't something else going that this isn't just patching over?

@danielgindi
Copy link
Contributor Author

Can you help me understand this better?

Sure!

How does resolving node_modules hierarchically lead to a performance cliff?

In a virtual fs, you don't know where the root is.
You could compile /foo/bar/index.js that references ../shared/util.js. So we're not pinning "root" to the root file passed to rollup.
This means you could look at ../ or at ../../../../ and you will still have no idea where the root is, as there's no "physical" bounds, there is no actual root.

From an abstract perspective "check to root" lookups on trees are always fast as they have convergent bounds.

You'd think, right?

@guybedford
Copy link
Contributor

guybedford commented Oct 3, 2021

You could compile /foo/bar/index.js that references ../shared/util.js. So we're not pinning "root" to the root file passed to rollup.
This means you could look at ../ or at ../../../../ and you will still have no idea where the root is, as there's no "physical" bounds, there is no actual root.

This is the part that seems very odd to me.

Are you telling me you have ids in your resolution system that are resolved to the final id of ../../../../? If the application path is /foo/bar/index.js you should always resolve this into /foo/ and be doing node_modules checks on that resolved path.

Is this a custom resolver or the default one? If a custom one I would suggest seeing if you can use better ids in the architecture. If a default one, that sounds like a serious bug to me.

If we really wanted to make this accepted behaviour, then if the path were ../../../ then node_modules checks should be defined on such an absolute virtual scheme to be inverted - such that we only check ../../../node_modules and then stop, but never check ../../node_modules or ../node_modules. I'm pretty sure you likely have an inversion of precedence bug like this that will lead to versioning issues...

@danielgindi
Copy link
Contributor Author

If we really wanted to make this accepted behaviour, then if the path were ../../../ then node_modules checks should be defined on such an absolute virtual scheme to be inverted - such that we only check ../../../node_modules and then stop, but never check ../../node_modules or ../node_modules. I'm pretty sure you likely have an inversion of precedence bug like this that will lead to versioning issues...

The case I presented you with was an easy one, as it's a common request (look at the issues and stackoverflow...) to require from a different "main folder", or even a different drive.
The basic require algorithm handles all of these with ease, there's no issue there.
But in the virtual realm where there's no drive anymore (we intentionally strip the drive and parent folders info!).

The piece of code that raised this issue to my awareness was an annoying one, some hack used in some packages, to avoid detection of the require keyword, and thus bypassing being detected as a node library (i.e buffer) and instead going through our virtual require function - which will look for it infinitely, by adding ../ forever. There's no root to count down to, as we strip that info.

I first tried to tackle these two different scenarios exactly but then realized that it will be code that's written for very specific scenarios and I'll be possibly missing other scenarios. The loop there did not have a clear stop in case it does not detect another stop, and it needs one.

@guybedford
Copy link
Contributor

But in the virtual realm where there's no drive anymore (we intentionally strip the drive and parent folders info!).

Can you please just clarify again for me what these virtual paths look like? Do they really start with ../ in the final resolved ID? If so I'd still like to discuss the dynamics of this scheme further as it is a novel mechanism and has some interesting properties we haven't yet fully fleshed out (such as the potential versioning issues discussed).

Perhaps you can also explain why you could not just implement the stop check in your own algorithm as path.resolve(id) === rootPath?

@danielgindi
Copy link
Contributor Author

The paths are all relative. In the beginning we put full paths in the system (each virtual require call gets two parameters, one is the path to require, the second is the path of the current module).
Back then we did not have this issue.
But there were complaints that the code exposed system info- as full paths in the system.
So we cleaned that up, and we replace the "root" with some virtual magic name. The "root" being the base of the main file passed to rollup. We can't check reaching root anymore. This left a bug.

But if you have such a structure:
C:\projects\foo\index.js
C:\projects\shared\util.js
Then the path to util.js will be ../shared/util.js.

And if someone works around the commonjs plugin (taking a ref to require and calling through that ref later), then we will also have zero info on where we need to reach.

We could theoretically find the lowest common root for all paths required in the project, but then we would have 2 main problems and o deal with:

  1. We need to collect those paths through the whole project and then put it in the helper js output- which is imported first.
  2. People on Windows env requiring paths on another drive will get a failure (or they will see original paths in the final product).

Currently the max-depth thing works and solves the issue.
If you think of a practical way to solve this in a more straightforward way, let's hear it!

@guybedford
Copy link
Contributor

I see the situation and it's definitely a big problem that internal paths so easily get exposed.

Another angle here is - how are those internal ids being exposed in the first place? RollupJS doesn't expose its absolute ids in builds so long as the source maps root is being configured correctly. It should also be possible to hook the source maps naming specifically for masking exposure.

It's just a fact of life at this point that all JS build environments build URLs or paths that contain internal info. Defining the public / private aspects of how this is exposed is how we shift the context rather than trying to operate on "opaque paths" I think.

So I'd really still advise against any root backtracking scheme as the id scheme where possible. But even if you must use such a backtracking scheme, as mentioned a root check is still possible. You shouldn't need to collect the list of roots, but instead just resolve the path to its absolute file path and check if that absolute path is a root path. In Windows and Posix the definition of a root path is a well defined concept, without needing to touch the file system or check over files - whether it is C:\ or D:\ doesn't matter - it's still clearly a root path once resolved to its file system path.

Both conversations are interesting, happy to continue both.

@danielgindi
Copy link
Contributor Author

I think you are missing something here :-)

The way for rollup to generate a product with internal path info, is if we do not strip that info. The paths are needed for resolving in runtime.
The whole point here is that those require calls are resolved in runtime, so it needs to run the full algorithm of back tracking and trying every node_modules out there in the parent folders. This is part of what allows circular dependency commonjs packages to work.

So either we put the full paths - including the root "C:" or the root "/var/whatever" in the final product js file, so we can test for a root, or we just decide when to stop looking for a root. We can't resolve those requires in the rollup stage, that's just the opposite of what this specific feature is accomplishing.

@guybedford
Copy link
Contributor

Ah I missed this embedded runtime resolver part. I will abstain from reviewing further then.

@danielgindi
Copy link
Contributor Author

Ah I missed this embedded runtime resolver part. I will abstain from reviewing further then.

No matter, I missed the conversations in this repo 🤣

@guybedford
Copy link
Contributor

I'd be happy to dig into a separate conversation elsewhere about why createRequire isn't a suitable approach for Node.js platform targets of this feature as implemented in #206, but this is obviously not the context.

@guybedford
Copy link
Contributor

In general I prefer to make the un-analyzable analyzable as opposed to implementing mock runtime layers. And where mock runtime layers are implemented, separating them from from general approaches like this plugin. I think the dynamic require stuff is a huge maintenance burden for this plugin.

@danielgindi
Copy link
Contributor Author

In general I prefer to make the un-analyzable analyzable as opposed to implementing mock runtime layers. And where mock runtime layers are implemented, separating them from from general approaches like this plugin. I think the dynamic require stuff is a huge maintenance burden for this plugin.

The problem is that most projects can't rollup without it. And the commonjs plugin would actually break it in a way that a separate plugin could not undo.
Two issue are solved by the dynamic feature- circular deps in commonjs (replicating exact behavior and semantics) and dynamic require (require(myVariable)).

Unfortunately createRequire does not even cope with circular deps in commonjs.
Then it's commonjs that was transpiled from ES6 and plays nice- the plugin can handle the circulars without dynamic witchery. In other cases cant.

@guybedford
Copy link
Contributor

This project not supporting cycles is something that has been an issue from the start and can be solved separately via execution wrapping when CJS executions exist in cycles to match CJS execution ordering semantics. I was never able to work on this myself unfortunately, but it was always a hope.

So the root solution to cycles is very much via wrapping approaches that can ensure the CJS execution invariants, and dynamic layers are more ways to work around that.

For highly dynamic requires - extending the ability for these requires to be analyzed is always preferable because that optimization can then feed back into the tree shaking and bundling layers.

In the extreme cases of completely non-analyzable requires, maintaining a dynamic require registry could be done more simply by allowing eg a hook from this plugin itself that allows substituting a require implementation, and then have that require implementation use an analysis layer to point to the right module. Note this type of referencing also relates to ideas in rollup/rollup#4167.

@guybedford
Copy link
Contributor

Alternatively, configuring the runtime such that dynamic requires will be hit is a good solution. Eg via a simple dynamic require remapping table like:

const requireRemaps = {
  'buffer': '#buffer',
  'fs/': '#fs/'
};
// was: require(unknown)
require(remap(unknown));

where the remap embedding is the only active operation being performed in the runtime, and would ideally be a simple table lookup, but in the more complex cases a custom function that could be defined via code again as hooked.

@danielgindi
Copy link
Contributor Author

@guybedford can we merge this?

@guybedford
Copy link
Contributor

I'm not blocking this landing I'm just not explicitly approving so @shellscape you are welcome to land this. I was more just expressing concern at the long-term maintenance requirements of this feature.

Copy link
Member

@lukastaegert lukastaegert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this again, I think I am no longer happy with this solution. The reason is that it introduces a new parameter to manage a case where we cannot make an informed decision what is imported anyway because we "lost" the original file path. And to me, this just means unnecessary configuration complexity, especially since hardly anyone will ever touch that option.

For now, I see two different approaches:

  1. Just detect cases where the injected second require parameter is missing and always return that we did not find a file. Simple but effective. Or:
  2. Fix it. One approach could be to use "this" for the second parameter instead by replacing dynamic requires with require.bind('/path/to/module') instead. Then you can pass around that require as much as you like and it will still work.

I can integrate one of the solutions in #1038.

@danielgindi
Copy link
Contributor Author

I think you misunderstand the actual issue.
There is not enough information in runtime to detect that the module is not found, or to infer the root position.

But I can think of another possible solution: Calculate the max depth level on compile time, and embed it in the runtime. So we'll know we have reached the virtual root.

@lukastaegert
Copy link
Member

I only have the example you provide in your test and that can indeed be fixed by replacing a standalone require with something better. I will try to push a possible solution to #1038 tomorrow. .bind was indeed not working well as it did not allow to keep require.resolve, but just replacing any standalone require with something like createRequire("/path/to/module") worked really well. Will keep fleshing it out and push to #1038 tomorrow, then we can continue discussion if there are other situations where we are missing information.

@lukastaegert
Copy link
Member

@danielgindi I added my solution to #1038 and also published it as @rollup/plugin-commonjs@beta so you can try it out easily. Basically the solution is to replace ALL instances of dynamic require calls as outlined above, please give it a spin.

@shellscape
Copy link
Collaborator

@danielgindi gentle ping

@lukastaegert
Copy link
Member

I would prefer if we could close this one in favor of #1038 which also fixes the issue, otherwise it will just be a difficult merge conflict

@shellscape
Copy link
Collaborator

Closing in favor of #1038 per request. Thank you to everyone who worked on this.

@shellscape shellscape closed this Apr 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants