Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_isTopLevel() check for "node_modules" transform exclusion is too "loose", should be more specific #132

Open
mattflix opened this issue Jul 5, 2017 · 3 comments

Comments

@mattflix
Copy link

mattflix commented Jul 5, 2017

module-deps currently contains logic that results in excluding files from transformation if any segment in the file's package-relative path is the string "node_modules". This logic lives within the _isTopLevel(file) function, which determines if the specified file should be considered "top-level" (meaning, a file belonging to the package's own source code). Elsewhere in module-deps, only files considered top-level are eligible for transformation.

But, the current implementation of this logic appears to be far too "loose", in that it can end up excluding files from transformation that should indeed be transformed!

Here is the problem...

In some projects, internal "node_modules" folders are used simply to facilitate use of the Node Module Resolution Algorithm (NMRA) locally within the package's own source code. Of course, depending on the needs of the project, any or all of this source code may require some kind of transformation.

Consider this example package file structure which uses an internal "node_modules" folder:

package.json
/src
    main.js
    /node_modules
       foo.js
       bar.js
/node_modules
    (npm-installed packages)

Here, the /src/node_modules folder is intended to facilitate the use of NMRA between main.js, foo.js, and bar.js. For example, with this organization, both main.js and bar.js can access foo.js via require("foo").

To be clear, the content of the /src/node_modules folder is NOT managed by npm. This particular "node_modules" folder has just been strategically placed to enable use of NMRA internally between the files under /src.

The above is a very simple, contrived example, but in larger, actual projects, such internal use of NMRA can eliminate "require path hell" and aid modular design.

Anyway... it appears to me that the logic in _isTopLevel() is likely intended to assume that anything "outside" of the root /node_modules (which normally contains already-built packages that should not be transformed) is a "top-level" file. Normally, top-level files end up being anything in /src or some such location.

The fly in the ointment, though, is that _isTopLevel() doesn't just check for "node_modules" specifically in as the 0th element of the candidate file's path... it checks for "node_modules" anywhere in the path. Thus, _isTopLevel() erroneously concludes that files like /src/node_modules/foo.js are NOT top-level, and are thus excluded from transformation.

IMO, _isTopLevel() really should only check to see if a file lives below /node_modules specifically. Files everywhere else should be considered top-level, and eligible for transformation.

The proposed change in logic is this:

Deps.prototype._isTopLevel = function (file) {
    var isTopLevel = this.entries.some(function (main) {
        var m = relativePath(path.dirname(main), file);
        // EXISTING:
        // return m.split(/[\\\/]/).indexOf('node_modules') < 0;
        // PROPOSED:
        return m.split(/[\\\/]/).indexOf('node_modules') !== 0;
    });
    if (!isTopLevel) {
        var m = relativePath(this.basedir, file);
        // EXISTING:
        // isTopLevel = m.split(/[\\\/]/).indexOf('node_modules') < 0;
        // PROPOSED:
        isTopLevel = m.split(/[\\\/]/).indexOf('node_modules') !== 0;
    }
    return isTopLevel;
};

That is, "tightly" check for "node_modules" appearing only as the 0th element of the path (as it will for any file within /node_modules/**), rather than "loosely" triggering off "node_modules" appearing anywhere in the path (which, as shown in the above example, probably doesn't mean what _isTopLevel() thinks it means).

Alternatively, it would be an improvement if the above "loose" assumption was simply not "baked into" module-deps, and instead was configurable (for those projects that need it).

@goto-bus-stop
Copy link
Member

Interesting! I think (hunch not data) that it's more common to have real node_modules in subfolders though, rather than individual files, especially in monorepo like structures. IMHO a compatible and good approach would be to instead place the modules you're importing in a logical place (node_modules doesn't tell you anything about what kind of modules it contains) and then symlink them into node_modules. Browserify does not preserve symlinks by default so it will apply transforms to them that way.

@mattflix
Copy link
Author

mattflix commented Mar 30, 2018

My example using the individual files (foo.js, etc.) was just to keep things simple. Your points about what might be considered good or common source code organization are well-taken, but other people may have other opinions/needs.

An organization illustrated above is still a perfectly valid use of Node Module Resolution Algorithm (NMRA) (and, note, is zero-config, which is another kind of advantage), and I intended it to mainly illustrate the problem that I am describing -- not begin a debate about "good" organization.

Sure, one could configure a fancy bundler or runtime loader to treat certain folders "like" node_modules, but not everyone wants or needs such dependencies.

In any case, my larger point is that _isTopLevel(file), as a default, is in spirit (IMO) attempting to avoid transforming stuff in what it assumes is THE one main node_modules folder managed by npm for the entire package, by excluding any file that appears to reside in it. That makes sense.

But, the current implementation also excludes any other location in the project that only "vaguely" matches that rule. It would be great if the implementation were changed to be more selective (exclude specifically the "main" node_modules folder only) or, alternatively, at least be configurable to exclude or include exactly what the user desires.

@goto-bus-stop
Copy link
Member

my main point wasn't to criticise some ways of organising node_modules, but that other projects may have multiple node_modules folders that are all managed by npm and thus should not be transformed. being more selective would break those projects.

especially in monorepo structures, it's possible to run browserify on eg. packages/app-client/main.js, with a node_modules folder in packages/app-client and one in / for dependencies that are shared between many of the packages/*. both of those would be managed by npm and should not be transformed. it is also not always clear which would be the "main" node_modules folder in such cases.

i only brought up the organisation change because it is a way to make this sort of thing work without having to break other existing projects.

regardless, having it configurable with a glob or regex seems OK to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants