Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add externals to metafile + externalize native modules #972

Closed
wants to merge 3 commits into from

Conversation

eduardoboucas
Copy link

Introduction

esbuild can be an incredibly powerful tool for bundling Node applications in preparation for a deployment as a serverless function. We've been experimenting with it at Netlify and we're super excited about the results.

Our goal is to provide a smooth experience for developers, minimising the need for user configuration. On this front, one of the challenges we're trying to tackle is how to handle dependencies with native bindings.

When a .node file is required directly, esbuild will fail with a useful error, allowing us to fallback to a different bundling mechanism. Unfortunately, there are a lot of ways with which module authors ship their binaries, some of which rely on libraries like bindings to dynamically require the native files. When this is the case, esbuild will happily generate the bundle, but the function will generate a runtime error.

We're also trying to provide a good experience around external modules, taking care of packaging up all the necessary dependencies with as little friction as possible.

Preamble

Before anything else, I should point out why I'm making this a draft PR. While I suspect that the issues we're facing are fairly common, and a solution to address them would certainly benefit more people, I have no idea if this aligns with the vision and the plans you have for esbuild.

But I didn't want to just create a bunch of issues. I'm more than happy to contribute, and I thought I could create a rough PR containing the basic ideas that I would love to see in esbuild, leaving it up to you to decide which of these ideas, if any, you feel comfortable with adding to the project.

This PR is divided into two parts — they should really be two separate PRs, and I'm happy to break them up based on your feedback, but I felt it was easier to present this as a unit.

Part 1: Adding externals to metafile

Closes #905

When flagging a module as external, we need to package it and its dependencies and place them alongside the esbuild-generated bundle in the serverless artefact. To do this we need to know exactly which of the external modules were actually imported into the entry point, so that we're able to:

  1. Warn the user if one of the modules they're flagging as external is missing from the dependency tree, which prevents the packaging to be successful, and
  2. Package only the necessary modules, avoiding adding unnecessary weight to the artefact.

To address this, this PR adds an externals property to the inputs section of the metafile, containing three properties:

  • path: The import path
  • source: The source directory where the import was performed
  • kind: The type of the external import. It can be:
    • user: Flagged by the user
    • system: Flagged internally
    • node-built-in: One of the Node built-in modules
    • native-module: Caused by a native module (more on this below)
"inputs": {
  "fixtures/node-module-included-nested/function.js": {
    "bytes": 40,
    "imports": [],
    "external": [
      {
        "path": "some-external-module",
        "source": "/Users/foobar/fixtures/node-module-included-nested",
        "kind": "user"
      }
    ]
  }
}

With this information, we can parse the metafile, see which external modules were required, resolve their location using source + path and package the source accordingly.

Part 2: Externalizing native modules

A module with native bindings, be it a direct or transitive dependency, has the potential to cause the bundling to fail, or worse, generate a bundle that will cause runtime errors. Sometimes the module can be deeply nested in the dependency tree, to the point that the user has no idea that the module even exists, let alone knowing that it needs to configure it.

To mitigate this, this PR introduces a mechanism for detecting these modules, which can be enabled with the --externalize-native-modules flag. It does so by checking for certain "marker modules", whose presence signals that the dependent module is likely to have native bindings.

When the flag is on and one of these modules is detected, the dependent module is automatically flagged as external, and added to the externals array with a kind value of native-module.

Again, I don't know if this falls within your scope for the project, but here's my case for it:

  • This is an optional flag, which should be disabled by default, so not a breaking change
  • The detection mechanism will not cover 100% of the cases, but it can be refined. As it stands, I'm convinced that it'll catch a significant number of cases
  • This comes at no extra cost performance-wise, since esbuild is already traversing the tree and parsing the package.json files; it's impossible to do this outside of esbuild without introducing huge performance penalties and thus diluting the benefits that esbuild brings

Next steps

There's still work to do. For one, I need to add tests. Also, I'm sure that there are a lot of possible refinements to my logic. I would also love to discuss a mechanism for converting the external imports into relative paths, because as it stands it's impossible to handle different versions of the same externalized module.

But before any of this, I wanted to share this rough implementation, discuss the details and hear your thoughts.

Thanks in advance! 🙌🏻

@eduardoboucas
Copy link
Author

@evanw If you want me to fix the snapshots of the failing tests so that you can see more examples of the proposed new property, let me know. Otherwise, I'll wait for your feedback before putting any more work into this, to avoid going in a direction that you're not happy with.

Copy link

@mingard mingard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Thanks @eduardoboucas

@evanw
Copy link
Owner

evanw commented Mar 17, 2021

Thanks for reaching out and for describing your problem. However, part of this PR seems like something that is perhaps too specific to embed in esbuild itself to me.

This comes at no extra cost performance-wise, since esbuild is already traversing the tree and parsing the package.json files; it's impossible to do this outside of esbuild without introducing huge performance penalties and thus diluting the benefits that esbuild brings

This is not how I intend to evolve this project. I don't want esbuild to become the union of everyone's bundling needs in the name of speed. Rather, I want esbuild to be a reasonably general core with a plugin system that lets you add your own behavior at a few well-defined extension points. This problem seems like something that should be solvable with a plugin. I think it would be good to explore solving this with a plugin first to see what the problems with doing that are, and then potentially explore addressing those problems first.

I think changing esbuild such that a) .node files default to external and b) such that external imports are encoded in the metadata both sound like reasonable changes to move forward with. I have been planning to put external imports into the metadata anyway, and I think .node files have to exist on the file system to be loaded in memory so I don't see a use for them not being external. I'm happy to make these changes myself, or I can also work with you on doing this if you'd like to be the one to do it.

But putting Netlify-specific scanning of package.json files into esbuild seems like something that belongs in a plugin instead of in esbuild itself. Have you considered writing your own code to scan package.json files? It doesn't seem like it would be "impossible to do this outside of esbuild without introducing huge performance penalties" to me. One idea is to look at all of the input file paths in the JSON metadata and then check for package.json files in the directories containing the input files (without checking a given directory more than once of course).

@eduardoboucas
Copy link
Author

eduardoboucas commented Mar 17, 2021

@evanw thanks for your thoughtful answer!

Let me start by addressing something you said, which I worry might be a result of me describing the problem poorly.

But putting Netlify-specific scanning of package.json files into esbuild seems like something that belongs in a plugin instead of in esbuild itself

Nothing in this PR is specific to Netlify. I'm well aware that esbuild needs to serve its community as a whole and not a single individual or company, so I wouldn't propose anything that I felt was specific to our use case.

Unfortunately, there is no standard way for modules to declare that they have native bindings, and in fact different modules can achieve this in completely different ways, which makes the detection all the more challenging. After trying to solve the problem from different angles, I got the best results by using this package, which looks for the exact set of marker modules that I've included in the PR.

This problem seems like something that should be solvable with a plugin.

This is where I started. I created a plugin that looks for .node files being loaded, flags them as external and replaces the path so that the require call can point to a convenient location on disk. This falls short for two reasons.

Firstly, if the filter expression I use in the plugin is something like /\.node$/, the callback will be fired if the module contains a static require call to a .node file (like sharp does). However, the only thing I'll be able to do is flag the .node file itself as external, but that's not sufficient because the module is requiring additional native files that are neither .node files nor required statically. The only way to make the module work is by flagging sharp itself as external, but this isn't something I can do from the .node callback. The callback would need the ability to say "flag whoever included me as external".

Secondly, using a static require call to a .node file is just one of the many ways in which module authors include native files. A very popular approach is to use libraries like bindings, which locate and call the appropriate .node files at runtime. I'm not sure an esbuild plugin would help here, because the call to the native file would be a dynamic require that esbuild wouldn't be able to handle.

Because of these cases, and a whole lot more stemming from an unstandardised approach to native modules in the Node ecosystem, I believe the only reliable way for esbuild to detect these modules is not to wait for certain paths to be loaded statically, but instead to look for certain modules in a project's dependencies and infer the presence of native bindings, flagging the entire module as external.

The only way I can see this working in a plugin is if new filtering capabilities were introduced, allowing paths to be filtered based on the contents of their corresponding package.json files. Something like:

build.onResolve({ filter: {
  packageJson: {
    dependencies: /(bindings|nan|prebuild|)/
  }
} }, (args) => {
  // The module has native bindings, so we compute an appropriate relative path
  // to it and flag it as external.
  return {
    external: true,
    path: somePath,
  }
})

One idea is to look at all of the input file paths in the JSON metadata and then check for package.json files in the directories containing the input files (without checking a given directory more than once of course).

The challenge of doing this outside of esbuild is not only due to performance, but also because we'd lose the ability to modify the generated bundle since we'd be operating on it after esbuild is finished. This is important so that the require calls to the externalised modules have the correct paths.

To illustrate the importance of this, take the following example:

my-module/
├─ node_modules/
│  ├─ [email protected]
│  ├─ some-dependency/
│  │  ├─ node_modules/
│  │  │  ├─ [email protected]
│  │  ├─ some-dependency.js
├─ main.js

When we process main.js and we find a require('native-module') call, it's sufficient to flag native-module as external and leave the original path untouched, since it will correctly map to [email protected] in the dependency tree. But once we process some-dependency.js and we find another require('native-module') call, leaving the path untouched means that it will incorrectly map to [email protected] in the tree. To make it map to the correct dependency, we'd need to change the path to something like ./node_modules/some-dependency/node_modules/native-module.

Doing these transformations outside of the scope of esbuild is really difficult.

It's worth noting that the current plugin API is ideal for this, since the return value allows you to both flag a module as external and modify its path, so your suggestion of possibly extending the plugin API capabilities sounds very appealing to me.

I'm happy to make these changes myself, or I can also work with you on doing this if you'd like to be the one to do it.

esbuild is adding value to what we're building, so I'd love to contribute back to the project, and work with you on this and other improvements. But it's entirely up to you, so let me know what you're comfortable with and I'm happy to follow your lead.

@eduardoboucas
Copy link
Author

eduardoboucas commented Mar 19, 2021

Hi again @evanw! 👋🏻

Following up on my previous message, I thought of an alternative syntax for the plugin approach that I think is more elegant and less disruptive to the existing API.

We could leave the existing filter property untouched, but introduce an optional subject property that defines what the filter is applied to. This property would default to "path", as by default we want to keep filtering imports by their path, but it could also be something like packageDependencies, packageVersion and any others that could be introduced as filtering options in the future.

In that case, the native module detection and extraction could be done like in a plugin like so:

build.onResolve({
  filter: /^(bindings|nan|node-gyp-build|node-pre-gyp|prebuild)$/,
  subject: 'packageDependencies'
}, (args) => {
  // The module has native bindings, so we compute an appropriate relative path
  // to it and flag it as external.
  const somePath = (...)

  return {
    external: true,
    path: somePath,
  }
})

In core, the implementation would be quite simple. The existing mechanism for registering the hooks would stay the same, but we'd check for the subject property before calling them.

Would you be comfortable with something like this? I'm happy to put together a PR to show what the implementation could look like.

Thanks!

@evanw
Copy link
Owner

evanw commented Mar 20, 2021

I gave it a shot. Here's a plugin (or at least a starting place for one) that might do what you want:

const externalNativeModulesPlugin = {
  name: 'external-native-modules',
  setup(build) {
    const fs = require('fs')
    const path = require('path')
    const isNative = require('is-native-module')
    const notPackagePath = /^(\/|\.(\/|\.\/))/
    const packageName = /^([^@][^/]*|@[^/]*\/[^/]+)(?:\/|$)/

    const nmCache = {}
    const isNodeModules = async (nmPath) =>
      nmCache[nmPath] !== void 0 ? nmCache[nmPath] : (nmCache[nmPath] =
        fs.promises.stat(nmPath).then(x => x.isDirectory(), () => false))

    const pjCache = {}
    const isNativePackage = async (pjPath) =>
      pjCache[pjPath] !== void 0 ? pjCache[pjPath] : (pjCache[pjPath] =
        fs.promises.readFile(pjPath, 'utf8').then(pkg => isNative(JSON.parse(pkg)), () => null))

    build.onResolve({ filter: /.*/ }, async (args) => {
      // Check for package imports
      if (notPackagePath.test(args.path)) return
      const package = packageName.exec(args.path)
      if (!package) return

      // Find the package
      let dir = args.resolveDir
      let foundPath
      while (true) {
        if (path.basename(dir) !== 'node_modules') {
          const nmPath = path.join(dir, 'node_modules')
          if (await isNodeModules(nmPath)) {
            const pjPath = path.join(nmPath, package[1], 'package.json')
            const isNative = await isNativePackage(pjPath)

            // Stop if this is a native package
            if (isNative === true) return { path: args.path, external: true }

            // Stop if this is not a native package
            if (isNative === false) return
          }
        }

        // Continue if this was missing
        const parent = path.dirname(dir)
        if (parent === dir) break
        dir = parent
      }
    })
  }
}

It uses caching to avoid unnecessary performance overhead. I don't know what code base to test this on since I don't use native modules myself, but it looks to me like it should be doing the right thing.

I'm curious to hear what performance overhead you observe for your use case vs. just calling the API directly. I tested this on a large code base of mine and it does not appear to cause huge performance penalties. It appears to be something in the realm of <2x of the original bundle time.

@eduardoboucas
Copy link
Author

Thanks for putting time into this @evanw! I really appreciate it.

I tested this plugin with a few projects and my numbers are similar to yours: I'm seeing an increase in bundle times from 1.5x to slightly over 2x.

You're right in that it's not a huge increase in absolute terms, but performance is the main reason we're rebuilding our entire bundling system around esbuild, so we're trying to optimise the process as much as we can. Under that light, doubling the bundling times for everyone in order to improve the experience for the subset of users that use native modules is not a negligible price to pay. This is why I'm trying so hard to find a way to accommodate this use case while minimising the performance penalties (and without introducing any penalties at all to the esbuild core, of course).

If adding new filtering capabilities is not something you're comfortable with, would you consider passing the contents of any package.json file associated with an import as an additional argument to the onResolve callback? This would still mean filtering on /.*/, which we were hoping to avoid, but at least we'd avoid additional filesystem calls and we could determine whether a module is native by checking args.packageJson.dependencies and args.packageJson.devDependencies.

@evanw
Copy link
Owner

evanw commented Mar 25, 2021

FYI this might be relevant: #1051. That issue is proposing that, by default, esbuild should copy .node modules over to the output directory and rewrite the import paths to point to the copied files. This seems like a more straightforward solution to this problem to me, so I think I'll plan to move forward with that one. It is also more inline with how the existing file loader works. We can't still explore the direction in this PR too, but I'm mentioning that other issue because I suspect solving that might replace what this PR is trying to do for your use case. What do you think?

Regarding your previous comment:

It doesn't really make sense to pass the contents of package.json into the onResolve callback because plugin onResolve callbacks run before esbuild's own internal onResolve callback. This is deliberate; plugins are supposed to be able to provide their own resolve implementation and also running esbuild's implementation would be wasteful in the vast majority of cases. There may be a way of invoking esbuild's internal onResolve logic from within a plugin in the future (#641) but that likely won't do anything to improve the speed and may even be slower than just doing it in JavaScript due to the IPC overhead.

@eduardoboucas
Copy link
Author

That issue is proposing that, by default, esbuild should copy .node modules over to the output directory and rewrite the import paths to point to the copied files.

I'm afraid that this approach will only solve a really small percentage of use cases, so depending on how the functionality it's framed, it might create the wrong expectation that esbuild handles native modules, where in reality things are a bit more complicated than that. An example of a use case where this will not work is precisely the one described in #1051.

The approach described is precisely what I've started with (except it was done in a plugin). The problem is that it will only work if a .node file is required statically and the library doesn't import any other native files. The issue mentions lovell/sharp, which is indeed an extremely popular Node module using native bindings. Even though it pulls in a .node file with a static require call, it then uses other system files which will not be caught by esbuild. This will lead to esbuild happily bundling the module, but it will throw a runtime error because the module will be incomplete. For this reason, and as I was able to test myself with this specific module, simply externalising the .node file will not work.

This is without mentioning the modules that don't use a static require call to .node files at all. Just by looking at the dependents of nan and bindings, we're looking at almost 8,000 modules that simply wouldn't work with the approach you're describing.

Realising the shortcomings of using statically-required .node files as a detection mechanism is precisely what led me to creating this pull request in the first place. If you do feel that handling native modules is something that esbuild should do (which I'd completely agree with), I think you'd need a more holistic approach to the problem, and I think the route proposed in this PR would be a good starting point for that.

@eduardoboucas
Copy link
Author

eduardoboucas commented Apr 2, 2021

Hi @evanw! I wanted to check in and see if you had a chance to form a more solidified opinion about the different directions we discussed.

Are you still open to the idea of adding the native module detection to the esbuild core, as per the initial proposal?

If not, is there any room for extending the filtering capabilities of the plugins API (e.g. filtering by contents of package.json), so that we could build the detection into a plugin without doubling the bundling times?

If you can share what you're comfortable and uncomfortable with, I'm happy to work on a solution that fits that. If you don't want to make any changes at all, that is fine too, in which case I can just close this PR.

Thanks again! 🙌

@peterp
Copy link

peterp commented Apr 8, 2021

I've added experimental esbuild support in @redwoodjs, but we do not currently bundle node_modules, because we couldn't come up with a nice developer experience for externalizing native modules.

It would be really amazing to have something like this in esbuild 🙇‍♂️

@eduardoboucas
Copy link
Author

We were able to optimise the plugin in such a way that the performance penalty is significantly reduced. In case anyone is interested, here's the result: https://github.com/netlify/zip-it-and-ship-it/blob/f47f22206c4e80d9e921b7b1e2af96d8dc95d379/src/runtimes/node/native_modules/plugin.js#L25-L70.

@evanw Thanks for your help in steering this and getting a first version of the plugin.

I'll close this PR since it looks like the initial implementation will not be pursued, and I don't want to be cluttering the list of pull requests. There were some really interesting points in the discussion and I'd be happy to address those separately in issues.

@gautaz
Copy link

gautaz commented May 17, 2021

Hello @eduardoboucas,

I have tested your esbuild plugin to support NodeJS native modules, in particular with wrtc.
The implementation provided here was not working in this particular case.
For now I have shamelessly copied your source code in my own project.

Would you mind distributing this plugin apart from the zip-it-and-ship-it repository?
I can also extract the code in a separate github project if this seems OK on your side.

The plugin could then be added to the community plugins if this is OK for @evanw.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Collect data about external modules
5 participants