Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[aws-lambda-nodejs] Bundling performance issues #10286

Closed
jaapvanblaaderen opened this issue Sep 10, 2020 · 27 comments · Fixed by #11289
Closed

[aws-lambda-nodejs] Bundling performance issues #10286

jaapvanblaaderen opened this issue Sep 10, 2020 · 27 comments · Fixed by #11289
Assignees
Labels
@aws-cdk/aws-lambda-nodejs bug This issue is a bug. effort/medium Medium work item – several days of effort in-progress This issue is being actively worked on. p1

Comments

@jaapvanblaaderen
Copy link
Contributor

❓ General Issue

The Question

In my stack I've created several lamdas using the new NodejsFunction from @aws-cdk/aws-lambda-nodejs. Functionally everything works fine, code in bundled with Parcel and the stack deploys without issues. There are however two (Parcel related?) performance issues:

  1. Bundling seems quite slow, my lambda has a few dependencies and it takes about 20 seconds to bundle it.
  2. When performing multiple tests for the same stack, tests are slow as the lambda bundling isn't cached.

I'm wondering if this is the expected performance or if I'm missing something? I did some tests with webpack to manually bundle the Lambdas, this is about 3 times faster than webpack. I'm also wondering about the caching feature in Parcel, shouldn't the second performance issue be tackled it?

How to reproduce

I've created a simple sample app to show the issue: https://github.com/jaapvanblaaderen/cdk-lambda-bundling-performance

The app contains a single stack with one lambda that's referenced in two tests. In the test output you can observe that bundling an empty lambda function takes about 8 seconds (initially slow) and for the second test it's still around 6-7 seconds (no/bad caching?).

Bundling asset simple-stack/HandlerLambda/Code/Stage...

 RUNS  lib/simple-stack.test.ts
✨ Built in 8.01s
Bundling asset simple-stack/HandlerLambda/Code/Stage...

 RUNS  lib/simple-stack.test.ts
✨ Built in 6.66s
 PASS  lib/simple-stack.test.ts (21.056 s)
  Simple Stack
    ✓ Should have memory size set to 512MB (9505 ms)
    ✓ Should use Node 12 as runtime (7459 ms)

Test Suites: 1 passed, 1 total
Tests:       2 passed, 2 total
Snapshots:   0 total
Time:        22.211 s
Ran all test suites.

Environment

  • CDK CLI Version: 1.62
  • Module Version: 1.62
  • Node.js Version: v14.4.0
  • OS: OSX Catalina
  • Language (Version): TypeScript 4.0.2
@jaapvanblaaderen jaapvanblaaderen added guidance Question that needs advice or information. needs-triage This issue or PR still needs to be triaged. labels Sep 10, 2020
@jogold
Copy link
Contributor

jogold commented Sep 24, 2020

@jaapvanblaaderen currently investigating this... looks indeed like there's an issue with the cache.

@jaapvanblaaderen
Copy link
Contributor Author

@jogold Ok cool, just let me know if I can help testing something if needed.

@jogold
Copy link
Contributor

jogold commented Sep 24, 2020

So... it's complicated and has to do with the content of cdk.out changing (we are bundling in there in a temporary directory) which invalidates part of the cache...

jogold added a commit to jogold/aws-cdk that referenced this issue Oct 7, 2020
Bundling in a random temporary directory each time invalidates Parcel's cache.

Use the `cacheKey` instead of random characters.

Addresses aws#10286 (not sure if closes it yet)
@jogold
Copy link
Contributor

jogold commented Oct 7, 2020

@jaapvanblaaderen we'll see how #10763 improves the cache performance. Let me know your experience when it's merged and released.

@jaapvanblaaderen
Copy link
Contributor Author

Thanks @jogold, I'll let you know.

@SomayaB SomayaB added in-progress This issue is being actively worked on. bug This issue is a bug. and removed guidance Question that needs advice or information. labels Oct 7, 2020
mergify bot pushed a commit that referenced this issue Oct 11, 2020
Bundling in a random temporary directory each time invalidates Parcel's cache.

Use the `cacheKey` instead of random characters for the temp directory.

Addresses #10286 (not sure if it closes it yet)


----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
@jaapvanblaaderen
Copy link
Contributor Author

jaapvanblaaderen commented Oct 19, 2020

@jogold I've done some testing with the latest 1.68 release, which should include the changes in #10763 if I'm correct. I've tested with the exact same test app as mentioned in the issue description and took the average of four test runs (running npm run test).

Results with parcel 2.0.0-beta.1

  • CDK 1.62: 22.3 seconds
  • CDK 1.68: 31.7 seconds

Not very promising results as it apparently got 10 seconds slower. I tested a few other versions as well, 1.62-1.66 are similar in performance and from 1.67 and upwards it's around 10 seconds slower.

Now the interesting part... I noticed that there are quite some security vulnerabilities with the currently supported version of Parcel, not really a big issue, but decided to upgrade it using npm audit fix. This upgrades parcel to ^2.0.0-nightly.425. Before I could re-test, I had to tweak some stuff in https://github.com/aws/aws-cdk/blob/master/packages/@aws-cdk/aws-lambda-nodejs/lib/bundlers.ts to make the build work: Change the PARCEL_VERSION and remove the --no-autoinstall option.

Results with parcel 2.0.0-nightly.425

  • CDK 1.62: 5.4 seconds 🎉
  • CDK 1.68: 22.7 seconds

That 5.4 seconds looks promising 🙂 . I hope this info helps to improve the situation? I guess you can compare CDK 1.66 and 1.67 to see why the performance (from CDK perspective) got worse? Then there is Parcel, I find it a bit weird that the latest 2.x release is quite old especially as they are still in beta and apparently making quite good (performance) improvements.

Also note that there is already a (dependabot PR) for updating parcel in CDK: #10953

@alan-cooney
Copy link

alan-cooney commented Oct 20, 2020

You can solve the slow tests problem by mocking out Parcel bundling with this module - https://www.npmjs.com/package/cdk-lambda-nodejs-mock

@jaapvanblaaderen
Copy link
Contributor Author

You can solve the slow tests problem by mocking out Parcel bundling with this module - https://www.npmjs.com/package/cdk-lambda-nodejs-mock

@alan-cooney This indeed works great for running tests, thanks! 👍 Is there also an open GitHub repo linked to this module?

@eladb eladb added p1 effort/medium Medium work item – several days of effort labels Oct 25, 2020
@floydspace
Copy link

Hey @jaapvanblaaderen . I have implemented a new cdk construct aws-lambda-nodejs-esbuild

See comparison results on my machine:
@aws-cdk/aws-lambda-nodejs:

Bundling asset simple-stack/HandlerLambda/Code/Stage...

 RUNS  lib/simple-stack.test.ts
✨ Built in 11.88s

 PASS  lib/simple-stack.test.ts (84.678 s)s/_qq218h940s33jg73wss4k5w0000gn/T/cdk.outeeruBW/bundling-temp-dc2901b128dbc7ef10d25ae6a7a15a98d5e3ac81c654c3ad726474eee5ec  Simple Stack
    ✓ Should have memory size set to 512MB (13909 ms)
    ✓ Should use Node 12 as runtime (67285 ms)

Test Suites: 1 passed, 1 total
Tests:       2 passed, 2 total
Snapshots:   0 total
Time:        86.375 s, estimated 133 s
Ran all test suites.

vs
aws-lambda-nodejs-esbuild:

 PASS  lib/simple-stack.test.ts
  Simple Stack
    ✓ Should have memory size set to 512MB (127 ms)
    ✓ Should use Node 12 as runtime (50 ms)

Test Suites: 1 passed, 1 total
Tests:       2 passed, 2 total
Snapshots:   0 total
Time:        2.045 s, estimated 4 s
Ran all test suites.

I forked your sample, so you can see how to migrate

Just saying.

@jogold
Copy link
Contributor

jogold commented Oct 28, 2020

I have implemented a new cdk construct aws-lambda-nodejs-esbuild

Very interesting!!

@jaapvanblaaderen
Copy link
Contributor Author

@floydspace This is great work! I will definitely have a look at this.

@Dzhuneyt
Copy link
Contributor

Dzhuneyt commented Oct 28, 2020

The benchmarks look really promising regarding the https://github.com/floydspace/aws-lambda-nodejs-esbuild package. As a regular user of AWS CDK + NodejsFunction constructs, I vote strongly on switching to the esbuild package internally, instead of the current Parcel implementation.

A side note: we've had good results with using Webpack back in the days before the NodejsFunction construct existed. Are there any plans for evaluating that one as a possible technology to back the construct?

@NGL321 NGL321 removed the needs-triage This issue or PR still needs to be triaged. label Oct 28, 2020
@jogold
Copy link
Contributor

jogold commented Oct 30, 2020

@eladb what's your opinion here?

I've made some tests with esbuild and it's incredibly fast. This looks like the perfect tool for aws-lambda-nodejs and would solve some of the tweaking we do to make Parcel work (like playing with the package.json).

If we go in this direction would you agree to include esbuild as a real (bundled) dependency in aws-lambda-nodejs? It has no dependencies, it only installs its binary with a postinstall script. Moreover, it offers a synchronous Javascript API (with TS typings). This means that we could completely drop the Docker + tryBundle() implementation. (Docker could still be used when installing packages that are not bundled though)

@floydspace would you be ready to contribute for this?

@floydspace
Copy link

floydspace commented Oct 30, 2020

Hey @jogold , sure would love to.
I have a couple of questions tho:

  • Can I just post the code from my package as is as a first version?
  • Why to use docker when installing not bundled packages, we can just use the user's environment. To be honest, I use the forked and adapted for cdk implementation from serverless-webpack/lib/packExternalModules.js and it's covered and works perfectly, (though I need to adapt tests as well)
  • Why to use docker if we could just spawn parcel cli as a child process to make it synchronous? It's basically what the esbuild synchronous API does. Maybe I miss some important detail in how cdk ecosystem works

Thank you.

@jogold
Copy link
Contributor

jogold commented Oct 30, 2020

Can I just post the code from my package as is as a first version?

Let's wait for @eladb's feedback

Why to use docker when installing not bundled packages, we can just use the user's environment. To be honest, I use the forked and adapted for cdk implementation from serverless-webpack/lib/packExternalModules.js and it's covered and works perfectly, (though I need to adapt tests as well)

For modules using native dependencies you could want to install in a Lambda compatible environment. This could be an option of course.

Why to use docker if we could just spawn parcel cli as a child process to make it synchronous? It's basically what the esbuild synchronous API does. Maybe I miss some important detail in how cdk ecosystem works

We started to use docker because we didn't want to include parcel as a direct dependency in aws-lambda-nodejs. It has too many dependencies. See #6340.

@eladb
Copy link
Contributor

eladb commented Nov 1, 2020

Sounds promising but I would still be reluctant to bundle it with the CDK. Going towards 2.0 the CDK will be a single module, which means our bar for bundling deps is going to be very high.

Can we still employ the same docker fallback approach here? What are the downsides?

@Dzhuneyt
Copy link
Contributor

Dzhuneyt commented Nov 1, 2020

Seems like a reasonable approach. To let the developers opt for Docker based or esbuild based compilation. However, let's consider these facts:

  1. 99% of users of the NodejsFunction can use Docker and Parcel based bundling currently interchangeable. The only exception are those that require bundling in a Lambda compatible Layer so that some sort of system dependencies can be embedded. Honestly, I don't have a practical example of this 1% but we should assume there are such use cases.
  2. Docker-based bundling is inherently slower compared to either Parcel or esbuild, due to the fact that there is the Docker overhead. This is especially true in MacOS based environments. Like 10x slower. Also, by using Docker-based bundling, Docker is introduced as a dependency of the project, instead of the standard Node + NPM, which CDK already requires anyway.

Based on these facts, I consider the "meaningful default" to be an option that is FASTER in 99% of the use cases and requires fewer system dependencies (Docker). This option currently is either Parcel or esbuild.

Long story short, it's about finding the right balance of keeping package dependencies low, and at the same time - keeping the development experience of consumers of the NodejsFunction construct satisfactory.

Just curious: If the NodejsFunction construct declares Parcel or esbuild as a dependency on a fixed version, isn't it enough to mitigate risks for AWS CDK?

@jogold
Copy link
Contributor

jogold commented Nov 1, 2020

Can we still employ the same docker fallback approach here? What are the downsides?

Yes but one of the problems we have with Docker here is that we need to mount the project root (in order to have access to the top level node_modules folder, this is a different situation compared to aws-lambda-python for example). This results in poor performances (almost unusable on macOS). And since installing parcel locally greatly improves performances without any constraints I think everyone is (or should be) using local bundling. And then the default becomes the exception....

Implementing a local only solution would result in much cleaner code also. We could maybe go back to having an "undeclared peer dependency".

Note: local bundling could still use Docker (as an option) when installing unbundled dependencies that should be built in a Lambda compatible environment.

@hoegertn
Copy link
Contributor

hoegertn commented Nov 1, 2020

Why exactly do we need access to the top-level node_modules? I normally have my own package.json in the lambda subfolder and I think bundling should work in this folder without the CDK project, shouldn't it?

@Dzhuneyt
Copy link
Contributor

Dzhuneyt commented Nov 2, 2020

That's a "classical" setup yes, but not a necessary one. In our organization, instead, we create Lambdas alongside/next-to the CDK constructs. So there's a single package.json and it's at the root of the CDK app. This makes it easy to centralize third party version upgrades of dozens of Lambdas.

@jogold
Copy link
Contributor

jogold commented Nov 2, 2020

I normally have my own package.json in the lambda subfolder

How do you easily install such a setup? Nesting package.json is usually an anti-pattern. There's no reason not to declare all your projects dependencies in one place (along with tooling configurations etc.), especially when bundling. The only exception are monorepos (still there's no nesting) but in this case dependencies are hoisted to the top-level node_modules folder so you need access to it when bundling.

@hoegertn
Copy link
Contributor

hoegertn commented Nov 2, 2020

I see that with bundling this might indeed not be necessary. Sometimes I zip the folder as Lambda asset and then I need it contained in a subfolder. Installing is done using the normal npm ci command. I am setting up my project with projen so this is called for all my folders automatically.

How do I specify Lambda-specific deps in one subproject and not in another?

I am open to any best-practices and happy to improve my setup.

@alan-cooney
Copy link

alan-cooney commented Nov 2, 2020

@alan-cooney This indeed works great for running tests, thanks! 👍 Is there also an open GitHub repo linked to this module?

Currently it's on my company's monorepo, but I can split it out if useful? It essentially just replaces all NodejsFunction classes with the standard Function from @aws-cdk/aws-lambda, and sets the code as Code.fromInline('return;') rather than bundling the actual lambda.

@jogold
Copy link
Contributor

jogold commented Nov 2, 2020

@hoegertn

Sometimes I zip the folder as Lambda asset and then I need it contained in a subfolder.

Got it, in this case you have no other solution (but you're creating a zip that contains much more than what's needed by your function).

That's exactly the problem we're trying to solve with the NodejsFunction construct (zero-config, bundling of exactly what's needed). Once we solve the performance issue there won't be any reason for nested package.json files anymore I think.

Installing is done using the normal npm ci command. I am setting up my project with projen so this is called for all my folders automatically.

This means that you're sometimes downloading and installing the same package multiple times. It's also complicated when you want to update packages.

@hoegertn
Copy link
Contributor

hoegertn commented Nov 2, 2020

I see your arguments. Performance is one issue, packaging additional files is sometimes an issue, and different dependency versions are sometimes a wanted thing.

But I would love to get rid of all this.

@jogold
Copy link
Contributor

jogold commented Nov 2, 2020

@floydspace we are going to replace parcel with esbuild while still maintaining local bundling + fallback to Docker approach for the moment. People installing esbuild will have very fast bundling performances.

Let me first draft a PR for this then we can work on it together.

eladb pushed a commit that referenced this issue Nov 18, 2020
Replace Parcel with esbuild for bundling.

esbuild offers [impressive performances](https://esbuild.github.io/) compared to Parcel.
Moreover everything can be configured via the CLI. This means that
we don't  need to play with the user `package.json` file anymore.

Add full Windows support for local bundling.

Refactor and clean-up.

Closes #10286
Closes #9130
Closes #9312
Resolves #11222

BREAKING CHANGE: local bundling now requires `esbuild` to be installed.
* **lambda-nodejs**: `projectRoot` has been replaced by `depsLockFilePath`. It should point to your dependency lock file (`package-lock.json` or `yarn.lock`)
* **lambda-nodejs**: `parcelEnvironment` has been renamed to `bundlingEnvironment`
* **lambda-nodejs**: `sourceMaps` has been renamed to `sourceMap`
@github-actions
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-lambda-nodejs bug This issue is a bug. effort/medium Medium work item – several days of effort in-progress This issue is being actively worked on. p1
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants