Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster sync API via worker_thread #590

Closed
cspotcode opened this issue Dec 10, 2020 · 20 comments
Closed

Faster sync API via worker_thread #590

cspotcode opened this issue Dec 10, 2020 · 20 comments

Comments

@cspotcode
Copy link

cspotcode commented Dec 10, 2020

I'm brand new to esbuild, but am looking for excuses to cut my teeth on more golang projects.

I see from various issues and pull requests that esbuild's sync APIs spawn a new external process for each transformation call, because this is the only way to synchronously pipe data to the external process and wait for a response.

To reduce the per-transformation overhead, I'm wondering if either of these solutions are a) possible and b) something I can work on as a pull request:

  1. Use a worker_thread to perform async communication with an esbuild process, blocking the main thread. I've implemented code that blocks the main thread while doing async stuff in a worker, so I could publish that as a reusable library and use it in esbuild. (here is my code, specifically the sync_worker package: https://github.com/cspotcode/typescript-http-support) I imagine usage would look like this pseudocode. (forgive me if this is obvious, as I say I'm brand new to esbuild)
const service = createEsBuildSyncCompileService() // <-- spawns worker_thread and esbuild process
service.transformSync(); // blocks main thread calling out to worker_thread, which uses async to talk to esbuild
  1. Compile esbuild to WebAssembly and run it in-process. I know less about the state of go-to-WASM compilation, but would enjoy the excuse to learn more.
    EDIT I see there is already a webassembly package. In that case, I can implement option (1) using the WebAssembly build.
@cspotcode cspotcode changed the title Faster sync API via WebAssembly or worker_thread Faster sync API via worker_thread Dec 10, 2020
@evanw
Copy link
Owner

evanw commented Dec 11, 2020

Oh wow what a hack. I love it. It looks like you're using the SharedArrayBuffer atomics API to block the main thread. I don't know what the cross-platform support for this feature is. MDN says it's supported in node 8+, which should be fine. It looks like worker_thread was added in node 10+. So I think it's ok? Hopefully there aren't any issues with node's atomics and less common processor architectures.

I think this could be good to add to esbuild, assuming benchmarks show that it's a big win (I expect them to). I want to keep esbuild dependency-free so it should be implemented as a part of esbuild itself instead of as a library that esbuild uses. This shouldn't require much additional code to support so including it in esbuild is fine.

It definitely shouldn't be implemented such that it requires WebAssembly though. The WebAssembly version is up to 10x slower than the native version for a variety of reasons. Instead, it should be implemented such that it works equally well with either the native version or the WebAssembly version. Both versions use the same TypeScript library code (they just create a different executable as a child process) so this should naturally happen without any extra work.

This could be used to add sync API support to the browser too. However, I think we shouldn't do that at this time. We should keep this node-only for now. There's a big push in the browser community to avoid blocking the main thread and having this be available would likely result in people using it because it's convenient without understanding the consequences.

This could be used to add support for plugins to the buildSync API. I think I'd like to do that afterward if this experiment works out. That's currently a shortcoming with the existing API since plugins require message passing back and forth with JavaScript code that can't be done asynchronously.

I think I'd prefer if this were exposed as a part of the existing createService() API to keep the API surface small. Basically adding buildSync and transformSync to Service but they throw unless you pass allowSync: true to createService().

I'm brand new to esbuild, but am looking for excuses to cut my teeth on more golang projects.

Keep in mind that working on this won't touch any Go code. The natural way to implement this is all in TypeScript. I'm happy to accept a PR for this, and I'm also happy to implement this myself if you were hoping to work on Go code.

@lukeed
Copy link
Contributor

lukeed commented Dec 11, 2020

Is there any reason to suspect a service will be juggling async and sync operations during its lifetime? If not, then maybe createService({ sync: true }) and build|transform names are kept as is?

@cspotcode
Copy link
Author

Thanks for the details response.

FWIW, I learned about this ability to block the main thread from some of the core node team in an ESM modules design thread. They experimented with it for blocking require() calls of ECMAScript modules, where each ESM loader hook is kept in a separate worker thread, and a (main) thread can still require() ESM modules. To me it feels more like old-school threading than a hack, but I see what you mean.

I believe Atomics are a part of the JS standard, or browser standard, right? https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Atomics My impression from the work I've done so far, is that they need to be robust on all platforms because they're a spec-ed part of the web standards. You're concerned that old node versions will be buggy?

It could be supported where available and fallback to process spawning on incompatible platforms. Though the performance disparity between new and ancient node versions might lead to confused bug reports.

Have you thought about exposing functions via syscall/js and/or compiling to a shared library, a native node addon? I know that's commonly done with rust, but it seems go is barely beginning to support shared libraries.

Keep in mind that working on this won't touch any Go code.

Truth be told, I'm most interested in learning about the language's internals, the runtime, how it compares to rust, how it interops with the web, stuff like that. So this is an interesting problem space, even if it only involves writing TS.

Given what you said about dependencies, probably best if you implement in esbuild and I'll focus on turning what I have into a reusable library. What is the motivation for keeping esbuild dependency free? Do you bundle third-party components, or is the goal to control every line of code?

Regardless, I'm very excited to start using esbuild. I'm adding a couple experimental compilers to ts-node so that users can get super-fast compilation without changing their configuration. So far I have swc and it'll be great to offer esbuild, too.

@cspotcode
Copy link
Author

Is there any reason to suspect a service will be juggling async and sync operations during its lifetime? If not, then maybe createService({ sync: true }) and build|transform names are kept as is?

This can get messy because an argument to the factory affects the type signatures of the API elsewhere. transform switches from returning a Promise<string> to returning a string. I think having the type signatures change adds complexity on the whole.

@lukeed
Copy link
Contributor

lukeed commented Dec 11, 2020

The type signatures are easily solved :) Just putting it out there since it may reduce the overall code needed within the service -- though probably not.

Either way, just a suggestion that came to mind when reading over the thread. Thanks for sharing!

@evanw
Copy link
Owner

evanw commented Dec 11, 2020

Is there any reason to suspect a service will be juggling async and sync operations during its lifetime? If not, then maybe createService({ sync: true }) and build|transform names are kept as is?

I see what you mean, but I still prefer the approach I described. It's more a question of API design. First of all, I see the Service object as a copy of the main package itself, just with a long-lived process. So if the main package has buildSync, the service having it too makes sense. I also like that build and buildSync look different because they do different things and have to be used differently. For example, you wouldn't want to accidentally forget to await build when you added sync: true. Also the name buildSync follows conventions from node for synchronous versions of functions such as readFileSync.

You're concerned that old node versions will be buggy?

I'm concerned that support for the atomics API might have, say, landed in later node versions for MIPS processors than they did for for x64 processors. But it's not a big deal if that happened as long as it's supported now.

Have you thought about exposing functions via syscall/js and/or compiling to a shared library, a native node addon? I know that's commonly done with rust, but it seems go is barely beginning to support shared libraries.

There is some experimental work along those lines here: #248. Right now esbuild is highly portable and easy to build. I see this as a plus. Integrating it with node adds complexity to the build process and makes it less portable, so doing this is not without drawbacks. And esbuild's primary purpose is being a bundler. The transform library use case is secondary. So I have some reservations about taking esbuild in that direction.

Given what you said about dependencies, probably best if you implement in esbuild and I'll focus on turning what I have into a reusable library. What is the motivation for keeping esbuild dependency free? Do you bundle third-party components, or is the goal to control every line of code?

Sounds good! No worries. I'll add this in the next release then.

The goal is pretty much to control every line of code (independent of code from the Go standard library obviously). I have several motivations:

  • Projects with fewer dependencies are faster to install. This can be somewhat mitigated by bundling the code, although having a lot of dependencies tends to add bloat as dependencies can re-implement things or provide unused functionality.

  • Projects with dependencies can be more complicated to install. For example, esbuild actually had one dependency in the past (a Go library that was only used for one function call in tests) but it caused an issue with someone installing it due to network issues, so I removed it: It's not support mips64le #523 (comment). People can have some pretty crazy proxy setups and may be installing esbuild through unusual means (e.g. npm is not available). This particular case could have been solved with vendoring but it was such a small thing that I just implemented it myself.

  • Projects with no dependencies are easier to audit and feel confident about the security properties of. To trust the esbuild project you currently only need to trust me. You don't need to trust 100 other developers from all over the internet. And you can easily audit all of the code yourself because it's all in one repo.

  • Projects with fewer dependencies are easier to iterate on. Making changes you need in other repositories involves collaboration and can slow things down a lot. It could even not be possible if someone stops maintaining something. Owning all of the code means I can fix any issue immediately without waiting.

  • Projects with no dependencies are less likely to bit rot over time. For example, packages that depend on other packages from npm are vulnerable to breakage if those packages get taken down for some reason. This has happened in the past.

I realize this approach is somewhat unusual, but it's not unheard of. The Rome Toolchain also deliberately has zero dependencies. They say "This allows us to develop faster and provide a more cohesive experience by integrating internal libraries more tightly and sharing concepts and abstractions. There always exist opportunities to have a better experience by having something purpose-built."

@evanw
Copy link
Owner

evanw commented Dec 11, 2020

This could be used to add support for plugins to the buildSync API. I think I'd like to do that afterward if this experiment works out. That's currently a shortcoming with the existing API since plugins require message passing back and forth with JavaScript code that can't be done asynchronously.

Wait, that wouldn't work. Plugins are code and code can't be transferred between threads.

@evanw evanw closed this as completed in 347c55c Dec 11, 2020
@evanw
Copy link
Owner

evanw commented Dec 11, 2020

Looks like it's 1.5x to 15x faster according to a quick benchmark. I tested esbuild.transformSync() vs. service.transformSync() for a small string and a large string. The small string was ~15x faster and the large string was ~1.5x faster. This is really cool! Thanks for the tip about worker_thread and atomics. This feature will go out with the next release.

@evanw
Copy link
Owner

evanw commented Dec 11, 2020

I discovered worker.unref() after I landed the change. That appears to let the worker thread be cleaned up automatically when the process exits so you don't need to call stop() like you do with the service. With this, I can actually speed up the existing APIs without changing them. This seems preferable to requiring the use of a service, so I'm going to revert the API changes and speed up the existing API instead.

Also: I'm hitting some intermittent CI hangs on Windows, so this approach may have problems on Windows. Either that or there's something iffy about my implementation. I'm going to release this approach as initially opt-in to avoid breaking people. If this approach works out then it'd be great for it to just always be enabled.

evanw added a commit that referenced this issue Dec 11, 2020
@cspotcode
Copy link
Author

Nice, hopefully I can try this out soon.

I think I found the (potential) bug in your code. It's possible to postMessage to the worker thread, then the worker calls .notify, and then the main thread calls .wait. If that happens, it'll hang. The solution is to increment the value being watched in the worker, so that .wait will not wait at all if the value has been incremented.

https://github.com/cspotcode/typescript-http-support/blob/main/packages/sync_worker/src/worker.ts#L49
https://github.com/cspotcode/typescript-http-support/blob/main/packages/sync_worker/src/client.ts#L25-L28

@evanw
Copy link
Owner

evanw commented Dec 12, 2020

Thanks for pointing that out. I bet that's what happened. Will fix.

I think there's actually potentially another race condition, one which is present in your library as well. If there are two synchronous messages, the call to Atomics.notify() from the first call might be able to incorrectly cause the second call to terminate early. If that happens, receiveMessageOnPort could return undefined.

I'm going to try to describe it. Given this condensed form of the code:

// Main thread
/* A */ const signalBefore = Atomics.load(sharedBufferView, 0);
/* B */ mainPort.postMessage(...);
/* C */ Atomics.wait(sharedBufferView, 0, signalBefore);
/* D */ receiveMessageOnPort(mainPort);

// Worker thread
/* X */ workerPort.postMessage(...)
/* Y */ Atomics.add(sharedBufferView, 0, 1);
/* Z */ Atomics.notify(sharedBufferView, 0);

Say the following sequence of events happens:

Main thread Worker thread Shared value Notes
A 0 The main thread starts the first request
B 0 The main thread sends the message
X 0 The worker thread sends the response
Y 1 The shared value is incremented
C 1 The wait fails because the shared value is different
D 1 The main thread receives the message
1
A 1 The main thread starts the second request
B 1 The main thread sends the message
C 1 The wait succeeds because the shared value hasn't changed
Z 1 The worker thread, still on the first request, notifies the main thread
D 1 The main thread incorrectly receives nothing

A fix for this is to use a separate shared buffer for each request.

@mhart
Copy link

mhart commented Dec 12, 2020

Awesome! Funnily enough, the Atomics approach is exactly the same one I used for https://github.com/lambci/sync-threads

Source here:

https://github.com/lambci/sync-threads/blob/master/src/index.js

I think it's probably a little too simple for your use case here though

@mhart
Copy link

mhart commented Dec 12, 2020

The main difference I can see is that you're using postMessage to communicate data back and forth whereas I'm using the SharedArrayBuffer itself. Using the SharedArrayBuffer might be faster because it's one less copy, but it does require you to get the buffer size right.

@cspotcode
Copy link
Author

cspotcode commented Dec 12, 2020 via email

@evanw
Copy link
Owner

evanw commented Dec 14, 2020

FYI I am reverting the introduction of worker_threads. There is a case where the worker thread is disconnected from the main thread when it is created, which means it can't communicate with the main thread: #595. I'll have to investigate later whether it's even possible to use this approach in that case.

@cspotcode
Copy link
Author

Ah bummer. I filed this bug report with node, which may be related to a root cause.

nodejs/node#36531

@cspotcode
Copy link
Author

The same message that gives the worker its parentPort also hands over cwdCounter, and a missing cwdCounter causes the bug I reported. This suggests that it may be possible to a) catch that error to prevent the worker from terminating, and b) get a reference to parentPort later, when it has been initialized.

https://github.com/nodejs/node/blob/master/lib/internal/main/worker_thread.js#L98-L131

I'm not saying these hacks and workarounds should be released to production, but it's useful to understand the problem more clearly. There may be an appetite for this feature to be available under an experimental or beta flag.

@eigilsagafos
Copy link

I have been playing around with a babel-node inspired approach that uses esbuild for running scripts in my codebase. In an effort to get it to work with source maps, yarn pnp and support for __filename and __dirname I ended up using transformSync on each file by patching module._extensions. (I'm very open to a better approach by the way). Anyway, in one script I am loading 700+ files. I have seen very different speeds on this in different versions of esbuild, so was very happy to see this land in 0.8.22. I hope to see this functionality get back into esbuild. Maybe as an opt-in feature?

Here are some stats that might interest you @evanw and that has kept me on version 0.8.2 for a while:

Loading script with ~700 ts/js files with transformSync

esbuild <= 0.8.2:     0m 16s 
esbuild 0.8.4-0.8.21: 2m 21s (!)
esbuild 0.8.22:       0m  5s


@cspotcode
Copy link
Author

cspotcode commented Dec 16, 2020

I hope it's ok to keep peppering relevant information here. If it feels spammy, I can stop and consolidate it somewhere.

One tl;dr from nodejs/node#36531 is that worker_threads inherit NODE_OPTIONS and execArgv from the parent, so they will also execute any --require preload modules. This means if you're not careful, a --require script may recursively spawn worker threads.

I also had a thought: users may want to consume the esbuild API from within their own worker thread, so checking isMainThread isn't reliable. We can check for workerData though. We can pass isEsBuildWorkerThread: true and check for that. Or use a different entry-point .js file for the thread, worker.js

nodejs/node#36531 means that workerData and a few other worker things are not available synchronously in workers spawned by preload scripts. Node needs to fix this, but it's also possible that we can reliably workaround it by asynchronously getting references to workerData and parentPort.

@evanw
Copy link
Owner

evanw commented Dec 20, 2020

Here are some stats that might interest you @evanw and that has kept me on version 0.8.2 for a while:

Thanks for the heads up. This is clearly a performance regression in 0.8.4. I'll get it fixed in the next release regardless of what ends up happening with worker_threads. Please let me know ASAP if you ever observe a regression like this in the future!

Edit: I just did a similar performance test with a large code base I have access to. Here are my performance results:

[npm]
0.8.2: 0m20.685s
0.8.4: 0m20.686s
0.8.22: 0m4.055s

[yarn]
0.8.2: 0m24.218s
0.8.4: 2m41.113s
0.8.4 with bug fix: 0m23.980s
0.8.22: 0m6.725s

Looks like using yarn will slow down your code, presumably due to the custom path resolution. Just a heads up in case you're trying to get your code to go as fast as possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants