-
Notifications
You must be signed in to change notification settings - Fork 29.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wanted: Reimplement WHATWG URL Parser in WASM #38708
Comments
Can we use rust? |
As a vendored in dependency I personally wouldn't care. But... need to consider the impact on the prerequisites that are necessary to build node.js. If vendoring the dependency requires someone to set up a whole rust development environment to be able to build, then that may be problematic. If, alternatively, what we vendor in is just the wasm files and some glue code such that we only need the wasm compiler, then it really doesn't matter what the source language is. |
How much C/C++ experience would be needed? I have submitted small patches to Gecko before |
The Servo WHATWG URL parser in Rust has CI set up for WASM. Compiling that and publishing the resulting WASM to eg. npm could maybe even be something they themselves would be interested in? https://github.com/servo/rust-url/blob/d673c4d5e22b3a8ac91b7f52faa45dc32a275f75/.github/workflows/main.yml |
Since the practice already exists in undici, can we first try to work this on an npm module? |
I'll be happy with whatever works and keeps us spec compliant. It would be good to ensure that we can still quickly respond to changes in the base spec, so we'll need to make sure that whatever the implementation is it stays maintained. |
@jasnell Has anyone taken the issue? If not then I would like to work on this. |
It tried to find any benchmark results for the mentioned undici migration to wasm and found only the following message: nodejs/undici#575 Probably that's the not right place to look into since I don't see a significant performance improvement in the above message. Most likely that's a wrong place to look at, so could someone tell me where to find more information on existing experiments? |
So, simplified, in essence what it would take is to swap the Lines 79 to 102 in 910efc2
Similar to how the import of Considering that, I guess it can easier to compile the existing c++ code to WebAssembly than to set up some wasm-pack / wasm-bindgen flow that compiles JS-bindings for servo/rust-url using something like I did however open servo/rust-url#712 to see if they would be interested in publishing such bindings themselves. |
Hope my question is not too stupid. But may I ask how WASM is debugged? JS and C++ are quite easy to debug but I never debugged WASM till now. |
In this context, you can use chrome devtools. it supports dwarf sections and whatnot so it should "just work" |
@voxpelli ... yes, that's essentially it. The one complicating bit there are the |
@jasnell - This sounds interesting, just wondering if there is any starting point for this issue? |
Maybe https://www.assemblyscript.org/ could be one way to achieve this? |
Hi! Upon talking with @jasnell, I wanted to give a heads up that I'm currently working on this issue. I've started writing WHATWG URL parser in Rust (as a graduation project for my masters) with a goal of replacing the existing implementation if the performance and the implementation justifies it. It's quite early to estimate the performance impact but if you want to help or just take a look at my progress feel free to visit github.com/anonrig/url. I'm in the process of URL encoding/decoding the inputs and have a working copy of URLSearchParams written in Rust. My current solution is written in Rust, and later compiled into Wasm using wasm-bindgen and serde-wasm-bindgen. This allows us to generate/create a typescript definition included Node module without writing any JavaScript and simplifies the build process. If you're new to Rust but want to contribute to my process of solving this issue, you can reference the Github Workflow I've created which creates, builds, tests and later deploys the Rust package to npm under url-wasm (which will be available asap) My primarily goal is to have a spec compliant URL and URLSearchParams implementation and later focus on the performance impact/benchmarking it. (I'm also an active member in the Node.js Slack channel under my name and surname, if anybody wants to say hello and talk about URL) |
@anonrig What's your roadmap? What are next steps in terms of development? |
One downside that hasn't been mentioned yet is lack of code sharing. The C++ URL parser gets shared between processes (think shared library) but with WASM, if you have a cluster of let's say 32 workers, you have 32 WASM blobs taking up memory. I don't think V8 supports precompiling WASM, like it does for regular JS (the startup snapshot), but even if it did, bigger snapshots means slower startup times. Snapshots must be deserialized before they can be used. The overhead might be negligible for something like a URL parser, though. |
Thanks for your patience @nebarf. Upon request I've created an issue on URL-WASM repository and added the missing features. If you have any questions or want to contribute, let's discuss it on the repository issues. anonrig/url#1 |
Upon my initial benchmarking I've realized that WebAssembly is primarly written to support long-running CPU intensive tasks, which does minimal data sharing between JS and WASM. Due to the nature of URL parser, and being bound to TextEncoder and TextDecoder serialization requirement of WebAssembly, there's a 20% reduced performance. My benchmark was primarily focused on native URL parser and the Rust one I've working on. I'd be happy to add the benchmarks to the repository when I've more information about the performance impact of |
This could be an interesting opportunity to experiment with v8's work on wasm interface types & fast calls, which should help with the performance. |
Upon implementing a PoC using Rust and WebAssembly, and talking back and forth with @jasnell, I found out that WebAssembly is not suitable for this particular task. I'll start implementing the URL state machine using JavaScript under https://github.com/anonrig/url-js. Here's my experience and learnings using WASM and Rust: https://www.yagiz.co/implementing-node-js-url-parser-in-webassembly-with-rust/ |
How would it be different from https://github.com/jsdom/whatwg-url? |
Since URLSearchParams is written in JavaScript, there's no need for me to focus on reimplementing it in JavaScript. The only bottleneck and the point of this task is to create a URL state machine in JS (which is also implemented in whatwg-url). Personally speaking, I did not like the implementation in |
I've extacted the C++ logic for URL. Maybe I can try to compile it to wasm. |
Status Update:I've written and open-sourced Currently, url-state-machine is faster than other JavaScript based implementations, but still not as fast as C++. I came to a state where I've squeezed as much as performance from JavaScript with the help of @jasnell, but the outcome is not as good as it should be (especially for long strings). I'd like to ask the community for some feedback especially on the |
interface types should hopefully help with this case, allowing the wasm to tell v8 it wants to deal with strings, instead of you having to manually encode and copy them into the wasm's memory. |
I've written the C++ version of WHATWG URL and passed all the WPT of 3 May, 2022 (Living Standard — Last Updated 3 May 2022) and already used in our own Web-interoperable Runtime. And I initially commit the C++ code in https://github.com/xadillax/libwhatwgurl. The I think we can continue working on this. |
@XadillaX did you compile it to wasm? How performant was it? |
Not yet. I'm currently using it via v8's binding. If necessary, I will try to. |
Is it worth moving the current implementation into a separate project outside the core repo if it's also written in C++ and not compiled to wasm? I'm not sure what the advantage would be. A disadvantage I can think of is that it would probably delay the process of landing changes made to the parser in the core repo. |
Behind this disadvantage (maybe), I think this can:
|
Is maintaining the parser in the core repo problematic?
Why would anyone use this outside of Node.js? Are we going to treat security vulnerabilities from the parser the same way as the ones coming from projects like OpenSSL?
That can be done in the core repo too, right? |
This issue is about investigations whether a WASM-version of the code could become faster than the current C++ version, so for the sake of this issue: It's needed 🙂
I think it would have to be a separate issue from this one anyhow and one that would make little sense to go ahead with until this one has reached a conclusion. |
Given the excellent investigation that @anonrig did on this, it's clear that a WASM implementation is not going to bring any performance improvements whatsoever. We can close this issue and pursue other approaches. |
The current WHATWG URL parser implementation is written in C/C++ and incurs a fairly significant cost crossing the JS/C++ boundary. It should be possible to realize a significant performance improvement by porting the implementation to WASM (similar to how the https://github.com/nodejs/undici project has seen a massive performance boost out of moving llhttp parser to wasm).
If someone wanted to pick up this effort, I am available to help mentor through it.
It will require c/c++ experience. What I would imagine would be best is setting it up as a separate project that we would vendor in as a dependency. Done right it would have no breaking API changes.
The text was updated successfully, but these errors were encountered: