-
Notifications
You must be signed in to change notification settings - Fork 992
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Poor performance with nodejs watchman client #113
Comments
path generators have been very rarely used so far, so there are probably dragons :-/ What's your high level goal? That will help to figure out what to focus on in fixing this up. |
When my program starts I crawl the parts of repo that I'm interested in to I had assumed that using watchman's query to generate a list of files based I've found that query's performance is highly correlated by how much I'm a bit sad that this didn't turn out to be a win. On Sunday, June 21, 2015, Wez Furlong [email protected] wrote:
|
Interesting comment regarding filtering; makes it sound like you might be running up against json encoding/decoding overheads. Can you try a couple of cli benchmarks on your tree with your query?
That will contrast BSER vs. json raw perf for the query (it should get passed thru by the CLI when run in those modes). We don't have a bser decoder for node :-/ What data do you use from watchman? You may want to limit the list of fields to just the name to reduce the size of the json that you need to read and parse. If all of the files of interest have suffixes, you may have better results with the suffix generator combined with |
They're both very fast from the cli. Ran a few times and bser is slightly faster.
I've put a timed the command until the first chunk of the response comes on the socket and it's on the order of 100ms (inline with the benchmark above). However it takes 2-3s to finish streaming. Could this be the node I might try it as an
Just using the name field.
I'm doing that, it's taking around 3-5s to finish and call my callback. (Although, I'm not using exists. I should) |
Just tried |
The only thing different on the server side when you use the CLI is that we default to BSER. BSER is cheaper to encode/decode than JSON, but I'm not sure that it would make that much difference if you factor in that the CLI then needs to re-encode the BSER as JSON for you to consume from node. It sounds like this is mostly |
We're using https://www.npmjs.com/package/json-stream to handle the decoding, it's more likely that the overheads are in there than just in the |
mmalecki/json-stream#6 may be a good candidate |
This should go into http://accidentallyquadratic.tumblr.com/ |
I added an Anyways, I might need to do some profiling.
/facepalm |
I have a patch for json-stream that might address this, but I'm dashing out the door. |
The patch made it 6x slower. I'm bit struggling to understand the reasoning behind the patch. |
But you're right, jsonstream is the bottleneck:
Switching out that module with this one: https://www.npmjs.com/package/jsonstream |
If the JSON stream from watchman is newline delimited can't we just build up a buffer and simply use |
Heh, perf impact. So, does using the other jsonstream package bring us close to the perf you saw with an exec, or do we have more room for improvements? |
Close, but exec + JSON.parse is a bit faster. I'll get some concrete numbers. |
https://reviews.facebook.net/D40479 takes a stab at implementing BSER |
wow! it's now down to 100-200ms. Great work! |
query
command
With the perf addressed, I think the remaining issue is that we need to fix up the docs for the path generator to say that it only takes an array? Anything else? |
Also: down to ~100ms range from 10s? Or was that 10s including some other processing? :-) |
Nope, that's it :D
It was 10s when I didn't include the suffix (or dirs), it was 4-5s with the suffix and dirs. |
Summary: this is a javascript implementation of BSER encoding/decoding. This helps us workaround the perf issues in Refs: #113 I've performed only functional testing thus far, so I don't have any real data on perf impact. I'm hoping to avoid writing C++ to interface with the Node/V8 internals. For decoding, the strategy is to read the PDU length up-front, accumulate the data into the buffer, and when we have the complete data, recursively process and decode it. This is much simpler than doing something like a SAX streaming parser. We can incrementally accumulate the data so that we can plug into the node event loop. The BunserBuf class will emit an event for each decoded value in the stream. A convenience method is provided that wraps around BunserBuf and decodes an input string or Buffer and returns the result (rather than emitting an event). For encoding, to make things simpler, and because the input data tends to be small and relatively simple, we only encode integers as int32 values. This gives us a predictable size in the header region and allows us to go back and fill in the size. Since json numbers are all real values, all number values passed to the BSER encoder will occupy 8 bytes. Removed unused deps from the package, and bumped up the version number. Test Plan: `npm test`. also: ``` cd watchman/node npm link cd .. NODE_PATH=/usr/local/lib/node_modules node node/example.js ``` Reviewers: amasad Reviewed By: amasad Subscribers: sid0 Differential Revision: https://reviews.facebook.net/D40479
@amasad think this was the cause of the |
I doubt it. That error is based on the ready event which is the completion of the following commands:
None of which should produce large output. Right? |
I've updated the docs for the path generator to address the array vs. path portion of this, so I'm closing this out. |
Summary: Pull Request resolved: facebook/sapling#113 GitHub Actions was failing at apt-get stage, error message suggested adding an apt-get update as remedy. Added in this diff and it indeed works. The actions build fails later on missing mio::net module when building mysql async, but that's unrelated to this change. Reviewed By: farnz Differential Revision: D34368662 fbshipit-source-id: f0a00da3ee740ae4443a328616e792ea615c922c
The first thing is that it says on the site that I can pass in a string, but watchman will complain that it's not an array:
Second, when I pass in array of strings:
I'm working around this by having an
anyof
expression withdirname
s.The text was updated successfully, but these errors were encountered: