Poor performance with nodejs watchman client #113

amasad · 2015-06-21T11:34:18Z

The first thing is that it says on the site that I can pass in a string, but watchman will complain that it's not an array:

watchman -j <<-EOT
> ["query", "/Users/amasad/fbobjc-hg", { "path": "hi" }]
> EOT
{
    "version": "3.2.0",
    "error": "failed to parse query: 'path' must be an array"
}

Second, when I pass in array of strings:

It's ignored
Takes very long to finish the operation

I'm working around this by having an anyof expression with dirnames.

The text was updated successfully, but these errors were encountered:

wez · 2015-06-21T17:07:25Z

path generators have been very rarely used so far, so there are probably dragons :-/

What's your high level goal? That will help to figure out what to focus on in fixing this up.

amasad · 2015-06-21T18:26:13Z

When my program starts I crawl the parts of repo that I'm interested in to
build an in-memory representation of the fs (and use watchman to keep it
up-to-date).

I had assumed that using watchman's query to generate a list of files based
on a pattern, suffix, and dirs would be faster than going to disk manually.
However, so far I l've found that recursively reading the directory and
filtering in node is faster by 500ms-1s.

I've found that query's performance is highly correlated by how much
filtering is done by watchman. For example adding 'suffix' and dirname
martchers cut the time for the query from 10s to 5s.

I'm a bit sad that this didn't turn out to be a win.

On Sunday, June 21, 2015, Wez Furlong [email protected] wrote:

path generators have been very rarely used so far, so there are probably
dragons :-/

What's your high level goal? That will help to figure out what to focus on
in fixing this up.

—
Reply to this email directly or view it on GitHub
#113 (comment).

wez · 2015-06-21T18:52:16Z

Interesting comment regarding filtering; makes it sound like you might be running up against json encoding/decoding overheads.

Can you try a couple of cli benchmarks on your tree with your query?

time watchman -j < query.json --output-encoding=bser --server-encoding=bser --no-pretty > /dev/null
time watchman -j < query.json --output-encoding=json --server-encoding=json --no-pretty > /dev/null

That will contrast BSER vs. json raw perf for the query (it should get passed thru by the CLI when run in those modes).

We don't have a bser decoder for node :-/

What data do you use from watchman? You may want to limit the list of fields to just the name to reduce the size of the json that you need to read and parse.

If all of the files of interest have suffixes, you may have better results with the suffix generator combined with exists and dirname filters. The suffix index is a linked list of all files with the same suffix so it should be cheaper to walk than crawling the path tree.

amasad · 2015-06-21T21:37:20Z

They're both very fast from the cli. Ran a few times and bser is slightly faster.

time watchman -j < query.json --output-encoding=json --server-encoding=json --no-pretty > /dev/nulltime
real    0m0.107s
user    0m0.003s
sys 0m0.005s

time watchman -j < query.json --output-encoding=bser --server-encoding=bser --no-pretty > /dev/null

real    0m0.093s
user    0m0.003s
sys 0m0.005s

I've put a timed the command until the first chunk of the response comes on the socket and it's on the order of 100ms (inline with the benchmark above). However it takes 2-3s to finish streaming. Could this be the node net module overhead?

I might try it as an exec and see if it's cheaper.

What data do you use from watchman? You may want to limit the list of fields to just the name to reduce the size of the json that you need to read and parse.

Just using the name field.

If all of the files of interest have suffixes, you may have better results with the suffix generator combined with exists and dirname filters. The suffix index is a linked list of all files with the same suffix so it should be cheaper to walk than crawling the path tree.

I'm doing that, it's taking around 3-5s to finish and call my callback. (Although, I'm not using exists. I should)

amasad · 2015-06-21T21:48:01Z

Just tried child_process.exec from the repl and it's somewhere between 100ms and 1s to execute the query, get the result, and JSON.parse it. It's now clear that the overhead is coming from the socket connection. Either on node's side, or watchman.

wez · 2015-06-21T21:52:49Z

The only thing different on the server side when you use the CLI is that we default to BSER. BSER is cheaper to encode/decode than JSON, but I'm not sure that it would make that much difference if you factor in that the CLI then needs to re-encode the BSER as JSON for you to consume from node.

It sounds like this is mostly net module overhead; perhaps we can do something to tune the performance of that?

wez · 2015-06-21T21:54:57Z

We're using https://www.npmjs.com/package/json-stream to handle the decoding, it's more likely that the overheads are in there than just in the net module

wez · 2015-06-21T21:56:09Z

mmalecki/json-stream#6 may be a good candidate

sunshowers · 2015-06-21T22:29:21Z

This should go into http://accidentallyquadratic.tumblr.com/

amasad · 2015-06-21T22:46:15Z

We're using https://www.npmjs.com/package/json-stream to handle the decoding, it's more likely that the overheads are in there than just in the net module

I added an on('data' listener on the socket and timed that. The last chunk before jsonstream kicks in clocks in at ~2.5s. So that makes net the more probable culprit. (unless jsonstream is blocking the event loop by doing the parsing).

Anyways, I might need to do some profiling.

This should go into http://accidentallyquadratic.tumblr.com/

/facepalm

wez · 2015-06-21T22:58:56Z

I have a patch for json-stream that might address this, but I'm dashing out the door.
I've messaged @amasad with the paste URL for it, will work on submitting it upstream later on if it proves to work

amasad · 2015-06-21T23:17:09Z

The patch made it 6x slower. I'm bit struggling to understand the reasoning behind the patch.

amasad · 2015-06-21T23:31:50Z

But you're right, jsonstream is the bottleneck:

   1196   52.8%    LazyCompile: JSONStream._transform /fb-watchman/node_modules/json-stream/lib/json-stream.js:15

Switching out that module with this one: https://www.npmjs.com/package/jsonstream
Cuts the time taken by more than a 1/3rd. It now takes only a ~1s.

amasad · 2015-06-21T23:33:07Z

If the JSON stream from watchman is newline delimited can't we just build up a buffer and simply use JSON.parse when we see a newline?

wez · 2015-06-22T02:45:34Z

Heh, perf impact. So, does using the other jsonstream package bring us close to the perf you saw with an exec, or do we have more room for improvements?
The current json-stream package has some logic for speculatively parsing at the end of a packet (not based on newlines) which we don't need and will drive up the parse cost.

amasad · 2015-06-22T02:51:53Z

Close, but exec + JSON.parse is a bit faster. I'll get some concrete numbers.

wez · 2015-06-22T08:20:37Z

https://reviews.facebook.net/D40479 takes a stab at implementing BSER

amasad · 2015-06-22T11:12:38Z

wow! it's now down to 100-200ms. Great work!

wez · 2015-06-22T16:05:42Z

With the perf addressed, I think the remaining issue is that we need to fix up the docs for the path generator to say that it only takes an array? Anything else?

wez · 2015-06-22T16:07:47Z

Also: down to ~100ms range from 10s? Or was that 10s including some other processing? :-)

amasad · 2015-06-22T22:31:53Z

With the perf addressed, I think the remaining issue is that we need to fix up the docs for the path generator to say that it only takes an array? Anything else?

Nope, that's it :D

Also: down to ~100ms range from 10s? Or was that 10s including some other processing? :-)

It was 10s when I didn't include the suffix (or dirs), it was 4-5s with the suffix and dirs.

Summary: this is a javascript implementation of BSER encoding/decoding. This helps us workaround the perf issues in Refs: #113 I've performed only functional testing thus far, so I don't have any real data on perf impact. I'm hoping to avoid writing C++ to interface with the Node/V8 internals. For decoding, the strategy is to read the PDU length up-front, accumulate the data into the buffer, and when we have the complete data, recursively process and decode it. This is much simpler than doing something like a SAX streaming parser. We can incrementally accumulate the data so that we can plug into the node event loop. The BunserBuf class will emit an event for each decoded value in the stream. A convenience method is provided that wraps around BunserBuf and decodes an input string or Buffer and returns the result (rather than emitting an event). For encoding, to make things simpler, and because the input data tends to be small and relatively simple, we only encode integers as int32 values. This gives us a predictable size in the header region and allows us to go back and fill in the size. Since json numbers are all real values, all number values passed to the BSER encoder will occupy 8 bytes. Removed unused deps from the package, and bumped up the version number. Test Plan: `npm test`. also: ``` cd watchman/node npm link cd .. NODE_PATH=/usr/local/lib/node_modules node node/example.js ``` Reviewers: amasad Reviewed By: amasad Subscribers: sid0 Differential Revision: https://reviews.facebook.net/D40479

wez · 2015-06-24T13:21:47Z

@amasad think this was the cause of the ERROR Watcher took too long to load problems in React Native?

amasad · 2015-06-25T00:44:33Z

I doubt it. That error is based on the ready event which is the completion of the following commands:

version
watch-project
clock
subscribe

None of which should produce large output. Right?

Refs: #113

wez · 2015-06-30T07:21:27Z

I've updated the docs for the path generator to address the array vs. path portion of this, so I'm closing this out.

Summary: Pull Request resolved: facebook/sapling#113 GitHub Actions was failing at apt-get stage, error message suggested adding an apt-get update as remedy. Added in this diff and it indeed works. The actions build fails later on missing mio::net module when building mysql async, but that's unrelated to this change. Reviewed By: farnz Differential Revision: D34368662 fbshipit-source-id: f0a00da3ee740ae4443a328616e792ea615c922c

wez changed the title ~~Weird behavior with the path generator in the query command~~ Poor performance with nodejs watchman client Jun 22, 2015

wez mentioned this issue Jun 24, 2015

brew install fail: C compiler cannot create executables #114

Closed

wez added a commit that referenced this issue Jun 30, 2015

Improve documentation for query generators

19831a8

Refs: #113

wez closed this as completed Jun 30, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Poor performance with nodejs watchman client #113

Poor performance with nodejs watchman client #113

amasad commented Jun 21, 2015

wez commented Jun 21, 2015

amasad commented Jun 21, 2015

wez commented Jun 21, 2015

amasad commented Jun 21, 2015

amasad commented Jun 21, 2015

wez commented Jun 21, 2015

wez commented Jun 21, 2015

wez commented Jun 21, 2015

sunshowers commented Jun 21, 2015

amasad commented Jun 21, 2015

wez commented Jun 21, 2015

amasad commented Jun 21, 2015

amasad commented Jun 21, 2015

amasad commented Jun 21, 2015

wez commented Jun 22, 2015

amasad commented Jun 22, 2015

wez commented Jun 22, 2015

amasad commented Jun 22, 2015

wez commented Jun 22, 2015

wez commented Jun 22, 2015

amasad commented Jun 22, 2015

wez commented Jun 24, 2015

amasad commented Jun 25, 2015

wez commented Jun 30, 2015

Poor performance with nodejs watchman client #113

Poor performance with nodejs watchman client #113

Comments

amasad commented Jun 21, 2015

wez commented Jun 21, 2015

amasad commented Jun 21, 2015

wez commented Jun 21, 2015

amasad commented Jun 21, 2015

amasad commented Jun 21, 2015

wez commented Jun 21, 2015

wez commented Jun 21, 2015

wez commented Jun 21, 2015

sunshowers commented Jun 21, 2015

amasad commented Jun 21, 2015

wez commented Jun 21, 2015

amasad commented Jun 21, 2015

amasad commented Jun 21, 2015

amasad commented Jun 21, 2015

wez commented Jun 22, 2015

amasad commented Jun 22, 2015

wez commented Jun 22, 2015

amasad commented Jun 22, 2015

wez commented Jun 22, 2015

wez commented Jun 22, 2015

amasad commented Jun 22, 2015

wez commented Jun 24, 2015

amasad commented Jun 25, 2015

wez commented Jun 30, 2015