Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stream,zlib: performance improvements #13322

Closed
wants to merge 2 commits into from

Conversation

mscdex
Copy link
Contributor

@mscdex mscdex commented May 31, 2017

This PR brings various performance improvements to stream.Transform and zlib. A good chunk of the performance increases come from the inlining of stream.Transform state initialization during stream instantiation. The rest of the increases come from changes to zlib itself.

Here is an overview of the notable zlib-specific changes:

  • Skipped the more complex machinery used for exported constructors (to allow new and without) when calling the zlib.create*() methods
  • Improved consistency in zlib parameter value validations
  • Used a typed array for more efficiently returning results from the C++ layer
  • Avoided direct calls to assert() by putting them behind an if containing their conditional
  • Avoided redundant handle assertions
  • Removed checks for impossible situations (e.g. false-y or non-Buffer/TypedArray chunk values)
    • objectMode/writableObjectMode and/or encoding are now explicitly overridden if set to non-defaults in options passed to constructors
  • Async-specific:
    • Stream-specific:
      • Moved write() callback out of _transform() for better reuse by other instances and removed the sync-specific code
      • Stored write() callback at the C++ layer for faster invocation (avoiding a dynamic lookup on the handle's JS object for every write())
      • Avoided re-assigning the same chunk value when multiple write()s are needed for the same chunk
    • Convenience functions:
      • Moved event handlers outside of zlibBuffer() helper function for better reusability
      • Switched to simpler 'data' event listening instead of stream.read()
      • Avoided Buffer.concat() for single Buffer case
  • Sync-specific:
    • Avoided Buffer.concat() for single Buffer case
    • Copied common and sync-specific parts from callback() previously shared by both sync and async functionality

Here are some results with the included benchmarks:

                                                            improvement confidence      p.value
streams/transform-creation.js n=1000000                       497.78 %        *** 9.135161e-41
zlib/creation.js n=1000000 options="false" type="Deflate"      18.57 %        *** 2.157176e-45
zlib/creation.js n=1000000 options="true" type="Deflate"       18.80 %        *** 4.619755e-44
zlib/deflate.js n=400000 inputLen=1024 method="createDeflate"   6.57 %        *** 3.938906e-17
zlib/deflate.js n=400000 inputLen=1024 method="deflateSync"    27.62 %        *** 1.717435e-37
zlib/deflate.js n=100000 inputLen=1024 method="deflate"        29.41 %        *** 4.915297e-27

CI: https://ci.nodejs.org/job/node-test-pull-request/8378/

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • commit message follows commit guidelines
Affected core subsystem(s)
  • stream
  • zlib

@mscdex mscdex added performance Issues and PRs related to the performance of Node.js. stream Issues and PRs related to the stream subsystem. zlib Issues and PRs related to the zlib subsystem. labels May 31, 2017
@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. stream Issues and PRs related to the stream subsystem. zlib Issues and PRs related to the zlib subsystem. labels May 31, 2017
@Trott
Copy link
Member

Trott commented May 31, 2017

Optional, but given the extent of the changes, it would be good to run a test coverage report and make sure this doesn't decrease coverage for lib/zlib.js etc. (and if it does, then hopefully add some tests to fully cover the new code).

Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for the streams parts.

@@ -208,18 +206,15 @@ function done(stream, er, data) {
if (er)
return stream.emit('error', er);

if (data !== null && data !== undefined)
if (data != null)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would you mind adding a comment that this matches both null and undefined, and it is done on purpose?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments added.

@mcollina
Copy link
Member

As any changes in streams, a run through CITGM would be nice.

lib/zlib.js Outdated

if (opts.encoding || opts.objectMode || opts.writableObjectMode) {
opts = _extend({}, opts);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not Object.assign ? still too slow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any numbers?

Copy link
Contributor Author

@mscdex mscdex Jun 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any handy, no. We could always flip it to be a whitelist in the future I suppose if we run into issues. Using a whitelist would allow us to just specify an object inline and avoid any expensive copying-related operations.

@addaleax
Copy link
Member

This needs a rebase.

@mscdex
Copy link
Contributor Author

mscdex commented May 31, 2017

Rebased.

this._transformState = {
afterTransform: (er, data) => {
return afterTransform(this, er, data);
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we try to use bind here? It might be slightly faster as we don't keep the context alive.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed and checked with @mcollina, this can be really fast using bind, especially once this CL lands in V8. The CL should apply cleanly to 6.0, but also 5.9.

Here's a simple micro-benchmark:

function bar() {}

function A() { this.x = (x) => bar(x); }

function Ab() { this.x = bar.bind(undefined); }

function c() { return new A(); }

function cb() { return new Ab(); }

function l() { return {x: (x) => bar(x)}; }

function lb() { return {x: bar.bind(undefined)}; }

const N = 10000000;

function test(fn, n) {
	for (var i = 0; i < n; ++i) {
		fn.call(this);
	}
}

const FNS = [c, cb, l, lb];

for (const fn of FNS) {
	test(fn, 100);
}

for (const fn of FNS) {
	console.time(fn.name);
	test(fn, N);
	console.timeEnd(fn.name);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should see if you can bring those improvements also to _readableState and _writableState, as allocating those are a hot path whenever using node streams.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just copied the state object as-is without really paying attention. I will look into adding bind()...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've now incorporated bind() and it doesn't seem to have negatively affected performance, so it's fine.

this._transformState = {
afterTransform: (er, data) => {
return afterTransform(this, er, data);
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed and checked with @mcollina, this can be really fast using bind, especially once this CL lands in V8. The CL should apply cleanly to 6.0, but also 5.9.

Here's a simple micro-benchmark:

function bar() {}

function A() { this.x = (x) => bar(x); }

function Ab() { this.x = bar.bind(undefined); }

function c() { return new A(); }

function cb() { return new Ab(); }

function l() { return {x: (x) => bar(x)}; }

function lb() { return {x: bar.bind(undefined)}; }

const N = 10000000;

function test(fn, n) {
	for (var i = 0; i < n; ++i) {
		fn.call(this);
	}
}

const FNS = [c, cb, l, lb];

for (const fn of FNS) {
	test(fn, 100);
}

for (const fn of FNS) {
	console.time(fn.name);
	test(fn, N);
	console.timeEnd(fn.name);
}

@mscdex mscdex force-pushed the zlib-perf-improvements branch 2 times, most recently from 7358c93 to 18ad2d7 Compare June 2, 2017 02:22
@jasnell
Copy link
Member

jasnell commented Jun 2, 2017

Let's let #13374 land first, then get this rebased and landed after that.

@mscdex
Copy link
Contributor Author

mscdex commented Jun 2, 2017

:-(

@jasnell
Copy link
Member

jasnell commented Jun 2, 2017

I understand the :-( but I'd like to get #13374 landed and pulled into a quick 8.x patch or minor release next week while letting this PR sit for a few weeks before pulling it in.

kisg pushed a commit to paul99/v8mips that referenced this pull request Jun 2, 2017
When the input to Function.prototype.bind is a known function, we can
inline the allocation of the JSBoundFunction into TurboFan, which
provides a 2x speed-up for several hot functions in Node streams (as
discovered by Matteo Collina). One of example of this can be found in
nodejs/node#13322, which can be optimized and
made more readable using bind instead of closures.

[email protected]

Review-Url: https://codereview.chromium.org/2916063002
Cr-Commit-Position: refs/heads/master@{#45679}
@mscdex
Copy link
Contributor Author

mscdex commented Jun 5, 2017

Rebased.

@jasnell
Copy link
Member

jasnell commented Jun 5, 2017

Still LGTM

@mcollina
Copy link
Member

mcollina commented Jun 6, 2017

CITGM with READABLE_STREAM=disable: https://ci.nodejs.org/view/Node.js-citgm/job/citgm-smoker/858/

@mcollina
Copy link
Member

mcollina commented Jun 6, 2017

@mscdex
Copy link
Contributor Author

mscdex commented Jun 9, 2017

Benchmark results with V8 5.9 in master:

                                                              improvement confidence      p.value
 streams/transform-creation.js n=1000000                        693.97 %        *** 2.539023e-41
 zlib/creation.js n=1000000 options="false" type="Deflate"       14.76 %        *** 1.049980e-22
 zlib/creation.js n=1000000 options="true" type="Deflate"        12.35 %        *** 2.782802e-20
 zlib/deflate.js n=400000 inputLen=1024 method="createDeflate"    3.35 %        *** 2.227584e-09
 zlib/deflate.js n=400000 inputLen=1024 method="deflateSync"     27.42 %        *** 3.447520e-43
 zlib/deflate.js n=100000 inputLen=1024 method="deflate"         28.95 %        *** 1.054469e-34

addaleax pushed a commit that referenced this pull request Jun 17, 2017
PR-URL: #13322
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
addaleax pushed a commit that referenced this pull request Jun 17, 2017
PR-URL: #13322
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
@addaleax addaleax mentioned this pull request Jun 17, 2017
addaleax pushed a commit that referenced this pull request Jun 21, 2017
PR-URL: #13322
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
addaleax pushed a commit that referenced this pull request Jun 21, 2017
PR-URL: #13322
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
addaleax pushed a commit that referenced this pull request Jun 24, 2017
PR-URL: #13322
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
addaleax pushed a commit that referenced this pull request Jun 24, 2017
PR-URL: #13322
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
rvagg pushed a commit that referenced this pull request Jun 29, 2017
PR-URL: #13322
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
rvagg pushed a commit that referenced this pull request Jun 29, 2017
PR-URL: #13322
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
addaleax pushed a commit that referenced this pull request Jul 11, 2017
PR-URL: #13322
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
@addaleax
Copy link
Member

I’ve removed this from the 8.2.0 proposal and labelled it dont-land so we can wait for #14161 to be resolved.

@MylesBorins
Copy link
Contributor

So this should have likely been reverted as soon as we noticed issues in zlib. At this point there is a whole bunch of code that has landed on top of this PR in zlib.js and it is extremely hard to revert without a TON of conflicts

/cc @nodejs/tsc

@Trott
Copy link
Member

Trott commented Oct 18, 2017

@MylesBorins I have add4b0a reverted and all tests running successfully. Most of the conflicts were due to the new internal error stuff. I'll open a PR. Any idea if e5dc934 needs to be reverted too (or perhaps even instead)?

@Trott
Copy link
Member

Trott commented Oct 18, 2017

PR to revert add4b0a: #16280

addaleax added a commit to addaleax/node that referenced this pull request Nov 15, 2017
add4b0a made the assumption that compressed data
would never lead to an empty decompressed stream.

Fix that by explicitly checking the number of read bytes.

Fixes: nodejs#17041
Refs: nodejs#13322
addaleax added a commit that referenced this pull request Nov 18, 2017
add4b0a made the assumption that compressed data
would never lead to an empty decompressed stream.

Fix that by explicitly checking the number of read bytes.

PR-URL: #17042
Fixes: #17041
Refs: #13322
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Luigi Pinca <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Evan Lucas <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: James M Snell <[email protected]>
MylesBorins pushed a commit that referenced this pull request Dec 12, 2017
add4b0a made the assumption that compressed data
would never lead to an empty decompressed stream.

Fix that by explicitly checking the number of read bytes.

PR-URL: #17042
Fixes: #17041
Refs: #13322
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Luigi Pinca <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Evan Lucas <[email protected]>
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. performance Issues and PRs related to the performance of Node.js. stream Issues and PRs related to the stream subsystem. zlib Issues and PRs related to the zlib subsystem.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants