-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Major performance improvement #3
Conversation
`Reader.prototype.addChunk` was calling `Buffer.concat` constantly, which increased garbage collection and just all-around killed performance. The exact implications of this is documented in brianc/node-postgres#1286, which has a test case for showing how performance is affected. Rather than concatenating buffers to the new buffer size constantly, this change uses a growth strategy that doubles the size of the buffer each time and tracks the functional length in a separate `chunkLength` variable. This significantly reduces the amount of allocation and provides a 25x performance in my test cases, the larger the amount of data the query is returning, the greater improvement of performance. Since this uses a doubling buffer, it was important to avoid growing forever, so I also added a reclaimation strategy which reduces the size of the buffer wever time more than half of the data has been read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This great! Thank you! The perf improvements are really exciting! 💃 I have been doing this for as long as node-postgres with outgoing packets: https://github.com/brianc/node-buffer-writer/blob/master/index.js#L12 I think I've just overlooked the incoming packets for one reason or another.
The only thing I'm not really feeling is resizing the buffer back down doing copies from larger to smaller buffers. I think there's a way to do it more efficiently, while still reclaiming memory at a reasonable pace. Postgres packets are length prefixed and so frequently this.offset === this.chunk.length
will be true meaning the entire response will be consumed. I would check for this condition at the top of addChunk
and if the entire existing buffered input has been consumed, just set this.chunk = chunk
and return. That way the buffer grows to the entire length of a huge incoming packet with your efficient doubling algorithm, but is only shrunk once the whole thing is consumed. I think in practice this will be simpler and hopefully even more performant.
index.js
Outdated
@@ -8,32 +8,48 @@ var Reader = module.exports = function(options) { | |||
options = options || {} | |||
this.offset = 0 | |||
this.lastChunk = false | |||
this.chunk = null | |||
this.chunk = Buffer.alloc(4); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currently node-postgres supports node @v0.10.x
onward - does this exist in older versions of node?
note to self: I need to set up travis on this repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized I can just go back to assigning this to null and just keeping a handle to the first chunk added.
index.js
Outdated
this.chunk.copy(newBuffer, 0, this.offset); | ||
this.chunk = newBuffer; | ||
this.chunkLength -= this.offset; | ||
this.offset = 0; | ||
} | ||
} | ||
|
||
Reader.prototype._save = function() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah if this method is only returning false
and doing nothing else we might as well inline it (though v8 probably is anyway)
Thanks for the review, @brianc, working on the requested changes now. |
Thanks, I was actually hoping you might have some insight into this area (I know nothing about the postgres wire protocol), I just didn't want to open a PR for what I thought may end up as unbounded memory growth. Your suggestion is a lot cleaner and predictable. |
1. Fix Node @v0.10.x by removing `Buffer.alloc` usage. 2. Remove uneeded Reader.prototype._save/ 3. Reset buffer at the end of the packet instead of shrinking dynamically. 4. Remove some semi-colons to keep style consistent.
@brianc, I believe I've addressed all of your review comments. I also tested on v0.10.48 just to confirm I didn't break anything there. Please let me know if there are any other changes you'd like. Thanks again for the review and for |
This PR is amazing & I really appreciate the way you approached everything! 👏 I'm super excited to test this out & get it merged. I've got a ton on my plate right now, but I'll do everything I can to get this tested and merged in by EOD tomorrow. If for some reason I get sandbagged so badly at work I can't get to this, know it'll be in by the end of the weekend! If I had travis set up on this repo it'd be a lot easier actually, but this thing has changed so little in the past 5 years, and when I initially set it up I didn't have the rigor I do now....so I'll try & get that set up tomorrow too! |
Thank you for the kind words. By the weekend would be awesome, but I'm just glad to know it's on your radar. If you uncover anything during your testing, just let me know and I can fix it (or if it's easier for you, feel free to just tweak things yourself, I don't mind at all). Thanks again |
Reader.prototype.addChunk
was callingBuffer.concat
constantly, which increased garbage collection and just all-around killed performance. The exact implications of this is documented in brianc/node-postgres#1286, which has a test case for showing how performance is affected.Rather than concatenating buffers to the new buffer size constantly, this version uses a growth strategy that doubles the size of the buffer each time and tracks the functional length in a separate
chunkLength
variable. This significantly reduces the amount of allocation and provides a 25xperformance in my test cases, the larger the amount of data the query is returning, the greater improvement of performance.
Since this uses a doubling buffer, it was important to avoid unbounded growth, so I also added a reclamation strategy which reduces the size of the buffer by half whenever more than half of the data has been read.
I wasn't sure if it was okay to delete
Reader.prototype._save
outright, since it now just always returns false in all cases.I didn't add any new unit tests because the existing ones should cover the refactored code, but let me know if you would like me to add something specific.
Also, thanks for
node-postgres
, it's awesome and hopefully this helps makes it even more awesome. 😄