Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: major performance issues with bytea performance #2240 #2241

Merged
merged 11 commits into from
Jul 7, 2020

Conversation

regevbr
Copy link
Contributor

@regevbr regevbr commented Jun 18, 2020

fix: #2240

a simple query to retrieve a 100mb bytea column from a one table, one row DB takes forever and causes the cpu to spike at 100%, rendering my node instance unresponsive.

The issue is exactly like #1286 and the fix is +- the same as brianc/node-packet-reader#3

The code at:

combinedBuffer = Buffer.allocUnsafe(this.remainingBuffer.byteLength + buffer.byteLength)
this.remainingBuffer.copy(combinedBuffer)
buffer.copy(combinedBuffer, this.remainingBuffer.byteLength)

suffers from too many allocations and buffer copies going around.

The fix is a bit messy but it gets the job done.

Before the fix, fetching 100mb of bytea column took 3 minutes and 10 seconds and consumed on average 95% cpu usage during the entire time

After the fix the query took only! 40 seconds and consumed on average 3% cpu

@brianc
Copy link
Owner

brianc commented Jun 18, 2020

ah wow thanks - i'll check this out now & get it pushed out

@regevbr
Copy link
Contributor Author

regevbr commented Jun 18, 2020

@brianc no prob, a review now will be welcome, but some tests are failing so it is not ready to be merged just yet. I will ping you when it is ready

@brianc
Copy link
Owner

brianc commented Jun 18, 2020

at first glance this may have regressed performance in other places - travis is failing on 'big-simple-query-tests'

@brianc
Copy link
Owner

brianc commented Jun 18, 2020

ah gotcha - i'll let you pound on it a bit more than & check back l8r!

@regevbr
Copy link
Contributor Author

regevbr commented Jun 18, 2020

@brianc that sounds great!

@regevbr
Copy link
Contributor Author

regevbr commented Jun 18, 2020

@brianc I fixed the bug and the CI is passing now :-)
Can you please review?

@regevbr
Copy link
Contributor Author

regevbr commented Jun 19, 2020

I made some more improvements and refactoring and added a few comments. It is now really ready for review

// Adjust the cursors of remainingBuffer
this.remainingBufferOffset = combinedBufferOffset
} else {
// To avoid side effects, copy the remaining part of the new buffer to remainingBuffer with extra space for next buffer
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brianc this part is the only thing I think I can still improve. The question is if I can use the input buffer for the next run without anyone modifying it out of scope. What do you think?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean by anyone modifying it out of scope? The only place in the whole codebase where we deal w/ raw input buffers from the socket is here in parser.ts and its not exposed to the user ever so any type of modifications you do here are safe in that no one else should (or can AFAIK) ever modify input buffers from the socket.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be better here to use the existing input buffer for the next run instead of copying it over. What you say is that it is safe for me to do (as far as you know) which is great! I will change it then.
I was worried that who ever passes that buffer, can decide themselves to resuse that buffer and then it will mess up with my code...

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the buffer is passed in here as the whole net.Socket

The only place I could imagine this might cause issues is w/ some of the tests which may incorrectly be reusing buffers though nothing comes to mind. Some of the tests are old and kinda wonky so you could just try & run the tests...as long as they pass we're definitely good. The raw buffers of the socket are never exposed anywhere outside of the parser.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great will do

@regevbr
Copy link
Contributor Author

regevbr commented Jun 23, 2020

I uses a nice tool called clinic.js to visualize the before and after of this PR.
Before:
before-doctor
After:
after-doctor

Before:
before-flames
After:
after-flames

@regevbr
Copy link
Contributor Author

regevbr commented Jun 24, 2020

@brianc did you have a chance to review my PR?

@regevbr
Copy link
Contributor Author

regevbr commented Jul 2, 2020

@brianc @charmander can you please CR?

@brianc
Copy link
Owner

brianc commented Jul 2, 2020

I'm sorry for the delay. I've been having some anxiety attacks related to covid (I live in Austin, the #1 worst city in the US for new cases) & I've been really slacking on this work trying to keep mental health intact as best I can. I'll take a look at this now...I really appreicate the work.

@regevbr
Copy link
Contributor Author

regevbr commented Jul 2, 2020

Oh bummer, I'm sorry you have to go through that! I hope you will overcome it soon :-)
I don't want to impose any more pressure on you so don't feel obliged to work on it if you don't feel like it.
I trust that when everything will be back to normal for you, you will find the time to look it over.

@brianc
Copy link
Owner

brianc commented Jul 2, 2020

I looked over this - it looks pretty good! I ran my crude benchmarks too & perf doesn't seem to be impacted. Is there any way to write any kind of test for this? It's a great change, I just worry w/o tests it might regress in the future (as this currently already was kind of a regression when I moved away from buffer-reader since this lib didn't contain any tests that stress dealing w/ big bytea returns)

combinedBufferLength,
reuseRemainingBuffer,
combinedBufferFullLength,
})
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a question for you....a long time ago it was more efficient to make "backing classes" for things you'd return frequently as v8 can optimize them much more aggressively. You are using a type CombinedBuffer and constructing it w/ the object shorthand here & returning it w/ the object shorthand on line 172. Do you think it'd make a difference to actually make this a class?
something like this...

class CombinedBuffer {
  constructor(public combinedBuffer: buffer = combinedBuffer, public combinedBufferOffset: number = offset, public combinedBufferLength: number = length, public reuseRemainingBuffer: boolean = reuse, public combinedBufferFullLength: number = fullLength) {}
}

and then return it with this.consumeBuffer(new CombinedBuffer(combinedBuffer, offset, combinedBufferLength, reuseRemainingBuffer, combinedBufferFullLength))

?

Staying on top of how v8 optimizes things is a moving target so I'm not sure if you have any experience around the perf diff of using dynamically created objects versus proper "classes." I wouldn't normally sweat it but this is a white code code path & it might be a small perf win here.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might also be able to allocate a single instance of CombinedBuffer on the parser class itself & just set the properties on it versus allocating new ones every time. Again, a micro-optimization but those things can add up when the code is called so frequently.

I can play around w/ this too & push to your branch (I think github lets me do that?) if you're cool w/ it & think it's an okay idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is an interesting solution, why don't you let me research it (as I'm not too familiar with v8 bits) and try to enhance it

@regevbr
Copy link
Contributor Author

regevbr commented Jul 2, 2020

I looked over this - it looks pretty good! I ran my crude benchmarks too & perf doesn't seem to be impacted. Is there any way to write any kind of test for this? It's a great change, I just worry w/o tests it might regress in the future (as this currently already was kind of a regression when I moved away from buffer-reader since this lib didn't contain any tests that stress dealing w/ big bytea returns)

I'm glad to hear that! thanks!
Do you have a test scenario in place? I believe it will be a bit complicated as you will need to stress test it and make sure that the cpu doesn't get hogged.
I will try to add a test and see if I manage to get it to fail with the current code

@brianc
Copy link
Owner

brianc commented Jul 2, 2020

I will try to add a test and see if I manage to get it to fail with the current code

Yeah if you can't then I understand. It's incredibly difficult to test how much CPU some code is using within a unit test. It'd be ideal to have some way to catch a performance regression. You could add another benchmark to this crude benchmark file. I run that before any merge & manually check for perf regressions.

@regevbr
Copy link
Contributor Author

regevbr commented Jul 2, 2020

I will try to add a test and see if I manage to get it to fail with the current code

Yeah if you can't then I understand. It's incredibly difficult to test how much CPU some code is using within a unit test. It'd be ideal to have some way to catch a performance regression. You could add another benchmark to this crude benchmark file. I run that before any merge & manually check for perf regressions.

Ok will check it out, thanks!

@regevbr
Copy link
Contributor Author

regevbr commented Jul 3, 2020

@brianc I started reading about the performance difference between object literal and classes (or more specifically constructor functions) and indeed back in 2014 there was a difference, but it doesn't exist anymore...
For example you can look at this post but if you run the jsperf test he used there today, there is no difference.
But all of that doesn't matter for us, since I retained the input buffer for the next run, the code got much simpler and there is no need to pass objects around anymore.

I also added a bench test. Running it in the current code takes 144954 ms! and after the fix it only runs for 1107 ms which is an insane improvement.

Can you please CR again and run the bench tests locally to verify the fix?

I appreciate your time looking into it and hope things are better back home!

@brianc
Copy link
Owner

brianc commented Jul 6, 2020

Fantastic! Thanks for your work on this! ❤️ I'll merge this tomorrow in the AM & do a patch release.

@brianc brianc merged commit 9ba49b7 into brianc:master Jul 7, 2020
@regevbr regevbr deleted the #2240 branch July 7, 2020 13:35
@regevbr
Copy link
Contributor Author

regevbr commented Jul 7, 2020

Thanks @brianc for merging!
Please let me know when there is an official release with the fix :-)

@brianc
Copy link
Owner

brianc commented Jul 7, 2020

@regevbr Thanks for your persistence and great work on this. I apologize for the delays on my end. I like to release in the AM (CST) So I can be around all day in case some weird incompatibility goes out. I've released your changes with [email protected]. Going to put them into my prod app today!

@regevbr
Copy link
Contributor Author

regevbr commented Jul 7, 2020

@brianc sure not a problem! always happy to help.
Your delays are more than justifiable and in any case it is not my place to judge as this is your (awesome) library :-)
I will update my repos as well and let you know if everything works.
Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Major performance issues with bytea performance
2 participants