-
Notifications
You must be signed in to change notification settings - Fork 29.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
buffer: Reallocate() instead of creating a new backing store #53552
Conversation
a647a40
to
3c2c597
Compare
this seems to improve performance substantially when decoding large (order of MBs or more) inputs, but not that much in smaller inputs. but even then, it simplifies the code so I think it's valuable |
I thought that's what we had, and then @targos deliberately reverted it. Note that we should be concerned by the fact that if we are even just 1 byte off, we create a new buffer and copy. Copies are cheap and that's never going a massive issue, but we are potentially wasting time for no good reason. I think that @joyeecheung raised this issue in a comment to #52292 |
ah, dang, I did not realize Reallocate() was deprecated... sad. that'll teach me to always look at the history before trying stuff :) |
there is something I don't understand, though. #52234 claims there is no longer a performance benefit since we are no longer overriding the allocator's reallocation... but the performance increase I observed (in main) was too large to have been a fluke 🤔 |
Just a guess: the performance difference may be caused by the fact that when allocating a new backing store, V8 always zero-initializes the buffer even though its content (or most of it) will be overwritten by the copy later; whereas the realloc implementation in V8 only allocates an uninitialized buffer first, copy the data and if the new length is bigger, zero-initialize the uncopied tail. |
Also in a micro-benchmark scenario: repeatedly allocating a new backing store can urge the GC a lot more eagerly than simply reallocating it (the latter isn’t incorporated into the GC schedule as much AFAICT), though technically that would be a pitfall of the microbenchmark itself. |
yeah, I thought about that memset but it doesn't seem to be supported by profiling...
so you're saying that even if the reallocation is implemented as allocate + copy, it is still treated differently in terms of GC? |
Yes because the GC scheduling code is in V8. A Realloc call from Node.js’s side doesn’t affect GC much AFAICT, whereas NewBackingStore does. My comment in the other PR was talking about certain use cases where we don’t need the buffer to be allocated through the array buffer allocator and they can just be allocated normally. IIUC we sometimes prefer to allocate native memory using the array buffer allocator so that when the amount of external memory used is high enough V8 would perform GC to hopefully reduce the memory footprint (and if those external memory are set up to be released when some JS values get GC’ed, it would do the job). However this setup is not always doing more good than harm. Performance wise it may not be bad to just directly manage the external using stdlib calls and use Isolate::AdjustAmountOfExternalAllocatedMemory() to tune GC instead of going through the array buffer allocator machinery. However I don’t think this applies to Buffers because they are Uint8Arrays now and are always backed by real array buffers anyway. |
in
Buffer::New(..., node::encoding)
, when the prediction returned byStringBytes::Size()
doesn't match the actual size emitted byStringBytes::Write()
, we currently create a second backing store and thenmemcpy()
to it.AFAIU, we can instead reallocate the backing store into the new size, which should contract the allocation if possible.
this is especially relevant for base64 (see discussion at #53550), which since #52428 overestimates fairly often.