-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allocate buffers more efficiently #2
base: master
Are you sure you want to change the base?
Conversation
* Allocate buffer with desired capacity instead of appending one byte at a time, which can result in repeated re-allocations. * Don't initialize buffer to zeros when we're going to overwrite the contents anyway. Experimentally, these changes make a very substantial performance improvement in Debug builds, though little noticeable difference in Release builds. e.g. in my application that uses this crate for logging, these changes bring total run time down from 8.64s to 2.45s in Debug builds.
Friendly ping - any chance of this getting merged? |
Despite that we ultimately want to drop the NULL byte, we need to include space for it in the buffer provided to libc's vsnprintf; otherwise it may replace the last byte of the string with a NULL byte. This also simplifies things by providing a zero-length buffer for the first call to vsnprintf, and then allocating exactly the required space. This gives only one code path to maintain (as opposed to 2-3 before depending if the initial buffer was less than, equal to, or greater than the initial size). While this means paying to format the string twice, it lets us drop `shrink_to_fit` of the final string, which can be a relatively expensive call to the allocator. Fixes shadow/shadow#1432
vsnprintf *should* return the exact number of characters that will be printed with the given arguments. In Shadow, I've seen at least one instance of it wanting more space on the second call, though. This could be a bug in Shadow (maybe another thread mutating one of the arguments in between calls, but is difficult to debug. Probably better to be robust here and just reallocate if needed than to crash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The strategy may be reasonable, however the implementation as written is not sound.
I'm not the maintainer, but I would also recommend separating out the cargo fmt
changes into a separate commit or PR to make review easier.
if character_count >= buffer.len() { | ||
let new_len = character_count + 1; | ||
buffer.reserve_exact(new_len - buffer.len()); | ||
// SAFETY: Any bit pattern is a valid u8, and we reserved the space. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not sound, despite what the comment claims. It is not sufficient for any bit pattern to be a valid u8, as uninitialized memory is not even guaranteed to be a fixed bit pattern. This is UB, which can be verified by trying this example with Miri: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=4fad0134c08d9f8a6272ce3c77d2c3f9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, thanks for pointing this out. I learned this is incorrect later.
Happy to drop this part if I can get in the fix to avoid repeated allocations.
a time, which can result in repeated re-allocations.
contents anyway.
Experimentally, these changes make a very substantial performance
improvement in Debug builds, though little noticeable difference in
Release builds. e.g. in an experiment with my application that uses
this crate for logging, these changes bring total run time down from
8.64s to 2.45s in Debug builds.