-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Creating a slice of uninit memory is unsound (equivalent to mem::uninitialized) #220
Comments
What is the status of this? Are these concerns purely hypothetical, or is there potentially unsoundness in the library? |
This is potential unsoundness; the current Rust memory model (and the current LLVM memory model) treats uninits specially. |
You can find more details about the way unitialized memory is treated by C and Rust here: https://www.ralfj.de/blog/2019/07/14/uninit.html
The nightly-only BorrowedCursor would make fixing this easier, but alas, it's nightly-only. The reasonable fix I can see is calling |
For the rust back-end we shouldn't really need to be using unsafe stuff unless there is some very good reason for it. Not sure if say avoiding zero-initializing a vector is worth it. miniz_oxide already avoids using unsafe without any drastic performance implications, so not sure why flate2 would need to either. |
FWIW I've attempted to address this by switching to naive safe code (using Going via
|
If you are creating a vector from scratch, using |
Ack. Here (e.g. in Still, maybe using
So, maybe I should change the PR to use a little bit of fn write_to_spare_capacity_of_vec<T>(
output: &mut Vec<u8>,
writer: impl FnOnce(&mut [u8]) -> (usize, T),
) -> T {
let zeroed_spare_space: &mut [u8] = {
let uninited_spare_space = output.spare_capacity_mut();
let ptr = uninited_spare_space.as_mut_ptr() as *mut u8;
let len = uninited_spare_space.len();
unsafe {
// Safety of `write_bytes` and `from_raw_parts_mut`:
// * `ptr` is non-null and points to `len` bytes within a single allocated object
// (guaranteed by `spare_capacity_mut` returning `&mut [MaybeUninit<u8>]`).
// * Alignment of `u8` is 1 (so, not a concern)
core::ptr::write_bytes(ptr, 0, len);
// Safety of `from_raw_parts_mut`:
// * `ptr points to `len` consecutive properly initialized values of type T
// (guaranteed by our call to `ptr::write_bytes`).
// * The memory referenced by the returned slice must not be accessed through any other
// pointer (not derived from the return value) for the duration of lifetime 'a
// (guaranteed by exclusivity of `&mut [T]` returned by `spare_capacity_mut`).
core::slice::from_raw_parts_mut(ptr, len)
}
};
let (bytes_written, ret) = writer(zeroed_spare_space);
// Safety: Usage of `core::cmp::min` sanitizes `bytes_written` (making `set_len` safe even if
// `writer` misbehaves and returns an arbitrary `bytes_written`). Note that above we have
// already made sure that all spare-capacity-bytes are initialized via `ptr::write_bytes`).
let new_len = core::cmp::min(output.len() + bytes_written, output.capacity());
unsafe { output.set_len(new_len); }
ret
} WDYT? |
I don't see why any of those solutions would be better than We could avoid re-initializing the same part over and over by keeping track what part of the Vec has already been initialized in previous iterations, but that relies on no other code running
|
The approach of writing to the spare capacity without zeroing it first sounds good, provided that the C library wrappers are adjusted to write to a |
Yes, that would work for the C backend. What about the Rust backend though? The Rust backend would also need to be adjusted to write to a |
I think it's best to just zero the buffer when using |
Using
I guess option1 sounds better / less disruptive? Can you please provide feedback on this? And then let me also mention that we would use I note that this means that Rust may repeatedly initialize the same bytes over and over again as you point out in #220 (comment). Is this okay? |
compress_vec and decompress_vec functions are really just convenience wrappers for using Whether a new |
Let me try to summarize the options we have identified so far (focusing on
The performance impact of Fix1 and breaking impact of Fix3 depend on how
|
If calls to |
Ack. Both Fix1 and Fix3 are easy ways to avoid UB. I can put up a PR for either one. OTOH, I think that I have a slight preference toward Fix3 (removal), because I don't see the benefit of keeping these functions. It seems that making these functions safe (Fix1) will make them somewhat performance-hostile (i.e. they can lead to repeatedly reinitializing a slice of memory, which means that using |
/cc @Byron Can you please provide your feedback on this issue? Which of the fix options discussed above would you prefer? (e.g. out of the ones in #220 (comment)) It may be worth continuing the community discussion on this issue, but I also wonder who the decision maker here is (i.e. who can review, approve, and merge PRs)? I see that you've merged most of the recent PRs, so I hope you won't mind being CC-ed / summoned into this issue :-). |
Thanks for chiming me in - I was too busy in the past days but am glad I join now since all information has been gathered so neatly. It took an hour to catch up on all that was said, with a considerable portion spent on reading Ralf Jung's excellent articles (again, probably, I keep forgetting). With that said, here is my amateur perspective on uninitialized memory:
So for me it's also established that the current behaviour of I think it worth spelling out that correctness always trumps performance, so performance considerations shouldn't hold back a simple fix - optimizations, possibly like discussed, can be made later if there is demand. As a general stance, we will avoid breaking changes as these will ripple through the ecosystem with a lot of churn, so any fix here must be none-breaking. From the above it seems clear that #373 is a suitable and welcome fix. Regarding performance
It's notable that ConclusionYes, let's merge the fix and not worry about performance as those with performance concerns will have suitable alternatives. Correctness is more important. Maybe there is more spots of code that have soundness issues like this? I'd love to see them rooted out. Thanks everyone for your help and engagement, it's much appreciated. I am grateful too as I feel like I became a little more aware and could adjust my mental model towards uninitialized memory. I think |
Any plans for releasing a new version of the crate? (if so, then it would be greatly appraciated as it would help me get approval to import the crate into my project, without having to carry a patch on top of the last released version) |
This is incorrect, and a common misapprehension. Having uninitialized memory is instantly UB even if you never read from it. I.e. The exception to this is Similarly,
it's UB automatically, regardless of what it allows |
Uninitialised memory is even meaner than I thought 😅. I will prepare a new release now. |
flate2 creates slices of uninitialized memory in several places. The only two places where it happens when using the Rust backend are here and here.
This is equivalent to the use of the now-deprecated
mem::uninitialized
. Instead, either a slice ofMaybeUninit<T>
should be constructed and passed to the backends, or the backends should receive a structure that does not expose uninitialized memory by design such as Vec or a Vec-like fixed-capacity view of memory.If the backend does not overwrite the entire slice, this can become an exploitable security vulnerability.
The text was updated successfully, but these errors were encountered: