Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove a branch from try_alloc_layout? #234

Open
overlookmotel opened this issue Feb 16, 2024 · 3 comments
Open

Remove a branch from try_alloc_layout? #234

overlookmotel opened this issue Feb 16, 2024 · 3 comments

Comments

@overlookmotel
Copy link
Contributor

After reading fitzgen's (very interesting) blog post about the rationale for bumping downwards, I had one thought:

try_alloc_layout_fast has 2 branches:

bumpalo/src/lib.rs

Lines 1414 to 1442 in bb660a3

fn try_alloc_layout_fast(&self, layout: Layout) -> Option<NonNull<u8>> {
// We don't need to check for ZSTs here since they will automatically
// be handled properly: the pointer will be bumped by zero bytes,
// modulo alignment. This keeps the fast path optimized for non-ZSTs,
// which are much more common.
unsafe {
let footer = self.current_chunk_footer.get();
let footer = footer.as_ref();
let ptr = footer.ptr.get().as_ptr();
let start = footer.data.as_ptr();
debug_assert!(start <= ptr);
debug_assert!(ptr as *const u8 <= footer as *const _ as *const u8);
if (ptr as usize) < layout.size() {
return None;
}
let ptr = ptr.wrapping_sub(layout.size());
let aligned_ptr = round_mut_ptr_down_to(ptr, layout.align());
if aligned_ptr >= start {
let aligned_ptr = NonNull::new_unchecked(aligned_ptr);
footer.ptr.set(aligned_ptr);
Some(aligned_ptr)
} else {
None
}
}
}

The first branch (ptr as usize) < layout.size() is there purely to ensure that ptr.wrapping_sub(layout.size()) cannot wrap around. This rules out a possible mistake when evaluating the 2nd branch condition aligned_ptr >= start.

Bumpalo already has a method Bump::set_allocation_limit to limit the size of the Bump. I imagine that most users could impose a limit on the size of their Bumps. It'd be an uncommon use case for a bump allocator to be allocating massive slabs of memory, as they'd probably also be long-lived.

My thinking is this:

Taking example where size limit is 4 GiB minus 1 byte (i.e. size <= u32::MAX):

If the total size of the bump is constrained to 4 GiB, then no single allocation can be larger than 4 GiB. So layout.size() of a successful allocation is always a valid u32.

Constrain T in fn alloc<T>(&self, val: T) to only allow types where mem::size_of::<T>() <= u32::MAX.

When Bump allocates a chunk from global allocator, request a chunk of 4 GiB size. If my understanding is correct, this will only consume 4 GiB of virtual memory, not physical memory (though I may be wrong about that, in which case my whole theory here collapses!)

Check the start pointer for that chunk satisfies start_ptr as usize > u32::MAX as usize. In unlikely event that it doesn't:

  • Allocate another 4 GiB chunk.
  • Because allocations can't overlap, the pointer to the 2nd allocation is guaranteed to be > u32::MAX.
  • Free the 1st allocation, and use the 2nd for the chunk.

Either way, we now have a guarantee that start_ptr > u32::MAX.

Bump::alloc<T> can use a specialized version of alloc_layout where layout.size() is statically constrained to be <= u32::MAX.

Combining these 2 guarantees means that (ptr as usize) < layout.size() can never be true, and that branch can be removed. ptr.wrapping_sub(layout.size()) can never wrap.

NB: A size check would still be required when allocating &[T], as size is not knowable statically. But nonetheless, at least making Bump::alloc a bit faster would probably be a worthwhile gain.

NB 2: Some of the above is a little approximate (maybe I'm conflating 4 GiB and 4 GiB - 1 in some places), but hopefully the general idea is clear enough.

Do you think this would work? And if so, would it be a worthwhile optimization?

@overlookmotel
Copy link
Contributor Author

overlookmotel commented Feb 16, 2024

Come to think of it, if we've allocated 4 GiB originally and the max size of the bump is 4 GiB, then the bump can only ever require 1 chunk, and further branches can be removed.

Can this be right???

@fitzgen
Copy link
Owner

fitzgen commented Feb 20, 2024

When Bump allocates a chunk from global allocator, request a chunk of 4 GiB size. If my understanding is correct, this will only consume 4 GiB of virtual memory, not physical memory (though I may be wrong about that, in which case my whole theory here collapses!)

Two things:

  1. This is correct if the code is running on a system with virtual memory. That isn't guaranteed and bumpalo doesn't require that. Notably, Wasm doesn't have virtual memory.

  2. This would put us in the position of needing to (or at least being pressured to) manage virtual memory and detect when we can madvise away pages that we made physical mappings for because we touched them earlier, but then we did a reset.

Do you think this would work? And if so, would it be a worthwhile optimization?

I think this kind of thing is very valid for a bump allocation library to do, but it is also making a trade off to be more focused on a particular environment and use case and less general purpose. For that reason I don't think it is the right choice for this library.

@fitzgen
Copy link
Owner

fitzgen commented Feb 20, 2024

But yeah, once you accept virtual memory, you don't even need to explicitly check against the end of the bump region. You just have a guard page just after (before, when bumping downwards) the end of the bump region and then check that sizeof(T) < sizeof(page) which should be constant propagated away 99.99% of the time. Then you have a branch-free allocation path and when you run out of capacity you attempt to write to the guard page and then that raises a signal and you catch that and return allocation failure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants