Remove a branch from `try_alloc_layout`? #234

overlookmotel · 2024-02-16T12:42:18Z

After reading fitzgen's (very interesting) blog post about the rationale for bumping downwards, I had one thought:

try_alloc_layout_fast has 2 branches:

Lines 1414 to 1442 in bb660a3

    
           fn try_alloc_layout_fast(&self, layout: Layout) -> Option<NonNull<u8>> { 
        
               // We don't need to check for ZSTs here since they will automatically 
        
               // be handled properly: the pointer will be bumped by zero bytes, 
        
               // modulo alignment. This keeps the fast path optimized for non-ZSTs, 
        
               // which are much more common. 
        
               unsafe { 
        
                   let footer = self.current_chunk_footer.get(); 
        
                   let footer = footer.as_ref(); 
        
                   let ptr = footer.ptr.get().as_ptr(); 
        
                   let start = footer.data.as_ptr(); 
        
                   debug_assert!(start <= ptr); 
        
                   debug_assert!(ptr as *const u8 <= footer as *const _ as *const u8); 
        
                   if (ptr as usize) < layout.size() { 
        
                       return None; 
        
                   } 
        
                   let ptr = ptr.wrapping_sub(layout.size()); 
        
                   let aligned_ptr = round_mut_ptr_down_to(ptr, layout.align()); 
        
                   if aligned_ptr >= start { 
        
                       let aligned_ptr = NonNull::new_unchecked(aligned_ptr); 
        
                       footer.ptr.set(aligned_ptr); 
        
                       Some(aligned_ptr) 
        
                   } else { 
        
                       None 
        
                   } 
        
               } 
        
           }

The first branch (ptr as usize) < layout.size() is there purely to ensure that ptr.wrapping_sub(layout.size()) cannot wrap around. This rules out a possible mistake when evaluating the 2nd branch condition aligned_ptr >= start.

Bumpalo already has a method Bump::set_allocation_limit to limit the size of the Bump. I imagine that most users could impose a limit on the size of their Bumps. It'd be an uncommon use case for a bump allocator to be allocating massive slabs of memory, as they'd probably also be long-lived.

My thinking is this:

Taking example where size limit is 4 GiB minus 1 byte (i.e. size <= u32::MAX):

If the total size of the bump is constrained to 4 GiB, then no single allocation can be larger than 4 GiB. So layout.size() of a successful allocation is always a valid u32.

Constrain T in fn alloc<T>(&self, val: T) to only allow types where mem::size_of::<T>() <= u32::MAX.

When Bump allocates a chunk from global allocator, request a chunk of 4 GiB size. If my understanding is correct, this will only consume 4 GiB of virtual memory, not physical memory (though I may be wrong about that, in which case my whole theory here collapses!)

Check the start pointer for that chunk satisfies start_ptr as usize > u32::MAX as usize. In unlikely event that it doesn't:

Allocate another 4 GiB chunk.
Because allocations can't overlap, the pointer to the 2nd allocation is guaranteed to be > u32::MAX.
Free the 1st allocation, and use the 2nd for the chunk.

Either way, we now have a guarantee that start_ptr > u32::MAX.

Bump::alloc<T> can use a specialized version of alloc_layout where layout.size() is statically constrained to be <= u32::MAX.

Combining these 2 guarantees means that (ptr as usize) < layout.size() can never be true, and that branch can be removed. ptr.wrapping_sub(layout.size()) can never wrap.

NB: A size check would still be required when allocating &[T], as size is not knowable statically. But nonetheless, at least making Bump::alloc a bit faster would probably be a worthwhile gain.

NB 2: Some of the above is a little approximate (maybe I'm conflating 4 GiB and 4 GiB - 1 in some places), but hopefully the general idea is clear enough.

Do you think this would work? And if so, would it be a worthwhile optimization?

The text was updated successfully, but these errors were encountered:

overlookmotel · 2024-02-16T12:44:46Z

Come to think of it, if we've allocated 4 GiB originally and the max size of the bump is 4 GiB, then the bump can only ever require 1 chunk, and further branches can be removed.

Can this be right???

fitzgen · 2024-02-20T20:12:12Z

When Bump allocates a chunk from global allocator, request a chunk of 4 GiB size. If my understanding is correct, this will only consume 4 GiB of virtual memory, not physical memory (though I may be wrong about that, in which case my whole theory here collapses!)

Two things:

This is correct if the code is running on a system with virtual memory. That isn't guaranteed and bumpalo doesn't require that. Notably, Wasm doesn't have virtual memory.
This would put us in the position of needing to (or at least being pressured to) manage virtual memory and detect when we can madvise away pages that we made physical mappings for because we touched them earlier, but then we did a reset.

Do you think this would work? And if so, would it be a worthwhile optimization?

I think this kind of thing is very valid for a bump allocation library to do, but it is also making a trade off to be more focused on a particular environment and use case and less general purpose. For that reason I don't think it is the right choice for this library.

fitzgen · 2024-02-20T20:15:21Z

But yeah, once you accept virtual memory, you don't even need to explicitly check against the end of the bump region. You just have a guard page just after (before, when bumping downwards) the end of the bump region and then check that sizeof(T) < sizeof(page) which should be constant propagated away 99.99% of the time. Then you have a branch-free allocation path and when you run out of capacity you attempt to write to the guard page and then that raises a signal and you catch that and return allocation failure.

overlookmotel mentioned this issue Mar 12, 2024

Ability to create Bump from pre-existing memory allocation? #237

Closed

overlookmotel mentioned this issue Jun 21, 2024

Custom arena allocator oxc-project/backlog#49

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove a branch from `try_alloc_layout`? #234

Remove a branch from `try_alloc_layout`? #234

overlookmotel commented Feb 16, 2024

overlookmotel commented Feb 16, 2024 •

edited

Loading

fitzgen commented Feb 20, 2024

fitzgen commented Feb 20, 2024

Remove a branch from try_alloc_layout? #234

Remove a branch from try_alloc_layout? #234

Comments

overlookmotel commented Feb 16, 2024

overlookmotel commented Feb 16, 2024 • edited Loading

fitzgen commented Feb 20, 2024

fitzgen commented Feb 20, 2024

Remove a branch from `try_alloc_layout`? #234

Remove a branch from `try_alloc_layout`? #234

overlookmotel commented Feb 16, 2024 •

edited

Loading