Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize inplace collection of Vec #123878

Merged
merged 4 commits into from
May 20, 2024
Merged

Commits on May 18, 2024

  1. optimize in-place collection of Vec

    LLVM does not know that the multiplication never overflows, which causes
    it to generate unnecessary instructions. Use `usize::unchecked_mul`, so
    that it can fold the `dst_cap` calculation when `size_of::<I::SRC>() ==
    size_of::<T>()`.
    
    Running:
    
    ```
    rustc -C llvm-args=-x86-asm-syntax=intel -O src/lib.rs --emit asm`
    ```
    
    ```rust
    
    pub struct Foo([usize; 3]);
    
    pub fn unwrap_copy(v: Vec<Foo>) -> Vec<[usize; 3]> {
        v.into_iter().map(|f| f.0).collect()
    }
    ```
    
    Before this commit:
    
    ```
    define void @unwrap_copy(ptr noalias nocapture noundef writeonly sret([24 x i8]) align 8 dereferenceable(24) %_0, ptr noalias nocapture noundef readonly align 8 dereferenceable(24) %iter) {
    start:
      %me.sroa.0.0.copyload.i = load i64, ptr %iter, align 8
      %me.sroa.4.0.self.sroa_idx.i = getelementptr inbounds i8, ptr %iter, i64 8
      %me.sroa.4.0.copyload.i = load ptr, ptr %me.sroa.4.0.self.sroa_idx.i, align 8
      %me.sroa.5.0.self.sroa_idx.i = getelementptr inbounds i8, ptr %iter, i64 16
      %me.sroa.5.0.copyload.i = load i64, ptr %me.sroa.5.0.self.sroa_idx.i, align 8
      %_19.i.idx = mul nsw i64 %me.sroa.5.0.copyload.i, 24
      %0 = udiv i64 %_19.i.idx, 24
      %_16.i.i = mul i64 %me.sroa.0.0.copyload.i, 24
      %dst_cap.i.i = udiv i64 %_16.i.i, 24
      store i64 %dst_cap.i.i, ptr %_0, align 8
      %1 = getelementptr inbounds i8, ptr %_0, i64 8
      store ptr %me.sroa.4.0.copyload.i, ptr %1, align 8
      %2 = getelementptr inbounds i8, ptr %_0, i64 16
      store i64 %0, ptr %2, align 8
      ret void
    }
    ```
    
    After:
    
    ```
    define void @unwrap_copy(ptr noalias nocapture noundef writeonly sret([24 x i8]) align 8 dereferenceable(24) %_0, ptr noalias nocapture noundef readonly align 8 dereferenceable(24) %iter) {
    start:
      %me.sroa.0.0.copyload.i = load i64, ptr %iter, align 8
      %me.sroa.4.0.self.sroa_idx.i = getelementptr inbounds i8, ptr %iter, i64 8
      %me.sroa.4.0.copyload.i = load ptr, ptr %me.sroa.4.0.self.sroa_idx.i, align 8
      %me.sroa.5.0.self.sroa_idx.i = getelementptr inbounds i8, ptr %iter, i64 16
      %me.sroa.5.0.copyload.i = load i64, ptr %me.sroa.5.0.self.sroa_idx.i, align 8
      %_19.i.idx = mul nsw i64 %me.sroa.5.0.copyload.i, 24
      %0 = udiv i64 %_19.i.idx, 24
      store i64 %me.sroa.0.0.copyload.i, ptr %_0, align 8
      %1 = getelementptr inbounds i8, ptr %_0, i64 8
      store ptr %me.sroa.4.0.copyload.i, ptr %1, align 8
      %2 = getelementptr inbounds i8, ptr %_0, i64 16
      store i64 %0, ptr %2, align 8, !alias.scope !9, !noalias !14
      ret void
    }
    ```
    
    Note that there is still one more `mul,udiv` pair that I couldn't get
    rid of. The root cause is the same issue as rust-lang#121239, the `nuw` gets
    stripped off of `ptr::sub_ptr`.
    jwong101 committed May 18, 2024
    Configuration menu
    Copy the full SHA
    c585541 View commit details
    Browse the repository at this point in the history
  2. optimize in_place_collect with vec::IntoIter::try_fold

    `Iterator::try_fold` gets called on the underlying Iterator in
    `SpecInPlaceCollect::collect_in_place` whenever it does not implement
    `TrustedRandomAccess`. For types that impl `Drop`, LLVM currently can't
    tell that the drop can never occur, when using the default
    `Iterator::try_fold` implementation.
    
    For example, the asm from the `unwrap_clone` method is currently:
    
    ```
    unwrap_clone:
            push    rbp
            push    r15
            push    r14
            push    r13
            push    r12
            push    rbx
            push    rax
            mov     rbx, rdi
            mov     r12, qword ptr [rsi]
            mov     rdi, qword ptr [rsi + 8]
            mov     rax, qword ptr [rsi + 16]
            movabs  rsi, -6148914691236517205
            mov     r14, r12
            test    rax, rax
            je      .LBB0_10
            lea     rcx, [rax + 2*rax]
            lea     r14, [r12 + 8*rcx]
            shl     rax, 3
            lea     rax, [rax + 2*rax]
            xor     ecx, ecx
    .LBB0_2:
            cmp     qword ptr [r12 + rcx], 0
            je      .LBB0_4
            add     rcx, 24
            cmp     rax, rcx
            jne     .LBB0_2
            jmp     .LBB0_10
    .LBB0_4:
            lea     rdx, [rax - 24]
            lea     r14, [r12 + rcx]
            cmp     rdx, rcx
            je      .LBB0_10
            mov     qword ptr [rsp], rdi
            sub     rax, rcx
            add     rax, -24
            mul     rsi
            mov     r15, rdx
            lea     rbp, [r12 + rcx]
            add     rbp, 32
            shr     r15, 4
            mov     r13, qword ptr [rip + __rust_dealloc@GOTPCREL]
            jmp     .LBB0_6
    .LBB0_8:
            add     rbp, 24
            dec     r15
            je      .LBB0_9
    .LBB0_6:
            mov     rsi, qword ptr [rbp]
            test    rsi, rsi
            je      .LBB0_8
            mov     rdi, qword ptr [rbp - 8]
            mov     edx, 1
            call    r13
            jmp     .LBB0_8
    .LBB0_9:
            mov     rdi, qword ptr [rsp]
            movabs  rsi, -6148914691236517205
    .LBB0_10:
            sub     r14, r12
            mov     rax, r14
            mul     rsi
            shr     rdx, 4
            mov     qword ptr [rbx], r12
            mov     qword ptr [rbx + 8], rdi
            mov     qword ptr [rbx + 16], rdx
            mov     rax, rbx
            add     rsp, 8
            pop     rbx
            pop     r12
            pop     r13
            pop     r14
            pop     r15
            pop     rbp
            ret
    ```
    
    After this PR:
    
    ```
    unwrap_clone:
    	mov	rax, rdi
    	movups	xmm0, xmmword ptr [rsi]
    	mov	rcx, qword ptr [rsi + 16]
    	movups	xmmword ptr [rdi], xmm0
    	mov	qword ptr [rdi + 16], rcx
    	ret
    ```
    
    Fixes rust-lang#120493
    jwong101 committed May 18, 2024
    Configuration menu
    Copy the full SHA
    6165dca View commit details
    Browse the repository at this point in the history
  3. specialize Iterator::fold for vec::IntoIter

    LLVM currently adds a redundant check for the returned option, in addition
    to the `self.ptr != self.end` check when using the default
    `Iterator::fold` method that calls `vec::IntoIter::next` in a loop.
    jwong101 committed May 18, 2024
    Configuration menu
    Copy the full SHA
    9d6b93c View commit details
    Browse the repository at this point in the history

Commits on May 19, 2024

  1. Configuration menu
    Copy the full SHA
    65e302f View commit details
    Browse the repository at this point in the history