Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rust should pass vectors by vector register #93490

Closed
Miksel12 opened this issue Jan 30, 2022 · 9 comments
Closed

Rust should pass vectors by vector register #93490

Miksel12 opened this issue Jan 30, 2022 · 9 comments

Comments

@Miksel12
Copy link

Miksel12 commented Jan 30, 2022

Rust currently doesn't pass vectors of floats by vector register.

This should be able to be passed by vector registers:

pub struct Stats { x: f32, y: f32, z: f32, q: f32  }


pub fn sum_rust(a: &Stats, b: &Stats) -> Stats {
    return Stats {x: a.x + b.x, y: a.y + b.y, z: a.z + b.z, q: a.q + b.q };
}

But in Rust 1.47 it uses the stack:

example::sum_rust:
        mov     rax, rdi
        movups  xmm0, xmmword ptr [rsi]
        movups  xmm1, xmmword ptr [rdx]
        addps   xmm1, xmm0
        movups  xmmword ptr [rdi], xmm1
        ret

Post 1.47, it is packed into integer registers (see this issue: #85265):

example::sum_rust:
        movss   xmm0, dword ptr [rdi]
        movss   xmm1, dword ptr [rdi + 4]
        addss   xmm0, dword ptr [rsi]
        addss   xmm1, dword ptr [rsi + 4]
        movsd   xmm2, qword ptr [rdi + 8]
        movsd   xmm3, qword ptr [rsi + 8]
        addps   xmm3, xmm2
        movd    eax, xmm0
        movd    ecx, xmm1
        movd    esi, xmm3
        shufps  xmm3, xmm3, 229
        movd    edx, xmm3
        shl     rdx, 32
        or      rdx, rsi
        shl     rcx, 32
        or      rax, rcx
        ret

This issue should be fixed by #93405 and should bring it back to pre 1.48.
But ideally it should be optimized to:

example::sum_rust:
        addps   xmm1, xmm0
        ret

@dotdash mentions in #85265 that this is due to Rust not using the proper types on the LLVM IR level: #85265 (comment)

EDIT:
Clang is able to use this optimization in a similar case:

struct Foo
{
    float bar1;
    float bar2;
    float bar3;
    float bar4;
};

Foo sum_cpp(Foo foo1, Foo foo2)
{
    Foo foo3;
    foo3.bar1 = foo1.bar1 + foo2.bar1;
    foo3.bar2 = foo1.bar2 + foo2.bar2;
    foo3.bar3 = foo1.bar3 + foo2.bar3;
    foo3.bar4 = foo1.bar4 + foo2.bar4;
    return foo3;
}

Gets turned into:

sum_cpp(Foo, Foo):                       # @sum_cpp(Foo, Foo)
        addps   xmm0, xmm2
        addps   xmm1, xmm3
        ret
@nikic
Copy link
Contributor

nikic commented Jan 30, 2022

This doesn't make sense for integers. Yes, you've picked out a very lucky example where this allows vectorization, but generally this just means that data will have to be moved back and forth between vector and GPR registers, for the majority case where no vectorization is possible. Passing these by pointer (as in 1.47) is the right thing to do.

@Urgau
Copy link
Member

Urgau commented Jan 30, 2022

I actually tried this but this is complicated because this register are influenced by #[target_feature], which could lead to unsound code when called from a code that doesn't use this register.
https://github.com/rust-lang/rust/blob/803e19c6ebc647d4e600967c255fccea838bce9f/compiler/rustc_middle/src/ty/layout.rs#L3197-L3221

@Miksel12
Copy link
Author

Miksel12 commented Jan 30, 2022

This doesn't make sense for integers. Yes, you've picked out a very lucky example where this allows vectorization, but generally this just means that data will have to be moved back and forth between vector and GPR registers, for the majority case where no vectorization is possible. Passing these by pointer (as in 1.47) is the right thing to do.

That makes sense, but the exact same happens with floats. It is beneficial for floats to use vector registers, isn't it?
In that case I should probably use floats for my example.
I checked Clang to see how C++ is optimized and integers are indeed not passed in vector registers. Floats are.

struct Foo
{
    float bar1;
    float bar2;
    float bar3;
    float bar4;
};

Foo sum_cpp(Foo foo1, Foo foo2)
{
    Foo foo3;
    foo3.bar1 = foo1.bar1 + foo2.bar1;
    foo3.bar2 = foo1.bar2 + foo2.bar2;
    foo3.bar3 = foo1.bar3 + foo2.bar3;
    foo3.bar4 = foo1.bar4 + foo2.bar4;
    return foo3;
}

Gets turned into:

sum_cpp(Foo, Foo):                       # @sum_cpp(Foo, Foo)
        addps   xmm0, xmm2
        addps   xmm1, xmm3
        ret

@Miksel12
Copy link
Author

I actually tried this but this is complicated because this register are influenced by #[target_feature], which could lead to unsound code when called from a code that doesn't use this register.

https://github.com/rust-lang/rust/blob/803e19c6ebc647d4e600967c255fccea838bce9f/compiler/rustc_middle/src/ty/layout.rs#L3197-L3221

It seems like that issue is discussed in: #79865. It also seems like that issue will be fixed by upgrading to LLVM 14: #79865 (comment)
Though I'm not entirely sure it is the same issue.

@Urgau
Copy link
Member

Urgau commented Jan 30, 2022

I checked Clang to see how C++ is optimized and integers are indeed not passed in vector registers. Floats are.

This is because clang generate a more optimized layout dso_local { <2 x float>, <2 x float> } instead of %Foo = type { float, float, float, float }. This is something I also tried to do is my PR but this way more complicated (it also required the abi compatibility to be fixed) so I decided to not include it.

@Urgau
Copy link
Member

Urgau commented Feb 1, 2022

@Miksel12 I've open #93564 to fix the general issue related to the aggregation of types and I manage to also fix this issue.

With my PR your example code would now be compiled to:

sum_rust:
        addps	xmm0, xmm1
	ret

Even better than clang !

EDIT: It's no longer the case, due to the abi + target_features unsoundness.

@scottmcm
Copy link
Member

scottmcm commented Feb 6, 2022

Rust should pass vectors by vector register

Note that if you add repr(simd) then it does pass deal in vector types: https://play.rust-lang.org/?version=nightly&mode=release&edition=2021&gist=1645fca36d467d49bc9bd4f299f8243f

So arguably this more about "repr(rust) should use vector representations sometimes". (And of course when we eventually get stdsimd then people could do this as a repr(transparent) wrapper around Simd<f32, 4>.)

@bjorn3
Copy link
Member

bjorn3 commented Feb 6, 2022

Note that if you add repr(simd) then it does pass deal in vector types: https://play.rust-lang.org/?version=nightly&mode=release&edition=2021&gist=1645fca36d467d49bc9bd4f299f8243f

It doesn't. It passes them by reference both when using &Stats as arguments and when using Stats as arguments (llvm ir is identical). There would be no load or store if it was passed in vector registers.

define void @_ZN10playground8sum_rust17hf4e374897bed05a9E(<4 x float>* noalias nocapture sret(<4 x float>) dereferenceable(16) %0, <4 x float>* noalias nocapture readonly align 16 dereferenceable(16) %a, <4 x float>* noalias nocapture readonly align 16 dereferenceable(16) %b) unnamed_addr #0 {
start:
  %1 = load <4 x float>, <4 x float>* %a, align 16
  %2 = load <4 x float>, <4 x float>* %b, align 16
  %3 = fadd <4 x float> %1, %2
  store <4 x float> %3, <4 x float>* %0, align 16
  ret void
}

@workingjubilee
Copy link
Member

workingjubilee commented Feb 13, 2022

Having read this issue, I believe this is functionally a duplicate of #64609, #85265, and #91447, so I am closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants