Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

target_feature doesn't trickle down to closures and internal fns #58279

Open
KyleSiefring opened this issue Feb 8, 2019 · 7 comments
Open

target_feature doesn't trickle down to closures and internal fns #58279

KyleSiefring opened this issue Feb 8, 2019 · 7 comments
Labels
A-codegen Area: Code generation A-SIMD Area: SIMD (Single Instruction Multiple Data) A-target-feature Area: Enabling/disabling target features like AVX, Neon, etc. F-target_feature_11 target feature 1.1 RFC T-lang Relevant to the language team, which will review and decide on the PR/issue.

Comments

@KyleSiefring
Copy link

KyleSiefring commented Feb 8, 2019

Leaves poorly optimized assembly in its wake.

use std::arch::x86::*;
#[cfg(target_arch = "x86_64")]
use std::arch::x86_64::*;

// Creates non inlined calls to intrinsics
#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
#[target_feature(enable = "avx2")]
pub unsafe fn foo(input: &[__m256]) -> f32 {
    let accum = |val: __m256| {
        let roll = _mm256_setr_epi32(1, 2, 3, 4, 5, 6, 7, 0);
        let mut sum = val;
        let mut tmp = _mm256_permutevar8x32_ps(val, roll);
        for i in 0..7 {
            sum = _mm256_add_ps(tmp, sum);
            tmp = _mm256_permutevar8x32_ps(tmp, roll);
        }
        sum
    };
    // Once we call a complex internal closure or fn multiple
    //  times, we find that the compiler hasn't told them that
    //  they can inline or use avx2 intrinsics. Not the sharpest.
    let sum1 = accum(input[0]);
    let sum2 = accum(input[1]);
    _mm256_cvtss_f32(sum1) + _mm256_cvtss_f32(sum2)
}

// Works as expected
#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
#[target_feature(enable = "avx2")]
pub unsafe fn bar(input: &[__m256]) -> f32 {
    // When we pull this tool out of the shed every thing works
    #[target_feature(enable = "avx2")]
    unsafe fn accum(val: __m256) -> __m256 {
        let roll = _mm256_setr_epi32(1, 2, 3, 4, 5, 6, 7, 0);
        let mut sum = val;
        let mut tmp = _mm256_permutevar8x32_ps(val, roll);
        for i in 0..7 {
            sum = _mm256_add_ps(tmp, sum);
            tmp = _mm256_permutevar8x32_ps(tmp, roll);
        }
        sum
    }
    let sum1 = accum(input[0]);
    let sum2 = accum(input[1]);
    _mm256_cvtss_f32(sum1) + _mm256_cvtss_f32(sum2)
}

https://rust.godbolt.org/z/cIr7qS

I found this bug by triggering this one with closures. I wasn't able to trigger it from godbolt. I'm using the latest stable, so if I copied the code in it would work (as in not work).
#50154

Making a separate issue since this one is a performance bug.

@KyleSiefring KyleSiefring changed the title target_feature doesn't trickle down to closures internal fns target_feature doesn't trickle down to closures andinternal fns Feb 8, 2019
@KyleSiefring KyleSiefring changed the title target_feature doesn't trickle down to closures andinternal fns target_feature doesn't trickle down to closures and internal fns Feb 8, 2019
@scottmcm
Copy link
Member

scottmcm commented Feb 8, 2019

I think this is expected for internal fns, as items in general don't get anything from their enclosures (most well known is that they don't get access to generic parameters).

I think I agree for closures, though.

@jonas-schievink jonas-schievink added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. A-SIMD Area: SIMD (Single Instruction Multiple Data) labels Feb 9, 2019
@hanna-kruppe
Copy link
Contributor

One problem with closures is that functions with target_feature have to be unsafe, but there's no such thing as unsafe closures.

@Centril Centril added T-lang Relevant to the language team, which will review and decide on the PR/issue. and removed T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Feb 11, 2019
@workingjubilee workingjubilee added the A-codegen Area: Code generation label Oct 5, 2021
@workingjubilee workingjubilee added F-target_feature_11 target feature 1.1 RFC A-target-feature Area: Enabling/disabling target features like AVX, Neon, etc. labels Mar 3, 2023
@est31
Copy link
Member

est31 commented Mar 9, 2023

I think this has been implemented in the meantime for the target_feature 1.1 feature, as otherwise #108655 wouldn't have been filed.

@LeSeulArtichaut
Copy link
Contributor

(See #73631)

@est31
Copy link
Member

est31 commented Mar 13, 2023

@LeSeulArtichaut maybe we can close this in favour of #73631?

@LeSeulArtichaut
Copy link
Contributor

#73631 doesn't discuss inner functions at all, but if it seems uncontroversial to not allow them on inner functions then I guess this issue can be closed.

@est31
Copy link
Member

est31 commented Mar 15, 2023

Yeah, applying target_feature to inner functions is not good. There are ways to call inner functions from outside the function. It's different for closures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation A-SIMD Area: SIMD (Single Instruction Multiple Data) A-target-feature Area: Enabling/disabling target features like AVX, Neon, etc. F-target_feature_11 target feature 1.1 RFC T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

8 participants