Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: IsContainableHWIntrinsicOp seems to discard many containment opportunities #92332

Open
jakobbotsch opened this issue Sep 20, 2023 · 4 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@jakobbotsch
Copy link
Member

See e.g. the following:

case NI_Vector128_CreateScalarUnsafe:
case NI_Vector256_CreateScalarUnsafe:
case NI_Vector512_CreateScalarUnsafe:
{
if (!varTypeIsIntegral(childNode->TypeGet()))
{
// The floating-point overload doesn't require any special semantics
supportsSIMDScalarLoads = true;
supportsGeneralLoads = supportsSIMDScalarLoads;
break;
}
// The integral overloads only take GPR/mem
assert(supportsSIMDScalarLoads == false);
const unsigned expectedSize = genTypeSize(genActualType(parentNode->GetSimdBaseType()));
const unsigned operandSize = genTypeSize(childNode->TypeGet());
supportsGeneralLoads = (operandSize >= expectedSize);
break;
}

Using expectedSize = genTypeSize(genActualType(parentNode->GetSimdBaseType())) means we e.g. disallow containing LCL_FLD<ubyte> inside CreateScalar<ubyte> (this particular case is still contained after we transform to broadcast, since broadcast is missing any size check at all -- that's issue #83387). Likely we should not be using actual types here.

The same kind of check is also present for the ConvertScalarToVector128* variants.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Sep 20, 2023
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Sep 20, 2023
@ghost
Copy link

ghost commented Sep 20, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

See e.g. the following:

case NI_Vector128_CreateScalarUnsafe:
case NI_Vector256_CreateScalarUnsafe:
case NI_Vector512_CreateScalarUnsafe:
{
if (!varTypeIsIntegral(childNode->TypeGet()))
{
// The floating-point overload doesn't require any special semantics
supportsSIMDScalarLoads = true;
supportsGeneralLoads = supportsSIMDScalarLoads;
break;
}
// The integral overloads only take GPR/mem
assert(supportsSIMDScalarLoads == false);
const unsigned expectedSize = genTypeSize(genActualType(parentNode->GetSimdBaseType()));
const unsigned operandSize = genTypeSize(childNode->TypeGet());
supportsGeneralLoads = (operandSize >= expectedSize);
break;
}

Using expectedSize = genTypeSize(genActualType(parentNode->GetSimdBaseType())) means we e.g. disallow containing LCL_FLD<ubyte> inside CreateScalar<ubyte> (this particular case is still contained after we transform to broadcast, since broadcast is missing any size check at all -- that's issue #83387). Likely we should not be using actual types here.

The same kind of check is also present for the ConvertScalarToVector128* variants.

Author: jakobbotsch
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@jakobbotsch jakobbotsch added this to the 9.0.0 milestone Sep 20, 2023
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Sep 20, 2023
@tannergooding
Copy link
Member

Some of this is because the underlying instructions don't always take the exact size.

For example, on SSE2 enabled hardware you only have movd and pinsrw as a way of getting an integer value into a SIMD register. movd is exclusively 32/64-bits, while pinsrw is 16-bits.

On SSE4.1 enabled hardware, you get access to pinsrb/pinsrd/pinsrq and that supports 8/32/64-bits.

You don't get broadcast until AVX/AVX2 enabled hardware and that continues supporting all sizes explicitly.


For the most part, LowerHWIntrinsicCreateNode (and the other LowerHWIntrinsic* methods) is supposed to take care of taking the various Create nodes and transforming them to the ISA specific cases that can support more explicit checks and scenarios.

There are a few cases where those transforms don't happen and we may need to replicate the necessary ISA checks in IsContainableHWIntrinsicOp for those scenarios, but it really depends on how its being used and what codegen we'll have to generate.

@jakobbotsch
Copy link
Member Author

Some of this is because the underlying instructions don't always take the exact size.

I see -- feel free to close this if there are no additional opportunities to be had here

@tannergooding
Copy link
Member

I'll keep it open for now so we can confirm we aren't missing anything obvious

It's definitely possible we are, just wanted to note that some of it should be "by design" here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

No branches or pull requests

3 participants