Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARM64-SVE: Fix conditional select for Zeroing predicates #102904 #105737

Merged
merged 9 commits into from
Aug 13, 2024
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 13 additions & 5 deletions src/coreclr/jit/hwintrinsic.h
Original file line number Diff line number Diff line change
Expand Up @@ -236,16 +236,18 @@ enum HWIntrinsicFlag : unsigned int
// then the intrinsic should be switched to a scalar only version.
HW_Flag_HasScalarInputVariant = 0x2000000,

// The intrinsic uses a mask in arg1 to select elements present in the result, and must use a low vector register.
HW_Flag_LowVectorOperation = 0x4000000,

// The intrinsic uses a mask in arg1 to select elements present in the result, which zeros inactive elements
// (instead of merging).
HW_Flag_ZeroingMaskedOperation = 0x8000000,

#endif // TARGET_XARCH

// The intrinsic is a FusedMultiplyAdd intrinsic
HW_Flag_FmaIntrinsic = 0x40000000,

#if defined(TARGET_ARM64)
// The intrinsic uses a mask in arg1 to select elements present in the result, and must use a low vector register.
HW_Flag_LowVectorOperation = 0x4000000,
#endif

HW_Flag_CanBenefitFromConstantProp = 0x80000000,
};

Expand Down Expand Up @@ -981,6 +983,12 @@ struct HWIntrinsicInfo
return (flags & HW_Flag_HasScalarInputVariant) != 0;
}

static bool IsZeroingMaskedOperation(NamedIntrinsic id)
{
const HWIntrinsicFlag flags = lookupFlags(id);
return (flags & HW_Flag_ZeroingMaskedOperation) != 0;
}

static NamedIntrinsic GetScalarInputVariant(NamedIntrinsic id)
{
assert(HasScalarInputVariant(id));
Expand Down
160 changes: 80 additions & 80 deletions src/coreclr/jit/hwintrinsiclistarm64sve.h

Large diffs are not rendered by default.

10 changes: 9 additions & 1 deletion src/coreclr/jit/lowerarmarch.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -4074,9 +4074,17 @@ GenTree* Lowering::LowerHWIntrinsicCndSel(GenTreeHWIntrinsic* cndSelNode)
// `trueValue`
GenTreeHWIntrinsic* nestedCndSel = op2->AsHWIntrinsic();
GenTree* nestedOp1 = nestedCndSel->Op(1);
GenTree* nestedOp2 = nestedCndSel->Op(2);
assert(varTypeIsMask(nestedOp1));
assert(nestedOp2->OperIsHWIntrinsic());

if (nestedOp1->IsMaskAllBitsSet())
NamedIntrinsic nestedOp2Id = nestedOp2->AsHWIntrinsic()->GetHWIntrinsicId();

// If the nested op uses Pg/Z, then inactive lanes will result in zeros, so can only transform if
// op3 is all zeros.

if (nestedOp1->IsMaskAllBitsSet() &&
(!HWIntrinsicInfo::IsZeroingMaskedOperation(nestedOp2Id) || op3->IsVectorZero()))
Comment on lines +4025 to +4029
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any instructions that only allow Pg/M?

Basically we have:

  • Pg/Z only
  • Pg/M -or- Pg/Z

So I'm wanting to discern if we also have:

  • Pg/M only

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, quite a few.
eg:
https://docsmirror.github.io/A64/2023-09/add_z_p_zz.html
ADD <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We likely need something that indicates that and avoids containment of zero for that case then, correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's <Pg>/M only then the optimisation will continue to work the same as it does in HEAD (because IsZeroingMaskedOperation() will be false)

{
GenTree* nestedOp2 = nestedCndSel->Op(2);
GenTree* nestedOp3 = nestedCndSel->Op(3);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -56,17 +56,7 @@ namespace JIT.HardwareIntrinsics.Arm
test.RunStructFldScenario();

// Validates using inside ConditionalSelect with value falseValue
// Currently, using this operation in ConditionalSelect() gives incorrect result
// when falseReg == targetReg because this instruction uses Pg/Z to update the targetReg
// instead of Pg/M to merge it. As such, the value of falseReg is lost. Ideally, such
// instructions should be marked similar to RMW (a different flag name) to make sure that
// we do not assign falseReg/targetReg same. Then, we would do something like this:
//
// ldnf1sh target, pg/z, [x0]
// sel mask, target, target, falseReg
//
// This needs more careful thinking, so disabling it for now.
// test.ConditionalSelect_FalseOp();
test.ConditionalSelect_FalseOp();

// Validates using inside ConditionalSelect with zero falseValue
test.ConditionalSelect_ZeroOp();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -56,17 +56,7 @@ namespace JIT.HardwareIntrinsics.Arm
test.RunStructFldScenario();

// Validates using inside ConditionalSelect with value falseValue
// Currently, using this operation in ConditionalSelect() gives incorrect result
// when falseReg == targetReg because this instruction uses Pg/Z to update the targetReg
// instead of Pg/M to merge it. As such, the value of falseReg is lost. Ideally, such
// instructions should be marked similar to RMW (a different flag name) to make sure that
// we do not assign falseReg/targetReg same. Then, we would do something like this:
//
// ldnf1sh target, pg/z, [x0]
// sel mask, target, target, falseReg
//
// This needs more careful thinking, so disabling it for now.
// test.ConditionalSelect_FalseOp();
test.ConditionalSelect_FalseOp();

// Validates using inside ConditionalSelect with zero falseValue
test.ConditionalSelect_ZeroOp();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -56,17 +56,7 @@ namespace JIT.HardwareIntrinsics.Arm
test.RunStructFldScenario();

// Validates using inside ConditionalSelect with value falseValue
// Currently, using this operation in ConditionalSelect() gives incorrect result
// when falseReg == targetReg because this instruction uses Pg/Z to update the targetReg
// instead of Pg/M to merge it. As such, the value of falseReg is lost. Ideally, such
// instructions should be marked similar to RMW (a different flag name) to make sure that
// we do not assign falseReg/targetReg same. Then, we would do something like this:
//
// ldnf1sh target, pg/z, [x0]
// sel mask, target, target, falseReg
//
// This needs more careful thinking, so disabling it for now.
// test.ConditionalSelect_FalseOp();
test.ConditionalSelect_FalseOp();

// Validates using inside ConditionalSelect with zero falseValue
test.ConditionalSelect_ZeroOp();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -47,17 +47,7 @@ namespace JIT.HardwareIntrinsics.Arm
test.RunStructFldScenario();

// Validates using inside ConditionalSelect with value falseValue
// Currently, using this operation in ConditionalSelect() gives incorrect result
// when falseReg == targetReg because this instruction uses Pg/Z to update the targetReg
// instead of Pg/M to merge it. As such, the value of falseReg is lost. Ideally, such
// instructions should be marked similar to RMW (a different flag name) to make sure that
// we do not assign falseReg/targetReg same. Then, we would do something like this:
//
// ldnf1sh target, pg/z, [x0]
// sel mask, target, target, falseReg
//
// This needs more careful thinking, so disabling it for now.
// test.ConditionalSelect_FalseOp();
test.ConditionalSelect_FalseOp();

// Validates using inside ConditionalSelect with zero falseValue
test.ConditionalSelect_ZeroOp();
Expand Down
Loading