Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Z : Enable Vector instructions for handling float data types #7130

Merged
merged 1 commit into from
Nov 30, 2023

Conversation

sarwat12
Copy link
Contributor

@sarwat12 sarwat12 commented Oct 3, 2023

This PR addresses the use of vector instructions to handle short format in the select evaluator.
Previously, the use of vector instructions for short format in the select evaluator was disabled, even though on z14 and newer platforms it is supported. The issue was caused by not correctly converting the condition code from GPR to FPR for short format. Changes for enabling vector instructions for short format:

  • Use of LLGFR instruction for long format for zero-extending a 32 bit conditionReg to 64 bits
  • Use of separate SLLG instruction for short format to preserve the float representation of the first 32 bits as it is later moved into FPR
  • Addition of mask values in the VFCE instruction to get the element size mask for floats and doubles respectively

Closes: #5002

Signed-off-by: Sarwat Shaheen [email protected]

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for supporting the project, and congratulations on your first contribution! A project committer will shortly review your contribution. In the mean time, if you haven't had a chance please skim over the contribution guidelines which all pull requests must adhere to. If the ECA pull request check fails, have a look at the instructions for signing the ECA in the legal considerations section.

If you run into any problems our community will be happy to assist you in any way we can. There are a number of recommended ways to interact with the community. We encourage you to ask questions, or drop by to say hello.

@@ -2680,7 +2680,7 @@ OMR::Z::TreeEvaluator::dselectEvaluator(TR::Node *node, TR::CodeGenerator *cg)
generateRRInstruction(cg, TR::InstOpCode::LDGR, node, tempReg, conditionReg);
// generate compare with zero
generateVRIaInstruction(cg, TR::InstOpCode::VGBM, node, vzeroReg, 0, 0);
generateVRRcInstruction(cg, TR::InstOpCode::VFCE, node, vectorSelReg, tempReg, vzeroReg, 1, 0, 3);
generateVRRcInstruction(cg, TR::InstOpCode::VFCE, node, vectorSelReg, tempReg, vzeroReg, 0, 0, 2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect. You are basically forcing the VFCE instruction with short-format (Float) so it will generate functionally incorrect code for Double type.

You can use getVectorElementSizeMask(TR::Node *node) to get m4 mask.

Also given that operation should only take place for first element, I wonder if we should set in the m5 to 0x8 which would control the operation to happen only on 0th element , which will be FPR value loaded in LDGR.

if (node->getOpCode().isDouble())
{
generateRRInstruction(cg, TR::InstOpCode::LLGFR, node, conditionReg, conditionReg);
}else
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tabs/ Space needed to be fixed

{
generateRRInstruction(cg, TR::InstOpCode::LLGFR, node, conditionReg, conditionReg);
}else
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please comment why we needed to left shit the condition node for float. This will help others looking into the code.

Copy link
Contributor Author

@sarwat12 sarwat12 Oct 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the case when the node opcode is a Float Select, and the condition child is 32 bits, an SLLG instruction would be required to shift to the left the 32 least significant bits within the 64 bit conditionReg GPR.
This would preserve the floating point representation of the conditionNode, as it is later moved into an FPR using LDGR, which would enable further floating-point operations to be performed.

// convert to floating point
generateRRInstruction(cg, TR::InstOpCode::LDGR, node, tempReg, conditionReg);
// generate compare with zero
generateVRIaInstruction(cg, TR::InstOpCode::VGBM, node, vzeroReg, 0, 0);
generateVRRcInstruction(cg, TR::InstOpCode::VFCE, node, vectorSelReg, tempReg, vzeroReg, 1, 0, 3);
generateVRRcInstruction(cg, TR::InstOpCode::VFCE, node, vectorSelReg, tempReg, vzeroReg, 0, 0x8, getVectorElementSizeMask(node->getSize()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please comment the purpose behind mask values here.

Copy link
Contributor Author

@sarwat12 sarwat12 Oct 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the VFCE instruction, the following mask values have been used to generate vector instructions for both doubles and floats.

M4 - Floating-point-format control = getVectorElementSizeMask(node->getSize()); returns 2 for short format floats and 3 for long format doubles accordingly, depending on the node.

M5 - Single-Element-Control = 0x8, which sets the bit 0 to one, which would control the operation to take place only on the zero-indexed element in the vector.

M6 - Condition Code Set = 0, the Condition Code is not set and remains unchanged.

Binary Encoding for Double:

0x2aa2ea3f8a0] b9 16 00 00                                       LLGFR   GPR0,GPR0
0x2aa2ea3f950] b3 c1 00 20                                       LDGR    FPR2,GPR0
0x2aa2ea3fa00] e7 00 00 00 08 44                                 VGBM    VRF16,0x0
0x2aa2ea3fad0] e7 02 00 08 3a e8                                 WFCEDB  VRF16,VRF2,VRF16

Binary encoding for Float:

0x2aa3be9ccf0] eb 00 00 20 00 0d                                 SLLG    GPR0,32
0x2aa3be9cdb0] b3 c1 00 20                                       LDGR    FPR2,GPR0
0x2aa3be9ce60] e7 00 00 00 08 44                                 VGBM    VRF16,0x0
0x2aa3be9cf30] e7 02 00 08 2a e8                                 WFCEDB  VRF16,VRF2,VRF16

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I meant add comments in the code.

Copy link
Contributor

@r30shah r30shah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code changes looks good to me, @sarwat12 Please add comments in the code and also squash the commits.

Copy link
Contributor

@r30shah r30shah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nitpick, Overall LGTM, once you make final change, will give quick look and launch tests

@@ -2669,18 +2669,30 @@ OMR::Z::TreeEvaluator::dselectEvaluator(TR::Node *node, TR::CodeGenerator *cg)
TR::Register *resultReg = cg->gprClobberEvaluate(trueValueNode);
TR::Register *conditionReg = cg->evaluate(conditionNode);
TR::Register *falseValReg = cg->evaluate(falseValueNode);
if (cg->comp()->target().cpu.isAtLeast(OMR_PROCESSOR_S390_Z13) && node->getOpCode().isDouble())
if ((cg->comp()->target().cpu.isAtLeast(OMR_PROCESSOR_S390_Z13) && node->getOpCode().isDouble()) || (cg->comp()->target().cpu.isAtLeast(OMR_PROCESSOR_S390_Z14) && node->getOpCode().isFloat()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put the two condition in seperate line, so it is easy to read ?

 if ((cg->comp()->target().cpu.isAtLeast(OMR_PROCESSOR_S390_Z13) && node->getOpCode().isDouble())
    || (cg->comp()->target().cpu.isAtLeast(OMR_PROCESSOR_S390_Z14) && node->getOpCode().isFloat()))

Copy link
Contributor

@r30shah r30shah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sarwat12 As your PR closes #5002, can you link two. Please checkout https://github.com/eclipse/omr/blob/master/CONTRIBUTING.md#commit-guidelines for the commit guidelines.

@r30shah
Copy link
Contributor

r30shah commented Nov 6, 2023

@sarwat12 As explained in https://github.com/eclipse/omr/blob/master/CONTRIBUTING.md#commit-guidelines, it is a commit message, not comment on the PR.

Your PR does the following, so it should be clearly mentioned in the commit header and body should contain detailed description.

Header

Body
...

Closes: https://github.com/eclipse/omr/issues/5002

Signed-off-by: ...

Also please have clear commit description in the header and have details in the commit body describing what was the issue and how it was fixed.

@r30shah
Copy link
Contributor

r30shah commented Nov 6, 2023

Enable vector instructions for floats & doubles in the select evaluator

We were taking advantage of Vector Instructions for dselect already on the platform where long format in vector instructions is supported. Your PR addresses the issue seen with using it for short format (in fselect evaluator) on z14 and newer platform (Where short-format support for certain vector operation was added).

@sarwat12 sarwat12 force-pushed the fselect branch 3 times, most recently from 21fd520 to 50340ce Compare November 6, 2023 19:37
@r30shah
Copy link
Contributor

r30shah commented Nov 6, 2023

@sarwat12 Can you rebase your branch ? Currently it shows 16 commits in this PR which is incorrect

@r30shah
Copy link
Contributor

r30shah commented Nov 7, 2023

Jenkins build zos,zlinux

This commit addresses the use of vector instructions to handle short format in the **select** evaluator.
Previously, the use of vector instructions for short format in the select evaluator was disabled, even though on z14 and newer platforms, it is supported. The issue was caused by not correctly converting the condition code from GPR to FPR for short format. Changes for enabling vector instructions for short format:

- Use of LLGFR instruction for long format for zero-extending a 32 bit conditionReg to 64 bits
- Use of separate SLLG instruction for short format floats to preserve the float representation of the first 32 bits as it is later moved into FPR
- Addition of mask values in the VFCE instruction to get the element size mask for floats and doubles respectively

Closes: eclipse#5002

Signed-off-by: Sarwat Shaheen [email protected]
@r30shah
Copy link
Contributor

r30shah commented Nov 23, 2023

Jenkins build zos,zlinux

@r30shah
Copy link
Contributor

r30shah commented Nov 24, 2023

@sarwat12 Can you post the build number of your internal jenkins build testing this change ?

@sarwat12
Copy link
Contributor Author

Build #19439 for testing fselect branch.

@r30shah
Copy link
Contributor

r30shah commented Nov 29, 2023

@sarwat12 Looking at the build, I see JDK11 and JDK8 passes. But due to some unrelated issues JDK17 failed. Can you launch another build with JDK17 only ?

@sarwat12
Copy link
Contributor Author

Build #19491, with JDK17 only.

@r30shah
Copy link
Contributor

r30shah commented Nov 30, 2023

Thanks @sarwat12 for doing the necessary sanity tests internally. I have checked out the builds from #7130 (comment) and #7130 (comment) and apart from a failure in cmdLineTester_criu_jitserverAcrossCheckpoint_0 which looking at the failure does not look like related to changes in this PR, everything looks ok to me.

@dsouzai failure in the criu_jitserver test looks like following,

[ERR] Assertion failed at /home/jenkins/workspace/Build_JDK17_s390x_linux_Personal/openj9/runtime/compiler/env/JITServerPersistentCHTable.cpp:172: classInfo
 [ERR] 	subclass info cannot be null: ensure subclasses are loaded before superclass

Failing Test

Do you know if this is known issue or something new?

@dsouzai
Copy link
Member

dsouzai commented Nov 30, 2023

Do you know if this is known issue or something new?

I think it's this issue eclipse-openj9/openj9#17474

@dsouzai dsouzai self-assigned this Nov 30, 2023
Copy link
Member

@dsouzai dsouzai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Approving based on @r30shah's approval.

@dsouzai
Copy link
Member

dsouzai commented Nov 30, 2023

Merging since tests have passed.

@dsouzai dsouzai merged commit 2d7eb9a into eclipse:master Nov 30, 2023
6 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Investigate use of vector sequence for fselect
3 participants