Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SystemZ] Support i128 as legal type in VRs #74625

Merged
merged 1 commit into from
Dec 15, 2023
Merged

Conversation

uweigand
Copy link
Member

@uweigand uweigand commented Dec 6, 2023

On processors supporting vector registers and SIMD instructions, enable i128 as legal type in VRs. This allows many operations to be implemented via native instructions directly in VRs (including add, subtract, logical operations and shifts). For a few other operations (e.g. multiply and divide, as well as atomic operations), we need to move the i128 value back to a GPR pair to use the corresponding instruction there. Overall, this is still beneficial.

The patch includes the following LLVM changes:

  • Enable i128 as legal type
  • Set up legal operations (in SystemZInstrVector.td)
  • Custom expansion for i128 add/subtract with carry
  • Custom expansion for i128 comparisons and selects
  • Support for moving i128 to/from GPR pairs when required
  • Handle 128-bit integer constant values everywhere
  • Use i128 as intrinsic operand type where appropriate
  • Updated and new test cases

In addition, clang builtins are updated to reflect the intrinsic operand type changes (which also improves compatibility with GCC).

@uweigand uweigand self-assigned this Dec 6, 2023
@llvmbot llvmbot added clang Clang issues not falling into any other category backend:X86 clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:headers Headers provided by Clang, e.g. for intrinsics llvm:SelectionDAG SelectionDAGISel as well llvm:ir labels Dec 6, 2023
@llvmbot
Copy link
Collaborator

llvmbot commented Dec 6, 2023

@llvm/pr-subscribers-clang
@llvm/pr-subscribers-llvm-ir
@llvm/pr-subscribers-backend-x86

@llvm/pr-subscribers-llvm-selectiondag

Author: Ulrich Weigand (uweigand)

Changes

On processors supporting vector registers and SIMD instructions, enable i128 as legal type in VRs. This allows many operations to be implemented via native instructions directly in VRs (including add, subtract, logical operations and shifts). For a few other operations (e.g. multiply and divide, as well as atomic operations), we need to move the i128 value back to a GPR pair to use the corresponding instruction there. Overall, this is still beneficial.

The patch includes the following LLVM changes:

  • Enable i128 as legal type
  • Set up legal operations (in SystemZInstrVector.td)
  • Custom expansion for i128 add/subtract with carry
  • Custom expansion for i128 comparisons and selects
  • Support for moving i128 to/from GPR pairs when required
  • Handle 128-bit integer constant values everywhere
  • Use i128 as intrinsic operand type where appropriate
  • Updated and new test cases

In addition, clang builtins are updated to reflect the intrinsic operand type changes (which also improves compatibility with GCC).


Patch is 267.66 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/74625.diff

71 Files Affected:

  • (modified) clang/include/clang/Basic/BuiltinsSystemZ.def (+13-13)
  • (modified) clang/lib/Headers/vecintrin.h (+30-13)
  • (modified) clang/test/CodeGen/SystemZ/builtins-systemz-error2.c (+3-6)
  • (modified) clang/test/CodeGen/SystemZ/builtins-systemz-vector.c (+26-24)
  • (modified) clang/test/CodeGen/SystemZ/builtins-systemz-vector2-error.c (+4-3)
  • (modified) clang/test/CodeGen/SystemZ/builtins-systemz-vector2.c (+5-4)
  • (modified) clang/test/CodeGen/SystemZ/builtins-systemz-zvector.c (+10-12)
  • (modified) clang/test/CodeGen/SystemZ/builtins-systemz-zvector2-error.c (+6-3)
  • (modified) clang/test/CodeGen/SystemZ/builtins-systemz-zvector2.c (+4-4)
  • (modified) llvm/include/llvm/IR/IntrinsicsSystemZ.td (+14-14)
  • (modified) llvm/include/llvm/Target/TargetSelectionDAG.td (+18)
  • (modified) llvm/lib/Target/SystemZ/SystemZCallingConv.td (+4-3)
  • (modified) llvm/lib/Target/SystemZ/SystemZISelDAGToDAG.cpp (+67)
  • (modified) llvm/lib/Target/SystemZ/SystemZISelLowering.cpp (+451-13)
  • (modified) llvm/lib/Target/SystemZ/SystemZISelLowering.h (+14)
  • (modified) llvm/lib/Target/SystemZ/SystemZInstrVector.td (+219-32)
  • (modified) llvm/lib/Target/SystemZ/SystemZOperands.td (+51-40)
  • (modified) llvm/lib/Target/SystemZ/SystemZOperators.td (+35-1)
  • (modified) llvm/lib/Target/SystemZ/SystemZRegisterInfo.td (+3-2)
  • (added) llvm/test/CodeGen/SystemZ/and-10.ll (+31)
  • (added) llvm/test/CodeGen/SystemZ/and-11.ll (+17)
  • (added) llvm/test/CodeGen/SystemZ/args-12.ll (+43)
  • (added) llvm/test/CodeGen/SystemZ/args-13.ll (+44)
  • (added) llvm/test/CodeGen/SystemZ/asm-21.ll (+65)
  • (modified) llvm/test/CodeGen/SystemZ/atomic-load-05.ll (+1)
  • (modified) llvm/test/CodeGen/SystemZ/atomic-store-05.ll (+1)
  • (modified) llvm/test/CodeGen/SystemZ/atomicrmw-ops-i128.ll (+242-344)
  • (added) llvm/test/CodeGen/SystemZ/bswap-09.ll (+61)
  • (added) llvm/test/CodeGen/SystemZ/bswap-10.ll (+55)
  • (modified) llvm/test/CodeGen/SystemZ/cmpxchg-06.ll (+7-4)
  • (added) llvm/test/CodeGen/SystemZ/ctpop-03.ll (+21)
  • (added) llvm/test/CodeGen/SystemZ/ctpop-04.ll (+19)
  • (modified) llvm/test/CodeGen/SystemZ/fp-conv-19.ll (+4-6)
  • (added) llvm/test/CodeGen/SystemZ/fp-conv-20.ll (+112)
  • (added) llvm/test/CodeGen/SystemZ/fp-strict-conv-17.ll (+148)
  • (added) llvm/test/CodeGen/SystemZ/int-abs-02.ll (+21)
  • (added) llvm/test/CodeGen/SystemZ/int-add-19.ll (+16)
  • (added) llvm/test/CodeGen/SystemZ/int-cmp-63.ll (+237)
  • (added) llvm/test/CodeGen/SystemZ/int-const-07.ll (+47)
  • (added) llvm/test/CodeGen/SystemZ/int-conv-14.ll (+416)
  • (added) llvm/test/CodeGen/SystemZ/int-div-07.ll (+112)
  • (added) llvm/test/CodeGen/SystemZ/int-max-01.ll (+204)
  • (added) llvm/test/CodeGen/SystemZ/int-min-01.ll (+204)
  • (added) llvm/test/CodeGen/SystemZ/int-mul-12.ll (+28)
  • (added) llvm/test/CodeGen/SystemZ/int-mul-13.ll (+224)
  • (added) llvm/test/CodeGen/SystemZ/int-neg-03.ll (+16)
  • (added) llvm/test/CodeGen/SystemZ/int-sub-12.ll (+16)
  • (added) llvm/test/CodeGen/SystemZ/int-uadd-13.ll (+50)
  • (added) llvm/test/CodeGen/SystemZ/int-uadd-14.ll (+63)
  • (added) llvm/test/CodeGen/SystemZ/int-usub-12.ll (+50)
  • (added) llvm/test/CodeGen/SystemZ/int-usub-13.ll (+63)
  • (added) llvm/test/CodeGen/SystemZ/or-09.ll (+60)
  • (added) llvm/test/CodeGen/SystemZ/or-10.ll (+18)
  • (modified) llvm/test/CodeGen/SystemZ/regalloc-GR128.ll (+1)
  • (added) llvm/test/CodeGen/SystemZ/rot-03.ll (+77)
  • (added) llvm/test/CodeGen/SystemZ/scalar-ctlz-01.ll (+105)
  • (added) llvm/test/CodeGen/SystemZ/scalar-ctlz-02.ll (+78)
  • (removed) llvm/test/CodeGen/SystemZ/scalar-ctlz.ll (-103)
  • (added) llvm/test/CodeGen/SystemZ/scalar-cttz-01.ll (+129)
  • (added) llvm/test/CodeGen/SystemZ/scalar-cttz-02.ll (+42)
  • (modified) llvm/test/CodeGen/SystemZ/shift-12.ll (+18-58)
  • (added) llvm/test/CodeGen/SystemZ/shift-13.ll (+156)
  • (added) llvm/test/CodeGen/SystemZ/shift-14.ll (+156)
  • (added) llvm/test/CodeGen/SystemZ/shift-15.ll (+156)
  • (modified) llvm/test/CodeGen/SystemZ/store-replicated-vals.ll (+2-5)
  • (modified) llvm/test/CodeGen/SystemZ/store_nonbytesized_vecs.ll (+112-50)
  • (modified) llvm/test/CodeGen/SystemZ/tdc-04.ll (+1)
  • (modified) llvm/test/CodeGen/SystemZ/vec-intrinsics-01.ll (+93-65)
  • (modified) llvm/test/CodeGen/SystemZ/vec-intrinsics-02.ll (+13-9)
  • (added) llvm/test/CodeGen/SystemZ/xor-09.ll (+17)
  • (added) llvm/test/CodeGen/SystemZ/xor-10.ll (+18)
diff --git a/clang/include/clang/Basic/BuiltinsSystemZ.def b/clang/include/clang/Basic/BuiltinsSystemZ.def
index 4cfc52ae42168..f0c0ebfa622a4 100644
--- a/clang/include/clang/Basic/BuiltinsSystemZ.def
+++ b/clang/include/clang/Basic/BuiltinsSystemZ.def
@@ -64,14 +64,14 @@ TARGET_BUILTIN(__builtin_s390_vupllh, "V4UiV8Us", "nc", "vector")
 TARGET_BUILTIN(__builtin_s390_vupllf, "V2ULLiV4Ui", "nc", "vector")
 
 // Vector integer instructions (chapter 22 of the PoP)
-TARGET_BUILTIN(__builtin_s390_vaq, "V16UcV16UcV16Uc", "nc", "vector")
-TARGET_BUILTIN(__builtin_s390_vacq, "V16UcV16UcV16UcV16Uc", "nc", "vector")
+TARGET_BUILTIN(__builtin_s390_vaq, "SLLLiSLLLiSLLLi", "nc", "vector")
+TARGET_BUILTIN(__builtin_s390_vacq, "ULLLiULLLiULLLiULLLi", "nc", "vector")
 TARGET_BUILTIN(__builtin_s390_vaccb, "V16UcV16UcV16Uc", "nc", "vector")
 TARGET_BUILTIN(__builtin_s390_vacch, "V8UsV8UsV8Us", "nc", "vector")
 TARGET_BUILTIN(__builtin_s390_vaccf, "V4UiV4UiV4Ui", "nc", "vector")
 TARGET_BUILTIN(__builtin_s390_vaccg, "V2ULLiV2ULLiV2ULLi", "nc", "vector")
-TARGET_BUILTIN(__builtin_s390_vaccq, "V16UcV16UcV16Uc", "nc", "vector")
-TARGET_BUILTIN(__builtin_s390_vacccq, "V16UcV16UcV16UcV16Uc", "nc", "vector")
+TARGET_BUILTIN(__builtin_s390_vaccq, "ULLLiULLLiULLLi", "nc", "vector")
+TARGET_BUILTIN(__builtin_s390_vacccq, "ULLLiULLLiULLLiULLLi", "nc", "vector")
 TARGET_BUILTIN(__builtin_s390_vavgb, "V16ScV16ScV16Sc", "nc", "vector")
 TARGET_BUILTIN(__builtin_s390_vavgh, "V8SsV8SsV8Ss", "nc", "vector")
 TARGET_BUILTIN(__builtin_s390_vavgf, "V4SiV4SiV4Si", "nc", "vector")
@@ -116,11 +116,11 @@ TARGET_BUILTIN(__builtin_s390_verllvg, "V2ULLiV2ULLiV2ULLi", "nc", "vector")
 TARGET_BUILTIN(__builtin_s390_vgfmb, "V8UsV16UcV16Uc", "nc", "vector")
 TARGET_BUILTIN(__builtin_s390_vgfmh, "V4UiV8UsV8Us", "nc", "vector")
 TARGET_BUILTIN(__builtin_s390_vgfmf, "V2ULLiV4UiV4Ui", "nc", "vector")
-TARGET_BUILTIN(__builtin_s390_vgfmg, "V16UcV2ULLiV2ULLi", "nc", "vector")
+TARGET_BUILTIN(__builtin_s390_vgfmg, "ULLLiV2ULLiV2ULLi", "nc", "vector")
 TARGET_BUILTIN(__builtin_s390_vgfmab, "V8UsV16UcV16UcV8Us", "nc", "vector")
 TARGET_BUILTIN(__builtin_s390_vgfmah, "V4UiV8UsV8UsV4Ui", "nc", "vector")
 TARGET_BUILTIN(__builtin_s390_vgfmaf, "V2ULLiV4UiV4UiV2ULLi", "nc", "vector")
-TARGET_BUILTIN(__builtin_s390_vgfmag, "V16UcV2ULLiV2ULLiV16Uc", "nc", "vector")
+TARGET_BUILTIN(__builtin_s390_vgfmag, "ULLLiV2ULLiV2ULLiULLLi", "nc", "vector")
 TARGET_BUILTIN(__builtin_s390_vmahb, "V16ScV16ScV16ScV16Sc", "nc", "vector")
 TARGET_BUILTIN(__builtin_s390_vmahh, "V8SsV8SsV8SsV8Ss", "nc", "vector")
 TARGET_BUILTIN(__builtin_s390_vmahf, "V4SiV4SiV4SiV4Si", "nc", "vector")
@@ -161,14 +161,14 @@ TARGET_BUILTIN(__builtin_s390_vpopctb, "V16UcV16Uc", "nc", "vector")
 TARGET_BUILTIN(__builtin_s390_vpopcth, "V8UsV8Us", "nc", "vector")
 TARGET_BUILTIN(__builtin_s390_vpopctf, "V4UiV4Ui", "nc", "vector")
 TARGET_BUILTIN(__builtin_s390_vpopctg, "V2ULLiV2ULLi", "nc", "vector")
-TARGET_BUILTIN(__builtin_s390_vsq, "V16UcV16UcV16Uc", "nc", "vector")
-TARGET_BUILTIN(__builtin_s390_vsbcbiq, "V16UcV16UcV16UcV16Uc", "nc", "vector")
-TARGET_BUILTIN(__builtin_s390_vsbiq, "V16UcV16UcV16UcV16Uc", "nc", "vector")
+TARGET_BUILTIN(__builtin_s390_vsq, "SLLLiSLLLiSLLLi", "nc", "vector")
+TARGET_BUILTIN(__builtin_s390_vsbcbiq, "ULLLiULLLiULLLiULLLi", "nc", "vector")
+TARGET_BUILTIN(__builtin_s390_vsbiq, "ULLLiULLLiULLLiULLLi", "nc", "vector")
 TARGET_BUILTIN(__builtin_s390_vscbib, "V16UcV16UcV16Uc", "nc", "vector")
 TARGET_BUILTIN(__builtin_s390_vscbih, "V8UsV8UsV8Us", "nc", "vector")
 TARGET_BUILTIN(__builtin_s390_vscbif, "V4UiV4UiV4Ui", "nc", "vector")
 TARGET_BUILTIN(__builtin_s390_vscbig, "V2ULLiV2ULLiV2ULLi", "nc", "vector")
-TARGET_BUILTIN(__builtin_s390_vscbiq, "V16UcV16UcV16Uc", "nc", "vector")
+TARGET_BUILTIN(__builtin_s390_vscbiq, "ULLLiULLLiULLLi", "nc", "vector")
 TARGET_BUILTIN(__builtin_s390_vsl, "V16UcV16UcV16Uc", "nc", "vector")
 TARGET_BUILTIN(__builtin_s390_vslb, "V16UcV16UcV16Uc", "nc", "vector")
 TARGET_BUILTIN(__builtin_s390_vsldb, "V16UcV16UcV16UcIi", "nc", "vector")
@@ -180,8 +180,8 @@ TARGET_BUILTIN(__builtin_s390_vsumb, "V4UiV16UcV16Uc", "nc", "vector")
 TARGET_BUILTIN(__builtin_s390_vsumh, "V4UiV8UsV8Us", "nc", "vector")
 TARGET_BUILTIN(__builtin_s390_vsumgh, "V2ULLiV8UsV8Us", "nc", "vector")
 TARGET_BUILTIN(__builtin_s390_vsumgf, "V2ULLiV4UiV4Ui", "nc", "vector")
-TARGET_BUILTIN(__builtin_s390_vsumqf, "V16UcV4UiV4Ui", "nc", "vector")
-TARGET_BUILTIN(__builtin_s390_vsumqg, "V16UcV2ULLiV2ULLi", "nc", "vector")
+TARGET_BUILTIN(__builtin_s390_vsumqf, "ULLLiV4UiV4Ui", "nc", "vector")
+TARGET_BUILTIN(__builtin_s390_vsumqg, "ULLLiV2ULLiV2ULLi", "nc", "vector")
 TARGET_BUILTIN(__builtin_s390_vtm, "iV16UcV16Uc", "nc", "vector")
 
 // Vector string instructions (chapter 23 of the PoP)
@@ -256,7 +256,7 @@ TARGET_BUILTIN(__builtin_s390_vftcidb, "V2SLLiV2dIii*", "nc", "vector")
 TARGET_BUILTIN(__builtin_s390_vlrlr, "V16ScUivC*", "", "vector-enhancements-1")
 TARGET_BUILTIN(__builtin_s390_vstrlr, "vV16ScUiv*", "", "vector-enhancements-1")
 TARGET_BUILTIN(__builtin_s390_vbperm, "V2ULLiV16UcV16Uc", "nc", "vector-enhancements-1")
-TARGET_BUILTIN(__builtin_s390_vmslg, "V16UcV2ULLiV2ULLiV16UcIi", "nc", "vector-enhancements-1")
+TARGET_BUILTIN(__builtin_s390_vmslg, "ULLLiV2ULLiV2ULLiULLLiIi", "nc", "vector-enhancements-1")
 TARGET_BUILTIN(__builtin_s390_vfmaxdb, "V2dV2dV2dIi", "nc", "vector-enhancements-1")
 TARGET_BUILTIN(__builtin_s390_vfmindb, "V2dV2dV2dIi", "nc", "vector-enhancements-1")
 TARGET_BUILTIN(__builtin_s390_vfnmadb, "V2dV2dV2dV2d", "nc", "vector-enhancements-1")
diff --git a/clang/lib/Headers/vecintrin.h b/clang/lib/Headers/vecintrin.h
index ecfd6cd1a2f87..1f51e32c0d136 100644
--- a/clang/lib/Headers/vecintrin.h
+++ b/clang/lib/Headers/vecintrin.h
@@ -8359,7 +8359,7 @@ vec_min(__vector double __a, __vector double __b) {
 
 static inline __ATTRS_ai __vector unsigned char
 vec_add_u128(__vector unsigned char __a, __vector unsigned char __b) {
-  return __builtin_s390_vaq(__a, __b);
+  return (__vector unsigned char)((__int128)__a + (__int128)__b);
 }
 
 /*-- vec_addc ---------------------------------------------------------------*/
@@ -8388,7 +8388,8 @@ vec_addc(__vector unsigned long long __a, __vector unsigned long long __b) {
 
 static inline __ATTRS_ai __vector unsigned char
 vec_addc_u128(__vector unsigned char __a, __vector unsigned char __b) {
-  return __builtin_s390_vaccq(__a, __b);
+  return (__vector unsigned char)
+         __builtin_s390_vaccq((unsigned __int128)__a, (unsigned __int128)__b);
 }
 
 /*-- vec_adde_u128 ----------------------------------------------------------*/
@@ -8396,7 +8397,9 @@ vec_addc_u128(__vector unsigned char __a, __vector unsigned char __b) {
 static inline __ATTRS_ai __vector unsigned char
 vec_adde_u128(__vector unsigned char __a, __vector unsigned char __b,
               __vector unsigned char __c) {
-  return __builtin_s390_vacq(__a, __b, __c);
+  return (__vector unsigned char)
+         __builtin_s390_vacq((unsigned __int128)__a, (unsigned __int128)__b,
+                             (unsigned __int128)__c);
 }
 
 /*-- vec_addec_u128 ---------------------------------------------------------*/
@@ -8404,7 +8407,9 @@ vec_adde_u128(__vector unsigned char __a, __vector unsigned char __b,
 static inline __ATTRS_ai __vector unsigned char
 vec_addec_u128(__vector unsigned char __a, __vector unsigned char __b,
                __vector unsigned char __c) {
-  return __builtin_s390_vacccq(__a, __b, __c);
+  return (__vector unsigned char)
+         __builtin_s390_vacccq((unsigned __int128)__a, (unsigned __int128)__b,
+                               (unsigned __int128)__c);
 }
 
 /*-- vec_avg ----------------------------------------------------------------*/
@@ -8478,7 +8483,7 @@ vec_gfmsum(__vector unsigned int __a, __vector unsigned int __b) {
 static inline __ATTRS_o_ai __vector unsigned char
 vec_gfmsum_128(__vector unsigned long long __a,
                __vector unsigned long long __b) {
-  return __builtin_s390_vgfmg(__a, __b);
+  return (__vector unsigned char)__builtin_s390_vgfmg(__a, __b);
 }
 
 /*-- vec_gfmsum_accum -------------------------------------------------------*/
@@ -8507,7 +8512,8 @@ static inline __ATTRS_o_ai __vector unsigned char
 vec_gfmsum_accum_128(__vector unsigned long long __a,
                      __vector unsigned long long __b,
                      __vector unsigned char __c) {
-  return __builtin_s390_vgfmag(__a, __b, __c);
+  return (__vector unsigned char)
+         __builtin_s390_vgfmag(__a, __b, (unsigned __int128)__c);
 }
 
 /*-- vec_mladd --------------------------------------------------------------*/
@@ -8797,15 +8803,21 @@ vec_mulo(__vector unsigned int __a, __vector unsigned int __b) {
 /*-- vec_msum_u128 ----------------------------------------------------------*/
 
 #if __ARCH__ >= 12
+extern __ATTRS_o __vector unsigned char
+vec_msum_u128(__vector unsigned long long __a, __vector unsigned long long __b,
+              __vector unsigned char __c, int __d)
+  __constant_range(__d, 0, 15);
+
 #define vec_msum_u128(X, Y, Z, W) \
-  ((__vector unsigned char)__builtin_s390_vmslg((X), (Y), (Z), (W)));
+  ((__typeof__((vec_msum_u128)((X), (Y), (Z), (W)))) \
+   __builtin_s390_vmslg((X), (Y), (unsigned __int128)(Z), (W)))
 #endif
 
 /*-- vec_sub_u128 -----------------------------------------------------------*/
 
 static inline __ATTRS_ai __vector unsigned char
 vec_sub_u128(__vector unsigned char __a, __vector unsigned char __b) {
-  return __builtin_s390_vsq(__a, __b);
+  return (__vector unsigned char)((__int128)__a - (__int128)__b);
 }
 
 /*-- vec_subc ---------------------------------------------------------------*/
@@ -8834,7 +8846,8 @@ vec_subc(__vector unsigned long long __a, __vector unsigned long long __b) {
 
 static inline __ATTRS_ai __vector unsigned char
 vec_subc_u128(__vector unsigned char __a, __vector unsigned char __b) {
-  return __builtin_s390_vscbiq(__a, __b);
+  return (__vector unsigned char)
+         __builtin_s390_vscbiq((unsigned __int128)__a, (unsigned __int128)__b);
 }
 
 /*-- vec_sube_u128 ----------------------------------------------------------*/
@@ -8842,7 +8855,9 @@ vec_subc_u128(__vector unsigned char __a, __vector unsigned char __b) {
 static inline __ATTRS_ai __vector unsigned char
 vec_sube_u128(__vector unsigned char __a, __vector unsigned char __b,
               __vector unsigned char __c) {
-  return __builtin_s390_vsbiq(__a, __b, __c);
+  return (__vector unsigned char)
+         __builtin_s390_vsbiq((unsigned __int128)__a, (unsigned __int128)__b,
+                              (unsigned __int128)__c);
 }
 
 /*-- vec_subec_u128 ---------------------------------------------------------*/
@@ -8850,7 +8865,9 @@ vec_sube_u128(__vector unsigned char __a, __vector unsigned char __b,
 static inline __ATTRS_ai __vector unsigned char
 vec_subec_u128(__vector unsigned char __a, __vector unsigned char __b,
                __vector unsigned char __c) {
-  return __builtin_s390_vsbcbiq(__a, __b, __c);
+  return (__vector unsigned char)
+         __builtin_s390_vsbcbiq((unsigned __int128)__a, (unsigned __int128)__b,
+                                (unsigned __int128)__c);
 }
 
 /*-- vec_sum2 ---------------------------------------------------------------*/
@@ -8869,12 +8886,12 @@ vec_sum2(__vector unsigned int __a, __vector unsigned int __b) {
 
 static inline __ATTRS_o_ai __vector unsigned char
 vec_sum_u128(__vector unsigned int __a, __vector unsigned int __b) {
-  return __builtin_s390_vsumqf(__a, __b);
+  return (__vector unsigned char)__builtin_s390_vsumqf(__a, __b);
 }
 
 static inline __ATTRS_o_ai __vector unsigned char
 vec_sum_u128(__vector unsigned long long __a, __vector unsigned long long __b) {
-  return __builtin_s390_vsumqg(__a, __b);
+  return (__vector unsigned char)__builtin_s390_vsumqg(__a, __b);
 }
 
 /*-- vec_sum4 ---------------------------------------------------------------*/
diff --git a/clang/test/CodeGen/SystemZ/builtins-systemz-error2.c b/clang/test/CodeGen/SystemZ/builtins-systemz-error2.c
index cf8ee6f7d002b..312a9a156d21e 100644
--- a/clang/test/CodeGen/SystemZ/builtins-systemz-error2.c
+++ b/clang/test/CodeGen/SystemZ/builtins-systemz-error2.c
@@ -1,11 +1,8 @@
 // REQUIRES: systemz-registered-target
 // RUN: %clang_cc1 -triple s390x-ibm-linux -S -emit-llvm %s -verify -o -
 
-typedef __attribute__((vector_size(16))) char v16i8;
-
-v16i8 f0(v16i8 a, v16i8 b) {
-  __builtin_tbegin ((void *)0);         // expected-error {{'__builtin_tbegin' needs target feature transactional-execution}}
-  v16i8 tmp = __builtin_s390_vaq(a, b); // expected-error {{'__builtin_s390_vaq' needs target feature vector}}
-  return tmp;
+__int128 f0(__int128 a, __int128 b) {
+  __builtin_tbegin ((void *)0);    // expected-error {{'__builtin_tbegin' needs target feature transactional-execution}}
+  return __builtin_s390_vaq(a, b); // expected-error {{'__builtin_s390_vaq' needs target feature vector}}
 }
 
diff --git a/clang/test/CodeGen/SystemZ/builtins-systemz-vector.c b/clang/test/CodeGen/SystemZ/builtins-systemz-vector.c
index 877032a52a0ae..31b8cd11ea79f 100644
--- a/clang/test/CodeGen/SystemZ/builtins-systemz-vector.c
+++ b/clang/test/CodeGen/SystemZ/builtins-systemz-vector.c
@@ -21,6 +21,8 @@ volatile vec_ushort vus;
 volatile vec_uint vui;
 volatile vec_ulong vul;
 volatile vec_double vd;
+volatile signed __int128 si128;
+volatile unsigned __int128 ui128;
 
 volatile unsigned int len;
 volatile unsigned char amt;
@@ -111,14 +113,14 @@ void test_core(void) {
 }
 
 void test_integer(void) {
-  vuc = __builtin_s390_vaq(vuc, vuc);
-  // CHECK: call <16 x i8> @llvm.s390.vaq(<16 x i8> %{{.*}}, <16 x i8> %{{.*}})
-  vuc = __builtin_s390_vacq(vuc, vuc, vuc);
-  // CHECK: call <16 x i8> @llvm.s390.vacq(<16 x i8> %{{.*}}, <16 x i8> %{{.*}}, <16 x i8> %{{.*}})
-  vuc = __builtin_s390_vaccq(vuc, vuc);
-  // CHECK: call <16 x i8> @llvm.s390.vaccq(<16 x i8> %{{.*}}, <16 x i8> %{{.*}})
-  vuc = __builtin_s390_vacccq(vuc, vuc, vuc);
-  // CHECK: call <16 x i8> @llvm.s390.vacccq(<16 x i8> %{{.*}}, <16 x i8> %{{.*}}, <16 x i8> %{{.*}})
+  si128 = __builtin_s390_vaq(si128, si128);
+  // CHECK: call i128 @llvm.s390.vaq(i128 %{{.*}}, i128 %{{.*}})
+  ui128 = __builtin_s390_vacq(ui128, ui128, ui128);
+  // CHECK: call i128 @llvm.s390.vacq(i128 %{{.*}}, i128 %{{.*}}, i128 %{{.*}})
+  ui128 = __builtin_s390_vaccq(ui128, ui128);
+  // CHECK: call i128 @llvm.s390.vaccq(i128 %{{.*}}, i128 %{{.*}})
+  ui128 = __builtin_s390_vacccq(ui128, ui128, ui128);
+  // CHECK: call i128 @llvm.s390.vacccq(i128 %{{.*}}, i128 %{{.*}}, i128 %{{.*}})
 
   vuc = __builtin_s390_vaccb(vuc, vuc);
   // CHECK: call <16 x i8> @llvm.s390.vaccb(<16 x i8> %{{.*}}, <16 x i8> %{{.*}})
@@ -209,8 +211,8 @@ void test_integer(void) {
   // CHECK: call <4 x i32> @llvm.s390.vgfmh(<8 x i16> %{{.*}}, <8 x i16> %{{.*}})
   vul = __builtin_s390_vgfmf(vui, vui);
   // CHECK: call <2 x i64> @llvm.s390.vgfmf(<4 x i32> %{{.*}}, <4 x i32> %{{.*}})
-  vuc = __builtin_s390_vgfmg(vul, vul);
-  // CHECK: call <16 x i8> @llvm.s390.vgfmg(<2 x i64> %{{.*}}, <2 x i64> %{{.*}})
+  ui128 = __builtin_s390_vgfmg(vul, vul);
+  // CHECK: call i128 @llvm.s390.vgfmg(<2 x i64> %{{.*}}, <2 x i64> %{{.*}})
 
   vus = __builtin_s390_vgfmab(vuc, vuc, vus);
   // CHECK: call <8 x i16> @llvm.s390.vgfmab(<16 x i8> %{{.*}}, <16 x i8> %{{.*}}, <8 x i16> %{{.*}})
@@ -218,8 +220,8 @@ void test_integer(void) {
   // CHECK: call <4 x i32> @llvm.s390.vgfmah(<8 x i16> %{{.*}}, <8 x i16> %{{.*}}, <4 x i32> %{{.*}})
   vul = __builtin_s390_vgfmaf(vui, vui, vul);
   // CHECK: call <2 x i64> @llvm.s390.vgfmaf(<4 x i32> %{{.*}}, <4 x i32> %{{.*}}, <2 x i64> %{{.*}})
-  vuc = __builtin_s390_vgfmag(vul, vul, vuc);
-  // CHECK: call <16 x i8> @llvm.s390.vgfmag(<2 x i64> %{{.*}}, <2 x i64> %{{.*}}, <16 x i8> %{{.*}})
+  ui128 = __builtin_s390_vgfmag(vul, vul, ui128);
+  // CHECK: call i128 @llvm.s390.vgfmag(<2 x i64> %{{.*}}, <2 x i64> %{{.*}}, i128 %{{.*}})
 
   vsc = __builtin_s390_vmahb(vsc, vsc, vsc);
   // CHECK: call <16 x i8> @llvm.s390.vmahb(<16 x i8> %{{.*}}, <16 x i8> %{{.*}}, <16 x i8> %{{.*}})
@@ -308,14 +310,14 @@ void test_integer(void) {
   vul = __builtin_s390_vpopctg(vul);
   // CHECK: call <2 x i64> @llvm.ctpop.v2i64(<2 x i64> %{{.*}})
 
-  vuc = __builtin_s390_vsq(vuc, vuc);
-  // CHECK: call <16 x i8> @llvm.s390.vsq(<16 x i8> %{{.*}}, <16 x i8> %{{.*}})
-  vuc = __builtin_s390_vsbiq(vuc, vuc, vuc);
-  // CHECK: call <16 x i8> @llvm.s390.vsbiq(<16 x i8> %{{.*}}, <16 x i8> %{{.*}}, <16 x i8> %{{.*}})
-  vuc = __builtin_s390_vscbiq(vuc, vuc);
-  // CHECK: call <16 x i8> @llvm.s390.vscbiq(<16 x i8> %{{.*}}, <16 x i8> %{{.*}})
-  vuc = __builtin_s390_vsbcbiq(vuc, vuc, vuc);
-  // CHECK: call <16 x i8> @llvm.s390.vsbcbiq(<16 x i8> %{{.*}}, <16 x i8> %{{.*}}, <16 x i8> %{{.*}})
+  si128 = __builtin_s390_vsq(si128, si128);
+  // CHECK: call i128 @llvm.s390.vsq(i128 %{{.*}}, i128 %{{.*}})
+  ui128 = __builtin_s390_vsbiq(ui128, ui128, ui128);
+  // CHECK: call i128 @llvm.s390.vsbiq(i128 %{{.*}}, i128 %{{.*}}, i128 %{{.*}})
+  ui128 = __builtin_s390_vscbiq(ui128, ui128);
+  // CHECK: call i128 @llvm.s390.vscbiq(i128 %{{.*}}, i128 %{{.*}})
+  ui128 = __builtin_s390_vsbcbiq(ui128, ui128, ui128);
+  // CHECK: call i128 @llvm.s390.vsbcbiq(i128 %{{.*}}, i128 %{{.*}}, i128 %{{.*}})
 
   vuc = __builtin_s390_vscbib(vuc, vuc);
   // CHECK: call <16 x i8> @llvm.s390.vscbib(<16 x i8> %{{.*}}, <16 x i8> %{{.*}})
@@ -354,10 +356,10 @@ void test_integer(void) {
   // CHECK: call <2 x i64> @llvm.s390.vsumgh(<8 x i16> %{{.*}}, <8 x i16> %{{.*}})
   vul = __builtin_s390_vsumgf(vui, vui);
   // CHECK: call <2 x i64> @llvm.s390.vsumgf(<4 x i32> %{{.*}}, <4 x i32> %{{.*}})
-  vuc = __builtin_s390_vsumqf(vui, vui);
-  // CHECK: call <16 x i8> @llvm.s390.vsumqf(<4 x i32> %{{.*}}, <4 x i32> %{{.*}})
-  vuc = __builtin_s390_vsumqg(vul, vul);
-  // CHECK: call <16 x i8> @llvm.s390.vsumqg(<2 x i64> %{{.*}}, <2 x i64> %{{.*}})
+  ui128 = __builtin_s390_vsumqf(vui, vui);
+  // CHECK: call i128 @llvm.s390.vsumqf(<4 x i32> %{{.*}}, <4 x i32> %{{.*}})
+  ui128 = __builtin_s390_vsumqg(vul, vul);
+  // CHECK: call i128 @llvm.s390.vsumqg(<2 x i64> %{{.*}}, <2 x i64> %{{.*}})
 
   len = __builtin_s390_vtm(vuc, vuc);
   // CHECK: call i32 @llvm.s390.vtm(<16 x i8> %{{.*}}, <16 x i8> %{{.*}})
diff --git a/clang/test/CodeGen/SystemZ/builtins-systemz-vector2-error.c b/clang/test/CodeGen/SystemZ/builtins-systemz-vector2-error.c
index cd27ac79e15d4..af3b4f191879e 100644
--- a/clang/test/CodeGen/SystemZ/builtins-systemz-vector2-error.c
+++ b/clang/test/CodeGen/SystemZ/builtins-systemz-vector2-error.c
@@ -23,14 +23,15 @@ volatile vec_uint vui;
 volatile vec_ulong vul;
 volatile vec_double vd;
 volatile vec_float vf;
+volatile unsigned __int128 ui128;
 
 volatile unsigned int len;
 int cc;
 
 void test_integer(void) {
-  __builtin_s390_vmslg(vul, vul, vuc, -1);   // expected-error-re {{argument value {{.*}} is outside the valid range}}
-  __builtin_s390_vmslg(vul, vul, vuc, 16);   // expected-error-re {{argument value {{.*}} is outside the valid range}}
-  __builtin_s390_vmslg(vul, vul, vuc, len);  // expected-error {{must be a constant integer}}
+  __builtin_s390_vmslg(vul, vul, ui128, -1); // expected-error-re {{argument value {{.*}} is outside the valid range}}
+  __builtin_s390_vmslg(vul, vul, ui128, 16); // expected-error-re {{argument value {{.*}} is outside the valid range}}
+  __builtin_s390_vmslg(vul, vul, ui128, len);// expected-error {{must be a constant integer}}
 }
 
 void test_float(void) {
diff --git a/clang/test/CodeGen/SystemZ/builtins-systemz-vector2.c b/clang/test/CodeGen/SystemZ/builtins-systemz-vector2.c
index 5e287e28ed201..3761f252d724b 100644
--- a/clang/test/CodeGen/SystemZ/builtins-systemz-vector2.c
+++ b/clang/test/CodeGen/SystemZ/builtins-systemz-vector2.c
@@ -23,6 +23,7 @@ volatile vec_uint vui;
 volatile vec_ulong vul;
 volatile vec_double vd;
 volatile vec_float vf;
+volatile unsigned __int128 ui128;
 
 volatile unsigned int len;
 const void * volatile cptr;
@@ -41,10 +42,10 @@ void test_core(void) {
 }
 
 void test_integer(void) {
-  vuc = __builtin_s390_vmslg(vul, vul, vuc, 0);
-  // CHECK: call <16 x i8> @llvm.s390.vmslg(<2 x i64> %{{.*}}, <2 x i64> %{{.*}}, <16 x i8> %{{.*}}, i32 0)
-  vuc = __builtin_s390_vmslg(vul, vul, vuc, 15);
-  // CHECK: call <16 x i8> @llvm.s390.vmslg(<2 x i64> %{{.*}}, <2 x i64> %{{.*}}, <16 x i8> %{{.*}}, i32 15)
+  ui128 = __builtin...
[truncated]

Copy link

github-actions bot commented Dec 6, 2023

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff 4070dffd34e99915b005c655086d92e42c004d25 d661b381ae029e1dc374aa80612399f24e5a8823 -- clang/lib/Headers/vecintrin.h clang/test/CodeGen/SystemZ/builtins-systemz-error2.c clang/test/CodeGen/SystemZ/builtins-systemz-vector.c clang/test/CodeGen/SystemZ/builtins-systemz-vector2-error.c clang/test/CodeGen/SystemZ/builtins-systemz-vector2.c clang/test/CodeGen/SystemZ/builtins-systemz-zvector.c clang/test/CodeGen/SystemZ/builtins-systemz-zvector2-error.c clang/test/CodeGen/SystemZ/builtins-systemz-zvector2.c llvm/lib/Target/SystemZ/SystemZISelDAGToDAG.cpp llvm/lib/Target/SystemZ/SystemZISelLowering.cpp llvm/lib/Target/SystemZ/SystemZISelLowering.h
View the diff from clang-format here.
diff --git a/clang/lib/Headers/vecintrin.h b/clang/lib/Headers/vecintrin.h
index 1f51e32c0d..886770b262 100644
--- a/clang/lib/Headers/vecintrin.h
+++ b/clang/lib/Headers/vecintrin.h
@@ -8388,8 +8388,8 @@ vec_addc(__vector unsigned long long __a, __vector unsigned long long __b) {
 
 static inline __ATTRS_ai __vector unsigned char
 vec_addc_u128(__vector unsigned char __a, __vector unsigned char __b) {
-  return (__vector unsigned char)
-         __builtin_s390_vaccq((unsigned __int128)__a, (unsigned __int128)__b);
+  return (__vector unsigned char)__builtin_s390_vaccq((unsigned __int128)__a,
+                                                      (unsigned __int128)__b);
 }
 
 /*-- vec_adde_u128 ----------------------------------------------------------*/
@@ -8397,9 +8397,8 @@ vec_addc_u128(__vector unsigned char __a, __vector unsigned char __b) {
 static inline __ATTRS_ai __vector unsigned char
 vec_adde_u128(__vector unsigned char __a, __vector unsigned char __b,
               __vector unsigned char __c) {
-  return (__vector unsigned char)
-         __builtin_s390_vacq((unsigned __int128)__a, (unsigned __int128)__b,
-                             (unsigned __int128)__c);
+  return (__vector unsigned char)__builtin_s390_vacq(
+      (unsigned __int128)__a, (unsigned __int128)__b, (unsigned __int128)__c);
 }
 
 /*-- vec_addec_u128 ---------------------------------------------------------*/
@@ -8407,9 +8406,8 @@ vec_adde_u128(__vector unsigned char __a, __vector unsigned char __b,
 static inline __ATTRS_ai __vector unsigned char
 vec_addec_u128(__vector unsigned char __a, __vector unsigned char __b,
                __vector unsigned char __c) {
-  return (__vector unsigned char)
-         __builtin_s390_vacccq((unsigned __int128)__a, (unsigned __int128)__b,
-                               (unsigned __int128)__c);
+  return (__vector unsigned char)__builtin_s390_vacccq(
+      (unsigned __int128)__a, (unsigned __int128)__b, (unsigned __int128)__c);
 }
 
 /*-- vec_avg ----------------------------------------------------------------*/
@@ -8512,8 +8510,8 @@ static inline __ATTRS_o_ai __vector unsigned char
 vec_gfmsum_accum_128(__vector unsigned long long __a,
                      __vector unsigned long long __b,
                      __vector unsigned char __c) {
-  return (__vector unsigned char)
-         __builtin_s390_vgfmag(__a, __b, (unsigned __int128)__c);
+  return (__vector unsigned char)__builtin_s390_vgfmag(__a, __b,
+                                                       (unsigned __int128)__c);
 }
 
 /*-- vec_mladd --------------------------------------------------------------*/
@@ -8805,12 +8803,11 @@ vec_mulo(__vector unsigned int __a, __vector unsigned int __b) {
 #if __ARCH__ >= 12
 extern __ATTRS_o __vector unsigned char
 vec_msum_u128(__vector unsigned long long __a, __vector unsigned long long __b,
-              __vector unsigned char __c, int __d)
-  __constant_range(__d, 0, 15);
+              __vector unsigned char __c, int __d) __constant_range(__d, 0, 15);
 
-#define vec_msum_u128(X, Y, Z, W) \
-  ((__typeof__((vec_msum_u128)((X), (Y), (Z), (W)))) \
-   __builtin_s390_vmslg((X), (Y), (unsigned __int128)(Z), (W)))
+#define vec_msum_u128(X, Y, Z, W)                                              \
+  ((__typeof__((vec_msum_u128)((X), (Y), (Z), (W))))__builtin_s390_vmslg(      \
+      (X), (Y), (unsigned __int128)(Z), (W)))
 #endif
 
 /*-- vec_sub_u128 -----------------------------------------------------------*/
@@ -8846,8 +8843,8 @@ vec_subc(__vector unsigned long long __a, __vector unsigned long long __b) {
 
 static inline __ATTRS_ai __vector unsigned char
 vec_subc_u128(__vector unsigned char __a, __vector unsigned char __b) {
-  return (__vector unsigned char)
-         __builtin_s390_vscbiq((unsigned __int128)__a, (unsigned __int128)__b);
+  return (__vector unsigned char)__builtin_s390_vscbiq((unsigned __int128)__a,
+                                                       (unsigned __int128)__b);
 }
 
 /*-- vec_sube_u128 ----------------------------------------------------------*/
@@ -8855,9 +8852,8 @@ vec_subc_u128(__vector unsigned char __a, __vector unsigned char __b) {
 static inline __ATTRS_ai __vector unsigned char
 vec_sube_u128(__vector unsigned char __a, __vector unsigned char __b,
               __vector unsigned char __c) {
-  return (__vector unsigned char)
-         __builtin_s390_vsbiq((unsigned __int128)__a, (unsigned __int128)__b,
-                              (unsigned __int128)__c);
+  return (__vector unsigned char)__builtin_s390_vsbiq(
+      (unsigned __int128)__a, (unsigned __int128)__b, (unsigned __int128)__c);
 }
 
 /*-- vec_subec_u128 ---------------------------------------------------------*/
@@ -8865,9 +8861,8 @@ vec_sube_u128(__vector unsigned char __a, __vector unsigned char __b,
 static inline __ATTRS_ai __vector unsigned char
 vec_subec_u128(__vector unsigned char __a, __vector unsigned char __b,
                __vector unsigned char __c) {
-  return (__vector unsigned char)
-         __builtin_s390_vsbcbiq((unsigned __int128)__a, (unsigned __int128)__b,
-                                (unsigned __int128)__c);
+  return (__vector unsigned char)__builtin_s390_vsbcbiq(
+      (unsigned __int128)__a, (unsigned __int128)__b, (unsigned __int128)__c);
 }
 
 /*-- vec_sum2 ---------------------------------------------------------------*/
diff --git a/llvm/lib/Target/SystemZ/SystemZISelDAGToDAG.cpp b/llvm/lib/Target/SystemZ/SystemZISelDAGToDAG.cpp
index e5e1e91916..ced7fdeffa 100644
--- a/llvm/lib/Target/SystemZ/SystemZISelDAGToDAG.cpp
+++ b/llvm/lib/Target/SystemZ/SystemZISelDAGToDAG.cpp
@@ -1187,9 +1187,10 @@ void SystemZDAGToDAGISel::loadVectorConstant(
   SelectCode(Op.getNode());
 }
 
-SDNode *SystemZDAGToDAGISel::loadPoolVectorConstant(APInt Val, EVT VT, SDLoc DL) {
+SDNode *SystemZDAGToDAGISel::loadPoolVectorConstant(APInt Val, EVT VT,
+                                                    SDLoc DL) {
   SDNode *ResNode;
-  assert (VT.getSizeInBits() == 128);
+  assert(VT.getSizeInBits() == 128);
 
   SDValue CP = CurDAG->getTargetConstantPool(
       ConstantInt::get(Type::getInt128Ty(*CurDAG->getContext()), Val),
@@ -1197,17 +1198,15 @@ SDNode *SystemZDAGToDAGISel::loadPoolVectorConstant(APInt Val, EVT VT, SDLoc DL)
 
   EVT PtrVT = CP.getValueType();
   SDValue Ops[] = {
-    SDValue(CurDAG->getMachineNode(SystemZ::LARL, DL, PtrVT, CP), 0),
-    CurDAG->getTargetConstant(0, DL, PtrVT),
-    CurDAG->getRegister(0, PtrVT),
-    CurDAG->getEntryNode()
-  };
+      SDValue(CurDAG->getMachineNode(SystemZ::LARL, DL, PtrVT, CP), 0),
+      CurDAG->getTargetConstant(0, DL, PtrVT), CurDAG->getRegister(0, PtrVT),
+      CurDAG->getEntryNode()};
   ResNode = CurDAG->getMachineNode(SystemZ::VL, DL, VT, MVT::Other, Ops);
 
   // Annotate ResNode with memory operand information so that MachineInstr
   // queries work properly. This e.g. gives the register allocation the
   // required information for rematerialization.
-  MachineFunction& MF = CurDAG->getMachineFunction();
+  MachineFunction &MF = CurDAG->getMachineFunction();
   MachineMemOperand *MemOp =
       MF.getMachineMemOperand(MachinePointerInfo::getConstantPool(MF),
                               MachineMemOperand::MOLoad, 16, Align(8));
@@ -1621,11 +1620,11 @@ void SystemZDAGToDAGISel::Select(SDNode *Node) {
       SDValue Src = Node->getOperand(0);
       Src = CurDAG->getNode(ISD::BITCAST, DL, MVT::v16i8, Src);
 
-      uint64_t Bytes[2] = { 0x0706050403020100ULL, 0x0f0e0d0c0b0a0908ULL };
+      uint64_t Bytes[2] = {0x0706050403020100ULL, 0x0f0e0d0c0b0a0908ULL};
       SDNode *Mask = loadPoolVectorConstant(APInt(128, Bytes), MVT::v16i8, DL);
-      SDValue Ops[] = { Src, Src, SDValue(Mask, 0) };
-      SDValue Res = SDValue(CurDAG->getMachineNode(SystemZ::VPERM, DL,
-                                                   MVT::v16i8, Ops), 0);
+      SDValue Ops[] = {Src, Src, SDValue(Mask, 0)};
+      SDValue Res = SDValue(
+          CurDAG->getMachineNode(SystemZ::VPERM, DL, MVT::v16i8, Ops), 0);
 
       Res = CurDAG->getNode(ISD::BITCAST, DL, MVT::i128, Res);
       SDNode *ResNode = Res.getNode();
diff --git a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
index d82910a0b2..19c49952d8 100644
--- a/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
+++ b/llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
@@ -242,21 +242,21 @@ SystemZTargetLowering::SystemZTargetLowering(const TargetMachine &TM,
   // Handle i128 if legal.
   if (isTypeLegal(MVT::i128)) {
     // No special instructions for these.
-    setOperationAction(ISD::SDIVREM,   MVT::i128, Expand);
-    setOperationAction(ISD::UDIVREM,   MVT::i128, Expand);
+    setOperationAction(ISD::SDIVREM, MVT::i128, Expand);
+    setOperationAction(ISD::UDIVREM, MVT::i128, Expand);
     setOperationAction(ISD::SMUL_LOHI, MVT::i128, Expand);
     setOperationAction(ISD::UMUL_LOHI, MVT::i128, Expand);
-    setOperationAction(ISD::ROTR,      MVT::i128, Expand);
-    setOperationAction(ISD::ROTL,      MVT::i128, Expand);
-    setOperationAction(ISD::MUL,       MVT::i128, Expand);
-    setOperationAction(ISD::MULHS,     MVT::i128, Expand);
-    setOperationAction(ISD::MULHU,     MVT::i128, Expand);
-    setOperationAction(ISD::SDIV,      MVT::i128, Expand);
-    setOperationAction(ISD::UDIV,      MVT::i128, Expand);
-    setOperationAction(ISD::SREM,      MVT::i128, Expand);
-    setOperationAction(ISD::UREM,      MVT::i128, Expand);
-    setOperationAction(ISD::CTLZ,      MVT::i128, Expand);
-    setOperationAction(ISD::CTTZ,      MVT::i128, Expand);
+    setOperationAction(ISD::ROTR, MVT::i128, Expand);
+    setOperationAction(ISD::ROTL, MVT::i128, Expand);
+    setOperationAction(ISD::MUL, MVT::i128, Expand);
+    setOperationAction(ISD::MULHS, MVT::i128, Expand);
+    setOperationAction(ISD::MULHU, MVT::i128, Expand);
+    setOperationAction(ISD::SDIV, MVT::i128, Expand);
+    setOperationAction(ISD::UDIV, MVT::i128, Expand);
+    setOperationAction(ISD::SREM, MVT::i128, Expand);
+    setOperationAction(ISD::UREM, MVT::i128, Expand);
+    setOperationAction(ISD::CTLZ, MVT::i128, Expand);
+    setOperationAction(ISD::CTTZ, MVT::i128, Expand);
 
     // Support addition/subtraction with carry.
     setOperationAction(ISD::UADDO, MVT::i128, Custom);
@@ -2848,9 +2848,8 @@ static void adjustForTestUnderMask(SelectionDAG &DAG, const SDLoc &DL,
   // Use VECTOR TEST UNDER MASK for i128 operations.
   if (C.Op0.getValueType() == MVT::i128) {
     // We can use VTM for EQ/NE comparisons of x & y against 0.
-    if (C.Op0.getOpcode() == ISD::AND &&
-        (C.CCMask == SystemZ::CCMASK_CMP_EQ ||
-         C.CCMask == SystemZ::CCMASK_CMP_NE)) {
+    if (C.Op0.getOpcode() == ISD::AND && (C.CCMask == SystemZ::CCMASK_CMP_EQ ||
+                                          C.CCMask == SystemZ::CCMASK_CMP_NE)) {
       auto *Mask = dyn_cast<ConstantSDNode>(C.Op1);
       if (Mask && Mask->getAPIntValue() == 0) {
         C.Opcode = SystemZISD::VTM;
@@ -2953,8 +2952,7 @@ static void adjustForTestUnderMask(SelectionDAG &DAG, const SDLoc &DL,
 }
 
 // Implement i128 comparison in vector registers.
-static void adjustICmp128(SelectionDAG &DAG, const SDLoc &DL,
-                          Comparison &C) {
+static void adjustICmp128(SelectionDAG &DAG, const SDLoc &DL, Comparison &C) {
   if (C.Opcode != SystemZISD::ICMP)
     return;
   if (C.Op0.getValueType() != MVT::i128)
@@ -2977,11 +2975,19 @@ static void adjustICmp128(SelectionDAG &DAG, const SDLoc &DL,
   // Normalize other comparisons to GT.
   bool Swap = false, Invert = false;
   switch (C.CCMask) {
-    case SystemZ::CCMASK_CMP_GT: break;
-    case SystemZ::CCMASK_CMP_LT: Swap = true; break;
-    case SystemZ::CCMASK_CMP_LE: Invert = true; break;
-    case SystemZ::CCMASK_CMP_GE: Swap = Invert = true; break;
-    default: llvm_unreachable("Invalid integer condition!");
+  case SystemZ::CCMASK_CMP_GT:
+    break;
+  case SystemZ::CCMASK_CMP_LT:
+    Swap = true;
+    break;
+  case SystemZ::CCMASK_CMP_LE:
+    Invert = true;
+    break;
+  case SystemZ::CCMASK_CMP_GE:
+    Swap = Invert = true;
+    break;
+  default:
+    llvm_unreachable("Invalid integer condition!");
   }
   if (Swap)
     std::swap(C.Op0, C.Op1);
@@ -3062,12 +3068,14 @@ static Comparison getCmp(SelectionDAG &DAG, SDValue CmpOp0, SDValue CmpOp1,
         CmpOp0.getResNo() == 0 && CmpOp0->hasNUsesOfValue(1, 0) &&
         isIntrinsicWithCCAndChain(CmpOp0, Opcode, CCValid))
       return getIntrinsicCmp(DAG, Opcode, CmpOp0, CCValid,
-                             cast<ConstantSDNode>(CmpOp1)->getZExtValue(), Cond);
+                             cast<ConstantSDNode>(CmpOp1)->getZExtValue(),
+                             Cond);
     if (CmpOp0.getOpcode() == ISD::INTRINSIC_WO_CHAIN &&
         CmpOp0.getResNo() == CmpOp0->getNumValues() - 1 &&
         isIntrinsicWithCC(CmpOp0, Opcode, CCValid))
       return getIntrinsicCmp(DAG, Opcode, CmpOp0, CCValid,
-                             cast<ConstantSDNode>(CmpOp1)->getZExtValue(), Cond);
+                             cast<ConstantSDNode>(CmpOp1)->getZExtValue(),
+                             Cond);
   }
   Comparison C(CmpOp0, CmpOp1, Chain);
   C.CCMask = CCMaskForCondCode(Cond);
@@ -3486,8 +3494,7 @@ SDValue SystemZTargetLowering::lowerSELECT_CC(SDValue Op,
   // Check for absolute and negative-absolute selections, including those
   // where the comparison value is sign-extended (for LPGFR and LNGFR).
   // This check supplements the one in DAGCombiner.
-  if (C.Opcode == SystemZISD::ICMP &&
-      C.CCMask != SystemZ::CCMASK_CMP_EQ &&
+  if (C.Opcode == SystemZISD::ICMP && C.CCMask != SystemZ::CCMASK_CMP_EQ &&
       C.CCMask != SystemZ::CCMASK_CMP_NE &&
       C.Op1.getOpcode() == ISD::Constant &&
       cast<ConstantSDNode>(C.Op1)->getValueSizeInBits(0) <= 64 &&
@@ -4279,7 +4286,8 @@ SDValue SystemZTargetLowering::lowerXALUO(SDValue Op,
     unsigned BaseOp = 0;
     unsigned FlagOp = 0;
     switch (Op.getOpcode()) {
-    default: llvm_unreachable("Unknown instruction!");
+    default:
+      llvm_unreachable("Unknown instruction!");
     case ISD::UADDO:
       BaseOp = ISD::ADD;
       FlagOp = SystemZISD::VACC;
@@ -4367,7 +4375,8 @@ SDValue SystemZTargetLowering::lowerUADDSUBO_CARRY(SDValue Op,
     unsigned BaseOp = 0;
     unsigned FlagOp = 0;
     switch (Op.getOpcode()) {
-    default: llvm_unreachable("Unknown instruction!");
+    default:
+      llvm_unreachable("Unknown instruction!");
     case ISD::UADDO_CARRY:
       BaseOp = SystemZISD::VAC;
       FlagOp = SystemZISD::VACCC;
@@ -4898,8 +4907,8 @@ SystemZTargetLowering::lowerINTRINSIC_WO_CHAIN(SDValue Op,
                        Op.getOperand(1), Op.getOperand(2));
 
   case Intrinsic::s390_vaq:
-    return DAG.getNode(ISD::ADD, SDLoc(Op), Op.getValueType(),
-                       Op.getOperand(1), Op.getOperand(2));
+    return DAG.getNode(ISD::ADD, SDLoc(Op), Op.getValueType(), Op.getOperand(1),
+                       Op.getOperand(2));
   case Intrinsic::s390_vaccb:
   case Intrinsic::s390_vacch:
   case Intrinsic::s390_vaccf:
@@ -4915,8 +4924,8 @@ SystemZTargetLowering::lowerINTRINSIC_WO_CHAIN(SDValue Op,
                        Op.getOperand(1), Op.getOperand(2), Op.getOperand(3));
 
   case Intrinsic::s390_vsq:
-    return DAG.getNode(ISD::SUB, SDLoc(Op), Op.getValueType(),
-                       Op.getOperand(1), Op.getOperand(2));
+    return DAG.getNode(ISD::SUB, SDLoc(Op), Op.getValueType(), Op.getOperand(1),
+                       Op.getOperand(2));
   case Intrinsic::s390_vscbib:
   case Intrinsic::s390_vscbih:
   case Intrinsic::s390_vscbif:
@@ -6774,22 +6783,21 @@ SDValue SystemZTargetLowering::combineLOAD(
     SmallVector<SDValue, 2> ArgChains;
     for (auto UserAndIndex : Users) {
       SDNode *User = UserAndIndex.first;
-      unsigned Offset = User->getValueType(0).getStoreSize() * UserAndIndex.second;
-      SDValue Ptr =
-        DAG.getMemBasePlusOffset(LD->getBasePtr(), TypeSize::getFixed(Offset), DL);
-      SDValue EltLoad =
-        DAG.getLoad(User->getValueType(0), DL, LD->getChain(), Ptr,
-                    LD->getPointerInfo().getWithOffset(Offset),
-                    LD->getOriginalAlign(), LD->getMemOperand()->getFlags(),
-                    LD->getAAInfo());
+      unsigned Offset =
+          User->getValueType(0).getStoreSize() * UserAndIndex.second;
+      SDValue Ptr = DAG.getMemBasePlusOffset(LD->getBasePtr(),
+                                             TypeSize::getFixed(Offset), DL);
+      SDValue EltLoad = DAG.getLoad(
+          User->getValueType(0), DL, LD->getChain(), Ptr,
+          LD->getPointerInfo().getWithOffset(Offset), LD->getOriginalAlign(),
+          LD->getMemOperand()->getFlags(), LD->getAAInfo());
 
       DCI.CombineTo(User, EltLoad, true);
       ArgChains.push_back(EltLoad.getValue(1));
     }
 
     // Collect all chains via TokenFactor.
-    SDValue Chain = DAG.getNode(ISD::TokenFactor, DL, MVT::Other,
-                                ArgChains);
+    SDValue Chain = DAG.getNode(ISD::TokenFactor, DL, MVT::Other, ArgChains);
     DAG.ReplaceAllUsesOfValueWith(SDValue(N, 1), Chain);
     DCI.AddToWorklist(Chain.getNode());
     return SDValue(N, 0);
@@ -6834,7 +6842,8 @@ bool SystemZTargetLowering::canLoadStoreByteSwapped(EVT VT) const {
   if (VT == MVT::i16 || VT == MVT::i32 || VT == MVT::i64)
     return true;
   if (Subtarget.hasVectorEnhancements2())
-    if (VT == MVT::v8i16 || VT == MVT::v4i32 || VT == MVT::v2i64 || VT == MVT::i128)
+    if (VT == MVT::v8i16 || VT == MVT::v4i32 || VT == MVT::v2i64 ||
+        VT == MVT::i128)
       return true;
   return false;
 }
@@ -6963,16 +6972,14 @@ SDValue SystemZTargetLowering::combineSTORE(
     if (isMovedFromParts(Op1, LoPart, HiPart)) {
       SDLoc DL(SN);
       SDValue Chain0 =
-        DAG.getStore(SN->getChain(), DL, HiPart, SN->getBasePtr(),
-                     SN->getPointerInfo(), SN->getOriginalAlign(),
-                     SN->getMemOperand()->getFlags(), SN->getAAInfo());
-      SDValue Chain1 =
-        DAG.getStore(SN->getChain(), DL, LoPart,
-                     DAG.getObjectPtrOffset(DL, SN->getBasePtr(),
-                                                TypeSize::getFixed(8)),
-                     SN->getPointerInfo().getWithOffset(8),
-                     SN->getOriginalAlign(),
-                     SN->getMemOperand()->getFlags(), SN->getAAInfo());
+          DAG.getStore(SN->getChain(), DL, HiPart, SN->getBasePtr(),
+                       SN->getPointerInfo(), SN->getOriginalAlign(),
+                       SN->getMemOperand()->getFlags(), SN->getAAInfo());
+      SDValue Chain1 = DAG.getStore(
+          SN->getChain(), DL, LoPart,
+          DAG.getObjectPtrOffset(DL, SN->getBasePtr(), TypeSize::getFixed(8)),
+          SN->getPointerInfo().getWithOffset(8), SN->getOriginalAlign(),
+          SN->getMemOperand()->getFlags(), SN->getAAInfo());
 
       return DAG.getNode(ISD::TokenFactor, DL, MVT::Other, Chain0, Chain1);
     }
@@ -8314,10 +8321,9 @@ MachineBasicBlock *SystemZTargetLowering::emitCondStore(MachineInstr &MI,
 }
 
 // Implement EmitInstrWithCustomInserter for pseudo [SU]Cmp128Hi instruction MI.
-MachineBasicBlock *
-SystemZTargetLowering::emitICmp128Hi(MachineInstr &MI,
-                                     MachineBasicBlock *MBB,
-                                     bool Unsigned) const {
+MachineBasicBlock *SystemZTargetLowering::emitICmp128Hi(MachineInstr &MI,
+                                                        MachineBasicBlock *MBB,
+                                                        bool Unsigned) const {
   MachineFunction &MF = *MBB->getParent();
   const SystemZInstrInfo *TII = Subtarget.getInstrInfo();
   MachineRegisterInfo &MRI = MF.getRegInfo();
@@ -8328,7 +8334,7 @@ SystemZTargetLowering::emitICmp128Hi(MachineInstr &MI,
   Register Op1 = MI.getOperand(1).getReg();
 
   MachineBasicBlock *StartMBB = MBB;
-  MachineBasicBlock *JoinMBB  = SystemZ::splitBlockAfter(MI, MBB);
+  MachineBasicBlock *JoinMBB = SystemZ::splitBlockAfter(MI, MBB);
   MachineBasicBlock *HiEqMBB = SystemZ::emitBlockAfter(StartMBB);
 
   //  StartMBB:
@@ -8345,11 +8351,12 @@ SystemZTargetLowering::emitICmp128Hi(MachineInstr &MI,
   //   JNE JoinMBB
   //   # fallthrough to HiEqMBB
   MBB = StartMBB;
-  int HiOpcode = Unsigned? SystemZ::VECLG : SystemZ::VECG;
-  BuildMI(MBB, MI.getDebugLoc(), TII->get(HiOpcode))
-    .addReg(Op1).addReg(Op0);
+  int HiOpcode = Unsigned ? SystemZ::VECLG : SystemZ::VECG;
+  BuildMI(MBB, MI.getDebugLoc(), TII->get(HiOpcode)).addReg(Op1).addReg(Op0);
   BuildMI(MBB, MI.getDebugLoc(), TII->get(SystemZ::BRC))
-    .addImm(SystemZ::CCMASK_ICMP).addImm(SystemZ::CCMASK_CMP_NE).addMBB(JoinMBB);
+      .addImm(SystemZ::CCMASK_ICMP)
+      .addImm(SystemZ::CCMASK_CMP_NE)
+      .addMBB(JoinMBB);
   MBB->addSuccessor(JoinMBB);
   MBB->addSuccessor(HiEqMBB);
 
@@ -8366,7 +8373,8 @@ SystemZTargetLowering::emitICmp128Hi(MachineInstr &MI,
   MBB = HiEqMBB;
   Register Temp = MRI.createVirtualRegister(&SystemZ::VR128BitRegClass);
   BuildMI(MBB, MI.getDebugLoc(), TII->get(SystemZ::VCHLGS), Temp)
-    .addReg(Op0).addReg(Op1);
+      .addReg(Op0)
+      .addReg(Op1);
   MBB->addSuccessor(JoinMBB);
 
   // Mark CC as live-in to JoinMBB.
diff --git a/llvm/lib/Target/SystemZ/SystemZISelLowering.h b/llvm/lib/Target/SystemZ/SystemZISelLowering.h
index 3e614a1186..c473895a54 100644
--- a/llvm/lib/Target/SystemZ/SystemZISelLowering.h
+++ b/llvm/lib/Target/SystemZ/SystemZISelLowering.h
@@ -98,7 +98,12 @@ enum NodeType : unsigned {
   // Add/subtract with overflow/carry.  These have the same operands as
   // the corresponding standard operations, except with the carry flag
   // replaced by a condition code value.
-  SADDO, SSUBO, UADDO, USUBO, ADDCARRY, SUBCARRY,
+  SADDO,
+  SSUBO,
+  UADDO,
+  USUBO,
+  ADDCARRY,
+  SUBCARRY,
 
   // Set the condition code from a boolean value in operand 0.
   // Operand 1 is a mask of all condition-code values that may result of this
@@ -228,11 +233,14 @@ enum NodeType : unsigned {
   VSUM,
 
   // Compute carry/borrow indication for add/subtract.
-  VACC, VSCBI,
+  VACC,
+  VSCBI,
   // Add/subtract with carry/borrow.
-  VAC, VSBI,
+  VAC,
+  VSBI,
   // Compute carry/borrow indication for add/subtract with carry/borrow.
-  VACCC, VSBCBI,
+  VACCC,
+  VSBCBI,
 
   // Compare integer vector operands 0 and 1 to produce the usual 0/-1
   // vector result.  VICMPE is for equality, VICMPH for "signed greater than"
@@ -373,10 +381,12 @@ enum NodeType : unsigned {
   ATOMIC_CMP_SWAP_128,
 
   // Byte swapping load/store.  Same operands as regular load/store.
-  LRV, STRV,
+  LRV,
+  STRV,
 
   // Element swapping load/store.  Same operands as regular load/store.
-  VLER, VSTER,
+  VLER,
+  VSTER,
 
   // Prefetch from the second operand using the 4-bit control code in
   // the first operand.  The code is 1 for a load prefetch and 2 for

@uweigand uweigand added backend:SystemZ and removed clang Clang issues not falling into any other category backend:X86 clang:frontend Language frontend issues, e.g. anything involving "Sema" llvm:SelectionDAG SelectionDAGISel as well llvm:ir labels Dec 6, 2023
@uweigand
Copy link
Member Author

uweigand commented Dec 6, 2023

@JonPsson1 - please have a look at the effects of i128 support in particular on atomics
@redstar - can you check impact on the z/OS ABI? we may need to handle legal i128 there too, but there doesn't appear to be any in-tree test case for passing i128 on z/OS

Any other comments welcome as well!

@JonPsson1
Copy link
Contributor

I have looked through the changes and made some comments inline.

I built this with expensive checks enabled with all checks passing, and SPEC built successfully.

Commenting:

  @@ -293,7 +293,7 @@ SystemZTargetLowering::SystemZTargetLowering(const TargetMachine &TM,
   setOperationAction(ISD::ATOMIC_LOAD_UMIN, MVT::i32, Custom);
   setOperationAction(ISD::ATOMIC_LOAD_UMAX, MVT::i32, Custom);
 
-  // Even though i128 is not a legal type, we still need to custom lower
+  // Even though i128 is not a legal type, we still need to custom lower   **// Update comment**

@@ -2144,7 +2145,7 @@ CanLowerReturn(CallingConv::ID CallConv,
     VerifyVectorTypes(Outs);
 
   // Special case that we cannot easily detect in RetCC_SystemZ since
-  // i128 is not a legal type.
+  // i128 is not a legal type.   **// Update comment**


+++ b/llvm/lib/Target/SystemZ/SystemZRegisterInfo.td
@@ -124,7 +124,7 @@ defm GRX32 : SystemZRegClass<"GRX32", [i32], 32,
                                R12L,R12H,R13L,R13H,R14L,R14H,R15L,R15H)
                              ]>;
 
-// The architecture doesn't really have any i128 support, so model the
+// The architecture doesn't really have any i128 support, so model the  **// Update comment**

I happened to notice some cases with room for improvement:

; Scalar load + insertion + replication could be just a vlrepb.
define i128 @fun0(i128 %a, i128 %sh) {
; CHECK-LABEL: fun0:
; CHECK:       # %bb.0:
; CHECK-NEXT:    l %r0, 12(%r4)       //
; CHECK-NEXT:    vlvgp %v1, %r0, %r0  //
; CHECK-NEXT:    vl %v0, 0(%r3), 3
; CHECK-NEXT:    vrepb %v1, %v1, 15   // ===> vlrepb %v1, 12(%r4)  ?
; CHECK-NEXT:    vslb %v0, %v0, %v1
; CHECK-NEXT:    vsl %v0, %v0, %v1
; CHECK-NEXT:    vst %v0, 0(%r2), 3
; CHECK-NEXT:    br %r14
  %res = shl i128 %a, %sh
  ret i128 %res
}

; %v1 is the shift amount in a VR already.
define i128 @fun1(i128 %a, i128 %sh, i128 %t) {
; CHECK-LABEL: fun1:
; CHECK:       # %bb.0:
; CHECK-NEXT:    vl %v1, 0(%r5), 3
; CHECK-NEXT:    vl %v2, 0(%r4), 3
; CHECK-NEXT:    vaq %v1, %v2, %v1
; CHECK-NEXT:    vlgvf %r0, %v1, 3     //
; CHECK-NEXT:    vlvgp %v1, %r0, %r0   //
; CHECK-NEXT:    vl %v0, 0(%r3), 3
; CHECK-NEXT:    vrepb %v1, %v1, 15    // ===> vrepb %v1, %v1, 15
; CHECK-NEXT:    vslb %v0, %v0, %v1
; CHECK-NEXT:    vsl %v0, %v0, %v1
; CHECK-NEXT:    vst %v0, 0(%r2), 3
; CHECK-NEXT:    br %r14
  %s = add i128 %sh, %t
  %res = shl i128 %a, %s
  ret i128 %res
}

As a side question: I forgot why we can get CCMask '5' here: it seems it should be CCMASK_CMP_NE ('6'), if we reverse the LOC operation..?

 VTM killed %5:vr128bit, killed %4:vr128bit, implicit-def $cc
  %6:gr64bit = LOCGR killed %3:gr64bit(tied-def 0), killed %2:gr64bit, 13, 8, implicit killed $cc

# *** IR Dump After Two-Address instruction pass (twoaddressinstruction) ***:
(SystemZInstrInfo::commuteInstructionImpl)

  VTM killed %5:vr128bit, killed %4:vr128bit, implicit-def $cc
  %6:gr64bit = COPY killed %2:gr64bit
  %6:gr64bit = LOCGR %6:gr64bit(tied-def 0), killed %3:gr64bit, 13, 5, implicit killed $cc

@uweigand
Copy link
Member Author

I have looked through the changes and made some comments inline.

Thanks for the review!

Commenting:

Fixed, thanks!

I happened to notice some cases with room for improvement:

Good catch. I've not addressed these right now, this can be done as a follow-up. (The memory case is a bit tedious due to TableGen pattern limitations ...)

As a side question: I forgot why we can get CCMask '5' here: it seems it should be CCMASK_CMP_NE ('6'), if we reverse the LOC operation..?

No, 5 is correct here. Reversing XORs the mask with the set of valid bits, so we have 13 ^ 8 == 5.

Looking at the VTM instruction, we have the following valid condition codes (making up the 13, i.e. 0, 1, or 3):
0 - Selected bits all zeros; or all mask bits zero
1 - Selected bits a mix of zeros and ones
2 - n/a
3 - Selected bits all ones

The original mask is 8, i.e. condition code 0 ("selected bits all zeros"). Reversing this needs to check for condition codes 1 or 3, i.e. mask 5 ("selected bits a mix of zeros and ones" or "selected bits all ones").

On processors supporting vector registers and SIMD instructions,
enable i128 as legal type in VRs.  This allows many operations
to be implemented via native instructions directly in VRs
(including add, subtract, logical operations and shifts).
For a few other operations (e.g. multiply and divide, as well
as atomic operations), we need to move the i128 value back to
a GPR pair to use the corresponding instruction there.  Overall,
this is still beneficial.

The patch includes the following LLVM changes:
- Enable i128 as legal type
- Set up legal operations (in SystemZInstrVector.td)
- Custom expansion for i128 add/subtract with carry
- Custom expansion for i128 comparisons and selects
- Support for moving i128 to/from GPR pairs when required
- Handle 128-bit integer constant values everywhere
- Use i128 as intrinsic operand type where appropriate
- Updated and new test cases

In addition, clang builtins are updated to reflect the
intrinsic operand type changes (which also improves
compatibility with GCC).
@llvmbot llvmbot added clang Clang issues not falling into any other category backend:X86 clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Dec 14, 2023
@llvmbot llvmbot added llvm:SelectionDAG SelectionDAGISel as well llvm:ir labels Dec 14, 2023
@uweigand uweigand removed clang Clang issues not falling into any other category backend:X86 clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:headers Headers provided by Clang, e.g. for intrinsics llvm:SelectionDAG SelectionDAGISel as well llvm:ir labels Dec 14, 2023
@uweigand
Copy link
Member Author

Fixed merge conflicts, updated as described above, and fixed support for i128 parameters in the z/OS XPLINK ABI.

@JonPsson1
Copy link
Contributor

Thanks for explanations.

Updates to my comments LGTM.

@uweigand uweigand merged commit a65ccc1 into llvm:main Dec 15, 2023
8 of 9 checks passed
@uweigand uweigand deleted the systemz-i128 branch December 15, 2023 11:55
uweigand added a commit that referenced this pull request Sep 19, 2024
PR #74625 introduced a regression in the code generated for the
following set of intrinsic:
  vec_add_u128, vec_addc_u128, vec_adde_u128, vec_addec_u128
  vec_sub_u128, vec_subc_u128, vec_sube_u128, vec_subec_u128
  vec_sum_u128, vec_msum_u128
  vec_gfmsum_128, vec_gfmsum_accum_128

This is because the new code incorrectly assumed that a cast
from "unsigned __int128" to "vector unsigned char" would simply
be a bitcast re-interpretation; instead, this cast actually
truncates the __int128 to char and splats the result.

Fixed by adding an intermediate cast via a single-element
128-bit integer vector.

Fixes: #109113
tmsri pushed a commit to tmsri/llvm-project that referenced this pull request Sep 19, 2024
PR llvm#74625 introduced a regression in the code generated for the
following set of intrinsic:
  vec_add_u128, vec_addc_u128, vec_adde_u128, vec_addec_u128
  vec_sub_u128, vec_subc_u128, vec_sube_u128, vec_subec_u128
  vec_sum_u128, vec_msum_u128
  vec_gfmsum_128, vec_gfmsum_accum_128

This is because the new code incorrectly assumed that a cast
from "unsigned __int128" to "vector unsigned char" would simply
be a bitcast re-interpretation; instead, this cast actually
truncates the __int128 to char and splats the result.

Fixed by adding an intermediate cast via a single-element
128-bit integer vector.

Fixes: llvm#109113
llvmbot pushed a commit to llvmbot/llvm-project that referenced this pull request Oct 7, 2024
PR llvm#74625 introduced a regression in the code generated for the
following set of intrinsic:
  vec_add_u128, vec_addc_u128, vec_adde_u128, vec_addec_u128
  vec_sub_u128, vec_subc_u128, vec_sube_u128, vec_subec_u128
  vec_sum_u128, vec_msum_u128
  vec_gfmsum_128, vec_gfmsum_accum_128

This is because the new code incorrectly assumed that a cast
from "unsigned __int128" to "vector unsigned char" would simply
be a bitcast re-interpretation; instead, this cast actually
truncates the __int128 to char and splats the result.

Fixed by adding an intermediate cast via a single-element
128-bit integer vector.

Fixes: llvm#109113
(cherry picked from commit baf9b7d)
tru pushed a commit to llvmbot/llvm-project that referenced this pull request Oct 11, 2024
PR llvm#74625 introduced a regression in the code generated for the
following set of intrinsic:
  vec_add_u128, vec_addc_u128, vec_adde_u128, vec_addec_u128
  vec_sub_u128, vec_subc_u128, vec_sube_u128, vec_subec_u128
  vec_sum_u128, vec_msum_u128
  vec_gfmsum_128, vec_gfmsum_accum_128

This is because the new code incorrectly assumed that a cast
from "unsigned __int128" to "vector unsigned char" would simply
be a bitcast re-interpretation; instead, this cast actually
truncates the __int128 to char and splats the result.

Fixed by adding an intermediate cast via a single-element
128-bit integer vector.

Fixes: llvm#109113
(cherry picked from commit baf9b7d)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants