Sycl web #14302

iagarwa · 2024-06-26T14:35:40Z

Did get my changes here. so created draft

…to an RAII class (#94854) Modify MachineFunctionProperties in PassModel makes `PassT P; P.run(...);` not work properly. This is a necessary compromise.

… (#95025) In ContinuationIndenter::mustBreak, a break is required between a template declaration and the function/class declaration it applies to, if the template declaration spans multiple lines. However, this also includes template template parameters, which can cause extra erroneous line breaks in some declarations. This patch makes template template parameters not be counted as template declarations. Fixes llvm/llvm-project#93793 Fixes llvm/llvm-project#48746

…(#96384) Buildbot `clang-ppc64le-rhel` failed with: ```sh error: 'MFPropsModifier' may not intend to support class template argument deduction [-Werror,-Wctad-maybe-unsupported] note: add a deduction guide to suppress this warning ``` after #94854. This PR adds deduction guide explicitly to suppress warning.

…laration" (#96388) Reverts llvm/llvm-project#95025 ; many bots are broken

When unifying the ResolveExecutable implementations in #96256, I missed that RemoteAwarePlatform was able to resolve executables more aggressively. The host platform can rely on the current working directory to make relative paths absolute and resolve things like home directories. This should fix command-target-create-resolve-exe.test.

This formatter doesn't currently provide much value. It only formats `SourceLocation` and `QualType`. The only formatting it does for `QualType` is call `getAsString()` on it. The main motivator for the removal however is that the formatter implementation can be very slow (since it uses the expression evaluator in non-trivial ways). Not infrequently do we get reports about LLDB being slow when debugging Clang, and it turns out the user was loading `ClangDataFormat.py` in their `.lldbinit` by default. We should eventually develop proper formatters for Clang data-types, but these are currently not ready. So this patch removes them in the meantime to avoid users shooting themselves in the foot, and giving the wrong impression of these being reference implementations.

Fold `mul (uitofp i1 X), Y` to `select i1 X, Y, 0.0` when the `mul` is `nnan` and `nsz` Proof: https://alive2.llvm.org/ce/z/_stiPm

We're ultimately expected to return an APValue simply pointing to the CallExpr, not any useful value. Do that by creating a global variable for the call.

…k. NFC This was added by 507efbc ([MC] Fold A-B when A is a pending label or A/B are separated by a MCFillFragment) to account for pending labels and is now unneeded after the removal of pending labels (7500646).

The checks when building a thunk to decide if an arg needed to be cast to/from an integer or redirected via a pointer didn't match how arg types were changed in `canonicalizeThunkType`, this caused LLVM to ICE when using vector types as args due to incorrect types in a call instruction. Instead of duplicating these checks, we should check if the arg type differs between x64 and AArch64 and then cast or redirect as appropriate.

… (#96396) Reapply 4a7bf42 which was reverted in 34d44eb Not sure why there are tests elsewhere in clang that rely on the output of clang-format, but they were wrong

…at_provider (#95704) The original implementation of HelperFunctions::consumeHexStyle always sets Style when it returns true, but this is difficult for a compiler to understand since it requires seeing that Str starts with either an "x" or an "X" when starts_with_insensitive("x") return true. In particular, g++ 12 warns that HS may be used uninitialized in the format_provider::format caller. Change HelperFunctions::consumeHexStyle to return an optional HexPrintStyle and to make the fact that Str necessarily starts with an "X" when all other cases do not apply more explicit. This helps both the compiler and the human reader of the code. Co-authored-by: Sven Verdoolaege <[email protected]>

#95197 and 7500646 eliminated all raw `new MCXXXFragment`. We can now place fragments in a bump allocator. In addition, remove the dead `Kind == FragmentType(~0)` condition. ~CodeViewContext may call `StrTabFragment->destroy()` and need to be reset before `FragmentAllocator.Reset()`. Tested by llvm/test/MC/COFF/cv-compiler-info.ll using asan. Pull Request: llvm/llvm-project#96402

There is only one caller after #95188.

https://reviews.llvm.org/D67249 added content hash (see -fvalidate-ast-input-files-content) using llvm::hash_code (size_t). The hash value is 32-bit on 32-bit systems, which was unintentional. Fix #96379: #96136 switched the hash function to xxh3_64bit but did not update the ContentHash type, leading to mismatch between ASTReader and ASTWriter.

This change is part of this proposal: https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294 This is part 3 of 4 PRs. It sets the ground work for using the intrinsics in HLSL. Add HLSL frontend apis for `acos`, `asin`, `atan`, `cosh`, `sinh`, and `tanh` llvm/llvm-project#70079 llvm/llvm-project#70080 llvm/llvm-project#70081 llvm/llvm-project#70083 llvm/llvm-project#70084 llvm/llvm-project#95966

…n-constants If f(Y) simplifies to Y, replace with Y. This requires Y to be non-undef. Closes #94719

Follow-up to 05ba5c0. uint32_t is preferred over const MCExpr * in the section stack uses because it should only be evaluated once. Change the paramter type to match.

…6478)

Functions that have the `nvvm.kernel` attribute should have 0 results.

The `gpu.func` op lowering accounts for memref arguments/results (both "normal" and bare-pointer supported), but the `gpu.return` op lowering did not. The lowering produced invalid IR that did not verify. This commit uses the same lowering strategy as for `func.return` in the `gpu.return` lowering. (The C++ implementation is copied. We may want to share some code between `func` and `gpu` lowerings in the future.)

Define subtarget features for atomic fmin/fmax support. The flat/global support is a real messe. We had float/double support at the beginning in gfx6 and gfx7. gfx8 removed these. gfx10 reintroduced them. gfx11 removed the f64 versions again. gfx9 partially reintroduced them, in gfx90a and gfx940 but only for f64.

…IEs (#96484) If ParseStructureLikeDIE (or ParseEnum) encountered a declaration DIE, it would call FindDefinitionTypeForDIE. This returned a fully formed type, which it achieved by recursing back into ParseStructureLikeDIE with the definition DIE. This obscured the control flow and caused us to repeat some work (e.g. the UniqueDWARFASTTypeMap lookup), but it mostly worked until we tried to delay the definition search in #90663. After this patch, the two ParseStructureLikeDIE calls were no longer recursive, but rather the second call happened as a part of the CompleteType() call. This opened the door to inconsistencies, as the second ParseStructureLikeDIE call was not aware it was called to process a definition die for an existing type. To make that possible, this patch removes the recusive type resolution from this function, and leaves just the "find definition die" functionality. After finding the definition DIE, we just go back to the original ParseStructureLikeDIE call, and have it finish the parsing process with the new DIE. While this patch is motivated by the work on delaying the definition searching, I believe it is also useful on its own.

…#96514) This PR fixes llvm/llvm-project#96513. The way of creation of array type constant was incorrect: instead of creating [1, 1, 1] or [1, 1, 1, 1, 1, ....] constants, the same [1] constant was always created, substituting original composite constants. This in its turn led to a situation when only one of constants might exist in the code without emitting invalid code, the second constant would be eventually rewritten to the first constant, because a key to address both was an array of a single element (like [1]). This PR fixes the issue and purges from the code unneeded copy/pasted clone of the function that creates an array constant.

…nes (#95269) This patch extends #73964 and adds optimisation of load SVE intrinsics when predicate is zero.

The handling of `PointerType` is similar to `HeapType`. The only difference is that allocated flag is generated for `HeapType` and associated flag for `PointerType`. The tests for pointer to allocatable strings are disabled for now. I will enable them once #95906 is merged. The debugging in GDB looks like this: integer, pointer :: par2(:) integer, target, allocatable :: ar2(:) integer, target :: sc integer, pointer :: psc allocate(ar2(4)) par2 => ar2 psc => sc 19 par2 => ar2 (gdb) p par2 $3 = <not associated> (gdb) n 20 do i=1,5 (gdb) p par2 $4 = (0, 0, 0, 0) (gdb) ptype par2 type = integer (4) (gdb) p sc $5 = 3 (gdb) p psc $6 = (PTR TO -> ( integer )) 0x7fffffffda24 (gdb) p *psc $7 = 3

…ing for generic types (#89217) This patch is intended to be the first of a series with end goal to adapt atomic optimizer pass to support i64 and f64 operations (along with removing all unnecessary bitcasts). This legalizes 64 bit readlane, writelane and readfirstlane ops pre-ISel --------- Co-authored-by: vikramRH <[email protected]>

This change adds methods like buildGetFPEnv and similar for opcodes that represent manipulation on floating-point state.

This changes the behaviour in C++03 mode because we'll now use the builtin on Clang, but I don't think that's much of a problem.

This header used three-space indentation in a number of places. Reformat it completely.

This FIXME has already been addressed in #89358

Instead for iterating over all VFs when computing costs, simply iterate over the VFs available in the created VPlans. Split off from llvm/llvm-project#92555. This also prepares for moving the check if any vector instructions will be generated to be based on VPlan, to unblock recommitting llvm/llvm-project#92555.

Without the store, the vector loop body is empty. Add a store to avoid that, while not impacting the induction resume values that are created.

This patch implements lowering of the GlobalAddress, BlockAddress, JumpTable and BR_JT. Also patch adds legal support of the BR_CC operation for i32 type.

Some of these are just old, while others previously did not use UTC due to missing features that have since been implemented (such as signature matching).

These will be replaced later.

…Target` (#96500)

Since we mark the pseudos as mayLoad but do not provide any MMOs, isSafeToMove conservatively returns false, stopping MachineLICM from hoisting the instructions. PseudoLA_TLS_{LD,GD} does not actually expand to a load, so stop marking that as mayLoad to allow it to be hoisted, and for the others make sure to add MMOs during lowering to indicate they're GOT loads and thus can be freely moved.

… (#95061) This patch augments the HIPAMD driver to allow it to target AMDGCN flavoured SPIR-V compilation. It's mostly straightforward, as we re-use some of the existing SPIRV infra, however there are a few notable additions: - we introduce an `amdgcnspirv` offload arch, rather than relying on using `generic` (this is already fairly overloaded) or simply using `spirv` or `spirv64` (we'll want to use these to denote unflavoured SPIRV, once we bring up that capability) - initially it is won't be possible to mix-in SPIR-V and concrete AMDGPU targets, as it would require some relatively intrusive surgery in the HIPAMD Toolchain and the Driver to deal with two triples (`spirv64-amd-amdhsa` and `amdgcn-amd-amdhsa`, respectively) - in order to retain user provided compiler flags and have them available at JIT time, we rely on embedding the command line via `-fembed-bitcode=marker`, which the bitcode writer had previously not implemented for SPIRV; we only allow it conditionally for AMDGCN flavoured SPIRV, and it is handled correctly by the Translator (it ends up as a string literal) Once the SPIRV BE is no longer experimental we'll switch to using that rather than the translator. There's some additional work that'll come via a separate PR around correctly piping through AMDGCN's implementation of `printf`, for now we merely handle its flags correctly.

CONFLICT (content): Merge conflict in llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

CONFLICT (content): Merge conflict in clang/test/Driver/sycl-linker-wrapper-image.cpp

CONFLICT (content): Merge conflict in clang/lib/Driver/Driver.cpp CONFLICT (content): Merge conflict in clang/lib/Driver/ToolChains/HIPAMD.cpp

@premanandrao

Test needs update after cbf6e93 2024-05-28 [clang codegen] Delete unnecessary GEP cleanup code. (#90303). Change made by @premanandrao

paperchalice and others added 30 commits June 22, 2024 17:34

[CodeGen][NewPM] Extract MachineFunctionProperties modification part …

8e9c6bf

…to an RAII class (#94854) Modify MachineFunctionProperties in PassModel makes `PassT P; P.run(...);` not work properly. This is a necessary compromise.

Revert "[clang-format] Don't count template template parameter as dec…

34d44eb

…laration" (#96388) Reverts llvm/llvm-project#95025 ; many bots are broken

[MC] Remove remnant code related to pending labels

485d7ea

[InstCombine] (uitofp bool X) * Y --> X ? Y : 0 (#96216)

a4ca225

Fold `mul (uitofp i1 X), Y` to `select i1 X, Y, 0.0` when the `mul` is `nnan` and `nsz` Proof: https://alive2.llvm.org/ce/z/_stiPm

[clang][Interp] Fix CFStringMakeConstantString etc. evaluation

170c194

We're ultimately expected to return an APValue simply pointing to the CallExpr, not any useful value. Do that by creating a global variable for the call.

[clang-format] Don't count template template parameter as declaration…

6621505

… (#96396) Reapply 4a7bf42 which was reverted in 34d44eb Not sure why there are tests elsewhere in clang that rely on the output of clang-format, but they were wrong

[NFC][Clang][OMPX] Fix a typo in OMP.td (#96398)

fc23564

[MC] Move computeBundlePadding closer to its only caller. NFC

c9f6a5e

There is only one caller after #95188.

[clang] Fix -Wsign-compare in 32-bit builds

f5b93ae

[gn] port ade28a7 (clang-doc asset copy to share/clang)

3ba7599

[MC] MCSectionSubPair: replace const MCExpr * with uint32_t

05ba5c0

[InstCombine] Add tests for expanding foldSelectValueEquivalence; NFC

61c4d7b

[InstCombine] Improve coverage of foldSelectValueEquivalence for no…

b37a4b9

…n-constants If f(Y) simplifies to Y, replace with Y. This requires Y to be non-undef. Closes #94719

[MC] Change Subsection parameters from const MCExpr * to uint32_t

95f983f

Follow-up to 05ba5c0. uint32_t is preferred over const MCExpr * in the section stack uses because it should only be evaluated once. Change the paramter type to match.

[libc++] <experimental/simd> Add swap functions of simd reference (#8…

6ec1ddf

…6478)

[MC] Remove unused MCObjectStreamer::CurSubsectionIdx. NFC

e7622ab

[mlir][NVVM] Disallow results on kernel functions (#96399)

346c4a8

Functions that have the `nvvm.kernel` attribute should have 0 results.

AMDGPU: Start selecting buffer fat pointer atomicrmw fmin/fmax (#95593)

414c741

labath and others added 27 commits June 25, 2024 10:52

[AArch64][SVE] optimisation for SVE load intrinsics with no active la…

0bd9c49

…nes (#95269) This patch extends #73964 and adds optimisation of load SVE intrinsics when predicate is zero.

[GlobalISel] Add build methods for FP environment intrinsics (#96607)

f9795f3

This change adds methods like buildGetFPEnv and similar for opcodes that represent manipulation on floating-point state.

[libc++] Use __is_nothrow_destructible (#95766)

16d02cd

This changes the behaviour in C++03 mode because we'll now use the builtin on Clang, but I don't think that's much of a problem.

[SetOperations] clang-format header (NFC)

29f4a05

This header used three-space indentation in a number of places. Reformat it completely.

[clang] Remove a stale FIXME

f09b024

This FIXME has already been addressed in #89358

[LoopUnroll] Use poison instead of undef for preheader value

eeb0884

[LV] Make create-induction-resume.ll more robust by adding store.

a2e9157

Without the store, the vector loop body is empty. Add a store to avoid that, while not impacting the induction resume values that are created.

[LoopUnroll] Use poison instead of undef for another preheader value

37c736e

[Xtensa] Lower GlobalAddress/BlockAddress/JumpTable (#95256)

cc8fdd6

This patch implements lowering of the GlobalAddress, BlockAddress, JumpTable and BR_JT. Also patch adds legal support of the BR_CC operation for i32 type.

[SCCP] Generate test checks (NFC)

4acc8ee

Some of these are just old, while others previously did not use UTC due to missing features that have since been implemented (such as signature matching).

[SCCP] Use poison instead of undef when zapping returns

16bb8c1

[Reassociate] Use poison instead of undef for dummy operands (NFCI)

35eef9f

These will be replaced later.

[NFC][lld][ELF] Remove unused sec param of `ObjFile<ELFT>::getReloc…

65f9601

…Target` (#96500)

[VPlanTest] Use poison instead of undef for dummy values (NFC)

9952e00

[VectorBuilderTest] Use poison instead of undef for dummy values (NFC)

68efc50

Merge from 'main' to 'sycl-web' (15 commits)

3f7b832

CONFLICT (content): Merge conflict in llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

Merge from 'sycl' to 'sycl-web' (10 commits)

fd7622a

CONFLICT (content): Merge conflict in clang/test/Driver/sycl-linker-wrapper-image.cpp

Merge from 'sycl' to 'sycl-web' (1 commits)

2f481f2

Merge from 'main' to 'sycl-web' (125 commits)

074e55c

CONFLICT (content): Merge conflict in clang/lib/Driver/Driver.cpp CONFLICT (content): Merge conflict in clang/lib/Driver/ToolChains/HIPAMD.cpp

Fix tests after cbf6e93 (#14294)

658b9a4

Test needs update after cbf6e93 2024-05-28 [clang codegen] Delete unnecessary GEP cleanup code. (#90303). Change made by @premanandrao

iagarwa closed this Jun 26, 2024

iagarwa had a problem deploying to WindowsCILock June 26, 2024 14:35 — with GitHub Actions Failure

iagarwa had a problem deploying to WindowsCILock June 26, 2024 14:36 — with GitHub Actions Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sycl web #14302

Sycl web #14302

iagarwa commented Jun 26, 2024

Sycl web #14302

Sycl web #14302

Conversation

iagarwa commented Jun 26, 2024