Sycl web #14302

…to an RAII class (#94854) Modify MachineFunctionProperties in PassModel makes `PassT P; P.run(...);` not work properly. This is a necessary compromise.

… (#95025) In ContinuationIndenter::mustBreak, a break is required between a template declaration and the function/class declaration it applies to, if the template declaration spans multiple lines. However, this also includes template template parameters, which can cause extra erroneous line breaks in some declarations. This patch makes template template parameters not be counted as template declarations. Fixes llvm/llvm-project#93793 Fixes llvm/llvm-project#48746

…(#96384) Buildbot `clang-ppc64le-rhel` failed with: ```sh error: 'MFPropsModifier' may not intend to support class template argument deduction [-Werror,-Wctad-maybe-unsupported] note: add a deduction guide to suppress this warning ``` after #94854. This PR adds deduction guide explicitly to suppress warning.

…laration" (#96388) Reverts llvm/llvm-project#95025 ; many bots are broken

When unifying the ResolveExecutable implementations in #96256, I missed that RemoteAwarePlatform was able to resolve executables more aggressively. The host platform can rely on the current working directory to make relative paths absolute and resolve things like home directories. This should fix command-target-create-resolve-exe.test.

This formatter doesn't currently provide much value. It only formats `SourceLocation` and `QualType`. The only formatting it does for `QualType` is call `getAsString()` on it. The main motivator for the removal however is that the formatter implementation can be very slow (since it uses the expression evaluator in non-trivial ways). Not infrequently do we get reports about LLDB being slow when debugging Clang, and it turns out the user was loading `ClangDataFormat.py` in their `.lldbinit` by default. We should eventually develop proper formatters for Clang data-types, but these are currently not ready. So this patch removes them in the meantime to avoid users shooting themselves in the foot, and giving the wrong impression of these being reference implementations.

Fold `mul (uitofp i1 X), Y` to `select i1 X, Y, 0.0` when the `mul` is `nnan` and `nsz` Proof: https://alive2.llvm.org/ce/z/_stiPm

We're ultimately expected to return an APValue simply pointing to the CallExpr, not any useful value. Do that by creating a global variable for the call.

…k. NFC This was added by 507efbc ([MC] Fold A-B when A is a pending label or A/B are separated by a MCFillFragment) to account for pending labels and is now unneeded after the removal of pending labels (7500646).

The checks when building a thunk to decide if an arg needed to be cast to/from an integer or redirected via a pointer didn't match how arg types were changed in `canonicalizeThunkType`, this caused LLVM to ICE when using vector types as args due to incorrect types in a call instruction. Instead of duplicating these checks, we should check if the arg type differs between x64 and AArch64 and then cast or redirect as appropriate.

… (#96396) Reapply 4a7bf42 which was reverted in 34d44eb Not sure why there are tests elsewhere in clang that rely on the output of clang-format, but they were wrong

…at_provider (#95704) The original implementation of HelperFunctions::consumeHexStyle always sets Style when it returns true, but this is difficult for a compiler to understand since it requires seeing that Str starts with either an "x" or an "X" when starts_with_insensitive("x") return true. In particular, g++ 12 warns that HS may be used uninitialized in the format_provider::format caller. Change HelperFunctions::consumeHexStyle to return an optional HexPrintStyle and to make the fact that Str necessarily starts with an "X" when all other cases do not apply more explicit. This helps both the compiler and the human reader of the code. Co-authored-by: Sven Verdoolaege <[email protected]>

#95197 and 7500646 eliminated all raw `new MCXXXFragment`. We can now place fragments in a bump allocator. In addition, remove the dead `Kind == FragmentType(~0)` condition. ~CodeViewContext may call `StrTabFragment->destroy()` and need to be reset before `FragmentAllocator.Reset()`. Tested by llvm/test/MC/COFF/cv-compiler-info.ll using asan. Pull Request: llvm/llvm-project#96402

There is only one caller after #95188.

https://reviews.llvm.org/D67249 added content hash (see -fvalidate-ast-input-files-content) using llvm::hash_code (size_t). The hash value is 32-bit on 32-bit systems, which was unintentional. Fix #96379: #96136 switched the hash function to xxh3_64bit but did not update the ContentHash type, leading to mismatch between ASTReader and ASTWriter.

This change is part of this proposal: https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294 This is part 3 of 4 PRs. It sets the ground work for using the intrinsics in HLSL. Add HLSL frontend apis for `acos`, `asin`, `atan`, `cosh`, `sinh`, and `tanh` llvm/llvm-project#70079 llvm/llvm-project#70080 llvm/llvm-project#70081 llvm/llvm-project#70083 llvm/llvm-project#70084 llvm/llvm-project#95966

…n-constants If f(Y) simplifies to Y, replace with Y. This requires Y to be non-undef. Closes #94719

Follow-up to 05ba5c0. uint32_t is preferred over const MCExpr * in the section stack uses because it should only be evaluated once. Change the paramter type to match.

…6478)

Functions that have the `nvvm.kernel` attribute should have 0 results.

The `gpu.func` op lowering accounts for memref arguments/results (both "normal" and bare-pointer supported), but the `gpu.return` op lowering did not. The lowering produced invalid IR that did not verify. This commit uses the same lowering strategy as for `func.return` in the `gpu.return` lowering. (The C++ implementation is copied. We may want to share some code between `func` and `gpu` lowerings in the future.)

Define subtarget features for atomic fmin/fmax support. The flat/global support is a real messe. We had float/double support at the beginning in gfx6 and gfx7. gfx8 removed these. gfx10 reintroduced them. gfx11 removed the f64 versions again. gfx9 partially reintroduced them, in gfx90a and gfx940 but only for f64.

These have been replaced with atomicrmw fadd

Fix #96014 .

… == c) == (b != c)` (#94915) resolves llvm/llvm-project#92966 alive proof https://alive2.llvm.org/ce/z/bLAQBS

Fixes a crash uncovered by test 0007_0019.f90 in the Fujitsu test suite. Previously, in the `PrivCB`, we cloned the `omp.private` op without inserting it in the parent module of the original op. This causes issues whenever there is an op that needs to lookup the parent module (e.g. for symbol lookup). This PR fixes the issue by cloning in the parent module instead of creating an orphaned op.

@test6

… (#96200) If the AVL in a VSETVLIInfo is the output VL of a vsetvli with the same VLMAX, we treat it as the AVL of said vsetvli. This allows us to remove a true dependency as well as treating VSETVLIInfos as equal in more places and avoid toggles. We do this in two places, needVSETVLI and computeInfoForInstr. However we don't do this in computeInfoForInstr's vsetvli equivalent, getInfoForVSETVLI. We also have a restriction only in computeInfoForInstr that the AVL can't be a register as we want to avoid extending live ranges. This patch does two interlinked things: 1) It adds this AVL "peeking" to getInfoForVSETVLI 2) It relaxes the constraint that the AVL can't be a register in computeInfoForInstr, since it removes a use of the output VL which can actually reduce register pressure. E.g. see the diff in @vector_init_vsetvli_N and @test6 Now that getInfoForVSETVLI and computeInfoForInstr are consistent, we can remove the check in needVSETVLI. We also need to update how we update LiveIntervals in insertVSETVLI, as we can now end up needing to extend the LiveRange of the AVL across blocks.

The `memref.subview` result type inference (`SubViewOp::inferResultType`) sometimes used to produce a dynamic offset when a static offset is possible. When a dynamic value (stride, size, etc.) is multiplied with zero, the result is always a "static 0". Based on this, the result type inference implementation can be improved to produce more static type information in memref types.

MCAssembler::layout ensures that every section has at least one fragment, which simplifies MCAsmLayout::getSectionAddressSize (see e73353c from 2010). It's better to ensure the condition is satisfied at create time (COFF, GOFF, Mach-O) to simplify more fragment processing.

Reoder code to exit early if the BranchOnCond isn't in an exiting block. This delays retrieving the parent region, which may not be present. Split off from llvm/llvm-project#92651.

Split off from llvm/llvm-project#92651 as suggested.

Rewrite cloneSESE to perform 2 depth-first passes with the first one cloning blocks and the second one updating the predecessors and successors. This is needed to preserve the correct predecessor/successor ordering with llvm/llvm-project#92651 and has been split off as suggested.

All sections should have been created before MCAssembler::layout so that every section has an ordinal. Registering the section in WinCOFFObjectWriter::executePostLayoutBinding is a hack.

… keywords in C++11 code (#96387)

…ecc (#96259) These registers include: - X19, used by LLVM as the base pointer - X15 on Windows, where it is used for stack allocation. It can still be used on Linux/Darwin. - Adjust FrameLowering scratch register code to not assume X9 is available if the calling convention is preserve_nonecc. The code will then pick an unused register as scratch, and allow X9 to continue being used for argument passing.

changeSection is preferred to call the changeSectionImpl hook, which registers the section symbol.

This makes use of the information from TableGen instead of duplicating it in the code.

…Impl Follow-up to https://reviews.llvm.org/D128958 * Move target-specific code away from the generic ELFWriter. * All sections should have been created before MCAssembler::layout. * Remove one `registerSection` use, which should be considered private to MCAssembler.

Similar to abbf3bc. switchSection calls registerSection internally.

This patch uses std::array for ValueProfData. Aside from reducing the line count and code duplication, the use of std::array here makes it easier to add a new type of value profiling without touching as many places.

When `BeginSymName` is not null, `createTempSymbol` is called but the created symbol is not attached to a fragment. This is used as a hack to some DWARF tests to work. In the future, we should repurpose `BeginSymbol` as the section symbol in ELF.

getELFSection ensures that the section symbol exists.

…95771) There is no need to iterate all predecessors of current block, check if current block is the invoke unwind destination of any predecessor. We can directly call `BasicBlock::isEHPad()` to check if current block is an exception handling block.

We will only regsiter top level types and decls in ASTWriter and we will register the sub types and decls during the process of writing types and decls. So that the ID for the types in the sub level can be different if the writing decl process changes the order of the to-be- emitted type queues. This is not ideal since it causes unnecessary changes especially in no transitive changes model. This patch migrates the issue by regsitering special types before regsitering decls. This make sure that the special types in the 2nd top level can be registered early than the decls. But it might still be problematic if there are more levels in the special types. Luckily we just don't have such special types.

…:getHashValue The FIXME says to revert this when the underlying issue got fixed. And now the underlying issue got fixed in llvm/llvm-project#95734. So I think it should be fine to rever that one now.

…::finishImpl" This reverts commit 9d63506. There is a heap-use-after-free.

… that `ucmp`/`scmp` can return (#96410) This makes it possible to fold dumb comparisons like `ucmp(x, y) == 7`.

…finish Follow-up to https://reviews.llvm.org/D128958 * Move target-specific code away from the generic ELFWriter. * All sections should have been created before MCAssembler::layout. * Remove one `registerSection` use, which should be considered private to MCAssembler.

…. NFC ELFStreamer::finishImpl is not intended to be further overridden.

This function is final after efdb91e. Target-specific code should override MCTargetStreamer::finish instead, e.g. AArch64TargetELFStreamer::finish (fec1b6f).

When lower bound and exclusive upper bound of a loop are the same, and the zero-trip loop is not canonicalized away before the analysis, this leads to a meaningless range for the induction variable being inferred. This patch adds a check to make sure that the inferred range for the IV is meaningful before updating the analysis state. Fix llvm/llvm-project#94423

I just realized that the name `getLocalDeclID` looks like an member function in ASTReader. It looks not good. So I decided to refactor this into a static member function in LocalDeclID.

Before this commit, there used to be a workaround in the `func.func`/`gpu.func` op lowering when the bare-pointer calling convention is enabled. This workaround "patched up" the argument materializations for memref arguments. This can be done directly in the argument materialization functions (as the TODOs in the code base indicate). This commit effectively reverts back to the old implementation (a664c14) and adds additional checks to make sure that bare pointers are used only for function entry block arguments.

I initially assumed only kernels could be roots, but that is wrong. A function with no callers also needs to be a root to ensure it is correctly handled. They're very rare because we usually internalize everything, and internal functions with no callers would be deleted. When they are present, we need to also consider their dependencies and act accordingly. Previously, we could put a function "by default" in P0, but it could call another function with internal linkage defined in another module which was of course incorrect. Fixes SWDEV-467695

Our dwarf parsing code treats structures and classes as interchangable. CompilerContextKind is used when looking DIEs for types. This makes sure we always they're treated the same way. See also [#95905#discussion_r1645686628](llvm/llvm-project#95905 (comment)).

Notes are added to indicate the array declarations of the arrays in a found invalid pointer subtraction.

This reverts commit 4b9112e. A separate issue(#96353) describing it has been opened to further keep its track.

Add test showing unnecessary cost computations, as no vector VPlans are generated.

This is follow-up for #78901 after validation.

This pattern flattens vector.gather ops by unrolling the outermost dimension for rank > 2 vectors. There's two issues with this pattern for scalable vectors: 1. The unrolling doesn't take vscale into account. A constraint is added to disable this pattern for vectors with leading scalable dims. 2. The scalable dims are dropped when creating the new gather. Fixed by propagating the flags. Depends on #96049.

In some cases, no vector VPlans can be constructed due to failing VPlan legality checks (e.g. unable to perform sinking for first order recurrences or plans being incompatible with EVL). There's no need to compute costs in those cases, so check directly if there are no vector plans.

Initializing this map is somewhat expensive (especially for O0), so we currently only do it if certain flags are used. I would like to make use of it for crash dumps (#96078), where we don't know in advance whether it will be needed or not. This patch changes the initialization to a lazy approach, where a callback is registered that does the actual initialization. The callbacks will be run the first time the pass name is requested. This way there is no compile-time impact if the mapping is not used.

This PR adds SPIR-V extension SPV_KHR_cooperative_matrix that "adds a new set of types known as "cooperative matrix" types, where the storage for and computations performed on the matrix are spread across a set of invocations such as a subgroup" (see https://github.com/KhronosGroup/SPIRV-Registry/blob/main/extensions/KHR/SPV_KHR_cooperative_matrix.asciidoc). This PR also fixes llvm/llvm-project#96170, a new test cases is attached (llvm/test/CodeGen/SPIRV/transcoding/OpPtrCastToGeneric.ll).

…belongs to the same module This patch extracts the logci to decide how we decide the module units belongs to the same module into a member function of ASTContext. This is helpful to refactor the implementation in the future.

Some passes reference *this (inside decltype) which fails with MSVC. Fix this by not explicitly specifying the captures (otherwise we would get an unused lambda capture warning for cases where this is *not* used).

When the EvalEmitter is inactive, it will simply not evaluate any of the operations we emit via emit*. However, it will still allocate variables. So the variables will be allocated, but we won't evaluate their initializer, so later when we see the variable again, it is uninitialized. Stop creating variables in that case.

SIZEOF and C_SIZEOF were broken for assumed-ranks because `TypeAndShape::MeasureSizeInBytes` behaved as a scalar because the `TypeAndShape::shape_` member was the same for scalar and assumed-ranks. The easy fix would have been to add special handling in `MeasureSizeInBytes` for assumed-ranks using the TypeAndShape attributes, but I think this solution would leave `TypeAndShape::shape_` manipulation fragile to future developers. Hence, I went for the solution that turn shape_ into a `std::optional<Shape>`.

…#96321)" My attempt to fix the Windows build made things worse, revert entirely for now. This reverts commit e7137f2. This reverts commit 6eaf204. This reverts commit 957dc43.

CONFLICT (content): Merge conflict in llvm/lib/IR/Verifier.cpp

With the Makefile generator and particularly high build parallelism some intermediate dependencies may be generated redundantly and concurrently, leading to build failures. To fix this, arrange for libclc's add_custom_commands to depend on targets in addition to files. This follows CMake documentation's[^1] guidance on add_custom_command: > Do not list the output in more than one independent target that may > build in parallel or the instances of the rule may conflict. Instead, > use the add_custom_target() command to drive the command and make the > other targets depend on that one. Eliminating the redundant commands also improves build times. [^1]: https://cmake.org/cmake/help/v3.29/command/add_custom_command.html

…ameModule all the time Previously, we decide if two module units are in the same module by comparing name of the primary module interface. But it looks not efficiency if we always compare the strings. It should be good to avoid the expensive string operations if possible. In this patch, we introduced a `llvm::StringMap` to map primary module name to a Module* and a `llvm::DenseMap<Module*, Module*>` to map a Module* to a representative Module *. The representative Module* is one of the Module units belonging to a certain module. The module units have the same representative Module* should belong to the same module. We choose the representative Module* by the first module lookup for a certain primary module name. So the following module units have the same primary module name would get the same representative modules. So that for every modules, there will be only one hash process for the primary module name.

Prefer to check .empty() instead of .size() == 0

…ml (#68846) List the following C++23-era WG21 papers as Defect Reports in cxx_status.html as per WG21 meeting minutes. - [P1949R7](https://wg21.link/p1949r7) (C++ Identifier Syntax using Unicode Standard Annex 31) - [P2156R1](https://wg21.link/p2156r1) (Allow Duplicate Attributes) - [P2036R3](https://wg21.link/p2036r3) (Change scope of lambda _trailing-return-type_) - [P2468R2](https://wg21.link/p2468r2) (The Equality Operator You Are Looking For) - [P2327R1](https://wg21.link/p2327r1) (De-deprecating `volatile` compound operations) - [P2493R0](https://wg21.link/p2493r0) (Missing feature test macros for C++20 core papers) - [P2513R3](https://wg21.link/p2513r3) (`char8_t` Compatibility and Portability Fix) - [P2460R2](https://wg21.link/p2460r2) (Relax requirements on `wchar_t` to match existing practices) - [P2579R0](https://wg21.link/p2579r0) (Mitigation strategies for [P2036](https://wg21.link/p2036) ”Changing scope for lambda _trailing-return-type_”)

Previously, this only supported 1-D vectors via vector.shuffle, with the new vector.deinterleave this can be updated to support n-D vectors.

Fix for Fujitsu test suite test: 0275_0032.f90. The MLIR to LLVM translation logic assumed that reduction arguments to an `omp.parallel` op are always the last set of arguments to the op. However, this is a wrong assumption since private args come afterward.

The PR adds py binding for `AsyncTokenType`

…tic (#96292) We have been running into source location exhaustion recently and want to use the statistics to monitor the usage in various files to be able to anticipate where the next problem will happen. I picked `Statistic` because it can be written into a structured JSON file and is easier to consume by further automation. This commit does not change any existing per-source-manager metrics exposed via `SourceManager::PrintStats()`. This does create some redundancy, but I also expect to be non-controversial because it aligns with the intended use of `Statistic`.

…. (#96305) At the moment, vectorization is only enabled in streaming(-compatible) mode when enabled through an option. But the interfaces should check more than just 'hasSVE()', because a function with +sme in streaming mode should also vectorize with the option enabled. Additionally, a streaming-compatible function should only be able to use fixed-length autovec if SVE is available, otherwise the vector code will be scalarised by the backend.

Ideally, this would be based on target information (but we don't really have that), so this currently errs on the side of caution. If possible gathers/scatters should be lowered regular vector loads/stores before using invoking enable-arm-streaming.

I'll be splitting the ctlz/cttz tests into separate test files shortly

Methodes were removed in dc37dc8.

Introduce `nonblocking` and `nonallocating` attributes. RFC is here: https://discourse.llvm.org/t/rfc-nolock-and-noalloc-attributes/76837 This PR introduces the attributes, with some changes in Sema to deal with them as extensions to function (proto)types. There are some basic type checks, most importantly, a warning when trying to spoof the attribute (implicitly convert a function without the attribute to one that has it). A second, follow-on pull request will introduce new caller/callee verification. --------- Co-authored-by: Doug Wyatt <[email protected]> Co-authored-by: Shafik Yaghmour <[email protected]> Co-authored-by: Aaron Ballman <[email protected]> Co-authored-by: Sirraide <[email protected]>

Currently the AVLReg is printed raw like {AVLReg=2147483668, ...}, this changes it to {AVLReg=%20, ...} which should be easier to read.

Long story short the interaction of two optimizations happening in GlobalOpt results in a crash. For more details look at the issue llvm/llvm-project#96197. I will be fixing this in GlobalOpt but it is a conservative solution since it won't allow us to optimize resolvers which return a pointer to a function whose definition is in another TU when compiling without LTO: ``` __attribute__((target_version("simd"))) void bar(void); __attribute__((target_version("default"))) void bar(void); int foo() { bar(); } ``` fixes: #96197

… MinGW targets (#96062) libstdc++ requires this define to match what is predefined in GCC for the ABI of this platform; GCC hardcodes this define for all mingw configurations in gcc/config/i386/cygming.h. (It also defines __GXX_MERGED_TYPEINFO_NAMES=0, but that happens to match the defaults in libstdc++ headers, so there's no similar need to define it in Clang.) This fixes a Clang/libstdc++ interop issue discussed at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110572.

[basic.link]/p10: > If two declarations of an entity are attached to different modules, > the program is ill-formed But we only implemented the check for ODR. In this patch, we tried to diagnose the redeclarations from different modules.

fixes #91216

- Use Feature1_5xVGPRs

- move type insertion from individual parse methods into ParseTypeFromDWARF - optimize sentinel (TYPE_IS_BEING_PARSED) insertion to avoid double map lookup - as this requires the map to not have nullptr values, I've replaced all `operator[]` queries with calls to `lookup`.

…ff for #95904

/llvm-project/clang/lib/Sema/SemaType.cpp:7625:8: error: unused variable 'Success' [-Werror,-Wunused-variable] bool Success = FX.insert(NewEC, Errs); ^ 1 error generated.

Ztso 1.0 was ratified in January 2023. Documentation: https://github.com/riscv/riscv-isa-manual/blob/main/src/ztso-st-ext.adoc

change the expected error msg.

'a' is an input/output constraint for restraining assembly variables to an indexed or indirect address operand. It previously was marked as supported but would throw an assertion for unknown constraint type in the back-end when this test case was compiled. This change marks it as unsupported until we can add full support for address operands constraining to the compiler code generation.

Make it easier to access additional state from it.

…nd no self-ref (#94642) In that scalar case, the Init should initialize the auto var before use. The Init might use uninitialized memory from other sources (e.g., heap) but auto-init did not help us in that case because the auto-init would have been overwritten by the Init before use. For non-scalars e.g., classes, the Init expr might be a ctor call that leaves uninitialized members, so we leave the auto-init there. The motivation is to have less IR for the optimizer to later remove, which may not be until a fairly late pass (DSE) or may not get optimized in lower optimization levels like O1 (no DSE) or sometimes due to derefinement. This is ~10% less left-over auto-init in O1 in a few examples checked.

…#96321) (#96462) On MSVC the `this` uses inside `decltype` require a lambda capture. On clang they result in an unused capture warning instead. Add the capture and suppress the warning with `(void)this`. ----- Initializing this map is somewhat expensive (especially for O0), so we currently only do it if certain flags are used. I would like to make use of it for crash dumps (#96078), where we don't know in advance whether it will be needed or not. This patch changes the initialization to a lazy approach, where a callback is registered that does the actual initialization. The callbacks will be run the first time the pass name is requested. This way there is no compile-time impact if the mapping is not used.

… (#95290) With this change, Clang will generate errors when trylock functions have improper return types. Today, it silently fails to apply the trylock attribute to these functions which may incorrectly lead users to believe they have correctly acquired locks before accessing guarded data. As a side effect of explicitly checking the success argument type, I seem to have fixed a false negative in the analysis that could occur when a trylock's success argument is an enumerator. I've added a regression test to warn-thread-safety-analysis.cpp named `TrylockSuccessEnumFalseNegative`. This change also improves the documentation with descriptions of of the subtle gotchas that arise from the analysis interpreting the success arg as a boolean. Issue #92408

This also updates the status for C11 to be Partial, and because C17 is C11 plus DR resolutions, that makes C17 also Partial.

Out-of-range extractelement returns poison, and so do poison elements in the shufflevector mask.

Only the first proposed changes in the paper were adopted, and that wording was changing "operations" into "operators", which is purely an editorial change.

This paper proposes only changes to a footnote that had problematic implications for ABI; the changes were purely editorial.

This is very similar to https://reviews.llvm.org/D105553, in fact, I barely made any changes from MallocChecker's ownership visitor to this one. The new visitor emits a diagnostic note for function where a change in stream ownership was expected (for example, it had a fclose() call), but the ownership remained unchanged. This is similar to messages regarding ordinary values ("Returning without writing to x").

…st (#96494) Without a newline, documentation was failing to build with this error: Warning, treated as error: /home/runner/work/llvm-project/llvm-project/clang-build/tools/clang/docs/ThreadSafetyAnalysis.rst:466:Error in "code-block" directive: maximum 1 argument(s) allowed, 10 supplied. Issue #92408

Clang has implemented __VA_OPT__ since Clang 12.

Ultimately doesn't matter because the bitcode reader interprets undef and poison interchangeably in this context.

Test Plan: llvm-lit llvm-project/lldb/test/API/python_api/find_in_memory/TestFindInMemory.py llvm-project/lldb/test/API/python_api/find_in_memory/TestFindRangesInMemory.py Reviewers: clayborg Tasks: lldb

This PR relands #95942, which was reverted in #96332 due to link failures. It fixes the issue by updating CMake dependencies. The bazel support, originally introduced in #96334, is also included in this PR. --------- Co-authored-by: Keith Smiley <[email protected]>

This paper was a clarification paper that made no normative changes to the wording, so we can lean on the C99 status for this.

…to be a constant (#96286) This PR is to fix llvm/llvm-project#96285 by: * improve pattern matching to recognize an aggregate constant to be a constant * do not emit Bitcast for an aggregate type

This reverts commit 9cd6ef4. See discussion on review thread.

After #94815, this is only used within ModuleToPostOrderCGSCCPassAdaptor::run(), so keep it local to that function.

… (#96108) Transformational functions from the intrinsic module ISO_C_BINDING are allowed in specification expressions, so tweak some general checks that would otherwise trigger error messages about inadmissible targets, dummy procedures in specification expressions, and pure procedures with impure dummy procedures.

When the second argument to these intrinsic functions is a scalar constant zero, emit a warning (if enabled) even if the first argument is not a constant.

Uses the new InsertPosition class (added in #94226) to simplify some of the IRBuilder interface, and removes the need to pass a BasicBlock alongside a BasicBlock::iterator, using the fact that we can now get the parent basic block from the iterator even if it points to the sentinel. This patch removes the BasicBlock argument from each constructor or call to setInsertPoint. This has no functional effect, but later on as we look to remove the `Instruction *InsertBefore` argument from instruction-creation (discussed [here](https://discourse.llvm.org/t/psa-instruction-constructors-changing-to-iterator-only-insertion/77845)), this will simplify the process by allowing us to deprecate the InsertPosition constructor directly and catch all the cases where we use instructions rather than iterators.

This commit adds a unit test for SBBreakpoint::SetCallback as it wasn't being tested before.

Necessary for arm32 cross full build.

Catch some cases where assumed rank dummy arguments are not allowed.

This patch adds support for the new foreign type unit support in .debug_names. Features include: - don't manually index foreign TUs if we have info for them - only use the type unit entries that match the .dwo files when we have a .dwp file - fix type unit lookups for .dwo files - fix crashers that happen due to PeekDIEName() using wrong offsets where an entry had DW_IDX_comp_unit and DW_IDX_type_unit entries and when we had no type unit support, it would cause us to think it was a normal DIE in .debug_info from the main executable. --------- Co-authored-by: paperchalice <[email protected]>

Reverts the above commit, as it updates a common header function and did not update all callsites: https://lab.llvm.org/buildbot/#/builders/29/builds/382 This reverts commit 6481dc5.

These tests were accidentally removed in a7a1195. I only meant to remove bfloat tests, but I accidentally removed f32 and f64 as well.

We can't pass an empty string to addUndefined(). This fixes the crash that was encountered in llvm/llvm-project#93309 (turning the crash into a properly handled error; making it do the right thing is handled in llvm/llvm-project#96055).

The allocatable strings also use DIStringType but provide dwarf expressions to find the location and length of the string. With this change in place, the debugging of the allocatable strings looks like this: character(len=:), allocatable :: first character(len=:), allocatable :: second character(len=:), allocatable :: third first = 'Mount' second = 'Everest' third = first // " " // second print *, third (gdb) p third $1 = "" (gdb) n 18 print *, third (gdb) p third $2 = 'Mount Everest' (gdb) ptype third type = character (13)

Since `assignAddresses` is executed more than once, error reporting during `assignAddresses` would be duplicated. Generalize #66854 to cover more errors. Note: address-related errors exposed in one invocation might not be errors in another invocation. Pull Request: llvm/llvm-project#96361

…… (#96244) …ne continuation Allow preprocessing directives to appear between a source line and its continuation, including conditional compilation directives (#if, #ifdef, &c.). Fixes llvm/llvm-project#95476.

It's an lvalue, so we need to use the classify() taking an expression.

When a data transfer statement references a unit number that hasn't been explicitly OPENed, the runtime I/O support library opens a local "fort.N" file where N is the unit number. If that name exists in the current working directory but is not a readable or writable file (as appropriate), the runtime needs to catch the error at the point of the READ or WRITE statement rather than leaving an open unit in the unit map without a valid file descriptor.

Extend the runtime validation of deallocated pointers so that it also works when pointers are allocated &/or deallocated outside Fortran. Previously, bogus runtime errors would be reported for pointers allocated via CFI_allocate() and deallocated in Fortran, and CFI_deallocate() did not check that it was deallocating a whole contiguous pointer that was allocated as such.

…5312) Currently, LLDB does not support taking a minidump over the 4.2gb limit imposed by uint32. In fact, currently it writes the RVA's and the headers to the end of the file, which can become corrupted due to the header offset only supporting a 32b offset. This change reorganizes how the file structure is laid out. LLDB will precalculate the number of directories required and preallocate space at the top of the file to fill in later. Additionally, thread stacks require a 32b offset, and we provision empty descriptors and keep track of them to clean up once we write the 32b memory list. For [MemoryList64](https://learn.microsoft.com/en-us/windows/win32/api/minidumpapiset/ns-minidumpapiset-minidump_memory64_list), the RVA to the start of the section itself will remain in a 32b addressable space. We achieve this by predetermining the space the memory regions will take, and only writing up to 4.2 gb of data with some buffer to allow all the MemoryDescriptor64s to also still be 32b addressable. I did not add any explicit tests to this PR because allocating 4.2gb+ to test is very expensive. However, we have 32b automation tests and I validated with in several ways, including with 5gb+ array/object and would be willing to add this as a test case.

Support IR translation for scalable vector store

…ontraction ops (#95710) This patch introduces pattern rewrites for reducing the rank of named linalg contraction ops with unit spatial dim(s) to other named contraction ops. For example `linalg.batch_matmul` with batch size 1 -> `linalg.matmul` and `linalg.matmul` with unit LHS spatial dim -> `linalg.vecmat`, etc. These patterns don't support reducing the rank along reduction dimension as those don't convert to other named contraction ops.

@sunfishcode

This PR fixes #55781 by adding the `--no-wasm-opt` and `--wasm-opt` flags in clang to disable/enable the `wasm-opt` optimizations. The default is to enable `wasm-opt` as before in order to not break existing workflows. I think that adding a warning when no flag or the `--wasm-opt` flag is given but `wasm-opt` wasn't found in the path may be relevant here. It allows people using `wasm-opt` to be aware of if it have been used on their produced binary or not. The only downside I see to this is that people already using the toolchain with the `-O` and `-Werror` flags but without `wasm-opt` in the path will see their toolchain break (with an easy fix: either adding `--no-wasm-opt` or add `wasm-opt` to the path). I haven't implemented this here because I haven't figured out how to add such a warning, and I don't know if this warning should be added here or in another PR. CC @sunfishcode that proposed in the associated issue to review this patch.

When printing JSON output with --dynamic-table I noticed that the output is invalid JSON. This patch overrides the printDynamicTable() function in the JSONELFDumper to return a list of dictionaries for the DynamicSection value. Before the output was: ``` { "FileSummary": { "File": "bin/llvm-readelf", "Format": "elf64-x86-64", "Arch": "x86_64", "AddressSize": "64bit", "LoadName": "<Not found>" }DynamicSection [ (35 entries) Tag Type Name/Value 0x000000000000001D RUNPATH Library runpath: [$ORIGIN/../lib:] 0x0000000000000001 NEEDED Shared library: [libm.so.6] 0x0000000000000001 NEEDED Shared library: [libz.so.1] 0x0000000000000001 NEEDED Shared library: [libzstd.so.1] ``` Now the output looks like: ``` "DynamicSection": [ { "Tag": 29, "Type": "RUNPATH", "Value": 6322, "Path": [ "$ORIGIN/../lib" ] }, { "Tag": 1, "Type": "NEEDED", "Value": 6109, "Library": "libm.so.6" }, ```

This is to unblock #95007. Will investigate why the assertion is failing on some arch.

This updates Clang's extension criteria to explicitly mention impacts on other projects within the monorepo. These changes were discussed in the following RFC: https://discourse.llvm.org/t/rfc-require-discussion-of-impact-to-monorepo-stakeholders-when-adding-new-clang-extensions/79613

This paper clarified the lifetime of compound literal objects in odd scopes, such as use at function prototype scope. We do not currently implement this paper, as the new test demonstrates.

This also fixes up some asserts in copyPhysReg, loadRegFromStackSlot and storeRegToStackSlot.

…(#96482) One reason to want to split this up is to simplify the code added in #93802, where it checks the SME streaming-mode requirements for a builtin by checking for the absence of SVE. If the target guards are separate, we can generate a table and make the Sema code to verify the runtime mode simpler. Another reason is to avoid an issue with a check in SveEmitter.cpp where it ensures that the 'VerifyRuntimeMode' is set correctly for functions that have both SVE and SME target guards: if (!Def->isFlagSet(VerifyRuntimeMode) && Def->getGuard().contains("sve") && Def->getGuard().contains("sme")) llvm_unreachable("Missing VerifyRuntimeMode flag"); However, if we ever add a new feature with "sme" in the name, even though it is unrelated to FEAT_SME, then this code no longer works. Note that the arm_sve.td and arm_sme.td files could do with a bit of restructuring after this but it seems better to follow that up in an NFC patch.

…390) Detected with ASAN. `Operation::getLoc()` was called after erasing the operation. Reverts 48cf6b6, which attempted to fix the use-after-free. (But the use-after-free is still there when the `hasFailed` branch is taken.)

…rt (#85466) Add support to the runtime for 6.0 spec features that allow num_threads clause to take a list, and also make use of the strict modifier. Provides new compiler interface functions for these features.

The passed SCC is the current SCC we're working on.

If you're building and vendoring lldb, you might need to also vendor these files.

The static verifier flagged dead code in the check since the loop will only execute once and never reach the iterator increment. The loop needs to iterate twice to correctly diagnose when a statement is after the teams. Since there are two iterations again, reset the iterator to the first teams directive when the double teams case is seen so the diagnostic can report both locations.

This applies to the AOT case where we embed models in the compiler. The change adds support for multiple models for the same agent, and allows the user select one via a command line flag. "agent" refers to e.g. the inline advisor or the register allocator eviction advisor. To avoid build setup complexity, the support is delegated to the saved model. Since saved models define computational graphs, we can generate a composite model (this happens prior to building and embedding it in LLVM and is not shown in this change) that exposes an extra feature with a predefined name: `_model_selector`. The model, then, delegates internally to contained models based on that feature value. Model selection is expected to happen at model instantiation, there is no current scenario for switching them afterwards. If the model doesn't expose such a feature but the user passes one, we report error. If the model exposes such a feature but the user doesn't pass one, we also report an error. Invalid model selector values are expected to be handled by the saved model. Internally, the model uses a pair of uint64 values - the high and low of the MD5 hash of the name. A tool composing models would, then, need to: - expose the extra feature, `_model_selector`, shape (2,), uint64 data type - test its value (`tf.cond` or `tf.case` in Tensorflow) against the MD5 hash, in the [high, low] order, of contained models based on a user-specified name (which the user will then use as flag value to the compiler) Agents just need to add a flag to capture the name of a model and pass it to `ReleaseModeModelRunner` at construction. This can be passed in all cases without checking - the case where the model is not composite and we pass an empty name, everything works as before. This change also factors out the string flags we pass to the `ReleaseModeModelRunner` for better maintainability (we risk confusing parameters that are strings otherwise)

Static verifier noticed the current code has logically dead code parsing the clause where IsComma is assigned. Fix this and improve the error message received when a bad adjust-op is specified. This will now be handled like 'map' where a nice diagnostic is given with the correct values, then parsing continues on the next clause reducing unhelpful diagnostics.

…165) For unbuffered smem loads, it is illegal for the immediate offset to be negative if the resulting IOFFSET + (SGPR[Offset] or M0 or zero) is negative. New PR of llvm/llvm-project#79553.

This patch makes libc++ build with -fsized-deallocation. That flag is enabled by default in recent versions of Clang, so this patch will make libc++ forward-compatible with ToT Clang.

This is necessary for 32b platforms such as ARM and i386. Link: #94128

mig is a tool vendored with Xcode. Using apple_genrule makes sure that the bazel selected version of Xcode is preferred, and that the action is invalidated when that version changes.

- Adds a helper function for checking whether an argument is a [grid_constant](https://docs.nvidia.com/cuda/nvvm-ir-spec/index.html#supported-properties). - Adds support for cvta.param using changes from llvm/llvm-project#95289 - Supports escaped grid_constant pointers conservatively, by casting all uses to the generic address space with cvta.param.

733b8b2 ([LAA] Simplify identification of speculatable strides [nfc]) refactored getStrideFromPointer() to compute directly on SCEVs, and return an SCEV expression instead of a Value. However, it left behind a call to getUniqueCastUse(), which is completely unnecessary. Remove this, showing a positive test update, and simplify the surrounding program logic.

…rounding modes. (#95736) - Algorithm: - Step 1 - Range reduction: for a double precision input `x`, return `k` and `u` such that - k is an integer - u = x - k * pi / 128, and |u| < pi/256 - Step 2 - Calculate `sin(u)` and `cos(u)` in double-double using Taylor polynomials with errors < 2^-70 with FMA or < 2^-66 w/o FMA. - Step 3 - Calculate `sin(x) = sin(k*pi/128) * cos(u) + cos(k*pi/128) * sin(u)` using look-up table for `sin(k*pi/128)` and `cos(k*pi/128)`. - Step 4 - Use Ziv's rounding test to decide if the result is correctly rounded. - Step 4' - If the Ziv's rounding test failed, redo step 1-3 using 128-bit precision. - Currently, without FMA instructions, the large range reduction only works correctly for the default rounding mode (FE_TONEAREST). - Provide `LIBC_MATH` flag so that users can set `LIBC_MATH = LIBC_MATH_SKIP_ACCURATE_PASS` to build the `sin` function without step 4 and 4'.

Reverting to unblock macOS buildbots which are currently failing on these tests. https://green.lab.llvm.org/job/llvm.org/view/LLDB/job/as-lldb-cmake/6377/

As suggested in https://github.com/llvm/llvm-project/pull/86609/files#r1556689262 an API for getting the number of branch weights directly from the MD node would be useful in a variety of checks, and keeps the logic within ProfDataUtils.

Using the hashes of binary and profiled functions to recover functions with changed names. Test Plan: added hashing-based-function-matching.test.

This was added back in 7f6e331, but I forgot to update the docs that referenced it.

Reverts llvm/llvm-project#95821

This is a second attempt to land #95007 Test Plan: llvm-lit llvm-project/lldb/test/API/python_api/find_in_memory/TestFindInMemory.py llvm-project/lldb/test/API/python_api/find_in_memory/TestFindRangesInMemory.py Reviewers: clayborg Tasks: lldb

@PiJoules

Summary: This test fails due to alignment issues, it's likely that it's misaligned on other targets too and they just don't crash on it. @PiJoules maybe we should run this with ubsan?

…e_t (#96564) In #95312 I incorrectly set `m_expected_directories` to uint, this broke the windows build and is the incorrect type. `size_t` is more accurate because this value only ever represents the expected upper bound of the directory vector.

…emit Flang's resource directory" (#96557) Reverts llvm/llvm-project#90886 These changes broke linking to compiler-rt on Windows

This patch adds derf as an alternate spelling for the erf intrinsic. This spelling is supported by multiple other compilers and used by WRF.

This is a follow-up patch for #74199

…which has been addressed by #76555

We are running into NVPTX backend generating wrong code for an input: ``` %0 = llvm.nvvm.mma.m?n?k?.row.col.??? (...) if laneid == 0: ret else: store %0 ``` The backend reorder the instruction (as an effect of `MachineSink` pass) to ``` if laneid == 0: ret else: %0 = llvm.nvvm.mma.m?n?k?.row.col.??? (...) store %0 ``` This is incorrect because `mma` is a warp instruction which needs all threads to sync before performing the operation instead of being guarded by a specific thread id. It should be similar as the shuffle instruction `shfl` in terms of warp level sync, and `shfl` is marked as `isConvergent = true`. Apply `isConvergent = true` to `mma` instructions.

The test failed after llvm/llvm-project@8ad32ce In https://github.com/gcc-mirror/gcc/blob/master/gcc/common/config/i386/i386-cpuinfo.h FEATURE_AVX512CD = 23 and FEATURE_AVX512VBMI = 26, we should only add 2 features between them. New features should be inserted at the end.

CONFLICT (content): Merge conflict in libclc/CMakeLists.txt

The options regarding which blank lines are kept are also aggregated. The new option is `KeepEmptyLines`.

…731) This is follow up for #65215 Mentioned regression was fixed in MSVC 19.39 (VS 17.9.0), so it makes sense to not apply fix for that (and newer) compiler versions. Same as original change, this patch is narrowly scoped to not affect any other compiler.

…Lists.txt (#96330) This is essentially a revert of 9853e9b which tried removing duplication in the Windows config files by moving it to the CMake. However, we want to decouple the CMake and the test suite as much as possible, so encoding additional (non-official) Lit parameters in the CMake only as a code reuse mechanism is not an approach we want to take.

`GetDirectCallee` can be null. Fixes #96498.

CONFLICT (content): Merge conflict in clang/lib/CodeGen/CodeGenModule.cpp

…gRef). NFC

…s manager (#96378) Follows #95879.

This reverts commit 245491a ("[MC] Disable MCAssembler based constant folding for DwarfDebug") and cb09b5f ("[MC] Disable MCAssembler based constant folding for compact unwind and emitJumpTableEntry"). Checking the relative order of FA and FB is now faster due to de19f7b ("[MC] Replace fragment ilist with singly-linked lists").

…rguments (#96207) This commit simplifies the handling of dropped arguments and updates some dialect conversion documentation that is outdated. When converting a block signature, a `BlockTypeConversionRewrite` object and potentially multiple `ReplaceBlockArgRewrite` are created. During the "commit" phase, uses of the old block arguments are replaced with the new block arguments, but the old implementation was written in an inconsistent way: some block arguments were replaced in `BlockTypeConversionRewrite::commit` and some were replaced in `ReplaceBlockArgRewrite::commit`. The new `BlockTypeConversionRewrite::commit` implementation is much simpler and no longer modifies any IR; that is done only in `ReplaceBlockArgRewrite` now. The `ConvertedArgInfo` data structure is no longer needed. To that end, materializations of dropped arguments are now built in `applySignatureConversion` instead of `materializeLiveConversions`; the latter function no longer has to deal with dropped arguments. Other minor improvements: - Improve variable name: `origOutputType` -> `origArgType`. Add an assertion to check that this field is only used for argument materializations. - Add more comments to `applySignatureConversion`. Note: Error messages around failed materializations for dropped basic block arguments changed slightly. That is because those materializations are now built in `legalizeUnresolvedMaterialization` instead of `legalizeConvertedArgumentTypes`. This commit is in preparation of decoupling argument/source/target materializations from the dialect conversion.

The fadd_f64 test was in the middle of some f32 tests.

/llvm-project/clang-tools-extra/clangd/Format.cpp:284:11: error: no member named 'KeepEmptyLinesAtTheStartOfBlocks' in 'clang::format::FormatStyle' Style.KeepEmptyLinesAtTheStartOfBlocks = true; ~~~~~ ^ 1 error generated.

This is another relatively small adjustment to shuffleToIdentity, which has had a few knock-one effects to need a few more changes. It attempts to detect free concats, that will be legalized to multiple vector operations. For example if the lanes are '[a[0], a[1], b[0], b[1]]' and a and b are v2f64 under aarch64. In order to do this: - isFreeConcat detects whether the input has piece-wise identities from multiple inputs that can become a concat. - A tree of concat shuffles is created to concatenate the input values into a single vector. This is a little different to most other inputs as there are created from multiple values that are being combined together, and we cannot rely on the Lane0 insert location always being valid. - The insert location is changed to the original location instead of updating per item, which ensure it is valid due to the order that we visit and create items.

Add remove_if() method, similar to the one already present on SetVector. It is intended to replace the following pattern: for (Foo *Ptr : Set) if (Pred(Ptr)) Set.erase(Ptr); With: Set.remove_if(Pred); This pattern is commonly used for set intersection, where `Pred` is something like `!OtherSet.contains(Ptr)`. The implementation provided here is a bit more efficient than the naive loop, because it does not require looking up the bucket during the erase() operation again. However, my actual motivation for this is to have a way to perform this operation without relying on the current `std::set`-style guarantee that erase() does not invalidate iterators. I'd like to stop making use of tombstones in the small regime, which will make insertion operations a good bit more efficient. However, this will invalidate iterators during erase().

…ID from imported modules To support no-transitive-change model for named modules, we can't reuse type ID and identifier ID from imported modules arbitrarily. Since the theory for no-transitive-change model is, for a user of a named module, the user can only access the indirectly imported decls via the directly imported module. So that it is possible to control what matters to the users when writing the module. And it will be unsafe to do so if the users can reuse the type IDs and identifier IDs from the indirectly imported modules not via the directly imported modules. So in this patch, we don't reuse the type ID and identifier ID in the AST writer to avoid the problematic case.

We can't deref() them, so return false here.

runDFS() currently performs three hash table lookups. One in the main loop, one when checking whether a successor has already been visited and another when adding parent and reverse children to the successor. We can avoid the two additional lookups by making the parent number part of the stack, and then making the parent / reverse children update part of the main loop. The main loop already has a check for already visited nodes, so we don't have to check this in advance -- we can simply push the node to the worklist and skip it later.

mlir-config.h is included but not listed in dependencies

… (#95553) When running early-tailduplication we've seen problems with machine verifier errors due to register class mismatches after doing the machine SSA updates. Typical scenario is that there is a PHI node and another instruction that is using the same vreg: %othervreg:otherclass = PHI %vreg:origclass, %bb MInstr %vreg:origclass but then after TailDuplicator::tailDuplicateAndUpdate we get %othervreg:otherclass = PHI %vreg:origclass, %bb, ... MInstr %othervreg:otherclass Such rewrites are only valid if 'otherclass' is equal to (or a subclass of) 'origclass'. The solution here is based on adding a COPY instruction to make sure we satisfy constraints given by 'MInstr' in the example. So if 'otherclass' isn't equal to (or a subclass of) 'origclass' we insert a copy after the PHI like this: %othervreg:otherclass = PHI %vreg:origclass, %bb, ... %newvreg:origclass = COPY %othervreg:otherclass MInstr %newvreg:origclass A special case is when it is possible to constrain the register class instead of inserting a COPY. We currently prefer to constrain the register class instead of inserting a COPY, even if it is a bit unclear if that always is better (considering register pressure for the constrained class etc.). Fixes: llvm/llvm-project#62712

Change this function to be LLVM-style in name.

Syntacore SCR3 is a microcontroller-class processor core. Overview: https://syntacore.com/products/scr3 Co-authored-by: Dmitrii Petrov <[email protected]>

…IEs (#96484) If ParseStructureLikeDIE (or ParseEnum) encountered a declaration DIE, it would call FindDefinitionTypeForDIE. This returned a fully formed type, which it achieved by recursing back into ParseStructureLikeDIE with the definition DIE. This obscured the control flow and caused us to repeat some work (e.g. the UniqueDWARFASTTypeMap lookup), but it mostly worked until we tried to delay the definition search in #90663. After this patch, the two ParseStructureLikeDIE calls were no longer recursive, but rather the second call happened as a part of the CompleteType() call. This opened the door to inconsistencies, as the second ParseStructureLikeDIE call was not aware it was called to process a definition die for an existing type. To make that possible, this patch removes the recusive type resolution from this function, and leaves just the "find definition die" functionality. After finding the definition DIE, we just go back to the original ParseStructureLikeDIE call, and have it finish the parsing process with the new DIE. While this patch is motivated by the work on delaying the definition searching, I believe it is also useful on its own.

…#96514) This PR fixes llvm/llvm-project#96513. The way of creation of array type constant was incorrect: instead of creating [1, 1, 1] or [1, 1, 1, 1, 1, ....] constants, the same [1] constant was always created, substituting original composite constants. This in its turn led to a situation when only one of constants might exist in the code without emitting invalid code, the second constant would be eventually rewritten to the first constant, because a key to address both was an array of a single element (like [1]). This PR fixes the issue and purges from the code unneeded copy/pasted clone of the function that creates an array constant.

…nes (#95269) This patch extends #73964 and adds optimisation of load SVE intrinsics when predicate is zero.

The handling of `PointerType` is similar to `HeapType`. The only difference is that allocated flag is generated for `HeapType` and associated flag for `PointerType`. The tests for pointer to allocatable strings are disabled for now. I will enable them once #95906 is merged. The debugging in GDB looks like this: integer, pointer :: par2(:) integer, target, allocatable :: ar2(:) integer, target :: sc integer, pointer :: psc allocate(ar2(4)) par2 => ar2 psc => sc 19 par2 => ar2 (gdb) p par2 $3 = <not associated> (gdb) n 20 do i=1,5 (gdb) p par2 $4 = (0, 0, 0, 0) (gdb) ptype par2 type = integer (4) (gdb) p sc $5 = 3 (gdb) p psc $6 = (PTR TO -> ( integer )) 0x7fffffffda24 (gdb) p *psc $7 = 3

…ing for generic types (#89217) This patch is intended to be the first of a series with end goal to adapt atomic optimizer pass to support i64 and f64 operations (along with removing all unnecessary bitcasts). This legalizes 64 bit readlane, writelane and readfirstlane ops pre-ISel --------- Co-authored-by: vikramRH <[email protected]>

This change adds methods like buildGetFPEnv and similar for opcodes that represent manipulation on floating-point state.

This changes the behaviour in C++03 mode because we'll now use the builtin on Clang, but I don't think that's much of a problem.

This header used three-space indentation in a number of places. Reformat it completely.

This FIXME has already been addressed in #89358

Instead for iterating over all VFs when computing costs, simply iterate over the VFs available in the created VPlans. Split off from llvm/llvm-project#92555. This also prepares for moving the check if any vector instructions will be generated to be based on VPlan, to unblock recommitting llvm/llvm-project#92555.

Without the store, the vector loop body is empty. Add a store to avoid that, while not impacting the induction resume values that are created.

This patch implements lowering of the GlobalAddress, BlockAddress, JumpTable and BR_JT. Also patch adds legal support of the BR_CC operation for i32 type.

Some of these are just old, while others previously did not use UTC due to missing features that have since been implemented (such as signature matching).

These will be replaced later.

…Target` (#96500)

Since we mark the pseudos as mayLoad but do not provide any MMOs, isSafeToMove conservatively returns false, stopping MachineLICM from hoisting the instructions. PseudoLA_TLS_{LD,GD} does not actually expand to a load, so stop marking that as mayLoad to allow it to be hoisted, and for the others make sure to add MMOs during lowering to indicate they're GOT loads and thus can be freely moved.

… (#95061) This patch augments the HIPAMD driver to allow it to target AMDGCN flavoured SPIR-V compilation. It's mostly straightforward, as we re-use some of the existing SPIRV infra, however there are a few notable additions: - we introduce an `amdgcnspirv` offload arch, rather than relying on using `generic` (this is already fairly overloaded) or simply using `spirv` or `spirv64` (we'll want to use these to denote unflavoured SPIRV, once we bring up that capability) - initially it is won't be possible to mix-in SPIR-V and concrete AMDGPU targets, as it would require some relatively intrusive surgery in the HIPAMD Toolchain and the Driver to deal with two triples (`spirv64-amd-amdhsa` and `amdgcn-amd-amdhsa`, respectively) - in order to retain user provided compiler flags and have them available at JIT time, we rely on embedding the command line via `-fembed-bitcode=marker`, which the bitcode writer had previously not implemented for SPIRV; we only allow it conditionally for AMDGCN flavoured SPIRV, and it is handled correctly by the Translator (it ends up as a string literal) Once the SPIRV BE is no longer experimental we'll switch to using that rather than the translator. There's some additional work that'll come via a separate PR around correctly piping through AMDGCN's implementation of `printf`, for now we merely handle its flags correctly.

CONFLICT (content): Merge conflict in llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp

CONFLICT (content): Merge conflict in clang/test/Driver/sycl-linker-wrapper-image.cpp

CONFLICT (content): Merge conflict in clang/lib/Driver/Driver.cpp CONFLICT (content): Merge conflict in clang/lib/Driver/ToolChains/HIPAMD.cpp

@premanandrao

Test needs update after cbf6e93 2024-05-28 [clang codegen] Delete unnecessary GEP cleanup code. (#90303). Change made by @premanandrao

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sycl web #14302

Sycl web #14302

Commits on Jun 22, 2024

Commits on Jun 23, 2024

Commits on Jun 24, 2024

Commits on Jun 25, 2024

Commits on Jun 26, 2024