Ninja build #7

…, undef -> X transforms" and subsequent patches This reverts most of the following patches due to reports of miscompiles. I've left the added test cases with comments updated to be FIXMEs. 1cf6f210a2e [IR] Disable select ? C : undef -> C fold in ConstantFoldSelectInstruction unless we know C isn't poison. 469da663f2d [InstSimplify] Re-enable select ?, undef, X -> X transform when X is provably not poison 122b0640fc9 [InstSimplify] Don't fold vectors of partial undef in SimplifySelectInst if the non-undef element value might produce poison ac0af12ed2f [InstSimplify] Add test cases for opportunities to fold select ?, X, undef -> X when we can prove X isn't poison 9b1e95329af [InstSimplify] Remove select ?, undef, X -> X and select ?, X, undef -> X transforms (cherry picked from commit 00f3579aea6e3d4a4b7464c3db47294f71cef9e4)

We need to specify legal integer widths to trigger PR46712, so add those here. This doesn't appear to affect any existing tests, and it's not clear why a datalayout would not include any legal integer widths. While here, change some variable names that include 'tmp' to avoid warnings from the auto-generating script for CHECK lines. (cherry picked from commit efc30e591bb5a6e869fd8e084bd310ae516b0fae)

I'm not sure if the test is truly minimal, but we need to induce a situation where a value becomes a constant but is not immediately folded before getting to the 'or' transform. (cherry picked from commit d8b268680d0858aaf30cb1a278b64b11361bc780)

…gnment assumptions" due to the performance bugs filed in https://bugs.llvm.org/show_bug.cgi?id=46753. An SROA change soon may obviate some of these problems. This reverts commit 8d09f20798ac180b1749276bff364682ce0196ab. (cherry picked from commit 7bfaa40086359ed7e41c862ab0a65e0bb1be0aeb)

(cherry picked from commit 9adf7461f721170419058684a8d3f9228d641d59)

…by combineAdd and combineSub. There was a lot of duplicate code here for checking the VT and subtarget. Moving it into a helper avoids that. It also fixes a bug that combineAdd reused Op0/Op1 after a call to isHorizontalBinOp may have changed it. The new helper function has its own local version of Op0/Op1 that aren't shared by other code. Fixes PR46455. Reviewed By: spatel, bkramer Differential Revision: https://reviews.llvm.org/D83971 (cherry picked from commit 5408024fa87e0b23b169fec07913bd4357acdbc4)

The flag is off by default. (cherry picked from commit 033ef8420cec57187fffac1f06322f73aa945c4c)

This is brought up in https://reviews.llvm.org/D83915. We would like to remove some feature in PowerPC. We did send RFC before, but we think it might be a better idea that we indicate planned removal in the Release Notes for version 11 and actual removal in those for version 12.. Reviewed By: hubert.reinterpretcast Differential Revision: https://reviews.llvm.org/D83968

This function has a bug which will incorrectly reschedule instructions after an INLINEASM_BR (which can branch). (The bug may also allow scheduling past a throwing-CALL, I'm not certain.) I could fix that bug, but, as the removed FIXME notes, it's better to attempt rescheduling before converting to 3-addr form, as that may remove the need to convert in the first place. In fact, the code to do such reordering was added to this pass only a few months later, in 2011, via the addition of the function rescheduleMIBelowKill. That code does not contain the same bug. The removal of the sink3AddrInstruction function is not a no-op: in some cases it would move an instruction post-conversion, when rescheduleMIBelowKill would not move the instruction pre-converison. However, this does not appear to be important: the machine instruction scheduler can reorder the after-conversion instructions, in any case. This patch fixes a kernel panic 4.4 LTS x86_64 Linux kernels, when built with clang after 4b0aa5724feaa89a9538dcab97e018110b0e4bc3. Link: ClangBuiltLinux/linux#1085 Differential Revision: https://reviews.llvm.org/D83708 (cherry picked from commit 60433c63acb71935111304d71e41b7ee982398f8)

This suppresses `failed to compute relocation: R_PPC_REL32, Invalid data was encountered while parsing the file` and its 64-bit variants when running llvm-dwarfdump on a PowerPC object file with .eh_frame Unfortunately it is difficult to test the computation: DWARFDataExtractor::getEncodedPointer does not use the relocated value and even if it does, we need to teach llvm-dwarfdump --eh-frame to do some linker job to report a reasonable address. (cherry picked from commit b922004ea29d54534c4f09b9cfa655bf5f3360f0)

Code from D83800 by Yichao Yu (cherry picked from commit 3073a3aa1ef1ce8c9cac9b97a8e5905dd8779e16)

@test

…abels ``` define i32 @test(i1 %cond) { entry: br i1 %cond, label %exit, label %exit exit: %result = select i1 %cond, i32 123, i32 456 ret i32 %result } ``` In this test, after applying transformation of replacing select with Phis, the result will be: ``` define i32 @test(i1 %cond) { entry: br i1 %cond, label %exit, label %exit exit: %result = i32 phi [123, %exit], [123, %exit] ret i32 %result } ``` That is, select is transformed into an invalid Phi, which will then be reduced to 123 and the second value will be lost. But it is worth noting that this problem will arise only if select is in the InstCombine worklist will be before the branch. Otherwise, InstCombine will replace the branch condition with false and transformation will not be applied. The fix is to check the target labels in the branch condition for equality. Patch By: Kirill Polushin Differential Revision: https://reviews.llvm.org/D84003 Reviewed By: mkazantsev (cherry picked from commit c98988107868db41c12b9d782fae25dea2a81c87)

…ranch has the same labels An additional test that allows to check the correctness of handling the case of the same branch labels in the dominator when trying to replace select with phi-node. Patch By: Kirill Polushin Differential Revision: https://reviews.llvm.org/D84006 Reviewed By: mkazantsev (cherry picked from commit df6e185e8f895686510117301e568e5043909b66)

Summary: 1. gcc uses `-march` and `-mtune` flag to chose arch and pipeline model, but clang does not have `-mtune` flag, we uses `-mcpu` to chose both infos. 2. Add SiFive e31 and u54 cpu which have default march and pipeline model. 3. Specific `-mcpu` with rocket-rv[32|64] would select pipeline model only, and use the driver's arch choosing logic to get default arch. Reviewers: lenary, asb, evandro, HsiangKai Reviewed By: lenary, asb, evandro Tags: #llvm, #clang Differential Revision: https://reviews.llvm.org/D71124 (cherry picked from commit 294d1eae75bf8867821a4491f0d67445227f8470)

…urce register when the destination is a 64 register. Previously we only accepted a 32-bit source with a 64-bit dest. Accepting 64-bit as well is more consistent with gas behavior. I think maybe we should accept 16 bit register as well, but I'm not sure. (cherry picked from commit 3c2a56a857227b6bc39285747269f02cd7a9dbe5)

… register. This matches GNU assembler behavior. Operand size is determined only from the destination register. (cherry picked from commit 71b49aa438b22b02230fff30e8874ff756336e6d)

Summary: Remove unused function Reviewed By: lbenes Differential Revision: https://reviews.llvm.org/D83898 (cherry picked from commit 47a3b85a97136fca4a388646cbaec10b71414b60)

The getAllOnesValue can only handle things that are bitcast from a ConstantInt, while here we bitcast through a pointer, so we may see more complex objects (like Array or Struct). Differential Revision: https://reviews.llvm.org/D83870 (cherry picked from commit 8b354cc8db413f596c95b4f3240fabaa3e2c931e)

…instead of .o This matches LLD and fixes https://sourceware.org/bugzilla/show_bug.cgi?id=26262#c1 .o is a bad choice for save-temps output because it is easy to override the bitcode file (*.o) ``` # Use bfd for the example, -fuse-ld=gold is similar. clang -flto -c a.c # generate bitcode file a.o clang -fuse-ld=bfd -flto a.o -o a -Wl,-plugin-opt=save-temps # override a.o # The user repeats the command but get surprised, because a.o is now a combined module. clang -fuse-ld=bfd -flto a.o -o a -Wl,-plugin-opt=save-temps ``` Reviewed By: tejohnson Differential Revision: https://reviews.llvm.org/D84132 (cherry picked from commit 55fa315b0352b63454206600d6803fafacb42d5e)

(cherry picked from commit aa830e9768303ff8d27c015759294c4ce704d50c)

(cherry picked from commit 817767abeec8343b20de83f8b1b2c8c20bbbe00a)

…ogue Current powerpc backend generates wrong code sequence if stack pointer has to realign if -fstack-clash-protection enabled. When probing in prologue, backend should generate a subtraction instruction rather than a `stux` instruction to realign the stack pointer. This patch is part of fix of https://bugs.llvm.org/show_bug.cgi?id=46759. Differential Revision: https://reviews.llvm.org/D84218 (cherry picked from commit 8912252252c87d8ef6623ecf9fdde444560ee4b9)

…ing dynalloc Current powerpc backend generates wrong code sequence if stack pointer has to realign if `-fstack-clash-protection` enabled. When probing dynamic stack allocation, current `PREPARE_PROBED_ALLOCA` takes `NegSizeReg` as input and returns `FinalStackPtr`. `FinalStackPtr=StackPtr+ActualNegSize` is calculated correctly, however code following `PREPARE_PROBED_ALLOCA` still uses value of `NegSizeReg`, which does not contain `ActualNegSize` if `MaxAlign > TargetAlign`, to calculate loop trip count and residual number of bytes. This patch is part of fix of https://bugs.llvm.org/show_bug.cgi?id=46759. Differential Revision: https://reviews.llvm.org/D84152 (cherry picked from commit c3f9697f1f227296818fbaf1a770a29842ea454c)

This assert was added to verify assumption that GEP's SCEV will be of pointer type, basing on fact that it should be a SCEVAddExpr with (at least) last operand being pointer. Two notes: - GEP's SCEV does not have to be a SCEVAddExpr after all simplifications; - In current state, GEP's SCEV does not have to have at least one pointer operands (all of them can become int during the transforms). However, we might want to be at a point where it is true. We are currently removing this assert and will try to enumerate the cases where "is pointer" notion might be lost during the transforms. When all of them are fixed, we can return it. Differential Revision: https://reviews.llvm.org/D84294 Reviewed By: lebedev.ri (cherry picked from commit b96114c1e1fc4448ea966bce013706359aee3fa9)

since it's failing.

(cherry picked from commit 13ae440de4a408cf9d1a448def09769ecbecfdf7)

Fixes https://bugs.llvm.org/show_bug.cgi?id=46680. Just like insertions through IRBuilder, InsertNewInstBefore() should be using the deferred worklist mechanism, so that processing of newly added instructions is prioritized. There's one side-effect of the worklist order change which could be classified as a regression. An add op gets pushed through a select that at the time is not a umax. We could add a reverse transform that tries to push adds in the reverse direction to restore a min/max, but that seems like a sure way of getting infinite loops... Seems like something that should best wait on min/max intrinsics. Differential Revision: https://reviews.llvm.org/D84109 (cherry picked from commit d12ec0f752e7f2c7f7252539da2d124264ec33f7)

…VECTOR(X,0)) patterns. getTargetShuffleMask is used by the various "SimplifyDemanded" folds so we can't assume that the bypassed extract_subvector can be safely simplified - getFauxShuffleMask performs a more general decode that allows us to more safely catch many of these cases so the impact is minimal. (cherry picked from commit 5b5dc2442ac7a574a3b7d17c15ebeeb9eb3bec26)

…b asm instructions This patch provides optimization of bit manipulation operations by enabling the +experimental-b target feature. It adds matching of single block patterns of instructions to specific bit-manip instructions from the base subset (zbb subextension) of the experimental B extension of RISC-V. It adds also the correspondent codegen tests. This patch is based on Claire Wolf's proposal for the bit manipulation extension of RISCV: https://github.com/riscv/riscv-bitmanip/blob/master/bitmanip-0.92.pdf Differential Revision: https://reviews.llvm.org/D79870 (cherry picked from commit e2692f0ee7f338fea4fc918669643315cefc7678)

…p asm instructions This patch provides optimization of bit manipulation operations by enabling the +experimental-b target feature. It adds matching of single block patterns of instructions to specific bit-manip instructions from the permutation subset (zbp subextension) of the experimental B extension of RISC-V. It adds also the correspondent codegen tests. This patch is based on Claire Wolf's proposal for the bit manipulation extension of RISCV: https://github.com/riscv/riscv-bitmanip/blob/master/bitmanip-0.92.pdf Differential Revision: https://reviews.llvm.org/D79871 (cherry picked from commit 31b52b4345e36b169a2b6a89eac44651f59889dd)

…bp asm instructions This patch provides optimization of bit manipulation operations by enabling the +experimental-b target feature. It adds matching of single block patterns of instructions to specific bit-manip instructions belonging to both the permutation and the base subsets of the experimental B extension of RISC-V. It adds also the correspondent codegen tests. This patch is based on Claire Wolf's proposal for the bit manipulation extension of RISCV: https://github.com/riscv/riscv-bitmanip/blob/master/bitmanip-0.92.pdf Differential Revision: https://reviews.llvm.org/D79873 (cherry picked from commit 6144f0a1e52e7f5439a67267ca65f2d72c21aaa6)

…s asm instructions This patch provides optimization of bit manipulation operations by enabling the +experimental-b target feature. It adds matching of single block patterns of instructions to specific bit-manip instructions from the single-bit subset (zbs subextension) of the experimental B extension of RISC-V. It adds also the correspondent codegen tests. This patch is based on Claire Wolf's proposal for the bit manipulation extension of RISCV: https://github.com/riscv/riscv-bitmanip/blob/master/bitmanip-0.92.pdf Differential Revision: https://reviews.llvm.org/D79874 (cherry picked from commit d4be33374c07ea9a9362892876aa76b227298181)

…t asm instructions This patch provides optimization of bit manipulation operations by enabling the +experimental-b target feature. It adds matching of single block patterns of instructions to specific bit-manip instructions from the ternary subset (zbt subextension) of the experimental B extension of RISC-V. It adds also the correspondent codegen tests. This patch is based on Claire Wolf's proposal for the bit manipulation extension of RISCV: https://github.com/riscv/riscv-bitmanip/blob/master/bitmanip-0.92.pdf Differential Revision: https://reviews.llvm.org/D79875 (cherry picked from commit c9c955ada8e65205312f2bc41b46eefa0e98b36c)

For comdats (e.g. caused by -ffunction-sections), Section is already set here; make sure it's null, for the weak external symbol to be undefined. This fixes PR46779. Differential Revision: https://reviews.llvm.org/D84507 (cherry picked from commit 9e81d8bbf19d72fca3d87b7334c613d1aa2a5795)

This fixes PR 42837. Differential Revision: https://reviews.llvm.org/D84465 (cherry picked from commit 4d09ed953b5b8c70d9ca0aeaed8f26a237b612c6)

…th undef if needed when concatenating small or loads to match a larger load In the included test case the align 16 allowed the v23f32 load to handled as load v16f32, load v4f32, and load v4f32(one element not used). These loads all need to be concatenated together into a final vector. In this case we tried to concatenate the two v4f32 loads to match the type of the v16f32 load so we could do a second concat_vectors, but those loads alone only add up to v8f32. So we need to two v4f32 undefs to pad it. It appears we've tried to hack around a similar issue in this code before by adding undef padding to loads in one of the earlier loops in this function. Originally in r147964 by padding all loads narrower than previous loads to the same size. Later modifed to only the last load in r293088. This patch removes that earlier code and just handles it on demand where we know we need it. Fixes PR46820 Differential Revision: https://reviews.llvm.org/D84463 (cherry picked from commit 8131e190647ac2b5b085b48a6e3b48c1d7520a66)

…oads Unfortunately this is another regression from my canonicalization patch (1fed131660b2). The patch contained two implicit assumptions: 1. That we would have a permuted load only if we are loading a partial vector 2. That a partial vector load would necessarily be as wide as the splat However, assumption 2 is not correct since it is possible to do a wider load and only splat a half of it. This patch corrects this assumption by simply checking if the load is permuted and adjusting the offset if it is. (cherry picked from commit 7d076e19e31a2a32e357cbdcf0183f88fe1fb0fb)

I mixed up the precedence of operators in the assert and thought I had it right since there was no compiler warning. This just adds the parentheses in the expression as needed. (cherry picked from commit cdead4f89c0eecf11f50092bc088e3a9c6511825)

…irect branch (PR46857) SplitBlockPredecessors() can not split blocks that have such terminators, and in two other places we already ensure that we don't end up calling SplitBlockPredecessors() on such blocks. Do so in one more place. Fixes https://bugs.llvm.org/show_bug.cgi?id=46857 (cherry picked from commit 1da9834557cd4302a5183b8228ce063e69f82602)

(cherry picked from commit 30fa57662760e1489cf70cb411c55fbe9fc189fe)

As shown in D82998, the basic-aa-recphi option can cause miscompiles for gep's with negative constants. The option checks for recursive phi, that recurse through a contant gep. If it finds one, it performs aliasing calculations using the other phi operands with an unknown size, to specify that an unknown number of elements after the initial value are potentially accessed. This works fine expect where the constant is negative, as the size is still considered to be positive. So this patch expands the check to make sure that the constant is also positive. Differential Revision: https://reviews.llvm.org/D83576 (cherry picked from commit 311fafd2c90aed5b3fed9566503eebe629f1e979)

…it as livein to the basic blocks created when expanding the pseudo XBEGIN causes several based blocks to be inserted. If flags are live across it we need to make eflags live in the new basic blocks to avoid machine verifier errors. Fixes PR46827 Reviewed By: ivanbaev Differential Revision: https://reviews.llvm.org/D84479 (cherry picked from commit 647e861e080382593648b234668ad2f5a376ac5e)

… D83789 (cherry picked from commit bfc4294ef61d5cf69fffe6b64287a323c003d90f)

…HOP(X,Y)) An initial backend patch towards fixing the various poor HADD combines (PR34724, PR41813, PR45747 etc.). This extends isHorizontalBinOp to check if we have per-element horizontal ops (odd+even element pairs), but not in the expected serial order - in which case we build a "post shuffle mask" that we can apply to the HOP result, assuming we have fast-hops/optsize etc. The next step will be to extend the SHUFFLE(HOP(X,Y)) combines as suggested on PR41813 - accepting more post-shuffle masks even on slow-hop targets if we can fold it into another shuffle. Differential Revision: https://reviews.llvm.org/D83789 (cherry picked from commit 182111777b4ec215eeebe8ab5cc2a324e2f055ff)

(cherry picked from commit f75cf240d6ed528e1ce7770bbe09b417338b40ef)

v3i16 and v3f16 currently cannot be legalized and lowered so they should not be emitted by inst combining. Moved the check down to still allow extracting 1 or 2 elements via the dmask. Fixes image intrinsics being combined to return v3x16. Differential Revision: https://reviews.llvm.org/D84223 (cherry picked from commit 2c659082bda6319732118e746fe025d8d5f9bfac)

(cherry picked from commit 9853786ce39b9510eeb2688baaef7a364d58e113)

This isn't a natively supported operation, so convert it to a mask+compare. In addition to the operation itself, fix up some surrounding stuff to make the testcase work: we need concat_vectors on i1 vectors, we need legalization of i1 vector truncates, and we need to fix up all the relevant uses of getVectorNumElements(). Differential Revision: https://reviews.llvm.org/D83811 (cherry picked from commit b8f765a1e17f8d212ab1cd8f630d35adc7495556)

The default calling convention needs to save/restore the SVE callee saves according to the SVE PCS when the function takes or returns scalable types, even when the `aarch64_sve_vector_pcs` CC is not specified for the function. Reviewers: efriedma, paulwalker-arm, david-arm, rengolin Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D84041 (cherry picked from commit 9bacf1588583014538a0217add18f370acb95788)

This patch addresses two issues: * Forces the availability of the base-pointer (x19) when the frame has both scalable vectors and variable-length arrays. Otherwise it will be expensive to access non-SVE locals. * In presence of SVE stack objects, it will allocate the emergency scavenging slot close to the SP, so that they can be accessed from the SP or BP if available. If accessed from the frame-pointer, it will otherwise need an extra register to access the scavenging slot because of mixed scalable/non-scalable addressing modes. Reviewers: efriedma, ostannard, cameron.mcinally, rengolin, david-arm Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D70174 (cherry picked from commit bef56f7fe2382ed1476aa67a55626b364635b44e)

It's sort of tricky to hit this in practice, but not impossible. I have a synthetic C testcase if anyone is interested. The implementation is identical to the equivalent NEON register copies. Differential Revision: https://reviews.llvm.org/D84373 (cherry picked from commit 993c1a3219a8ae69f1d700183bf174d75f3815d4)

Fixed stack objects are preallocated and defined to be allocated before any of the regular stack objects. These are normally used to model stack arguments. The AAPCS does not support passing SVE registers on the stack by value (only by reference). The current layout also doesn't place them before all stack objects, but rather before all SVE objects. Removing this simplifies the code that emits the allocation/deallocation around callee-saved registers (D84042). This patch also removes all uses of fixedStack from from framelayout-sve.mir, where this was used purely for testing purposes. Reviewers: paulwalker-arm, efriedma, rengolin Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D84538 (cherry picked from commit 54492a5843a34684ce21ae201dd8ca3e509288fd)

Instead of aligning the last callee-saved-register slot to the stack alignment (16 bytes), just align the SVE callee-saved block. This also simplifies the code that allocates space for the callee-saves. This change is needed to make sure the offset to which the callee-saved register is spilled, corresponds to the offset used for e.g. unwind call frame instructions. Reviewers: efriedma, paulwalker-arm, david-arm, rengolin Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D84042 (cherry picked from commit 26b4ef3694973ea2fa656d3d3a7f67f16f135654)

While deallocating the stackframe, the offset used to reload the callee-saved registers was not pointing to the SVE callee-saves, but rather to the whole SVE area. +--------------+ | GRP callee | | saves | +--------------+ <- FP | SVE callee | | saves | +--------------+ <- Should restore SVE callee saves from here | SVE Spills | | and Locals | +--------------+ <- instead of from here. | | : : | | +--------------+ <- SP Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D84539 (cherry picked from commit cda2eb3ad2bbe923e74d6eb083af196a0622d800)

I have introduced a new TargetFrameLowering query function: isStackIdSafeForLocalArea that queries whether or not it is safe for objects of a given stack id to be bundled into the local area. The default behaviour is to always bundle regardless of the stack id, however for AArch64 this is overriden so that it's only safe for fixed-size stack objects. There is future work here to extend this algorithm for multiple local areas so that SVE stack objects can be bundled together and accessed from their own virtual base-pointer. Differential Revision: https://reviews.llvm.org/D83859 (cherry picked from commit 14bc85e0ebb6c00c1672158ab6a692bfbb11e1cc)

…plitVecOp_EXTRACT_SUBVECTOR In DAGTypeLegalizer::SplitVecOp_EXTRACT_SUBVECTOR I have replaced calls to getVectorNumElements with getVectorMinNumElements, since this code path works for both fixed and scalable vector types. For scalable vectors the index will be multiplied by VSCALE. Fixes warnings in this test: sve-sext-zext.ll Differential revision: https://reviews.llvm.org/D83198 (cherry picked from commit 5d84eafc6b86a42e261af8d753c3a823e0e7c67e)

Previous patches fixed up all the warnings in this test: llvm/test/CodeGen/AArch64/sve-sext-zext.ll and this change simply checks that no new warnings are added in future. Differential revision: https://reviews.llvm.org/D83205 (cherry picked from commit f43b5c7a76ab83dcc80e6769d41d5c4b761312b1)

I have added tests to: CodeGen/AArch64/sve-intrinsics-int-arith.ll for doing simple integer add operations on tuple types. Since these tests introduced new warnings due to incorrect use of getVectorNumElements() I have also fixed up these warnings in the same patch. These fixes are: 1. In narrowExtractedVectorBinOp I have changed the code to bail out early for scalable vector types, since we've not yet hit a case that proves the optimisations are profitable for scalable vectors. 2. In DAGTypeLegalizer::WidenVecRes_CONCAT_VECTORS I have replaced calls to getVectorNumElements with getVectorMinNumElements in cases that work with scalable vectors. For the other cases I have added asserts that the vector is not scalable because we should not be using shuffle vectors and build vectors in such cases. Differential revision: https://reviews.llvm.org/D84016 (cherry picked from commit 207877175944656bd9b52d36f391a092854572be)

…orizeChainsInBlock In vectorizeChainsInBlock we try to collect chains of PHI nodes that have the same element type, but the code is relying upon the implicit conversion from TypeSize -> uint64_t. For now, I have modified the code to ignore PHI nodes with scalable types. Differential Revision: https://reviews.llvm.org/D83542 (cherry picked from commit 9ad7c980bb47edd7db8f8db828b487cc7dfc9921)

…th scalable types When building code at -O0 We weren't falling back to DAG ISel correctly when encountering alloca instructions with scalable vector types. This is because the alloca has no operands that are scalable. I've fixed this by adding a check in AArch64ISelLowering::fallBackToDAGISel for alloca instructions with scalable types. Differential Revision: https://reviews.llvm.org/D84746 (cherry picked from commit 23ad660b5d34930b2b5362f1bba63daee78f6aa4)

Reviewers: kmclaughlin, efriedma, sdesmalen Subscribers: tschuett, hiraditya, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83357 (cherry picked from commit 809600d6642773f71245f76995dab355effc73af)

...with the non-template version, as the template version might increase the size of the compiler build. Methods affected: 1.`findAddrModeSVELoadStore` 2. `SelectPredicatedStore` Also, remove the `const` qualifier from the `unsigned` parameters of the methods to conform with other similar methods in the class. (cherry picked from commit dbeb184b7f54db2d3ef20ac153b1c77f81cf0b99)

Reviewers: c-rhodes, efriedma, sdesmalen Subscribers: huihuiz, tschuett, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77251 (cherry picked from commit adb28e0fb2b0e97ea9dce422c09b36979cf7cd2f)

…e header It turned out that the D78704 included a private LLVM header, which is excluded from the LLVM install target. I'm substituting that `#include` with the public one by moving the necessary `#define` into that. There was a discussion about this at D78704 and on the cfe-dev mailing list. I'm also placing a note to remind others of this pitfall. Reviewed By: mgorny Differential Revision: https://reviews.llvm.org/D84929 (cherry picked from commit 63d3aeb529a7b0fb95c2092ca38ad21c1f5cfd74)

In cases where the alignment of the datatype is smaller than expected by the instruction, the address is aligned. The aligned address is used for the load, but wasn't used for the store conditional, which resulted in a run-time alignment exception. (cherry picked from commit 7b114446c320de542c50c4c02f566e5d18adee33)

Currently we skip alias sets with only reads or a single write and no reads, but still add the pointers to the list of pointers in RtCheck. This can lead to cases where we try to access a pointer that does not exist when grouping checks. In most cases, the way we access PositionMap masked that, as the value would default to index 0. But in the example in PR46854 it causes a crash. This patch updates the logic to avoid adding pointers for alias sets that do not need any checks. It makes things slightly more verbose, by first checking the numbers of reads/writes and bailing out early if we don't need checks for the alias set. I think this makes the logic a bit simpler to follow. Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D84608 (cherry picked from commit 2062b3707c1ef698deaa9abc571b937fdd077168)

BUG_REPORT_URL is currently used both in LLVM and in Clang but declared only in the latter. This means that it's missing in standalone clang builds and the driver ends up outputting: PLEASE submit a bug report to and include [...] (note the missing URL) To fix this, include LLVM_PACKAGE_BUGREPORT in LLVMConfig.cmake (similarly to how we pass PACKAGE_VERSION) and use it to fill BUG_REPORT_URL when building clang standalone. Differential Revision: https://reviews.llvm.org/D84987 (cherry picked from commit 21c165de2a1bcca9dceb452f637d9e8959fba113)

…itLoadInst Summary: This is in response to the review of https://reviews.llvm.org/D84873: The expensive check should be reordered last Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D84890 (cherry picked from commit 243376cdc7b719d443f42c8c4667e5d96af53dcc)

This fixes the modules build. (cherry picked from commit 1b3c25e7b61f44b80788f8758f0d7f0b013135b5)

…nted relocations (PR46816) This fixes the ExecutionEngine/MCJIT/stubs-sm-pic.ll test in no-asserts builds which is set to XFAIL on some platforms like 32-bit x86. More importantly, we probably don't want to silently error in these cases. Differential revision: https://reviews.llvm.org/D84390 (cherry picked from commit 6a3b07a4bf14be32569550f2e9814d8797d27d31)

This can practically easily be a product of combining strings with macros in resource files. This fixes mstorsjo/llvm-mingw#140. As string literals within llvm-rc are handled as StringRefs, each referencing an uninterpreted slice of the input file, with actual interpretation of the input string (codepage handling, unescaping etc) done only right before writing them out to disk, it's hard to concatenate them other than just bundling them up in a vector, without rearchitecting a large part of llvm-rc. This matches how the same already is supported in VersionInfoValue, with a std::vector<IntOrString> Values. MS rc.exe only supports concatenated string literals in version info values (already supported), string tables (implemented in this patch) and user data resources (easily implemented in a separate patch, but hasn't been requested by any end user yet), while GNU windres supports string immediates split into multiple strings anywhere (e.g. like (100 ICON "myicon" ".ico"). Not sure if concatenation in other statements actually is used in the wild though, in resource files normally built by GNU windres. Differential Revision: https://reviews.llvm.org/D85183 (cherry picked from commit b989fcbae6f179ad887d19ceef83ace1c00b87cc)

In fixupIsDeadOrKill, we assume StartMI and EndMI not exist in same basic block, so we add an assertion in that function. This is wrong before RA, as before RA the true definition may exist in another block through copy like instructions. Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D83365 (cherry picked from commit 36f9fe2d3493717dbc6866d96b2e989839ce1a4c)

These might occur in seemingly generic assembly. Previously when targeting COFF, they were silently ignored, which certainly won't give the right result. Instead clearly error out, to make it clear that the assembly needs to be adjusted for this target. Also change a preexisting report_fatal_error into a proper error message, pointing out the offending source instruction. This isn't strictly an internal error, as it can be triggered by user input. Differential Revision: https://reviews.llvm.org/D85242 (cherry picked from commit f5e6fbac24f198d075a7c4bc0879426e79040bcf)

Add given input and mark it as tied. Doesn't create additional copy compared to matching input constraint to virtual register. Differential Revision: https://reviews.llvm.org/D85122 (cherry picked from commit d893278bba01b0e1209e8b8accbdd5cfa75a0932)

The CFA is calculated as (SP/FP + offset), but when there are SVE objects on the stack the SP offset is partly scalable and should instead be expressed as the DWARF expression: SP + offset + scalable_offset * VG where VG is the Vector Granule register, containing the number of 64bits 'granules' in a scalable vector. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D84043 (cherry picked from commit fd6584a22043b254a323635c142b28ce80ae5b5b)

This patch adds a CFI entry for each SVE callee saved register that needs unwind info at an offset from the CFA. The offset is a DWARF expression because the offset is partly scalable. The CFI entries only cover a subset of the SVE callee-saves and only encodes the lower 64-bits, thus implementing the lowest common denominator ABI. Existing unwinders may support VG but only restore the lower 64-bits. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D84044 (cherry picked from commit bb3344c7d8c2703c910dd481ada43ecaf11536a6)

This fixes an issue triggered by the following code, where emitEpilogue got confused when trying to restore the SVE registers after the call, whereas the call to bar() is implemented as a TCReturn: int non_sve(); int sve(svint32_t x) { return non_sve(); } Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D84869 (cherry picked from commit f2916636f83dfeb4808a16045db0025783743471)

Fixed an incorrect pattern in lib/Target/AArch64/AArch64SVEInstrInfo.td for storing out <vscale x 2 x f32> unpacked scalable vectors. Added a couple of tests to test/CodeGen/AArch64/sve-st1-addressing-mode-reg-imm.ll Differential Revision: https://reviews.llvm.org/D85441 (cherry picked from commit 0905d9f31ead399d054c5d2a2c353e690f5c8daa)

Introduced by fd6584a22043b254a323635c142b28ce80ae5b5b Following similar use of casts in AsmParser.cpp, for instance - ideally this type would use unsigned chars as they're more representative of raw data and don't get confused around implementation defined choices of char's signedness, but this is what it is & the signed/unsigned conversions are (so far as I understand) safe/bit preserving in this usage and what's intended, given the API design here. (cherry picked from commit e31cfc4cd3e393300002e9c519787c96e3b67bab)

The code wasn't taking into account that the two operands passed to ptest could be identical and was trying to erase them twice. Differential Revision: https://reviews.llvm.org/D85892 (cherry picked from commit 6c7957c9901714b7ad0a8d2743a8c431b57fd0c9)

Reviewed By: lkail Differential Revision: https://reviews.llvm.org/D85659 (cherry picked from commit 4d52ebb9b9c72b656c1ccb6a1424841f246cd791)

…< C (PR47133) While x*undef is undef, shift-by-undef is poison, which we must avoid introducing. Also log2(iN undef) is *NOT* iN undef, because log2(iN undef) u< N. See https://bugs.llvm.org/show_bug.cgi?id=47133 (cherry picked from commit 12d93a27e7b78d58dd00817cb737f273d2dba8ae)

… after D83273 Previously the time complexity is O(|number of paths from the root to an implied feature| * CPU_FWATURE_MAX) where CPU_FEATURE_MAX is 92. The number of paths can be large (theoretically exponential). For an inline asm statement, there is a code path `clang::Parser::ParseAsmStatement -> clang::Sema::ActOnGCCAsmStmt -> ASTContext::getFunctionFeatureMap` leading to potentially many calls of getImpliedEnabledFeatures (41 for my -march=native case). We should improve the performance a bit in case the number of inline asm statements is large (Linux kernel builds). Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D85257 (cherry picked from commit 0c7af8c83bd1acb0ca78f35ddde29b6fde4363a0)

(cherry picked from commit 13796d14238baabff972e15ceddb4ae61b1584b8)

… when the loc VT is a different size than the original element. For example a v4f16 argument is scalarized to 4 i32 values. So the values are spread out instead of being packed tightly like in the original vector. Fixes PR47000. (cherry picked from commit 08b2d0a963dbbf54317a137d69f430b347d1bfae)

Differential Revision: https://reviews.llvm.org/D85977

…ze of DIVariables When turning on -debug-info-kind=constructor we ran into a "fragment covers entire variable" error during thinlto. The fragment is currently always emitted if there is no type size, but sometimes the variable has a forward declared struct type which doesn't have a size. This changes the code to get the type size from the GlobalVariable instead. Differential Revision: https://reviews.llvm.org/D85572 (cherry picked from commit 54b6cca0f28484395ae43bcda4c9f929bc51cfe3)

This fixes the "Unable to insert indirect branch" fatal error sometimes seen when generating position-independent code. Patch by msizanoen1 Reviewed By: jrtc27 Differential Revision: https://reviews.llvm.org/D84833 (cherry picked from commit 5f9ecc5d857fa5d95f6ea36153be19db40576f8a)

D77531 has a type for mfsprg, it should be mtsprg. This patch is to fix this typo. (cherry picked from commit 95e18b2d9d5f93c209ea81df79c2e18ef77de506)

Replace the `ident_t` handling in Clang with the methods offered by the OMPIRBuilder. This cuts down on the clang code as well as the differences between the two, making further transitions easier. Tests have changed but there should not be a real functional change. The most interesting difference is probably that we stop generating local ident_t allocations for now and just use globals. Given that this happens only with debug info, the location part of the `ident_t` is probably bigger than the test anyway. As the location part is already a global, we can avoid the allocation, memcpy, and store in favor of a constant global that is slightly bigger. This can be revisited if there are complications. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D80735

We hit the compiling time reported by https://bugs.llvm.org/show_bug.cgi?id=46877 and the reason is the same as D77319. So we need to remove the dead node we created to avoid increase the problem size of DAGCombiner. Reviewed By: Spatel Differential Revision: https://reviews.llvm.org/D86183 (cherry picked from commit 960cbc53ca170c8c605bf83fa63b49ab27a56f65)

Text by Saleem!

The version of `st1d` that operates with vector plus immediate addressing mode uses the alias `st1d { <Zn>.d }, <Pg>, [<Za>.d]` for rendering `st1d { <Zn>.d }, <Pg>, [<Za>.d, #0]`. The disassembler was generating `<Zn>.s` instead of `<Zn>.d>`. Differential Revision: https://reviews.llvm.org/D86633

When collecting `i1` values via `findAllDefs`, ignore Constant's operands, since Constant's operands might not be `i1`. Fixes https://bugs.llvm.org/show_bug.cgi?id=46923 which causes ICE ``` llvm-project/llvm/lib/IR/Constants.cpp:1924: static llvm::Constant *llvm::ConstantExpr::getZExt(llvm::Constant *, llvm::Type *, bool): Assertion `C->getType()->getScalarSizeInBits() < Ty->getScalarSizeInBits()&& "SrcTy must be smaller than DestTy for ZExt!"' failed. ``` Differential Revision: https://reviews.llvm.org/D85007 (cherry picked from commit cbea17568f4301582c1d5d43990f089ca6cff522)

…wering vector arguments When joining the legal parts of vector arguments into its original value during the lower of Formal Arguments in SelectionDAGBuilder, the Calling Convention information was not being propagated for the handling of each individual parts. The same did not happen when lowering calls, causing a mismatch. This patch fixes the issue by properly propagating the Calling Convention details. This fixes Bugzilla #47001. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D86715 (cherry picked from commit 3d943bcd223e5b97179840c2f5885fe341e51747)

This fixes an issue where the restore point of callee-saves in the function epilogues was incorrectly calculated when the basic block consisted of only a RET instruction. This caused dealloc instructions to be inserted in between the block of callee-save restore instructions, rather than before it. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D86099 (cherry picked from commit 5f47d4456d192eaea8c56a2b4648023c8743c927)

This patch adds type information for SVE ACLE vector types, by describing them as vectors, with a lower bound of 0, and an upper bound described by a DWARF expression using the AArch64 Vector Granule register (VG), which contains the runtime multiple of 64bit granules in an SVE vector. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D86101 (cherry picked from commit 4e9b66de3f046c1e97b34c938b0920fa6401f40c)

…etter code generation. Patch by: Philip Guenther (cherry picked from commit d870e363263835bec96c83f51b20e64722cad742)

This is the follow up patch for https://reviews.llvm.org/D86183 as we miss to delete the node if NegX == NegY, which has use after we create the node. ``` if (NegX && (CostX <= CostY)) { Cost = std::min(CostX, CostZ); RemoveDeadNode(NegY); return DAG.getNode(Opcode, DL, VT, NegX, Y, NegZ, Flags); #<-- NegY is used here if NegY == NegX. } ``` Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D86689 (cherry picked from commit deb4b2580715810ecd5cb7eefa5ffbe65e5eedc8)

Tests on Solaris/sparcv9 currently show about 250 failures when building with gcc, most of them like the following: FAIL: LLVM-Unit :: Support/./SupportTests/TaskQueueTest.UnOrderedFutures (4269 of 67884) ******************** TEST 'LLVM-Unit :: Support/./SupportTests/TaskQueueTest.UnOrderedFutures' FAILED ******************** Note: Google Test filter = TaskQueueTest.UnOrderedFutures [==========] Running 1 test from 1 test case. [----------] Global test environment set-up. [----------] 1 test from TaskQueueTest [ RUN ] TaskQueueTest.UnOrderedFutures 0 SupportTests 0x0000000100753b20 llvm::sys::PrintStackTrace(llvm::raw_ostream&) + 32 1 SupportTests 0x0000000100752974 llvm::sys::RunSignalHandlers() + 68 2 SupportTests 0x0000000100752b18 SignalHandler(int) + 372 3 libc.so.1 0xffffffff7eedc800 __sighndlr + 12 4 libc.so.1 0xffffffff7eecf23c call_user_handler + 852 5 libc.so.1 0xffffffff7eecf594 sigacthandler + 84 6 SupportTests 0x00000001006f8cb8 std::thread::_State_impl<std::thread::_Invoker<std::tuple<llvm::ThreadPool::ThreadPool(llvm::ThreadPoolStrategy)::'lambda'()> > >::_M_run() + 512 7 libstdc++.so.6.0.28 0xfffffffc628117cc execute_native_thread_routine + 16 8 libc.so.1 0xffffffff7eedc6a0 _lwp_start + 0 Since it's effectively impossible to debug such a `SEGV` in a `Release` build, I tried a `Debug` build instead, only to find that the failures had gone away. Further investigation revealed that most of the issue centers around `llvm/lib/Support/ThreadPool.cpp`. That file is built with `-O3 -fPIC` in a `Release` build. The failure vanishes if - compiling without `-fPIC` - compiling with `-O -fPIC` - linking with GNU `ld` instead of Solaris `ld` It has meanwhile been determined that `gcc` doesn't correctly heed some TLS code sequences. To make things worse, Solaris `ld` doesn't properly validate its assumptions against the input, generating wrong code. `gld` like `gcc` is more liberal here and correctly deals with the code it gets fed from `gcc`. There's PR target/96607: GCC feeds SPARC/Solaris linker with unrecognized TLS sequences <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96607> now. An attempt to build with `-DLLVM_ENABLE_PIC=Off` initially failed since neither `libRemarks.so` (D85626 <https://reviews.llvm.org/D85626>) nor `LLVMPolly.so` (D85627 <https://reviews.llvm.org/D85627>) heed that option. Even with that fixed, a few codegen failures remain. Next I tried to build just `ThreadPool.cpp` with `-O -fPIC`. While that fixed the vast majority of the failures, 16 `LLVM :: CodeGen/X86` failures remained. Given that that solution was both incomplete and fragile, I went for building the whole tree with `-O -fPIC` for `Release` and `RelWithDebInfo` builds. As detailed in Bug 47304, 2-stage builds also show large numbers of failures when building with `-O3` or `-O2`, which are likewise worked around by building with `-O` until they are sufficiently analyzed and fixed. This way, all failures relative to a `Debug` build go away. Tested on `sparcv9-sun-solaris2.11`. Differential Revision: https://reviews.llvm.org/D85630 (cherry picked from commit 15c66b10114d239c96282cf8fc5330186178974b)

…(PR47322) Replace the check for poison-producing instructions in SimplifyWithOpReplaced() with the generic helper canCreatePoison() that properly handles poisonous shifts and thus avoids the problem from PR47322. This additionally fixes a bug in IIQ.UseInstrInfo=false mode, which previously could have caused this code to ignore poison flags. Setting UseInstrInfo=false should reduce the possible optimizations, not increase them. This is not a full solution to the problem, as poison could be introduced more indirectly. This is just a minimal, easy to backport fix. Differential Revision: https://reviews.llvm.org/D86834 (cherry picked from commit a5be86fde5de2c253aa19704bf4e4854f1936f8c)

Summary: PPC only supports the instruction selection for v16i8, v8i16, v4i32, v2i64, v4f32 and v2f64 for ISD::SETCC, don't support the v1i128, so v1i128 for ISD::SETCC will crash. This patch is to set v1i128 to expand to avoid crash. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D84238 (cherry picked from commit 802c043078ad653aca131648a130b59f041df0b5)

Since the parameter is not used anywhere, and the default size of 16 apparently causes PR47359, remove it. This ensures that IntervalMap will automatically determine the optimal size, using its NodeSizer struct. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D87044 (cherry picked from commit f26fc568402f84a94557cbe86e7aac8319d61387)

Fixes PR47375, in which an assertion was triggering because WebAssemblyTargetLowering::isVectorLoadExtDesirable was improperly assuming the use of simple value types. Differential Revision: https://reviews.llvm.org/D87110 (cherry picked from commit caee15a0ed52471bd329d01dc253ec9be3936c6d)

Quite a while ago, we legalized these nodes as we added custom handling for reciprocal estimates in the back end. We have since moved to target-independent combines but neglected to turn off legalization. As a result, we can now get selection failures on non-VSX subtargets as evidenced in the listed PR. Fixes: https://bugs.llvm.org/show_bug.cgi?id=47373 (cherry picked from commit 27714075848e7f05a297317ad28ad2570d8e5a43)

The test case in https://bugs.llvm.org/show_bug.cgi?id=47373 exposed two bugs in the PPC back end. The first one was fixed in commit 27714075848e7f05a297317ad28ad2570d8e5a43 but the test case had to be added without -verify-machineinstrs due to the second bug. This commit fixes the use-after-kill that is left behind by the PPC MI peephole optimization. (cherry picked from commit 69289cc10ffd1de4d3bf05d33948e6b21b6e68db)

…s match in addition to the source registers. Previously if the source match we asserted that the destination matched. But GPR <-> mask register copies on X86 can violate this since we use the same K-registers for multiple sizes. Fixes this ISPC issue ispc/ispc#1851 Differential Revision: https://reviews.llvm.org/D86507 (cherry picked from commit 4783e2c9c603ed6aeacc76bb1177056a9d307bd1)

This patch is cherry-picked from 04b0a4e22e3b4549f9d241f8a9f37eebecb62a31, and amended to prevent an undefined reference to `llvm::EnableABIBreakingChecks' (cherry picked from commit 38778e1087b2825e91b07ce4570c70815b49dcdc)

SSE4_1 and SSE4_2 due imply SSSE3. So I guess I got confused when switching the code to being table based in D83273. Fixes PR47464 (cherry picked from commit e6bb4c8e7b3e27f214c9665763a2dd09aa96a5ac)

…t PowerPC in PPCTargetLowering Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D86165 (cherry picked from commit 88b368a1c47bca536f03041f7464235b94ea98a1)

…ubrange. This is to fix CodeView build failure https://bugs.llvm.org/show_bug.cgi?id=47287 after DIsSubrange upgrade D80197 Assert condition is now removed and Count is calculated in case LowerBound is absent or zero and Count or UpperBound is constant. If Count is unknown it is later handled as VLA (currently Count is set to zero). Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D87406 (cherry picked from commit e45b0708ae81ace27de53f12b32a80601cb12bf3)

It was found some packed immediate operands (e.g. `<half 1.0, half 2.0>`) are incorrectly processed so one of two packed values were lost. Introduced new function to check immediate 32-bit operand can be folded. Converted condition about current op_sel flags value to fall-through. Fixes: SWDEV-247595 Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D87158 (cherry picked from commit d03c4034dc80c944ec4a5833ba8f87d60183f866)

Canonicalize icmp ne to icmp eq and implement all the folds only once.

This is a followup to D86834, which partially fixed this issue in InstSimplify. However, InstCombine repeats the same transform while dropping poison flags -- which does not cover cases where poison is introduced in some other way. The fix here is a bit more comprehensive, because things are quite entangled, and it's hard to only partially address it without regressing optimization. There are really two changes here: * Export the SimplifyWithOpReplaced API from InstSimplify, with an added AllowRefinement flag. For replacements inside the TrueVal we don't actually care whether refinement occurs or not, the replacement is always legal. This part of the transform is now done in InstSimplify only. (It should be noted that the current AllowRefinement check is not sufficient -- that's an issue we need to address separately.) * Change the InstCombine fold to work by temporarily dropping poison generating flags, running the fold and then restoring the flags if it didn't work out. This will ensure that the InstCombine fold is correct as long as the InstSimplify fold is correct. Differential Revision: https://reviews.llvm.org/D87445

960cbc53 immediately removes nodes that won't be used to avoid compilation time explosion. This patch adds the removal to constants to fix PR47517. Reviewed By: RKSimon, steven.zhang Differential Revision: https://reviews.llvm.org/D87614 (cherry picked from commit 2508ef014e8b01006de4e5ee6fd451d1f68d550f)

The code that decomposes the GEP into ADD/MUL doesn't work properly for vector GEPs. It can create bad COPY instructions or possibly assert. For now just bail out to SelectionDAG. Fixes PR45906 (cherry picked from commit 4208ea3e19f8e3e8cd35e6f5a6c43f4aa066c6ec)

This check fires during self-host. > The approach is simple: if a pass reports that it's not modifying a > Function/Module, compute a loose hash of that Function/Module and compare it > with the original one. If we report no change but there's a hash change, then we > have an error. > > This approach misses a lot of change but it's not super intrusive and can > detect most of the simple mistakes. > > Differential Revision: https://reviews.llvm.org/D80916 This reverts commit 3667d87a33d3c8d4072a41fd84bb880c59347dc0.

…ion" 2508ef01 doesn't totally fix the issue since we did not handle the case when unused temporary negated result is the same with the result, which is found by address sanitizer. (cherry picked from commit e1669843f2aaf1e4929afdd8f125c14536d27664)

This adds documentation for the options added / changed by D71913, which enabled aggressive WPD under LTO. The lld release notes already mentioned it, but I expanded the note. Differential Revision: https://reviews.llvm.org/D86958

This seems to have caused incorrect register allocation in some cases, breaking tests in the Zig standard library (PR47278). As discussed on the bug, revert back to green for now. > Record internal state based on register units. This is often more > efficient as there are typically fewer register units to update > compared to iterating over all the aliases of a register. > > Original patch by Matthias Braun, but I've been rebasing and fixing it > for almost 2 years and fixed a few bugs causing intermediate failures > to make this patch independent of the changes in > https://reviews.llvm.org/D52010. This reverts commit 66251f7e1de79a7c1620659b7f58352b8c8e892e, and follow-ups 931a68f26b9a3de853807ffad7b2cd0a2dd30922 and 0671a4c5087d40450603d9d26cf239f1a8b1367e. It also adjust some test expectations. (cherry picked from commit a21387c65470417c58021f8d3194a4510bb64f46)

By Ahsan Saghir!

…or STV_DEFAULT only This patch restricts the behaviour of referencing via .Lfoo$local local aliases, introduced in https://reviews.llvm.org/D73230, to STV_DEFAULT globals only. Hidden symbols via --fvisiblity=hidden (https://gcc.gnu.org/wiki/Visibility) is an important scenario. Benefits: - Improves the size of object files by using fewer STT_SECTION symbols. - The code reads a bit better (it was not obvious to me without going back to the code reviews why the canBenefitFromLocalAlias function currently doesn't consider visibility). - There is also a side benefit in restoring the effectiveness of the --wrap linker option and making the behavior of --wrap consistent between LTO and normal builds for references within a translation-unit. Note: this --wrap behavior (which is specific to LLD) should not be considered reliable. See comments on https://reviews.llvm.org/D73230 for more. Differential Revision: https://reviews.llvm.org/D85782 (cherry picked from commit 4cb016cd2d8467c572b2e5c5d34f376ee79e4ac1)

2508ef01 fixed a bug about constant removal in negation. But after sanitizing check I found there's still some issue about it so it's reverted. Temporary nodes will be removed if useless in negation. Before the removal, they'd be checked if any other nodes used it. So the removal was moved after getNode. However in rare cases the node to be removed is the same as result of getNode. We missed that and will be fixed by this patch. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D87614 (cherry picked from commit a2fb5446be960ad164060b3c05fc268f7f72d67a)

Matches C++20 API addition. Differential Revision: https://reviews.llvm.org/D83449 (cherry picked from commit a0385bd7acd6e1d16224b4257f4cb50e59f1d75e)

…y inserted after an INLINEASM_BR. findPHICopyInsertPoint special cases placement in a block with a callbr or invoke in it. In that case, we must ensure that the copy is placed before the INLINEASM_BR or call instruction, if the register is defined prior to that instruction, because it may jump out of the block. Previously, the code placed it immediately after the last def _or use_. This is wrong, if the use is the instruction which may jump. We could correctly place it immediately after the last def (ignoring uses), but that is non-optimal for register pressure. Instead, place the copy after the last def, or before the call/inlineasm_br, whichever is later. Differential Revision: https://reviews.llvm.org/D87865 (cherry picked from commit f7a53d82c0902147909f28a9295a9d00b4b27d38)

…uilder SelectionDAGBuilder was inconsistently mangling values based on ABI Calling Conventions when getting them through copyFromRegs in SelectionDAGBuilder, causing duplicate value type convertions for function arguments. The checking for the mangling requirement was based on the value's originating instruction and was performed outside of, and inspite of, the regular Calling Convention Lowering. The issue could be observed in a scenario such as: ``` %arg1 = load half, half* %const, align 2 %arg2 = call fastcc half @someFunc() call fastcc void @otherFunc(half %arg1, half %arg2) ; Here, %arg2 was incorrectly mangled twice, as the CallConv data from ; the call to @someFunc() was taken into consideration for the check ; when getting the value for processing the call to @otherFunc(...), ; after the proper convertion had taken place when lowering the return ; value of the first call. ``` This patch fixes the issue by disregarding the Calling Convention information for such copyFromRegs, making sure the ABI mangling is properly contanined in the Calling Convention Lowering. This fixes Bugzilla #47454. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D87844 (cherry picked from commit 53d238a961d14eae46f6f2b296ce48026c7bd0a1)

This is the relevant portions of an assert fixed by b98f902f1877c3d679f77645a267edc89ffcd5d6.

This fixes a verifier error in the testcase from bug 47619. The stack passed s3 value was widened to 4-bytes, and producing a 4-byte memory access with a < 1 byte result type. We need to either widen the result type or narrow the access size. This copies the code directly from the AMDGPU handling, which narrows the load size. I don't like that every target has to handle this, but this is currently broken on the 11 release branch and this is the simplest fix. This reverts commit 42bfa7c63b85e76fe16521d1671afcafaf8f64ed. (cherry picked from commit 6cb0d23f2ea6fb25106b0380797ccbc2141d71e1)

…res 8 bytes to store This is a fix for PR47630. The regression is caused by the D78011. After this change the code starts to call the `emitGlobalConstantLargeInt` even for constants which requires eight bytes to store. Differential revision: https://reviews.llvm.org/D88261 (cherry picked from commit c6c5629f2fb4ddabd376fbe7c218733283e91d09)

This commit fixes a regression (from LLVM 10 to LLVM 11 RC3) in the LLVM C API. Previously, commit 1ee6ec2bf removed the mask operand from the ShuffleVector instruction, storing the mask data separately in the instruction instead; this reduced the number of operands of ShuffleVector from 3 to 2. AFAICT, this change unintentionally caused a regression in the LLVM C API. Specifically, it is no longer possible to get the mask of a ShuffleVector instruction through the C API. This patch introduces new functions which together allow a C API user to get the mask of a ShuffleVector instruction, restoring the functionality which was previously available through LLVMGetOperand(). This patch also adds tests for this change to the llvm-c-test executable, which involved adding support for InsertElement, ExtractElement, and ShuffleVector itself (as well as constant vectors) to echo.cpp. Previously, vector operations weren't tested at all in echo.ll. I also fixed some typos in comments and help-text nearby these changes, which I happened to spot while developing this patch. Since the typo fixes are technically unrelated other than being in the same files, I'm happy to take them out if you'd rather they not be included in the patch. Differential Revision: https://reviews.llvm.org/D88190 (cherry picked from commit 51cad041e0cb26597c7ccc0fbfaa349b8fffbcda)

It is not a good idea to expose raw constants in the LLVM C API. Replace this with an explicit getter. Differential Revision: https://reviews.llvm.org/D88367 (cherry picked from commit 55f727306e727ea9f013d09c9b8aa70dbce6a1bd)

The test would fail in no-asserts release builds using MSVC for 64-bit Windows: Unexpected error message: TestBuffer:1:1: error: implicit format conflict between 'FOO' (%u) and '18\0' (%x), need an explicit format specifier Error message(s) not found: {implicit format conflict between 'FOO' (%u) and 'BAZ' (%x), need an explicit format specifier} It seems a string from a previous test case is finding its way into the latter one. This doesn't reproduce on master anymore after 998709b7d, so let's just hack around it here for the branch.

Differential Revision: https://reviews.llvm.org/D88479

…ating invalid MIR. During lowering of G_UMULO and friends, the previous code moved the builder's insertion point to be after the legalizing instruction. When that happened, if there happened to be a "G_CONSTANT i32 0" immediately after, the CSEMIRBuilder would try to find that constant during the buildConstant(zero) call, and since it dominates itself would return the iterator unchanged, even though the def of the constant was *after* the current insertion point. This resulted in the compare being generated *before* the constant which it was using. There's no need to modify the insertion point before building the mul-hi or constant. Delaying moving the insert point ensures those are built/CSEd before the G_ICMP is built. Fixes PR47679 Differential Revision: https://reviews.llvm.org/D88514 (cherry picked from commit 1d54e75cf26a4c60b66659d5d9c62f4bb9452b03)

We shift the significand right on a truncation, but that needs to be made NaN-safe: always set at least 1 bit in the significand. https://llvm.org/PR43907 See D88238 for the likely follow-up (but needs some plumbing fixes before it can proceed). Differential Revision: https://reviews.llvm.org/D87835 (cherry picked from commit e34bd1e0b03d20a506ada156d87e1b3a96d82fa2)

By Lang Hames!

As suggested by Yvan.

This reverts partial of a2fb5446 (actually, 2508ef01) about removing negated FP constant immediately if it has no uses. However, as discussed in bug 47517, there're cases when NegX is folded into constant from other places while NegY is removed by that line of code and NegX is equal to NegY. In these cases, NegX is deleted before used and crash happens. So revert the code and add necessary test case. (cherry picked from commit b326d4ff946d2061a566a3fcce9f33b484759fe0)

Tail duplication of a block with an INLINEASM_BR may result in a PHI node on the indirect branch. This is okay, but it also introduces a copy for that PHI node *after* the INLINEASM_BR, which is not okay. See: ClangBuiltLinux/linux#1125 Differential Revision: https://reviews.llvm.org/D88823 (cherry picked from commit d2c61d2bf9bd1efad49acba2f2751112522686aa)

Force RIP-relative jump tables and global values Force RIP-relative all zeros / all ones constants These things were causing crashes due to use of absolute addressing

This is extremely slow yet unnecessary with manual finalization. In LLVM 6 this wasn't a problem.

It makes emitting object extremely slow. GDB doesn't work properly with it anyway. GDB also often crashes because it cannot read the format.

Hide operator !=

Emit pair of shifts of double size if possible

Allow all integer widths in the pattern, allow ashr Handle signed and mixed cases, allowing to replace truncation

Detect clamping ashr shift amount to max legal value

Replace VSELECT instruction which zeroes their result on exceeding legal SHL/SRL shift amount.

…ADD/SUB/AND op. Prefer vector-vector shifts if available (AVX2+). Improves code generated for rotate and funnel shifts. Otherwise it would generate a shuffle + slower vector-scalar shift.

The test directory severely inflates the size of the AUR clone, and we're not even using the tests

Treat Zen3 as Zen2 until upstream adds Zen3 support

Commits on Jul 15, 2020

First commit on the release/11.x branch.

zmodem committed Jul 15, 2020

Configuration menu

View commit details

Copy full SHA for f4821a9

Browse repository at this point

Copy the full SHA

f4821a9 View commit details

Browse the repository at this point in the history

Commits on Sep 5, 2020

ReleaseNotes: Add RISC-V updates

asb committed Sep 5, 2020

Configuration menu

View commit details

Copy full SHA for b00850c

Browse repository at this point

Copy the full SHA

b00850c View commit details

Browse the repository at this point in the history

Commits on Mar 28, 2021

C++20 fixes

xddxd authored and Nekotekina committed Mar 28, 2021

Configuration menu

View commit details

Copy full SHA for 5d8643e

Browse repository at this point

Copy the full SHA

5d8643e View commit details

Browse the repository at this point in the history

Commits on Apr 18, 2021

Ninja build

xddxd committed Apr 18, 2021

Configuration menu

View commit details

Copy full SHA for 004a051

Browse repository at this point

Copy the full SHA

004a051 View commit details

Browse the repository at this point in the history

Ninja build #7

Are you sure you want to change the base?

Ninja build #7

Commits on Jul 15, 2020

Commits on Jul 16, 2020

Commits on Jul 17, 2020

Commits on Jul 18, 2020

Commits on Jul 20, 2020

Commits on Jul 21, 2020

Commits on Jul 22, 2020

Commits on Jul 23, 2020

Commits on Jul 27, 2020

Commits on Jul 28, 2020

Commits on Jul 29, 2020

Commits on Jul 31, 2020

Commits on Aug 3, 2020

Commits on Aug 5, 2020

Commits on Aug 6, 2020

Commits on Aug 7, 2020

Commits on Aug 17, 2020

Commits on Aug 18, 2020

Commits on Aug 19, 2020

Commits on Aug 20, 2020

Commits on Aug 24, 2020

Commits on Aug 25, 2020

Commits on Aug 26, 2020

Commits on Aug 28, 2020

Commits on Aug 31, 2020

Commits on Sep 1, 2020

Commits on Sep 5, 2020

Commits on Sep 7, 2020

Commits on Sep 8, 2020

Commits on Sep 9, 2020

Commits on Sep 11, 2020

Commits on Sep 14, 2020

Commits on Sep 15, 2020

Commits on Sep 16, 2020

Commits on Sep 17, 2020

Commits on Sep 22, 2020

Commits on Sep 24, 2020

Commits on Sep 25, 2020

Commits on Sep 28, 2020

Commits on Sep 29, 2020

Commits on Sep 30, 2020

Commits on Oct 1, 2020

Commits on Oct 5, 2020

Commits on Oct 6, 2020

Commits on Oct 7, 2020

Commits on Nov 1, 2020

Commits on Nov 2, 2020

Commits on Nov 3, 2020

Commits on Nov 6, 2020

Commits on Dec 3, 2020

Commits on Jan 9, 2021

Commits on Mar 28, 2021

Commits on Apr 18, 2021