Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AutoBump] Merge with 02654f73 (Aug 30) (18) #371

Open
wants to merge 99 commits into
base: bump_to_54916e57
Choose a base branch
from

Conversation

mgehre-amd
Copy link
Collaborator

No description provided.

dklimkin and others added 30 commits August 30, 2024 10:21
…m#105617)

Follow-up on 8ac140f.

The test `SemaTemplate/default-parm-init.cpp` was introduced since the
fix llvm#80288 and mainly did the following things:

- Ensure the default arguments are properly substituted inside either
the primary template & their explicit / out-of-line specializations.
- Ensure the strategy doesn't mess up the substitution of a lambda
expression as a default argument.

The 1st is for the bug of llvm#68490, yet it does some redundant work: each
of the member functions is duplicated twice for the `sizeof` and
`alignof` operators, respectively, and the principle under the hood are
essentially the same. So this patch removes the duplication and reduces
the 8 functions to 4 functions that reveal the same thing.

The 2nd is presumably testing that the fix in llvm#80288 doesn't impact a
complicated substitution. However, that seems unnecessary & unrelated to
the original issue. And more importantly, we don't have any problems
with that ever. Hence, I'll remove that test from this patch.

The test for default arguments is merged into
`SemaTemplate/default-arguments.cpp` with a new namespace, and hopefully
this could reduce the entropy of our testing cases.
…S_D instruction

Reviewed By: heiher, SixWeining

Pull Request: llvm#106332
The PR llvm#105996 broke taking the
address of a vector:

**compound-literal.c**
```C
typedef int v4i32 __attribute((vector_size(16)));
v4i32 *y = &(v4i32){1,2,3,4};
```
That because the current interpreter handle vector unary operator as a
fallback when the generic code path fail. but the new interpreter was
not. we need to handle `UO_AddrOf` in
`Compiler<Emitter>::VisitVectorUnaryOperator`.

Signed-off-by: yronglin <[email protected]>
CompilerInstance can re-use same SourceManager across multiple
frontendactions. During this process it calls
`SourceManager::clearIDTables` to reset any caches based on FileIDs.

It didn't reset IncludeLocMap, resulting in wrong include locations for
workflows that triggered multiple frontend-actions through same
CompilerInstance.
This patch adds check for mutiples of `tosa.tile`. The `multiples` in
`tosa.tile` indicates how many times the tensor should be replicated
along each dimension. Zero and negative values are invalid, except for
-1, which represents a dynamic value. Therefore, each element of
`mutiples` should be positive integer or -1. Fix llvm#106167.
A optimizable cast can also be removed by VPlan simplifications. Remove
the restriction from planContainsAdditionalSimplifications, as this
causes it to miss relevant simplifications, triggering false positives
for the cost decision verification.

Also adds debug output for printing additional cost-precomputations.

Fixes llvm#106641.
…ectorsCombine. (llvm#104774)

UZP2 requires both operands to match the result type but the combine tries to replace a truncate by passing the pre-truncated operands directly to an UZP2 with the truncated result type. This patch nop-casts the operands to keep the DAG consistent.  There should be no changes to the generated code, which is fine as it.

This patch also enables more target specific getNode() validation for fixed length vector types.
…ecks (llvm#104478)

The CMake docs state that `check_c_source_compiles()` checks whether the
supplied code "can be compiled as a C source file and linked as an
executable (so it must contain at least a `main()` function)."

https://cmake.org/cmake/help/v3.30/module/CheckCSourceCompiles.html

In practice, this command is a wrapper around `try_compile()`:

- https://gitlab.kitware.com/cmake/cmake/blob/2904ce00d2ed6ad5dac6d3459af62d8223e06ce0/Modules/CheckCSourceCompiles.cmake#L54
- https://gitlab.kitware.com/cmake/cmake/blob/2904ce00d2ed6ad5dac6d3459af62d8223e06ce0/Modules/Internal/CheckSourceCompiles.cmake#L101

When `CMAKE_SOURCE_DIR` is compiler-rt/lib/builtins/,
`CMAKE_TRY_COMPILE_TARGET_TYPE` is set to `STATIC_LIBRARY`, so the
checks for `float16` and `bfloat16` support work as intended in a
Clang + compiler-rt runtime build for example, as it runs CMake
recursively from that directory.

However, when using llvm/ or compiler-rt/ as CMake source directory, as
`CMAKE_TRY_COMPILE_TARGET_TYPE` defaults to `EXECUTABLE`, these checks
will indeed fail if the code doesn't have a `main()` function. This
results in LLVM using x86 SIMD registers when generating calls to
builtins that, with Arch Linux's compiler-rt package for example,
actually use a GPR for their argument or return value as they use
`uint16_t` instead of `_Float16`.

This had been caught in post-commit review:
https://reviews.llvm.org/D145237#4521152. Use of the internal
`CMAKE_C_COMPILER_WORKS` variable is not what hides the issue, however.

PR llvm#69842 tried to fix this by unconditionally setting
`CMAKE_TRY_COMPILE_TARGET_TYPE` to `STATIC_LIBRARY`, but it apparently
caused other issues, so it was reverted. This PR just adds a `main()`
function in the checks, as per the CMake docs.
…lvm#106707)

* Revert "Fix MSVC "not all control paths return a value" warning. NFC."
Dep to revert c9b6e01

* Revert "[AMDGPU] Graph-based Module Splitting Rewrite (llvm#104763)"
Breaks tests.
Make dsymutil return a non-zero exit code when crashing during linking.
Need to use FinalShuffle function for all vectorized results to
correctly produce vectorized value.

Fixes llvm#106655
By choosing an initial value whose mask is only used by the blend we can
remove the need for the mask entirely.
Recursion here causes stack overflow on large inputs. Fixing by
unrolling via a stack.
Code lowering always generates fir.if else blocks for source level if
statements, whether needed or not. Change this to only generate else
blocks that are needed.
Trivially extend dd0cf23 ([LICM] Reassociate & hoist sub expressions) to
handle unsigned predicates as well.

Alive2 proofs: https://alive2.llvm.org/ce/z/GdDBtT.
Ever since 6859685 (or, precisely,
84428da) relative jumps emitted by the
AVR codegen are off by two bytes - this pull request fixes it.

## Abstract

As compared to absolute jumps, relative jumps - such as rjmp, rcall or
brsh - have an implied `pc+2` behavior; that is, `jmp 100` is `pc =
100`, but `rjmp 100` gets understood as `pc = pc + 100 + 2`.

This is not reflected in the AVR codegen:


https://github.com/llvm/llvm-project/blob/f95026dbf66e353128a3a3d7b55f3e52d5985535/llvm/lib/Target/AVR/MCTargetDesc/AVRAsmBackend.cpp#L89

... which always emits relative jumps that are two bytes too far - or
rather it _would_ emit such jumps if not for this check:


https://github.com/llvm/llvm-project/blob/f95026dbf66e353128a3a3d7b55f3e52d5985535/llvm/lib/Target/AVR/MCTargetDesc/AVRAsmBackend.cpp#L517

... which causes most of the relative jumps to be actually resolved
late, by the linker, which applies the offsetting logic on its own,
hiding the issue within LLVM.

[Some time
ago](llvm@697a162)
we've had a similar "jumps are off" problem that got solved by touching
`shouldForceRelocation()`, but I think that has worked only by accident.
It's exploited the fact that absolute vs relative jumps in the parsed
assembly can be distinguished through a "side channel" check relying on
the existence of labels (i.e. absolute jumps happen to named labels, but
relative jumps are anonymous, so to say). This was an alright idea back
then, but it got broken by 6859685.

I propose a different approach:
- when emitting relative jumps, offset them by `-2` (well, `-1`,
strictly speaking, because those instructions rely on right-shifted
offset),
- when parsing relative jumps, treat `.` as `+2` and read `rjmp .+1234`
as `rjmp (1234 + 2)`.

This approach seems to be sound and now we generate the same assembly as
avr-gcc, which can be confirmed with:

```cpp
// avr-gcc test.c -O3 && avr-objdump -d a.out

int main() {
    asm(
"      foo:\n\t"
"        rjmp  .+2\n\t"
"        rjmp  .-2\n\t"
"        rjmp  foo\n\t"
"        rjmp  .+8\n\t"
"        rjmp  end\n\t"
"        rjmp  .+0\n\t"
"      end:\n\t"
"        rjmp .-4\n\t"
"        rjmp .-6\n\t"
"      x:\n\t"
"        rjmp x\n\t"
"        .short 0xc00f\n\t"
);
}
```

avr-gcc is also how I got the opcodes for all new tests like `inst-brbc.s`, so we should be good.
…san_disable (llvm#106727)

This better matches lsan_enable and disable, which we are trying to
emulate.
Fixes issue found here
llvm#106691 (comment)

The issue wasn't in the code change itself, just the unittest; the
trailing marker wasn't properly cleaned up.
llvm#100692 changes clang template deduction, and an error was now emitted
when building flang with top of the tree clang when mapping std::pow in
intrinsics-library.cpp for constant folding `error: address of
overloaded function 'pow' is ambiguous`

See https://lab.llvm.org/buildbot/#/builders/4/builds/1670

I I am not expert enough to understand if the new error is justified or
not here, but it is easy to help the compiler here with explicit
wrappers to fix the builds.
Harini0924 and others added 29 commits August 30, 2024 10:15
…lob expansion in lit's internal shell" (llvm#106763)

Reverts llvm#106325
Broke some Buildbots.
If the operand node has the same scalars as one of the vectorized nodes,
the compiler could miss this and incorrectly request minbitwidth data
for the wrong node. It may lead to a compiler crash, because the
  vectorized node might have different minbw result.

Fixes llvm#106667
Noticed in clang-formatting of llvm#106750
The worst possible case for a double literal goes like:

```
  mov ...
  movk ..., lsl #16
  movk ..., lsl #32
  movk ..., lsl #48
  fmov ...
```

The limit of 5 in the code gives the impression that `Insn` includes all
instructions including the `fmov`, but that's not true. It only counts
the integer moves. This led me astray on some other work in this area.
…106753)

Similar to what we do in foldVMV_V_V with the passthru, if we end up
changing the Src's VL in tryReduceVL we need to make sure it dominates.

Fixes llvm#106735
To support detecting MD5 checksum mismatches, deal with SupportFiles
rather than a plain FileSpecs in the SourceManager.
This patch implements sandboxir::ConstantFP mirroring llvm::ConstantFP.
There is no need to support Python 2.7 anymore, Python 3.3+ has
`subprocess.DEVNULL`. This is good practice and also prevents file
handles from
staying open unnecessarily.

Also remove a couple unused or unneeded `__future__` imports.
After landing support for actual vectorization of the "clustered" loads,
need better estimate the cost between the masked gather and clustered loads.
This includes estimation of the address calculation and better
estimation of the gathered loads. Also, this estimation now relies on
SLPCostThreshold option, allowing modify the behavior of the compiler.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: llvm#105858
Argument is another possible starting point for the pointer traversal,
and PtrUseVisitor should be able to handle it.
…mber of elements in operands.

Patch adds basic support for non-power-of-2 number of elements in
operands. The patch still requires that this number addresses whole
registers.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: llvm#106449
Consolidating code so that we have one copy instead of multiple reasoning
about identity element.  Note that we're (deliberately) not passing
the FMF flags to common utility to preserve behavior in this change.
…optional" (llvm#106778)

Reverts llvm#104668

This commit triggers an edge case that can cause circular
`unrealized_conversion_cast` ops.
llvm#106760 may fix it, but it is
has other issues. Reverting this PR for now, until I find a solution for
that problem.
Primary goal is having one way of doing this, to ensure that we don't
end up with accidental divergence.
This patch updates the source cache dump command to print both the
actual (on-disk) checksum and the expected (line table) checksum. To
achieve that we now read and store the on-disk checksum in the cached
object. The same information will be used in a future path to print a
warning when the checksums differ.
Follow up fix to llvm#106332

`LoongArchMatInt.cpp:96:33: runtime error: shift exponent 64 is too
large for 64-bit type`
https://lab.llvm.org/buildbot/#/builders/169/builds/2681
…#106792)

Reverts llvm#103371

There is `heap-use-after-free`, commented on
206b5af

Maybe `if (Next == E || BB != Next->getParent()) {` is enough,
but not sure, what was the intent there,
…m#98214)

We split up all the headers into top-level modules when we broke up
cycles with the C compatibility headers. However, this resulted in a
large number of small modules, which is awkward and clearly against the
philosophy of Clang modules. This was necessary to make things work.

This patch regroups a few headers from two leaf modules: stop_token and
pstl. It should be pretty uncontroversial that grouping these headers
into a single module doesn't introduce any cyclic dependency, yet it's a
first step towards reducing the number of top-level modules we have in
our modulemap.
…lvm#106494)

This patch contains two pars:
- first to revert the patch llvm#101428.
- second to remove `atomic_fetch_and_*()` to `atomic_<op>()`
  conversion (when return value is not used), but preserve 
  `__sync_fetch_and_add()` to locked insn with cpu v1/v2.
This updates the expected diffferences document to capture the
difference in multi-argument overload resolution between Clang and DXC.

Fixes llvm#99530
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.