-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generated LLVM code causes register spills #52933
Comments
I'm taking a look at this, but this has a C reproducer https://godbolt.org/z/bGPT4WxPe and GCC does a lot better |
Thank you @gbaraldi for taking this up. For the C code, I'd just make the single change below. With the change, gcc is still good, but not as good as with the original UB. I feel it only packs the results in XMM registers instead of the register spills and reloads... and might also not be scalable? struct wtf {
- int a[20];
+ int a[30];
}; Perhaps there's some hidden dependency that I'm not seeing? :( |
Yeah, that was a typo, could you try incfreasing it to 50 or 100? Though i'm not surprised if at that size it gets so bad. I opened llvm/llvm-project#78506 upstream |
Thank for for raising the upstream issue, @gbaraldi. Actually, seems like it takes a lot to break GCC! This is the general process to check the assembly for any N:
rs.awk$0 ~ /^##.*/ {gsub("^## ",""); print}
$0 ~ /^#~.*/ {gsub("^#~ ",""); gsub("N",N); if($0 ~ /?/) {for (i=0; i<N; i++) {a = $0; gsub("?",i,a); print a}} else {print}}
## #include <stddef.h>
## struct wtf {
#~ int a[N];
## };
## struct wtf __attribute__ ((noinline)) foo(struct wtf *b, int i)
## {
## struct wtf new;
#~ int idx? = (? + i) % N ;
#~ int val? = b->a[idx?] ;
#~ new.a[?] = val? ;
## return new;
## }
## |
I get a
error |
Sorry, I think you have a POSIX compatible $ awk --version
GNU Awk 5.2.1, API 3.2, PMA Avon 8-g1, (GNU MPFR 4.2.0, GNU MP 6.2.1)
Copyright (C) 1989, 1991-2022 Free Software Foundation. And the POSIX version fails here too: $ awk -P -v N=10 -f rs.awk rs.awk
awk: rs.awk:2: error: Invalid preceding regular expression: /?/ |
my |
Sorry, seems like I got the for loop wrong. The correct version is $0 ~ /^##.*/ {gsub("^## ",""); print}
- $0 ~ /^#~.*/ {gsub("^#~ ",""); gsub("N",N); if($0 ~ /?/) {for (i=1; i<N; i++) {a = $0; gsub("?",i,a); print a}} else {print}}
+ $0 ~ /^#~.*/ {gsub("^#~ ",""); gsub("N",N); if($0 ~ /?/) {for (i=0; i<N; i++) {a = $0; gsub("?",i,a); print a}} else {print}}
## #include <stddef.h>
## struct wtf {
#~ int a[N];
## };
## struct wtf __attribute__ ((noinline)) foo(struct wtf *b, int i)
## {
## struct wtf new;
#~ int idx? = (? + i) % N ;
#~ int val? = b->a[idx?] ;
#~ new.a[?] = val? ;
## return new;
## }
## I've also corrected it above. |
This issue stems from PR #52438 and concerns the following function on my PC with Windows 11/WSL on Intel hardware (see below). But I suspect that the problem may also apply to other hardware types.
Consider the LLVM code generated for the following case where the parameter
shift
is a variable, i.e.,shift
without constant propagation:The generated LLVM code has the following structure:
Note that in the
ifelse
,load
, andnew
/store
portions, the instructions for all indices are bunched together. Next, consider the generated native code:The same structure is also carried forward into assembly (not shown here) despite there being no memory aliases! This significantly increases the register pressure and leads to a large number of register spills and reloads (8 bytes each). The spills and reloads degrade the performance of the function. Furthermore, the effects get worse as the size of the tuple increases.
ifelse
,load
, andstore
portions index by index.This topic was brought up on Julia Discourse where it was suggested that looking into new aliasing annotations for LLVM might be useful, see link for details.
Note 1: The case with constant propagation does not have this issue. The resulting code is excellent.
The LLVM and native code for this case can be obtained as follows:
Note 2 ¹: Special solutions are sometimes found for some cases (integer tuples) but not for others.
The LLVM and native code for these case can be obtained as follows:
Note: unicode superscripts denote edits.
The text was updated successfully, but these errors were encountered: