-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't optimize loop assignment into memset #45466
Comments
I think it has just inlined the BTW the following still compiled to a for r in data {
*r = 0;
} |
https://gist.github.com/quininer/361b79cd3396ca23a9de3fbc09da1b8a
|
Thank you! |
Can we bisect this? Seems like it would be somewhat painful, but probably the best way to find the problem. cc @dotdash -- any chance you want to swoop in a fix this? =) cc @Mark-Simulacrum @est31 -- do you folks have any nifty way to bisect this? I guess since it doesn't error it would be difficult, unless we added some sort of snooping thing. We could do it by hand, presumably. |
triage: P-high |
We will revisit next week. |
Rust 1.20 doesn't feature the bug. Rust 1.21 does. My setup was like (deleting the target dir in each step):
Bisection output:
Can't get any closer because the regression is apparently beyond the deletion event horizon :/. 90 days is roughly 3 months, so it would make sense. Not entirely sure though because c5e2051 was available... Either way, this is the commit range I could pinpoint it to: c5e2051...3f977ba |
Most suspicious commit: |
Confirmed that #43595 is at fault - reverting it on top of master fixes the bug. |
However, there is an LLVM problem behind all of this: After #43595, before loop optimizations, we have this code: ; Function Attrs: uwtable
define void @_ZN7suspect7memzero17h76c06c0c84a550b5E(i8* nocapture nonnull %data.ptr, i64 %data.meta) {
start:
%0 = icmp eq i64 %data.meta, 0
br i1 %0, label %bb5, label %bb2.i.preheader
bb2.i.preheader: ; preds = %start
br label %bb2.i
bb2.i: ; preds = %bb2.i.preheader, %bb7
%iter.sroa.0.010 = phi i64 [ %3, %bb7 ], [ 0, %bb2.i.preheader ]
%1 = tail call { i64, i1 } @llvm.uadd.with.overflow.i64(i64 %iter.sroa.0.010, i64 1) #5
%2 = extractvalue { i64, i1 } %1, 1
br i1 %2, label %bb5.loopexit, label %bb7
bb5.loopexit: ; preds = %bb7, %bb2.i
br label %bb5
bb5: ; preds = %bb5.loopexit, %start
ret void
bb7: ; preds = %bb2.i
%3 = extractvalue { i64, i1 } %1, 0
%4 = getelementptr inbounds i8, i8* %data.ptr, i64 %iter.sroa.0.010
store i8 0, i8* %4, align 1
%5 = icmp ult i64 %3, %data.meta
br i1 %5, label %bb2.i, label %bb5.loopexit
}
declare { i64, i1 } @llvm.uadd.with.overflow.i64(i64, i64) indvars removes the add with overflow, but leaves behind a ; ModuleID = '<stdin>'
source_filename = "<stdin>"
define void @_ZN7suspect7memzero17h76c06c0c84a550b5E(i8* nocapture nonnull %data.ptr, i64 %data.meta) {
start:
%0 = icmp eq i64 %data.meta, 0
br i1 %0, label %bb5, label %bb2.i.preheader
bb2.i.preheader: ; preds = %start
br label %bb2.i
bb2.i: ; preds = %bb7, %bb2.i.preheader
%iter.sroa.0.010 = phi i64 [ %1, %bb7 ], [ 0, %bb2.i.preheader ]
%1 = add nuw i64 %iter.sroa.0.010, 1
br i1 false, label %bb5.loopexit, label %bb7
bb5.loopexit: ; preds = %bb7, %bb2.i
br label %bb5
bb5: ; preds = %bb5.loopexit, %start
ret void
bb7: ; preds = %bb2.i
%2 = getelementptr inbounds i8, i8* %data.ptr, i64 %iter.sroa.0.010
store i8 0, i8* %2, align 1
%3 = icmp ult i64 %1, %data.meta
br i1 %3, label %bb2.i, label %bb5.loopexit
}
; Function Attrs: nounwind readnone
declare { i64, i1 } @llvm.uadd.with.overflow.i64(i64, i64) #0
attributes #0 = { nounwind readnone } Which the immediately succeeding loop idiom recognition can't remove. Replacing the start:
%0 = icmp eq i64 %data.meta, 0
br i1 %0, label %bb5, label %bb2.i.preheader
bb2.i.preheader: ; preds = %start
br label %bb2.i
bb2.i: ; preds = %bb7, %bb2.i.preheader
%iter.sroa.0.010 = phi i64 [ %1, %bb7 ], [ 0, %bb2.i.preheader ]
%1 = add nuw i64 %iter.sroa.0.010, 1
br label %bb7 ; MANUALLY CHANGED
bb5.loopexit: ; preds = %bb7, %bb2.i
br label %bb5
bb5: ; preds = %bb5.loopexit, %start
ret void
bb7: ; preds = %bb2.i
%2 = getelementptr inbounds i8, i8* %data.ptr, i64 %iter.sroa.0.010
store i8 0, i8* %2, align 1
%3 = icmp ult i64 %1, %data.meta
br i1 %3, label %bb2.i, label %bb5.loopexit
}
; Function Attrs: nounwind readnone
declare { i64, i1 } @llvm.uadd.with.overflow.i64(i64, i64) #0
attributes #0 = { nounwind readnone } optimizes to a memset: ; ModuleID = '<stdin>'
source_filename = "<stdin>"
define void @_ZN7suspect7memzero17h76c06c0c84a550b5E(i8* nocapture nonnull %data.ptr, i64 %data.meta) {
start:
%0 = icmp eq i64 %data.meta, 0
br i1 %0, label %bb5, label %bb2.i.preheader
bb2.i.preheader: ; preds = %start
call void @llvm.memset.p0i8.i64(i8* %data.ptr, i8 0, i64 %data.meta, i32 1, i1 false)
br label %bb2.i
bb2.i: ; preds = %bb7, %bb2.i.preheader
%iter.sroa.0.010 = phi i64 [ %1, %bb7 ], [ 0, %bb2.i.preheader ]
%1 = add nuw i64 %iter.sroa.0.010, 1
br label %bb7
bb5.loopexit: ; preds = %bb7
br label %bb5
bb5: ; preds = %bb5.loopexit, %start
ret void
bb7: ; preds = %bb2.i
%2 = getelementptr inbounds i8, i8* %data.ptr, i64 %iter.sroa.0.010
%3 = icmp ult i64 %1, %data.meta
br i1 %3, label %bb2.i, label %bb5.loopexit
}
; Function Attrs: nounwind readnone
declare { i64, i1 } @llvm.uadd.with.overflow.i64(i64, i64) #0
; Function Attrs: argmemonly nounwind
declare void @llvm.memset.p0i8.i64(i8* nocapture writeonly, i8, i64, i32, i1) #1
attributes #0 = { nounwind readnone }
attributes #1 = { argmemonly nounwind } This means that LLVM's |
LLVM patch that fixes this:
OTOH, adding random LLVM passes all around is bad for compilation time, so I'm not sure how much do we want this |
Discussed in @rust-lang/compiler meeting: seems like we should assemble a PR and do a perf run. |
[needs perf run] Simplify CFG after IndVarSimplify Fixes #45466
[needs perf run] Try to improve LLVM pass ordering Fixes #45466
[needs perf run] Try to improve LLVM pass ordering Fixes #45466
[needs perf run] Try to improve LLVM pass ordering Fixes #45466
https://godbolt.org/g/9uWCF8
This is normal before rustc 1.20, which is a performance regression.
The text was updated successfully, but these errors were encountered: