-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only break critical edges where actually needed #33544
Conversation
let term_span = term.span; | ||
let term_scope = term.scope; | ||
let succs = term.successors_mut(); | ||
// if succs.len() > 1 || (succs.len() > 0 && is_invoke) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just delete this line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I have feelings for this line... Nah, actually I'm just bad at double checking my stuff.
b16904b
to
3a475e4
Compare
// Returns true if the terminator is a call that would use an invoke in LLVM. | ||
fn term_is_invoke(term: &Terminator) -> bool { | ||
match term.kind { | ||
TerminatorKind::Call { cleanup: Some(_), .. } | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does this always return false?
2cd4ca9
to
6e04944
Compare
r=me if travis passes. |
@bors r=Aatch |
📌 Commit 6e04944 has been approved by |
☔ The latest upstream changes (presumably #33425) made this pull request unmergeable. Please resolve the merge conflicts. |
Currently, to prepare for MIR trans, we break _all_ critical edges, although we only actually need to do this for edges originating from a call that gets translated to an invoke instruction in LLVM. This has the unfortunate effect of undoing a bunch of the things that SimplifyCfg has done. A particularly bad case arises when you have a C-like enum with N variants and a derived PartialEq implementation. In that case, the match on the (&lhs, &rhs) tuple gets translated into nested matches with N arms each and a basic block each, resulting in N² basic blocks. SimplifyCfg reduces that to roughly 2*N basic blocks, but breaking the critical edges means that we go back to N². In nickel.rs, there is such an enum with roughly N=800. So we get about 640K basic blocks or 2.5M lines of LLVM IR. LLVM takes a while to reduce that to the final "disr_a == disr_b". So before this patch, we had 2.5M lines of IR with 640K basic blocks, which took about about 3.6s in LLVM to get optimized and translated. After this patch, we get about 650K lines with about 1.6K basic blocks and spent a little less than 0.2s in LLVM. cc rust-lang#33111
6e04944
to
00f6513
Compare
@bors r=Aatch |
📌 Commit 00f6513 has been approved by |
Currently, all switches in MIR are exhausitive, meaning that we can have a lot of arms that all go to the same basic block, the extreme case being an if-let expression which results in just 2 possible cases, be might end up with hundreds of arms for large enums. To improve this situation and give LLVM less code to chew on, we can detect whether there's a pre-dominant target basic block in a switch and then promote this to be the default target, not translating the corresponding arms at all. In combination with rust-lang#33544 this makes unoptimized MIR trans of nickel.rs as fast as using old trans and greatly improves the times for optimized builds, which are only 30-40% slower instead of ~300%. cc rust-lang#33111
… r=Aatch Only break critical edges where actually needed Currently, to prepare for MIR trans, we break _all_ critical edges, although we only actually need to do this for edges originating from a call that gets translated to an invoke instruction in LLVM. This has the unfortunate effect of undoing a bunch of the things that SimplifyCfg has done. A particularly bad case arises when you have a C-like enum with N variants and a derived PartialEq implementation. In that case, the match on the (&lhs, &rhs) tuple gets translated into nested matches with N arms each and a basic block each, resulting in N² basic blocks. SimplifyCfg reduces that to roughly 2*N basic blocks, but breaking the critical edges means that we go back to N². In nickel.rs, there is such an enum with roughly N=800. So we get about 640K basic blocks or 2.5M lines of LLVM IR. LLVM takes a while to reduce that to the final "disr_a == disr_b". So before this patch, we had 2.5M lines of IR with 640K basic blocks, which took about about 3.6s in LLVM to get optimized and translated. After this patch, we get about 650K lines with about 1.6K basic blocks and spent a little less than 0.2s in LLVM. cc rust-lang#33111 r? @Aatch
[MIR trans] Optimize trans for biased switches Currently, all switches in MIR are exhausitive, meaning that we can have a lot of arms that all go to the same basic block, the extreme case being an if-let expression which results in just 2 possible cases, be might end up with hundreds of arms for large enums. To improve this situation and give LLVM less code to chew on, we can detect whether there's a pre-dominant target basic block in a switch and then promote this to be the default target, not translating the corresponding arms at all. In combination with rust-lang#33544 this makes unoptimized MIR trans of nickel.rs as fast as using old trans and greatly improves the times for optimized builds, which are only 30-40% slower instead of ~300%. cc rust-lang#33111
… r=Aatch Only break critical edges where actually needed Currently, to prepare for MIR trans, we break _all_ critical edges, although we only actually need to do this for edges originating from a call that gets translated to an invoke instruction in LLVM. This has the unfortunate effect of undoing a bunch of the things that SimplifyCfg has done. A particularly bad case arises when you have a C-like enum with N variants and a derived PartialEq implementation. In that case, the match on the (&lhs, &rhs) tuple gets translated into nested matches with N arms each and a basic block each, resulting in N² basic blocks. SimplifyCfg reduces that to roughly 2*N basic blocks, but breaking the critical edges means that we go back to N². In nickel.rs, there is such an enum with roughly N=800. So we get about 640K basic blocks or 2.5M lines of LLVM IR. LLVM takes a while to reduce that to the final "disr_a == disr_b". So before this patch, we had 2.5M lines of IR with 640K basic blocks, which took about about 3.6s in LLVM to get optimized and translated. After this patch, we get about 650K lines with about 1.6K basic blocks and spent a little less than 0.2s in LLVM. cc rust-lang#33111 r? @Aatch
[MIR trans] Optimize trans for biased switches Currently, all switches in MIR are exhausitive, meaning that we can have a lot of arms that all go to the same basic block, the extreme case being an if-let expression which results in just 2 possible cases, be might end up with hundreds of arms for large enums. To improve this situation and give LLVM less code to chew on, we can detect whether there's a pre-dominant target basic block in a switch and then promote this to be the default target, not translating the corresponding arms at all. In combination with rust-lang#33544 this makes unoptimized MIR trans of nickel.rs as fast as using old trans and greatly improves the times for optimized builds, which are only 30-40% slower instead of ~300%. cc rust-lang#33111
… r=Aatch Only break critical edges where actually needed Currently, to prepare for MIR trans, we break _all_ critical edges, although we only actually need to do this for edges originating from a call that gets translated to an invoke instruction in LLVM. This has the unfortunate effect of undoing a bunch of the things that SimplifyCfg has done. A particularly bad case arises when you have a C-like enum with N variants and a derived PartialEq implementation. In that case, the match on the (&lhs, &rhs) tuple gets translated into nested matches with N arms each and a basic block each, resulting in N² basic blocks. SimplifyCfg reduces that to roughly 2*N basic blocks, but breaking the critical edges means that we go back to N². In nickel.rs, there is such an enum with roughly N=800. So we get about 640K basic blocks or 2.5M lines of LLVM IR. LLVM takes a while to reduce that to the final "disr_a == disr_b". So before this patch, we had 2.5M lines of IR with 640K basic blocks, which took about about 3.6s in LLVM to get optimized and translated. After this patch, we get about 650K lines with about 1.6K basic blocks and spent a little less than 0.2s in LLVM. cc rust-lang#33111 r? @Aatch
[MIR trans] Optimize trans for biased switches Currently, all switches in MIR are exhausitive, meaning that we can have a lot of arms that all go to the same basic block, the extreme case being an if-let expression which results in just 2 possible cases, be might end up with hundreds of arms for large enums. To improve this situation and give LLVM less code to chew on, we can detect whether there's a pre-dominant target basic block in a switch and then promote this to be the default target, not translating the corresponding arms at all. In combination with rust-lang#33544 this makes unoptimized MIR trans of nickel.rs as fast as using old trans and greatly improves the times for optimized builds, which are only 30-40% slower instead of ~300%. cc rust-lang#33111
⌛ Testing commit 00f6513 with merge ab08af1... |
… r=Aatch Only break critical edges where actually needed Currently, to prepare for MIR trans, we break _all_ critical edges, although we only actually need to do this for edges originating from a call that gets translated to an invoke instruction in LLVM. This has the unfortunate effect of undoing a bunch of the things that SimplifyCfg has done. A particularly bad case arises when you have a C-like enum with N variants and a derived PartialEq implementation. In that case, the match on the (&lhs, &rhs) tuple gets translated into nested matches with N arms each and a basic block each, resulting in N² basic blocks. SimplifyCfg reduces that to roughly 2*N basic blocks, but breaking the critical edges means that we go back to N². In nickel.rs, there is such an enum with roughly N=800. So we get about 640K basic blocks or 2.5M lines of LLVM IR. LLVM takes a while to reduce that to the final "disr_a == disr_b". So before this patch, we had 2.5M lines of IR with 640K basic blocks, which took about about 3.6s in LLVM to get optimized and translated. After this patch, we get about 650K lines with about 1.6K basic blocks and spent a little less than 0.2s in LLVM. cc rust-lang#33111 r? @Aatch
[MIR trans] Optimize trans for biased switches Currently, all switches in MIR are exhausitive, meaning that we can have a lot of arms that all go to the same basic block, the extreme case being an if-let expression which results in just 2 possible cases, be might end up with hundreds of arms for large enums. To improve this situation and give LLVM less code to chew on, we can detect whether there's a pre-dominant target basic block in a switch and then promote this to be the default target, not translating the corresponding arms at all. In combination with rust-lang#33544 this makes unoptimized MIR trans of nickel.rs as fast as using old trans and greatly improves the times for optimized builds, which are only 30-40% slower instead of ~300%. cc rust-lang#33111
💔 Test failed - auto-mac-64-opt-rustbuild |
Currently, to prepare for MIR trans, we break all critical edges,
although we only actually need to do this for edges originating from a
call that gets translated to an invoke instruction in LLVM.
This has the unfortunate effect of undoing a bunch of the things that
SimplifyCfg has done. A particularly bad case arises when you have a
C-like enum with N variants and a derived PartialEq implementation.
In that case, the match on the (&lhs, &rhs) tuple gets translated into
nested matches with N arms each and a basic block each, resulting in N²
basic blocks. SimplifyCfg reduces that to roughly 2*N basic blocks, but
breaking the critical edges means that we go back to N².
In nickel.rs, there is such an enum with roughly N=800. So we get about
640K basic blocks or 2.5M lines of LLVM IR. LLVM takes a while to
reduce that to the final "disr_a == disr_b".
So before this patch, we had 2.5M lines of IR with 640K basic blocks,
which took about about 3.6s in LLVM to get optimized and translated.
After this patch, we get about 650K lines with about 1.6K basic blocks
and spent a little less than 0.2s in LLVM.
cc #33111
r? @Aatch