Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MIR] Consider converting to extended basic blocks #39685

Closed
nikomatsakis opened this issue Feb 9, 2017 · 27 comments
Closed

[MIR] Consider converting to extended basic blocks #39685

nikomatsakis opened this issue Feb 9, 2017 · 27 comments
Assignees
Labels
A-MIR Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html A-mir-opt Area: MIR optimizations C-cleanup Category: PRs that clean code up or issues documenting cleanup. I-needs-decision Issue: In need of a decision. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@nikomatsakis
Copy link
Contributor

After some discussion with @stoklund and @eddyb, I was thinking that perhaps it would be better for MIR to transition to supporting extended basic blocks (one entry, multiple exits) instead of the current basic blocks. Concretely, the idea would be to remove the Call, Assert, and Drop terminators and make them into statements with an outgoing panic edge.

Specifically for call, this would simplify things dramatically because the return value is only for use on the "return" edge, and handling that in the current dataflow and so forth is a pain in the neck (this extended basic block form makes that edge more special, which should help). It also helps us have fewer basic blocks.

We could conceivably go further, and remove terminators altogether, and instead just have statements, where the final statement should be of "terminator type". However, I'm somewhat disinclined to do that, at least not right away, because the majority of the advantage of this format derives from converting call/assert/drop (which "feel" linear in code) and not from converting if or match or other control-flow, since almost always those would terminate the current basic block anyway (since the control-flow joins afterwards). Moreover, the current setup assures that we have a terminator in the right spot, and unifying them with statements would mean we have to maintain that invariant ourselves.

cc @rust-lang/compiler @nagisa

@nikomatsakis nikomatsakis added A-MIR Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html C-cleanup Category: PRs that clean code up or issues documenting cleanup. I-needs-decision Issue: In need of a decision. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Feb 9, 2017
@nikomatsakis nikomatsakis changed the title [MIR] Consider converting to extended basic-blocks [MIR] Consider converting to extended basic blocks Feb 9, 2017
@jonas-schievink
Copy link
Contributor

I like that idea. It makes MIR feel similar to the IR used by tracing JITs like LuaJIT, which has guard instructions that abort the current trace if a condition isn't true.

@dotdash
Copy link
Contributor

dotdash commented Feb 9, 2017

I'm in favor of this. At first I was reminded of the problems we had (have?) with certain LLVM optimizations when marking &mut arguments as noalias, but that is because there is an implicit unwind edge and as I understand it, we would still have an explicit unwind edge in MIR, right?

@Aatch
Copy link
Contributor

Aatch commented Feb 9, 2017

I'm also in favour. Making Call, Assert and Drop special statements would probably be enough, since, as you say, simpler control-flow would likely terminate the current block anyway.

The one negative is that we currently have a 1-to-1 relationship from MIR basic blocks to LLVM basic blocks, but that isn't insurmountable.

@nagisa
Copy link
Member

nagisa commented Feb 9, 2017

I consider this as an option for #32105. On one hand this might make it easier to analyse MIR, on another hand it seems like a very small incremental improvement to me.

@Aatch
Copy link
Contributor

Aatch commented Feb 9, 2017

@nagisa the change to how Call is handled is worth it in my opinion. We have to handle it as a special case in a number of places.

@nikomatsakis
Copy link
Contributor Author

The one negative is that we currently have a 1-to-1 relationship from MIR basic blocks to LLVM basic blocks, but that isn't insurmountable.

To me, this feels pretty unimportant.

@eddyb
Copy link
Member

eddyb commented Feb 22, 2017

That 1-to-1 relationship is a lie because of the [x; n] rvalue, which creates a loop on the fly.

@nikomatsakis
Copy link
Contributor Author

nikomatsakis commented Feb 22, 2017

I've been thinking about this some. I think the right set of things to include as "terminators" is:

  • Goto
  • Resume
  • Return
  • Unreachable

The right set of "branching statements" is then:

  • SwitchInt (with implicit fallthrough)
  • Drop, DropAndReplace
  • Call
  • Assert

To express the concept of an exhaustive switch, we can then have a SwitchInt as the last statement in the block, followed by an Unreachable terminator.

Having SwitchInt always have a fallthrough resolves the current awkward situation where the list of targets is one greater than the list of values without requiring an otherwise: Option<BasicBlock>, which is an annoying discrepancy. I think it also happens to mirror LLVM's setup, but that doesn't seem all that important.

It also allows us to put a "scope end" after the SwitchInt, though only on one branch. In other words:

if <foo> { ... }

would generate something like

BB0:
   temp = <foo>;
   switch temp { 0 => BB1 }
   lifetimeend(temp)
   // if true goes here
   ...
   goto BB2;

BB1:
   // if false
   lifetimeend(temp);
   goto BB2;

BB2:
   ...

Well, ok, after writing this up I'm not as convinced as I was that SwitchInt belongs in the list of "branching statements", but it still seems like a reasonable-ish thing. But I could go either way.

@nikomatsakis
Copy link
Contributor Author

Heh, I guess we could go one further, and have all only three terminators (return, unreachable, and resume), and then have Goto be a kind of "branching statement" as well. This would mean that the terminators are very simple things that never target other basic blocks or carry data, which has a certain appeal.

@nikomatsakis
Copy link
Contributor Author

So the more I think about this the more I like my extended variant, where we only have return/unreachable/resume terminators, and all branches that go towards other blocks are statements.

@eddyb eddyb self-assigned this Mar 11, 2017
@eddyb
Copy link
Member

eddyb commented Mar 11, 2017

Can't assign @Mark-Simulacrum but I'll be mentoring them on this, at least for the time being.

@nikomatsakis
Copy link
Contributor Author

Whatever happened here? I guess @Mark-Simulacrum you got a lot of the way there but didn't have time to push it over the finish line?

@eddyb
Copy link
Member

eddyb commented Jun 15, 2017

@nikomatsakis It ended up getting pretty messy, with effectively the current BB structure being cached in Mir so that whatever needed it could get it back. I'm less confident it's worth it.

@Mark-Simulacrum
Copy link
Member

Yeah, I still have the branch locally (and it's available publicly as well, IIRC). It's not in cache right now, but IIRC the last problem I ran into was the dataflow code not supporting statements which can unwind. I didn't have the time back then to dig in and fix it, but I could probably try now after a rebase. I don't know whether this is something that would be a good idea right now. I feel like dataflow might need to be rewritten for NLL, which might also help make this easier, but I could be wrong.

@eddyb
Copy link
Member

eddyb commented Jun 15, 2017

Oh, right, dataflow would need one BB-like state per statement which can unwind.
That's part of why I became really doubtful about EBBs.

A possibly more feasible approach would be embedding call-like terminators in statements, such that the terminator form always has an unwind target, while the statement form never has one.
While that may sound a lot like some previous suggestions, from what I've seen during @Mark-Simulacrum's attempt at EBBs, it should be possible to keep common code paths for them.

@arielb1
Copy link
Contributor

arielb1 commented Jun 15, 2017

@eddyb

Why? You don't need a dataflow "variable" for each statement that can unwind, just a dataflow "equation" for every edge out of an EBB. OTOH this does make things more complicated.

@eddyb
Copy link
Member

eddyb commented Jun 16, 2017

@arielb1 What I mean is that the same amount of state ("equation" I suppose?) that you would store for a whole BB needs to be stored for every statement with an extra (unwind) edge in an EBB, making it quite unappealing - not just more complicated, but less efficient overall to track everything.

@steveklabnik
Copy link
Member

Triage: no comments in two years; is this still relevant?

@pnkfelix
Copy link
Member

(A discussion about MIR 2.0 is coming up at the all-hands in February 2019 and I imagine that this question about the design of our basic blocks will arise there.)

@eddyb
Copy link
Member

eddyb commented Feb 13, 2019

I think I have a design that won't cause all the issues we hit before (cc @oli-obk):

Each MIR block gets an (optional?) "cleanup exit" for everything that could panic inside the block, "coalescing" in a sense all the panic cleanup edges.

This takes advantage of the fact that many calls/asserts could be performed without adding or removing cleanups, so they can likely be grouped by the cleanup they use.

The main result is that dataflow only needs to keep 2 final states per block (the regular exit state and the merged cleanup exits' states) instead of N, as with full EBBs (which would've forced us to effectively recover the CFG shape we have today, all the time).

Transformation might be harder than with full EBBs but at least terminators wouldn't have side-effects (since we're left with unconditional jumps and conditional branches, I think?).

@oli-obk oli-obk added the A-mir-opt Area: MIR optimizations label Jul 27, 2020
@bjorn3
Copy link
Member

bjorn3 commented Jul 29, 2020

Cranelift originally used EBB's but then switched to BB's. Most literature is written for BB's. Converting the algorithms to work on EBB's introduces additional complexity. bytecodealliance/cranelift#796

@oli-obk
Copy link
Contributor

oli-obk commented Jul 29, 2020

We could add a new Rvalue for invoking guaranteed non-panicking functions (size_of, transmute, ...). This would reduce the number of basic blocks significantly, but I don't know how cranelift or llvm will handle such function calls if we insert actual function calls and don't just generate code.

@hanna-kruppe
Copy link
Contributor

Not sure what problems you expect on the LLVM side, call is an ordinary instruction in LLVM IR, not a terminator. Only when you want to handle exceptions, you use invoke which is a terminator.

@bjorn3
Copy link
Member

bjorn3 commented Jul 29, 2020

In Cranelift and LLVM non-unwinding function calls are regular instructions (called call in both cases). In LLVM only unwinding function calls are terminators (called invoke). If any backend uses terminators for all function calls, it can split the basic block after the call.

@nikomatsakis
Copy link
Contributor Author

I'd...be reluctant to have two forms of Call, but I guess if we saw significant wins it might be worth it.

@nikomatsakis
Copy link
Contributor Author

I'm sort of inclined to close this issue though, since I think we're not going to convert to EBBs.

@oli-obk
Copy link
Contributor

oli-obk commented Jul 31, 2020

Yea, let's close this issue, we can always crate mir optimizations that replace non-panicking intrinsics with some MIR statement logic if that is reasonable to do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-MIR Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html A-mir-opt Area: MIR optimizations C-cleanup Category: PRs that clean code up or issues documenting cleanup. I-needs-decision Issue: In need of a decision. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests