Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate fn signature from fn body in HIR #35078

Closed
nikomatsakis opened this issue Jul 27, 2016 · 32 comments
Closed

Separate fn signature from fn body in HIR #35078

nikomatsakis opened this issue Jul 27, 2016 · 32 comments
Labels
A-incr-comp Area: Incremental compilation

Comments

@nikomatsakis
Copy link
Contributor

nikomatsakis commented Jul 27, 2016

Currently, getting access to a HIR item gives you access to both the function signature and its body. This makes the incremental compilation graph more coarse than it should be. To fix this, the best thing would be to separate things out so that the item is just the external signature, and you have to separately load (e.g., from the HIR map) the body for a fn item. This way we could record that you actually accessed the fn body and track that separately.

@nikomatsakis nikomatsakis added the A-incr-comp Area: Incremental compilation label Jul 27, 2016
@eddyb
Copy link
Member

eddyb commented Jul 31, 2016

The body is a block and those are in the HIR map already, so I think the only missing thing is keeping an ID instead of a reference, and looking up that ID every time.
Not being able to use references without having implicit dependencies does seem like a bit of a limitation, which will impact the future design of HIR.

@flodiebold
Copy link
Member

I'm currently looking into this, and I think I've got the general idea, but I've got a few questions to make sure I'm on the right track:

  • I've found three places where functions or methods are defined in the HIR: TraitItem_::MethodTraitItem, ImplItemKind::Method, and Item_::ItemFn. The plan would basically be to replace the P<Block> in each of these by a NodeId, right? (By the way, is there a reason ImplItemKind isn't named ImplItem_?)
  • We'll need some other place to store the top-level blocks. Would the Crate be right for this, like for items?
  • Or maybe removing the P<Block> from the item is the wrong approach. Could we instead just make it private, and make sure that accesses are recorded that way?
  • Should I wrap the NodeId of the block in a BlockId, similar to the ItemId?
  • How should visitors be handled? I could add a method visit_fn_block similar to visit_nested_item, though I'm not sure if it should by default visit the block or not.
  • Anything else I'm completely missing?

@nikomatsakis
Copy link
Contributor Author

@flodiebold great! Sorry I haven't had a chance to respond yet. Let me start by giving a few quick answers, I only have a few minutes just now.

The plan would basically be to replace the P in each of these by a NodeId, right?

I would expect it to be a different kind of id (e.g., MethodBodyId or DefId), but a NodeId might also be fine.

By the way, is there a reason ImplItemKind isn't named ImplItem_?

The _ is just an older naming convention. I prefer Kind.

We'll need some other place to store the top-level blocks. Would the Crate be right for this, like for items?

Yes, I think so.

Could we instead just make it private, and make sure that accesses are recorded that way?

Well, we could, but we don't use privacy much for LIR; I would be inclined to move it into separate array myself.

Should I wrap the NodeId of the block in a BlockId, similar to the ItemId?

See above. I think it probably makes sense, just so that BodyId (or whatver) can be a small integer that directly indexes a vector.

How should visitors be handled? I could add a method visit_fn_block similar to visit_nested_item, though I'm not sure if it should by default visit the block or not.

Yeah, that's a tricky one. I'd say add the visitor and -- to start -- visit the block by default. That should preserve the existing behavior, and you can test and make sure everything is working. Then we can debate changing the default; it'd require going through the visitors a bit to see what they do.

@eddyb
Copy link
Member

eddyb commented Oct 17, 2016

If I may make a suggestion, there is a more useful abstraction here than function bodies, that I keep needing time and time again, that is, definitions that have associated MIR (just "bodies"?).
This includes all function definitions, closure expressions, array lengths in types and enum discriminants.
It's a bit annoying to visit them, and you end up with ugliness like this.

@nikomatsakis
Copy link
Contributor Author

@eddyb interesting. It seems like all the same benefits for incr comp would apply to most of those cases too. I wonder if the right type for these "bodies" would be expressions and not blocks, so that they can be used everywhere.

@eddyb
Copy link
Member

eddyb commented Oct 20, 2016

So I got stuck in a place where I really really needed an unified way to refer to those bodies, so I tried moving all of them to hir::Expr. Which went well (yay deduplication) except for middle::region.

For some reason, if I have |...| expr instead of |...| {expr} (which is what the parser always did), there's a missing Misc CodeExtent (for expr.id?) so rustc_mir::build::scope ICEs sometimes.

It's not clear to me as to why this would be the case, since expr.id is marked as a terminating scope, and then resolve_expr calls visitor.new_node_extent_with_dtor(expr.id) which for terminating scopes it creates a DestructionScope - ooooh. I think I got it? The MIR builder is looking for Misc instead.

I might be able to fix this now, although I'm still not sure what the correct behavior should be, really.
For the record: at the time of the writing, the commit I'm talking about is eddyb@cc18ef0.

@nikomatsakis
Copy link
Contributor Author

@eddyb I think I expect there to be both a Misc and a DestructionScope, one nested inside the other

@eddyb
Copy link
Member

eddyb commented Oct 20, 2016

Update: it was these few lines which are both no longer necessary and require a Block for each function (but closures now don't have one so it kept going up reaching the outer function).

@nikomatsakis
Copy link
Contributor Author

@flodiebold (so, I want to clarify something. I think that we should consider factoring out statics, constants, closure bodies, etc as a second step -- it makes sense to just focus on fn bodies first, we can refactor as a follow-up if we choose.)

@eddyb
Copy link
Member

eddyb commented Oct 21, 2016

This is more complex than I expected, but I hope to have one or two commits soon that can be cherry-picked.

@eddyb
Copy link
Member

eddyb commented Oct 23, 2016

@flodiebold Here you go: eddyb@203ccaf - should apply relatively cleanly on master.

I wanted to add a visit_body method to intravisit::Visitor but I have to do other things first.

@flodiebold
Copy link
Member

Great! I'm taking a look.

@flodiebold
Copy link
Member

@eddyb Ok, I've cherry-picked the commit without big conflicts. But I'm now getting two spurious 'unreachable statement' warnings in libsyntax when compiling stage 2, failing the build :( The lines the compiler says are unreachable are here and here. check-stage1-rpass works, and I haven't managed to make a small reproduction so far. So this is rather hard to debug for me, and I hope you have an idea... It seems the problem also occurs in your branch, so it's not just a merge problem.

@eddyb
Copy link
Member

eddyb commented Oct 23, 2016

@flodiebold Oops, I never tried building stage2. The crater run I just started will probably catch it.

Yeah, those cases are almost unreachable, looks like match and if "diverges" checks are a bit wrong.
EDIT: to be clear, I'm working on it, hoping to have something in the next hour.

@eddyb
Copy link
Member

eddyb commented Oct 23, 2016

@flodiebold Alright, that was only a leak from "then" to "else". Fixed commit is eddyb@75269d2.

EDIT: still broken (this time in rustc_borrowck), investigating... (although you shouldn't be affected)
EDIT2: Another problem in rustc_driver (this time a correct warning, yay). Fix is in eddyb@d23057e.

@petrochenkov
Copy link
Contributor

Can patterns from fn parameters be moved into the function body somehow? (Probably in HIR?)
They are implementation details and not parts of a function interface and dependent code shouldn't recompile when they change.
Something like:

fn f((a, b): (u8, u8)) {
    println!("{} {}", a, b);
}

=>

fn f(__arg0: (u8, u8)) {
    let (a, b) = __arg0;
    println!("{} {}", a, b);
}

@eddyb
Copy link
Member

eddyb commented Oct 24, 2016

@petrochenkov Yeah, the argument types and the patterns can be split into two lists.
In fact, I needed this! Since the types are part of the signature and thus "outside" the body, while the patterns are "inside" (e.g. per-item type tables include their types, they're inferred as part of the body, etc.).

@eddyb
Copy link
Member

eddyb commented Oct 26, 2016

@flodiebold @nikomatsakis Good news! You can just rebase on top of #37412 now.

Manishearth added a commit to Manishearth/rust that referenced this issue Oct 26, 2016
Prohibit patterns in trait methods without bodies

They are not properly type checked
```rust
trait Tr {
    fn f(&a: u8); // <- This compiles
}
```
, mostly rejected by the parser already and generally don't make much sense.
This PR is kind of a missing part of rust-lang#35015.

Needs crater run.
cc rust-lang#35078 (comment) rust-lang#35015 rust-lang/rfcs#1685 rust-lang#35203
r? @eddyb
bors added a commit that referenced this issue Oct 29, 2016
Prohibit patterns in trait methods without bodies

They are not properly type checked
```rust
trait Tr {
    fn f(&a: u8); // <- This compiles
}
```
, mostly rejected by the parser already and generally don't make much sense.
This PR is kind of a missing part of #35015.

Given the [statistics from crater](#37378 (comment)), the effect of this PR is mostly equivalent to improving `unused_mut` lint.

cc #35078 (comment) #35015 rust-lang/rfcs#1685 #35203
r? @eddyb
@flodiebold
Copy link
Member

Ok, I'm making some progress. Two snags I ran into:

  • I couldn't make visitors visit bodies by default, because in the visitor trait I don't have access to the crate or HIR map. So I left out the default completely, meaning I had to add a function to every single Visitor impl. Maybe I could make skipping the block the default once I've got it working.
  • Since closures are often treated the same way as functions (e.g. with visit_fn, or FnLike), I had to separate their body as well, which has caused some additional complications (I think there are quite a few visitors which wouldn't need to go into nested items, but do need to go into nested closures). I'm not sure that was the best idea, but I couldn't think of a better solution.

Currently, my main problem is the following: Compilation of librand fails because I think the compiler is trying to const-eval a function from libcore, for which it needs the HIR of the body, and that's not saved in the crate metadata anymore. So I'll probably need to add the function bodies somewhere in the crate metadata, but I'm not yet sure how to approach that.

My current state is here.

@eddyb
Copy link
Member

eddyb commented Oct 29, 2016

@flodiebold The only thing that's needed is the body, you might be able to change astencode to only work on bodies.

@nikomatsakis
Copy link
Contributor Author

@flodiebold

Ok, I'm making some progress.

Great to hear! In the meantime, I've been working on #36349, and I'm making good progress there I think. Some of the things you are describing overlap.

I couldn't make visitors visit bodies by default, because in the visitor trait I don't have access to the crate or HIR map.

Yeah, this is sort of annoying. =) In principle we should be able to use specialization here to make this a touch more ergonomic, though some key features are missing.

So I did something to these traits too =). What I did is to separate out items and impl-items into the category of "Item-like" (I should probably factor our trait-items too). Then the visit_all_items method on Crate becomes visit_all_item_likes. It no longer takes a normal intravisit::Visitor but instead takes an ItemLikeVisitor, which has just two methods (visit_item, visit_impl_item). The intention is to group visitors into three categories:

  • Shallow visitors: these just want to look at each item (and sometimes impl-items).
    • They should implement the ItemLikeVisitor trait directly. This trait has no defaults, so as we add new methods in the future, we can revisit those cases and see if changes are needed.
  • Deep visitors: these want to walk over the entire tree, but they are not particularly sensitive to the order that they do it in.
    • They can implement intravisit::Visitor as today and invoke as_deep_visitor(), which is a new method that converts a Visitor into an ItemLikeVisitor.
    • the important point is that in the future we may break new things into "item-like", and that will change the visit order for these deep-nested walkers, but that should not be a problem.
  • Nested visitors: these want to walk the tree, and they want to walk items in lexical order (e.g., they want to walk nested items in the context of their parents, impl items in the context of the impl etc).
    • These should use intravisit::Visitor as today, but should not use visit_all_item_likes; instead they should use visit::walk_krate (basically as today).
    • One annoying thing is that, as you said, there is no "simple" way to implement all the "visit nested" routines.

To address this very last bullet point, and this seems relevant to what you were writing, I was contemplating having a fn hir_map(&self) -> &'v HirMap sort of thing in the trait. I originally thought it would return an Option and default to None, and then you would override if you wanted to enable visiting nested things. But now I am thinking maybe it should be orthogonal, so that the default for visiting nested function bodies can be different (if we want it to be).

Then again, it'd be better to avoid walking nested things like that by default, just to make the code more intentional.

My preference in this refactoring, btw, would be to have the nested case implement some other trait (NestedVisitor) that is required to provide a HirMap but otherwise makes nesting invisible. Or something.

@michaelwoerister
Copy link
Member

cc me

@flodiebold
Copy link
Member

Still working on this. I've got something which passes check-stage1 and the incremental tests (with the FIXMEs in change_private_impl_method fixed 😄 ). There's still quite a few things to clean up and refactor now before it's ready though, in particular the visitors (I think the NestedVisitor thing is a good idea) and the handling of inlined items (I solved that in a very hacky way).

@nikomatsakis
Copy link
Contributor Author

@flodiebold nice! I hope to have my PR up today -- but I'm cleaning up the last bit, which is taking longer than I wanted. If you want to check out my current branch, it's here

eddyb added a commit to eddyb/rust that referenced this issue Nov 10, 2016
[6/n] rustc: transition HIR function bodies from Block to Expr.

_This is part of a series ([prev](rust-lang#37408) | [next](rust-lang#37676)) of patches designed to rework rustc into an out-of-order on-demand pipeline model for both better feature support (e.g. [MIR-based](https://github.com/solson/miri) early constant evaluation) and incremental execution of compiler passes (e.g. type-checking), with beneficial consequences to IDE support as well.
If any motivation is unclear, please ask for additional PR description clarifications or code comments._

<hr>

The main change here is that functions and closures both use `Expr` instead of `Block` for their bodies.
For closures this actually allows a honest representation of brace-less closure bodies, e.g. `|x| x + 1` is now distinguishable from `|x| { x + 1 }`, therefore this PR is `[syntax-breaking]` (cc @Manishearth).

Using `Expr` allows more logic to be shared between constant bodies and function bodies, with some small such changes already part of this PR, and eventually easing rust-lang#35078 and per-body type tables.

Incidentally, there used to be some corners cut here and there and as such I had to (re)write divergence tracking for type-checking so that it is capable of understanding basic structured control-flow:

``` rust
fn a(x: bool) -> i32 {
    // match also works (as long as all arms diverge)
    if x { panic!("true") } else { return 1; }
    0 // "unreachable expression" after this PR
}
```

And since liveness' "not all control paths return a value" moved to type-checking we can have nice things:

``` rust
// before & after:
fn b() -> i32 { 0; } // help: consider removing this semicolon

// only after this PR
fn c() -> i32 { { 0; } } // help: consider removing this semicolon
fn d() { let x: i32 = { 0; }; } // help: consider removing this semicolon
fn e() { f({ 0; }); } // help: consider removing this semicolon
```
@nikomatsakis
Copy link
Contributor Author

@flodiebold hey, how's it going? I think this is one of the major blockers.

@flodiebold
Copy link
Member

I've had very little time over the last week, but this weekend I have time and hope to make good progress, maybe even get a pull request ready :) I assume your changes will also be merged until then, so I can rebase on top of that.

@nikomatsakis
Copy link
Contributor Author

@flodiebold yes, hopefully! I'm tracking down an annoying windows problem, but now that I know what it is (paths too long) we can hopefully resolve it and get the PR landed.

@nikomatsakis
Copy link
Contributor Author

ok, in that case I'll leave off.

bors added a commit that referenced this issue Nov 18, 2016
Separate impl items from the parent impl

This change separates impl item bodies out of the impl itself. This gives incremental more resolution. In so doing, it refactors how the visitors work, and cleans up a bit of the collect/check logic (mostly by moving things out of collect that didn't really belong there, because they were just checking conditions).

However, this is not as effective as I expected, for a kind of frustrating reason. In particular, when invoking `foo.bar()` you still wind up with dependencies on private items. The problem is that the method resolution code scans that list for methods with the name `bar` -- and this winds up touching *all* the methods, even private ones.

I can imagine two obvious ways to fix this:

- separating fn bodies from fn sigs (#35078, currently being pursued by @flodiebold)
- a more aggressive model of incremental that @michaelwoerister has been advocating, in which we hash the intermediate results (e.g., the outputs of collect) so that we can see that the intermediate result hasn't changed, even if a particular impl item has changed.

So all in all I'm not quite sure whether to land this or not. =) It still seems like it has to be a win in some cases, but not with the test cases we have just now. I can try to gin up some test cases, but I'm not sure if they will be totally realistic. On the other hand, some of the early refactorings to the visitor trait seem worthwhile to me regardless.

cc #36349 -- well, this is basically a fix for that issue, I guess

r? @michaelwoerister

NB: Based atop of @eddyb's PR #37402; don't land until that lands.
@flodiebold
Copy link
Member

flodiebold commented Nov 19, 2016

@nikomatsakis So, I'm rebasing and integrating my visitor changes with yours. Obviously, I also want to use nested_visit_map; but there are a lot of visitors which want to visit bodies, but not nested items, so just implementing nested_visit_map would change their behavior. The solution I'm trying out currently is changing nested_visit_map to return an Option<(&Map, NestedVisitMode)>, where NestedVisitMode can currently be either OnlyBodies or All. I think that will make it easy to extend if there are more nested things later. What do you think?

@nikomatsakis
Copy link
Contributor Author

@flodiebold so I think it is plausible that visitors should visit the item body by default. At least i'd be fine with landing a change that did this by default. After all, if the visitor does not try to recurse, it won't register as a read, and I think it will be the common case -- most checkers want to check the signature+body together.

@nikomatsakis
Copy link
Contributor Author

@flodiebold well, in any case, I'd be happy to land a PR where existing behavior is preserved and then we can tinker later, if we see a need.

@flodiebold
Copy link
Member

Ok, I guess I'll just make the PR.

bors added a commit that referenced this issue Nov 29, 2016
Separate function bodies from their signatures in HIR

Also give them their own dep map node.

I'm still unhappy with the handling of inlined items (1452edc1), but maybe you have a suggestion how to improve it.

Fixes #35078.

r? @nikomatsakis
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-incr-comp Area: Incremental compilation
Projects
None yet
Development

No branches or pull requests

5 participants