three-point error messages in borrowck #45988

nikomatsakis · 2017-11-14T19:06:12Z

The NLL RFC goes to some lengths to discuss how to phrase NLL error messages. In particular, we want to adopt the so-called "three point" style:

error[E0506]: cannot write to `i` while borrowed
 --> <anon>:4:5
   |
 3 |     let x = &i;
   |              - (shared) borrow of `i` occurs here
 4 |     i += 1;
   |     ^^^^^^ write to `i` occurs here, while borrow is still active
 5 |     println!("{}", x);
   |                    - borrow is later used here

This is going to require a bit of work. Right now, we easily have two of those points: the borrow checker has identified a borrow, and it knows where it occurred (that's the "borrow occurs here" point). It has also identified an access, and it knows where that occurs (that's the "write occurs here" point). What it does not know is the "borrow is later used here" point. That's a bit harder for us to find.

Right now, what we have readily available is just the region of the borrow. This indicates all the points in the control-flow graph where the borrow is still "live" (potentially in use), but it does not indicate why the borrow is potentially in use. It's the why (i.e., what use caused us to consider this point as part of the region) that we are interested in here.

Still, we have all the information we theoretically need. We have the region inference context, which contains the full set of constraints that we used to find the regions, and we have the MIR itself. (This is intentional: in the lexical checker, we only kept the final results of inference, and not the constraints that led to those results, which sometimes hobbled our ability to issue errors we wanted.)

However, it's still a bit of an open question how best to make use of those constraints to find the point in question. We could probably do some kind of graph-search heuristics that would lead us to the right point most of the time. I've also been considering another idea: we could re-run inference, but this time use different data structures. Rather than treating regions as simple sets of points, we would consider a region to be a set of (P, U) tuples. Here P is the liveness point, but U is some use of a variable (i.e., probably itself a pair (Pu, X) of a point of use and a variable name). The idea is that the region R contains the point P because of the use U. We'd have to extend liveness to track not only which variables are live, but what use U makes them live at that point.

The rest of inference is effectively unchanged, except that we track these pairs, so that whenever we add a point to a region, we also track the variable that made that point live (ultimately, every point in any region stems from some live use). This would be more expensive -- more data! -- but we're only doing it in the case of error. We could also do a more targeted form of inference, limiting ourselves to the regions that are in some way connected to the borrow region.

If we did that, then ultimately the borrow region would be inferred to contain some number of (P, U) tuples where P is the point of access and the use U is the third point we want to highlight.

What is appealing about this is that it is a very precise, very general analysis. What is unappealing is that it requires reproducing a lot of code. But we might be able to factor out most of it with generics so that it's not so much re-use.

If we don't do this, I'm not sure quite sure what else we would do. I can imagine some heuristics search -- basically making a graph that shows which regions are connected to what and then searching for a use of some variable X that contains a region P that is connected in some way to the borrow region -- but I haven't thought of anything else that yields the right results without essentially reproducing the analysis in some way. That heuristic search might or might not be good enough; I'd really prefer not to give wrong results to the user.

The text was updated successfully, but these errors were encountered:

nikomatsakis · 2017-11-14T19:06:41Z

cc @spastorino @Nashenas88 -- one of you might be the right ones to take this on.

Nashenas88 · 2017-11-15T02:03:54Z

I'd love to work on this. Whenever I get back to the visualizer work, this would be extremely useful to be familiar with.

Nashenas88 · 2017-11-15T02:49:50Z

One question about the UI here: If the borrow is moved between many variables and data structures, would it be useful to explain that movement? One of my biggest frustrations with some existing errors are when two seemingly unrelated types cause lifetime issues amongst each other. In those cases the relationship isn't immediately clear, and the errors is very vague. We are much more verbose when it comes to explaining unsatisfied trait bounds. Here's an example taken from Faraday, reproduced below:

error[E0277]: the trait bound `plugins::PluginTransform + 'static: std::marker::Sync` is not satisfied  
   --> src/project.rs:397:30
    |
397 |         self.pods.par_iter().map(|pod| -> Result<()> {  
    |                              ^^^ trait `plugins::PluginTransform + 'static: std::marker::Sync` not satisfied
    |
    = note: `plugins::PluginTransform + 'static` cannot be shared between threads safely
    = note: required because it appears within the type `Box<plugins::PluginTransform + 'static>`
    = note: required because of the requirements on the impl of `std::marker::Sync` for `std::ptr::Unique<Box<plugins::PluginTransform + 'static>>`
    = note: required because it appears within the type `alloc::raw_vec::RawVec<Box<plugins::PluginTransform + 'static>>`
    = note: required because it appears within the type `std::vec::Vec<Box<plugins::PluginTransform + 'static>>`
    = note: required because it appears within the type `plugins::Manager`
    = note: required because it appears within the type `std::option::Option<plugins::Manager>`
    = note: required because it appears within the type `project::Project`
    = note: required because it appears within the type `&project::Project`
    = note: required because it appears within the type `&&project::Project`
    = note: required because it appears within the type `[closure@src/project.rs:397:34: 424:10 op:&plugins::Operation, self:&&project::Project, export_dir:&&std::path::Path]`

If we use a modified version of one of the existing tests as an example:

#[derive(Debug)]
struct Wrap<'p> { p: &'p mut i32 }

impl<'p> Drop for Wrap<'p> {
    fn drop(&mut self) {
        *self.p += 1;
    }
}

#[derive(Debug)]
struct Foo<'p> { a: String, b: Wrap<'p> }

fn main() {
    let mut x = 0;
    let wrap = Wrap { p: &mut x };
    let s = String::from("str");
    let foo = Foo { a: s, b: wrap };
    x = 1; //~ ERROR
    println!("{:?}", foo);
}

I can imagine a more helpful error message saying something along the lines of:

error[E0506]: cannot write to `x` while borrowed
 --> main:15:5
   |
15 |     let wrap = Wrap { p: &mut x };
   |                               - borrow of `x` occurs here (into `wrap`)
17 |     let foo = Foo { a: s, b: wrap };
   |                              ---- borrow moved here (into `foo`)
18 |     x = 1;
   |     ^^^^^^ write to `x` occurs here, while borrow is still active
 5 |     println!("{:?}", foo);
   |                      --- borrow is later used here

I think that if we're going to write another inference pass to handle this, it might not be so challenging to extend to this given that we already have all the data necessary. Though, I can also see this rabbit holing into tons of special cases 😅 .

Would this be beyond the scope of the RFC?

nikomatsakis · 2017-11-15T10:11:55Z

@Nashenas88 I've wondered the same thing. I did not propose tracing the full path, because I didn't want to overwhelm the user, but I imagine it might be useful, yes. Maybe a good idea would be to design the system so that we have the information, then we can separately tweak when we think it makes sense to display it.

nikomatsakis · 2017-12-04T19:48:04Z

Some notes from a conversation with @spastorino:

https://gist.github.com/nikomatsakis/ed89304174dd607d887bc10a35e30ab7

nikomatsakis · 2017-12-08T09:43:31Z

OK, now that the basic causal work has landed, let me try to explain the next steps. Let's start with this example:

#![allow(warnings)]
fn read(_: &i32) { }
fn main() {
    let mut x = 22;
    let p = &x;
    x += 1; // point A
    read(p); // point B
}

Currently this will give an error like:

lunch-box. rustc --stage1 -Znll -Zborrowck=mir ~/tmp/issue-45988.rs
error[E0506]: cannot assign to `x` because it is borrowed
 --> /home/nmatsakis/tmp/issue-45988.rs:6:5
  |
5 |     let p = &x;
  |             -- borrow of `x` occurs here
6 |     x += 1; // point A
  |     ^^^^^^ assignment to borrowed `x` occurs here

as you can see, the error already identifies the point A, which is the point where the assignment occurs, and it identifies the borrow of x. Our goal is to extend this error with a bit of extra information; specifically, the point B. The key idea is that the point B explains why the borrow of x includes the point A -- because the reference p will later be used:

lunch-box. rustc --stage1 -Znll -Zborrowck=mir ~/tmp/issue-45988.rs
error[E0506]: cannot assign to `x` because it is borrowed
 --> /home/nmatsakis/tmp/issue-45988.rs:6:5
  |
5 |     let p = &x;
  |             -- borrow of `x` occurs here
6 |     x += 1; // point A
  |     ^^^^^^ assignment to borrowed `x` occurs here
7 |     read(p); // point B
  |        - borrowed reference later used here

There is a function in the source that has this purpose, called explain_why_borrow_contains_point. It is given a context, which identifies the point B in its context.loc field. It is also given a borrow argument: borrow.region contains the region for the borrow ofx. This region is known to include the point B, which is precisely why the error is occurring here.

Now, the region checker has been augmented to track causal information, and in fact explain_why_borrow_contains_point is using this to dump debugging information. We can iterate through that causal information to find the root cause -- in this case it will be a LiveVar cause. This tells us that the region R contains the point A because the variable p was live at A.

We're now almost ready to print out the extra information we wanted -- but not quite! We know that p was live at the point A, but that doens't give us B. The point is that a variable is live at some point if it may be used later, but we still have to find that later use. One approach would be to instrument the liveness analysis to track causal information, but that seems like a lot of work. An alternative would be to just search the MIR CFG starting from the point A. We are looking for the first statement that uses p -- but we stop the search if ever we find a statement that assigns to p. You can do this via a depth-first search, which is easier to code, or a breadth-first search, which may yield more intuitive results (since it will find the "closest" use).

The dfs.rs module in region inference may be of interest, since it preforms a simple depth-first search -- I'm not saying you want to re-use the code per se, more that it could be used as a model. Though it occurs to me now that there also exists various iteration code in librustc_data_structures you could use; for example librustc_data_structures::control_flow_graph::iterate::reverse_post_order will return a vector with all the basic blocks starting from a particular block (but then you still have to walk their statements).

The other thing you will need is some code to decide when a particular statement uses a local variable. The liveness analysis has this visitor that it uses for that. We may want to extract that code so it can be readily re-used; and in fact we probably want to respect the distinction between "regular" use and drops. If the cause is LiveVar, we only want to find a "regular use". If the cause is DropVar, we only want to find a "drop use".

There are some other root causes too, of course, but for now we can just ignore those and not try to add extra explanatory information. For example, if the cause is LiveOther, then there is no extra information we can find, so we just stop.

spastorino · 2017-12-09T00:10:04Z

Just in case so we don't step on each other, I'm working on this thing and it's close to be finished.

Closes rust-lang#45988

nikomatsakis added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. WG-compiler-nll labels Nov 14, 2017

nikomatsakis added the E-needs-mentor label Nov 14, 2017

nikomatsakis mentioned this issue Nov 14, 2017

non-lexical lifetimes (NLL) tracking issue #43234

Closed

23 tasks

nikomatsakis mentioned this issue Dec 7, 2017

NLL inference doesn't work when it probably should #46545

Closed

nikomatsakis mentioned this issue Dec 8, 2017

make causal tracking lazy #46590

Closed

spastorino mentioned this issue Dec 12, 2017

Add Three point error handling to borrowck nikomatsakis/rust#27

Merged

nikomatsakis pushed a commit to spastorino/rust that referenced this issue Dec 16, 2017

Add three point error handling to borrowck

f99ab56

Closes rust-lang#45988

nikomatsakis pushed a commit to nikomatsakis/rust that referenced this issue Dec 18, 2017

Add three point error handling to borrowck

51628d7

Closes rust-lang#45988

nikomatsakis pushed a commit to nikomatsakis/rust that referenced this issue Dec 19, 2017

Add three point error handling to borrowck

1bb1871

Closes rust-lang#45988

nikomatsakis pushed a commit to nikomatsakis/rust that referenced this issue Dec 20, 2017

Add three point error handling to borrowck

0285f81

Closes rust-lang#45988

bors closed this as completed in 3a185a5 Dec 21, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

three-point error messages in borrowck #45988

three-point error messages in borrowck #45988

nikomatsakis commented Nov 14, 2017

nikomatsakis commented Nov 14, 2017

Nashenas88 commented Nov 15, 2017

Nashenas88 commented Nov 15, 2017 •

edited

Loading

nikomatsakis commented Nov 15, 2017

nikomatsakis commented Dec 4, 2017

nikomatsakis commented Dec 8, 2017

spastorino commented Dec 9, 2017

three-point error messages in borrowck #45988

three-point error messages in borrowck #45988

Comments

nikomatsakis commented Nov 14, 2017

nikomatsakis commented Nov 14, 2017

Nashenas88 commented Nov 15, 2017

Nashenas88 commented Nov 15, 2017 • edited Loading

nikomatsakis commented Nov 15, 2017

nikomatsakis commented Dec 4, 2017

nikomatsakis commented Dec 8, 2017

spastorino commented Dec 9, 2017

Nashenas88 commented Nov 15, 2017 •

edited

Loading