-
Notifications
You must be signed in to change notification settings - Fork 11.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[move-compiler] Parser resilience #16673
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
4 Ignored Deployments
|
10fedc7
to
77b4a74
Compare
@@ -1,4 +1,4 @@ | |||
module a::m { | |||
macro fun foo<$T>($x: $T): $T { $x } | |||
fun bar(): u64 { foo<u64>!(42) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sor tof change seems rather suspect to me. What happened to cause this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an error in the body of the function which causes the returned value to be incorrect. Previously, the parser error caused compilation to stop but now additional typing error would be reported (wrong return type). I made this type of change in other places as well to make it more clear which compilation error we are testing for.
│ ^ | ||
│ │ | ||
│ Unexpected '+' | ||
│ Expected ',' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A nit on these: we can pretty precisely tell the user which things we expected: anything in the start set plus the end token. This gets too large sometimes to bother reporting (the exp
start set, for example), but here we could say "Expected ,
or )
". I added a macro format_oxford_list
under shared/mod.rs
that I wrote originally precisely for these sorts of errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bump on this, seems helpful to say ,
or )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
...rnal-crates/move/crates/move-compiler/tests/move_2024/parser/labeled_lambda_body_invalid.exp
Outdated
Show resolved
Hide resolved
error[E01002]: unexpected token | ||
┌─ tests/move_2024/parser/macro_identifier_invalid_no_following_characters.move:3:1 | ||
│ | ||
3 │ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to suppress these errors when we find EOF so that we don't get these weird ghost errors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is not much we can do here (and in some other places) without an additional machinery of some kind as EOF is the only part of the stop set that can be a match here.
One idea is ignore "unexpected EOF if another error has already been recorded. We could do this (as already suggested in the other comment by @cgswords) simply checking if there are any diags in the compilation environment already (should we suppress other errors the same way?). What do you think, @cgswords and @tnowacki ?
┌─ tests/move_2024/parser/labeled_lambda_body_invalid.move:7:9 | ||
│ | ||
7 │ call!(|x| -> u64 'a: 0); // parsing error needs a block | ||
│ ^^^^^^^^^^^^^^^^^^^^^^^ | ||
│ │ │ | ||
│ │ Found 0 argument(s) here | ||
│ Invalid call of 'a::m::call'. The call expected 1 argument(s) but got 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be worth it to build some machinery to avoid reporting these sorts of "double errors"? We already have an error, since we messed up parsing. Is it useful to report this second one, too?
If we decide not to, a few idea of how to avoid it:
(1) We keep track of bad-parse locations and, when reporting errors, compare the errors to see if it's the same problem (specifically look through the env diagnostics and see if we had a parsing error, and decide not to reporting a naming error at the same location). This could prove to be rather finnicky.
(2) We could mark the arguments as having errors (vec[E::UnexpectedError]
) and check for that sort of thing when deciding to report errors later:
if args.length() != fn_args.length() && args.iter().any(|arg| arg.is_error) {
.. report arity error
}
This would require hand-addressing these in a bunch of spots in the compiler.
(3) When reporting diagnostics, only report the first one we receive for each location. This would actually still produce this error, but not the one below it. That might be an improvement.
Alternatively, maybe we should just live with all these errors. We have, however, gotten some feedback that errors can be too noisy to find the root cause and I'm afraid this is going to make the problem worse.
This is possibly worth discussing as a group (cc @tnowacki )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have an idea generally of avoiding a lot of double errors. And it mostly involves changing UnresovledError
to UnresolvedError(Vec<Exp>)
. This lets us drop expressions less often, and also let's us know when we might need to suppress errors (since we can wrap things as Errors more often without fearing dropping them)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this example, we could mark it possibly as call!(UnresolvedError(vec![]))
conceptually, which gives us an easy spot to suppress arity errors and the like
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we even need to go that far, we just need to do a bit of error tuning. For this case, we could see if any of the call args contains an error and avoid reporting arity errors in that case.
6 │ S { mut f } = s1; | ||
│ ^^^^^^^^^^^ Missing assignment for field 'f' in 'a::m::S' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this error should not be reported. Why aren't we grabbing f
as a field?
public fun foo(_s: S) { | ||
let f = 0; | ||
S { mut f } = s1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public fun foo(_s: S) { | |
let f = 0; | |
S { mut f } = s1; | |
public fun foo(s: S) { | |
let f = 0; | |
S { mut f } = s; |
b455200
to
45da52e
Compare
@@ -1439,6 +1439,7 @@ fn parse_sequence(context: &mut Context) -> Result<Sequence, Box<Diagnostic>> { | |||
let mut seq: Vec<SequenceItem> = vec![]; | |||
let mut last_semicolon_loc = None; | |||
let mut eopt = None; | |||
let mut parsing_error = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood here but it feels a bit strange. For the errors we had in question, I would have imagined parse_sequence_item
to have returned an UnresolvedError?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have imagined parse_sequence_item to have returned an UnresolvedError?
This would indeed be preferred, but I'd rather not go into this rabbit hole just yet. The reason it's a bit complicated is that on failure in parsing SequenceItem
, we need to fast forward the parser to the end of the item (so that parsing at Sequence
level does not produce a bunch of weird errors trying to restart parsing). The stop set for sequence item set at the Sequence
level would include ;
and }
but we can't simply stop at }
if we are parsing something like this:
fun test00(): S<u64> {
0x42<u64>::m::S { u: 0 }
}
If we do, we will stop at }
finishing pack rather than at the }
finishing sequence. What we'd need to do is to count opening and closing braces, but at the level we need to do it, we technically don't know that it should be done (as we only have a stop set as the source of information).
Currently we handle this by having an error returned from SequenceItem
parsing and doing the correct fast-forwarding at SequenceLevel
where we know how SequenceItem
can end and what tokens to look for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't we just have match parse_sequence_item(context) { ... | Err(diag) => { seq.push(UnresolvedError); ...
? Is more my point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doh... Now that I tried it, though, I realized that it wouldn't quite work. The reason for it is that in order to suppress the typing error for non-parsable function body, there can only be one sequence element and this element must be UnresolvedError
. Unfortunately, if we place UnresolvedError
in seq
here and the last no-semicolon statement is empty, expansion's translation pass will insert another element to the sequence for the non-semicolon statement.
I can still insert UnresolvedError
into the sequence and at the end move it to no-semicolon placeholder if the placeholder is empty and there is only one list element, but this arguably wouldn't be much prettier...
Added a comment along those lines, but open to additional suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, on the second thought, perhaps translation to expansion should be changed to not produce a final unit-typed expression if the only sequence member is unresolved. I made a change along those lines, please let me know what you think.
45da52e
to
9cc054a
Compare
12930f9
to
c8dacc3
Compare
c8dacc3
to
796a5b5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Things are looking good! Approving if we need to get it in quickly, but I think @cgswords has been following this much more thoroughly than me. So I think it would benefit getting his eyes before landing
d1d3193
to
0899e7e
Compare
Dang, some of these matching error changes are a bit unfortunate -- i wonder if we want to add =>
to the stop set for patterns. I'm also still curious if we can get rid of EOFs in some cases by checking if we are standing at the }
that matches the module start. That said, stamping this so that it doesn't stall IDE work. We can come back and tune those later in smaller PRs.
Description
This PR adds resilience to parser errors to the compiler. The high-level idea is to (eventually) always return a correctly (though potentially only partially) parsed AST node at every level. For example,
fun foo
would be parsed as a correct function even though it does not have parameter list or a body.Test Plan
Tests need to be adjusted to check that everything works.
If your changes are not user-facing and do not break anything, you can skip the following section. Otherwise, please briefly describe what has changed under the Release Notes section.
Type of Change (Check all that apply)
Release notes
Developers might see more compiler diagnostics as selected parsing errors no longer prevent compilation and diagnostics from the compiler reaching later compilation stages where additional diagnostics may be generated.