Skip to content
This repository has been archived by the owner on Aug 31, 2023. It is now read-only.

Parse and Error Recovery API #1815

Merged
merged 18 commits into from
Nov 25, 2021
Merged

Parse and Error Recovery API #1815

merged 18 commits into from
Nov 25, 2021

Conversation

MichaReiser
Copy link
Contributor

@MichaReiser MichaReiser commented Nov 23, 2021

Summary

The preliminary goal of this PR is to refine the parse_* and recover_ APIs so that they guide developers in building correct syntax trees and avoid infinite loops. The secondary goal is to add means to improve error recovery and simplify adding missing slots if a child is missing.

Correctness

The preliminary goal of this PR is to refine the API of the parse_* and recover methods to guide parser authors to create correct syntax trees. One shortcoming of the current API is that it's unclear for the caller when they must handle an error and when not because many methods return an Option<CompletedMarker> but it has a different meaning for different rule implementations

  • a) Try to parse a node of type X
  • b) Parse a node of type X, add an error if it's missing
  • c) Parse a nod of type X, add an error and perform some error recovery

The problem is that handling a missing node is required for a) if the calling rule expects this to be a required parent but doesn't have to do anything in case of b) or c). There are other situations where there are three different variants of the same parse rule only to support the three different cases a), b), and c).

Our API (and the compiler) should guide developers to do the necessary recovery when needed and provide means to propagate errors in case they can't handle the error on their own. It should further not be required to implement the same rule multiple times to support the different cases. That's why this PR introduces a new ParsedSyntax that must be handled and redefines the contract of parse rules. The contract of a parsing rule is:

  • Returns Present(completed) if it was able to parse a (partial) node. Partial means, even if it means that it only parsed the for ( head of a for statement and all other children are missing
  • Returns Absent if the node wasn't present. The rule doesn't add any error in that case and the rule isn't allowed to progress the parser position.

The PR further addresses the recover APIs (and unifies them) and enforce handling whatever the recovery is successful or not to avoid infinite loops. It does so by introducing a new RecoveryResult and a recovery function is only successful if:

  • The parser isn't at the EOF
  • The recovery consumed at least one token (it did some recovery)

Missing slots

#1724 requires that the parser adds a "missing slot" for every optional or required child that isn't present in the source text. This requires that the parse_* methods expose the information if they parsed a node or not.

This PR adds helper methods to the before introduced ParsedSyntax to accomplish that.

  • make_required: Adds an error and a missing slot if the parse method failed to parse the expected node, doesn't do anything otherwise
  • make_optional: Adds a missing slot if the parse method didn't return a node

It further introduces two helpers precede_required, and precede_optional that are useful if a parsing rule can only be parsed if another parse rule succeeded:

let lhs = expr(p)?;
let binary = lhs.precede_required(p); // inserts binary as a parent of lhs but only if lhs is present in the source

The benefit of this is that this avoids creating a marker that then must be explicitly abandoned in case the lhs isn't present in the source code.

Recovery

A rule often doesn't know enough about its surroundings to decide for the best error recovery strategy. Rules may then be forced to only "eat" the next token and wrap it in an "unknown" node (if that is even allowed in that context). The problem is, there's no guarantee that the next token is valid in this context which has the result, that the parser will insert many diagnostics.

Ideally, the parser groups as many invalid tokens as possible into a single Unknown node and only adds a single diagnostic. For example, an array expression can use a more aggressive recovery if it failed to parse the next element and can eat all tokens up to the next ,, ], }, ; into an Unknown* node. However, that only works if e.g. parsing an expression doesn't perform any error recovery as well.

This is why this PR proposes to move error recovery to the call sites, that have the required context to perform good error recovery.

Conditional Syntax

There are different syntaxes that are only valid in a certain context:

  • with: Loose mode only
  • typescript: Typescript files only
  • import/export: top of a module
  • experimental syntax

The difficulty is that the parser must wrap such conditional syntax inside of an Unknown* node but unknown nodes don't exist for every node type. For example, the whole function declaration must be wrapped in an UnknownStatement if any parameter has a typescript type annotation. However, this can't be done in the parse_parameter rule because there's no UnknownParameter node type. That's why the rule must propagate the error to the caller until it reaches the FunctionDeclaration implementation that then can handle the case.

This PR introduces a ConditionalParsedSyntax that must be handled to address this need. It should only be returned by parse rules that may return conditional syntax (and can't convert the node to an Unknown node).

Usage

The PR rewrote some parsing rules to show how the API is intended to be used. I also rewrote the assignment target parsing to use the new API [in this commit](https://github.com/rome/tools/pull/1805/commits
/b9f616e634e39a6bf5f0942d7ca8cd5193f402bc) (part of #1805)

Proposal

Rename the parsing rules from assignment_expression to parse_assignment_expression, etc..

Examples

Parsing a list with error recovery

pub(super) fn object_expr(p: &mut Parser) -> CompletedMarker {
let m = p.start();
p.expect_required(T!['{']);
let props_list = p.start();
let mut first = true;
while !p.at(EOF) && !p.at(T!['}']) {
if first {
first = false;
} else {
p.expect(T![,]);
if p.at(T!['}']) {
break;
}
}
let recovered_member = object_member(p).or_recover(
p,
ParseRecovery::new(JS_UNKNOWN_MEMBER, token_set![T![,], T!['}'], T![;], T![:]])
.with_recovery_on_line_break(),
JsParseErrors::expected_object_member,
);
if recovered_member.is_err() {
break;
}
}

Rule with conditional syntax

pub fn with_stmt(p: &mut Parser) -> ParsedSyntax {
if !p.at(T![with]) {
return Absent;
}
let m = p.start();
p.bump_any(); // with
parenthesized_expression(p);
stmt(p, None);
let with_stmt = m.complete(p, JS_WITH_STATEMENT);
// or SloppyMode.exclusive_syntax(...) but this reads better with the error message, saying that
// it's only forbidden in strict mode
let conditional = StrictMode.excluding_syntax(p, with_stmt, |p, marker| {
p.err_builder("`with` statements are not allowed in strict mode")
.primary(marker.range(p), "")
});
conditional.or_invalid_to_unknown(p, JS_UNKNOWN_STATEMENT)
}

Type Script parse method

Nothing special, return a regular ParsedSyntax

fn parse_ts_parameter_types(p: &mut Parser) -> ParsedSyntax {
if p.at(T![<]) {
Present(ts_type_params(p).unwrap())
} else {
Absent
}
}

JS with may contain TS

fn function(p: &mut Parser, kind: SyntaxKind) -> ConditionalParsedSyntax {
let m = p.start();
let mut uses_ts_syntax = kind == JS_FUNCTION_DECLARATION && p.eat(T![declare]);
let in_async = p.at(T![ident]) && p.cur_src() == "async";
if in_async {
p.bump_remap(T![async]);
}
p.expect_required(T![function]);
let in_generator = p.eat(T![*]);
let guard = &mut *p.with_state(ParserState {
labels: HashMap::new(),
in_function: true,
in_async,
in_generator,
..p.state.clone()
});
let id = opt_binding_identifier(guard);
if let Some(mut identifier_marker) = id {
identifier_marker.change_kind(guard, JS_IDENTIFIER_BINDING);
} else if kind == JS_FUNCTION_DECLARATION {
let err = guard
.err_builder(
"expected a name for the function in a function declaration, but found none",
)
.primary(guard.cur_tok().range, "");
guard.error(err);
}
let type_parameters =
parse_ts_parameter_types(guard).exclusive_for(&TypeScript, guard, |p, marker| {
p.err_builder("type parameters can only be used in TypeScript files")
.primary(marker.range(p), "")
});
uses_ts_syntax |= type_parameters.is_present();
if let Valid(type_parameters) = type_parameters {
type_parameters.make_optional(guard);
}
parameter_list(guard);
let return_type = parse_ts_return_type(guard).exclusive_for(&TypeScript, guard, |p, marker| {
p.err_builder("return types can only be used in TypeScript files")
.primary(marker.range(p), "")
});
uses_ts_syntax |= return_type.is_present();
if let Valid(return_type) = return_type {
return_type.make_optional(guard);
}
if kind == JS_FUNCTION_DECLARATION {
function_body_or_declaration(guard);
} else {
function_body(guard).make_required(guard, JsParseErrors::expected_function_body);
}
let function = m.complete(guard, kind);
if uses_ts_syntax {
// change kind to TS specific kind?
// No need to add an error here because the return type / type parameters nodes already
// have an error
TypeScript.exclusive_syntax_no_error(guard, function)
} else {
Valid(function.into())
}
}

Test Plan

cargo test and cargo xtask coverage

@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Nov 23, 2021

Deploying with  Cloudflare Pages  Cloudflare Pages

Latest commit: 543e071
Status: ✅  Deploy successful!
Preview URL: https://d369e3f0.tools-8rn.pages.dev

View logs

@github-actions
Copy link

github-actions bot commented Nov 23, 2021

Test262 comparison coverage results on ubuntu-latest

Test result main count This PR count Difference
Total 17608 17608 0
Passed 16787 16787 0
Failed 820 820 0
Panics 1 1 0
Coverage 95.34% 95.34% 0.00%

@github-actions
Copy link

github-actions bot commented Nov 23, 2021

Test262 comparison coverage results on windows-latest

Test result main count This PR count Difference
Total 17608 17608 0
Passed 16787 16787 0
Failed 820 820 0
Panics 1 1 0
Coverage 95.34% 95.34% 0.00%

1 │ (5 + 5) => {}
│ ^

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is actually a good example. We don't want to progress character by character when doing error recovery. Instead, the ParameterList should skip all tokens until it fins a safe token (,, ) or maybe the start of another pattern).

Copy link
Contributor

@ematipico ematipico left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the proposed changes overall. I have been having hard time working with a Option<CompletedMarker> because it doesn't tell us when something is a real error or it is expected.

These changes will make things easier to us when we want to work on the recover strategies.

I left some questions around the trait that are proposed. I believe we should tighten the types to avoid to misuse the implementation of the traits.

crates/rslint_parser/src/parser/parse_result.rs Outdated Show resolved Hide resolved
crates/rslint_parser/src/parser/parse_result.rs Outdated Show resolved Hide resolved
crates/rslint_parser/src/parser/parse_result.rs Outdated Show resolved Hide resolved
crates/rslint_parser/src/parser/parse_result.rs Outdated Show resolved Hide resolved
crates/rslint_parser/src/parser/parse_result.rs Outdated Show resolved Hide resolved
crates/rslint_parser/src/parser.rs Outdated Show resolved Hide resolved
crates/rslint_parser/src/parser/parse_result.rs Outdated Show resolved Hide resolved
crates/rslint_parser/src/parser/parse_result.rs Outdated Show resolved Hide resolved
crates/rslint_parser/src/parser/parse_result.rs Outdated Show resolved Hide resolved
@MichaReiser MichaReiser marked this pull request as ready for review November 24, 2021 13:12
@MichaReiser MichaReiser changed the title Error recovery Parse and Error Recovery API Nov 24, 2021
Copy link
Contributor

@yassere yassere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t have much hands-on experience working with these parse methods so I don’t have much feedback, but this does seem like a good idea to me.

Do you think it’s necessary for ergonomics that the parse methods return an actual Result rather than a custom enum where things like Absent and Unsupported are simply variants? It feels a bit odd that a parse method that doesn’t find an optional node would consider that an error state and return an Err(AbsentError) (if I’m understanding it correctly, so please correct me if I'm wrong). But if bubbling up errors using ? is going to be common here, then that makes sense.

crates/rslint_parser/src/lib.rs Outdated Show resolved Hide resolved
crates/rslint_parser/src/parser/parse_error.rs Outdated Show resolved Hide resolved
crates/rslint_parser/src/syntax/js_parse_errors.rs Outdated Show resolved Hide resolved
@MichaReiser
Copy link
Contributor Author

I don’t have much hands-on experience working with these parse methods so I don’t have much feedback, but this does seem like a good idea to me.

Do you think it’s necessary for ergonomics that the parse methods return an actual Result rather than a custom enum where things like Absent and Unsupported are simply variants? It feels a bit odd that a parse method that doesn’t find an optional node would consider that an error state and return an Err(AbsentError) (if I’m understanding it correctly, so please correct me if I'm wrong). But if bubbling up errors using ? is going to be common here, then that makes sense.

I agree on the ergonomics and my first version actually used custom enums but I had to reimplement many of the generic Result methods like 'ok', 'map' etc.

The try operator can be useful in some rare cases but I don't think it's super important here.

Let me revisit the result/custom enum decision tomorrow.

I also came to the conclusion that we most certainly want a custom enum for 'ConditionalSyntax'

@MichaReiser
Copy link
Contributor Author

MichaReiser commented Nov 25, 2021

I reworked the PR following @yassere suggestions

  • Replaced ParseResult with ParsedSyntax which is our own enum. Eliminates the need for an AbsentError and using Present(completed) and Absent in parse rules certainly gives more information than Ok(completed) and Err(AbsentError).
  • Replaced ConditionalSyntaxParseResult with our own enum ConditionalParsedSyntax, mainly for the same reasons as above
  • Introduced a new InvalidParsedSyntax for syntax that is invalid in the current parsing context.
  • Refined the documentation and payed attention to consistently use the same concepts (syntax, conditional syntax, invalid syntax)

Thank you all for your feedback. The API feels now much better and more consistent than with what I initially started.

Copy link
Contributor

@ematipico ematipico left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this new changes are refreshing and will make the usage of the parser better once we get used to it.

Now that we have an infrastructure that allows us to understand when we have errors, optional children and unsupported language features, I think all these features should live inside their own crate. Mainly because this will be used by other parsers.

Imagine creating a JSON parser where we can use ConditionalParsedSyntax to support comments inside a JSON file.

I would suggest the following changes once the PR is merged:

  • create a new crate called parser_core
  • move ConditionalParsedSyntax, ParsedSyntax, RecoveryError, ParseRecovery, parse_error.rs and ExpectedNodeDiagnosticBuilder inside parser_core
  • move all markers (and probably events too) inside parser_core
    • document parser_core and create a small example (in a README.md file or inside a doc comment) of how to use the new parsing infrastructure
  • create a trait called ParserCore which will contain all the existing functions for parsing a file (eat, expect, at, etc.)
  • rslint_parser will become a js_parser and will implement ParserCore

crates/rslint_parser/src/parser/parse_error.rs Outdated Show resolved Hide resolved
crates/rslint_parser/src/parser/parse_recovery.rs Outdated Show resolved Hide resolved
/// specified in the recovery set, the EOF, or a line break (depending on configuration).
/// Returns `Ok(unknown_node)` if recovery was successful, and `Err(RecoveryError::Eof)` if the parser
/// is at the end of the file (before starting recovery).
pub fn recover(&self, p: &mut Parser) -> RecoveryResult {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old recover strategy was using .state.no_recovery. I assume is not needed anymore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so but I might be wrong. My hope is that we can delete the whole state.no_recovery thing but there might be a use case where we still need it and we should then add the check here as well.

crates/rslint_parser/src/parser/parsed_syntax.rs Outdated Show resolved Hide resolved
crates/rslint_parser/src/parser/parsed_syntax.rs Outdated Show resolved Hide resolved
@@ -234,6 +234,7 @@ pub fn check_lhs(p: &mut Parser, expr: JsAnyExpression, marker: &CompletedMarker
}

/// Check if the var declaration in a for statement has multiple declarators, which is invalid
#[allow(deprecated)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a TODO here? I presume we will need to remove this at some point

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I inlined all allow next to the problematic statement.

Yes, this should go away when we apply the changes to the AST that @jamiebuilds suggests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and the main for now is that we don't add any new uses... I'm less concerned about the existing ones.

@@ -226,10 +230,10 @@ fn class_member(p: &mut Parser) -> CompletedMarker {
if declare && !has_access_modifier {
// declare() and declare: foo
if is_method_class_member(p, offset) {
literal_member_name(p); // bump declare as identifier
literal_member_name(p).ok().unwrap(); // bump declare as identifier
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we leave this unwrap() here? If yes, why the .ok()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ParsedSyntax doesn't have an unwrap method. We need the ok to first convert it into an Option.

We can remove the unwrap once the enclosing function returns a ParsedSyntax to. I just didn't want to rewrite all parse rules in this commit :D

@MichaReiser
Copy link
Contributor Author

Overall, this new changes are refreshing and will make the usage of the parser better once we get used to it.

Now that we have an infrastructure that allows us to understand when we have errors, optional children and unsupported language features, I think all these features should live inside their own crate. Mainly because this will be used by other parsers.

Imagine creating a JSON parser where we can use ConditionalParsedSyntax to support comments inside a JSON file.

I would suggest the following changes once the PR is merged:

* create a new crate called `parser_core`

* move `ConditionalParsedSyntax`, `ParsedSyntax`, `RecoveryError`, `ParseRecovery`, `parse_error.rs` and `ExpectedNodeDiagnosticBuilder` inside `parser_core`

* move all markers (and probably events too) inside `parser_core`

* * document `parser_core` and create a small example (in a `README.md` file or inside a doc comment) of how to use the new parsing infrastructure

* create a trait called `ParserCore` which will contain all the existing functions for parsing a file (`eat`, `expect`, `at`, etc.)

* `rslint_parser` will become a `js_parser` and will implement `ParserCore`

Moving out the core logic makes sense but I don't think I'll tackle this right now because i don't know all the constraints yet. But what I started doing is to move Js specific features into the syntax folder.

… to `or_missing`

Rename `precede_required` to `precede_or_missing_with_error` and `precede_optional` to `precede_or_missing`
@MichaReiser MichaReiser merged commit 22cd68b into main Nov 25, 2021
@MichaReiser MichaReiser deleted the feature/error-recovery branch November 25, 2021 13:27
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants