-
Notifications
You must be signed in to change notification settings - Fork 660
Conversation
Deploying with Cloudflare Pages
|
Test262 comparison coverage results on ubuntu-latest
|
Test262 comparison coverage results on windows-latest
|
│ | ||
1 │ (5 + 5) => {} | ||
│ ^ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is actually a good example. We don't want to progress character by character when doing error recovery. Instead, the ParameterList
should skip all tokens until it fins a safe token (,
, )
or maybe the start of another pattern).
3550273
to
ef4e2bf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the proposed changes overall. I have been having hard time working with a Option<CompletedMarker>
because it doesn't tell us when something is a real error or it is expected.
These changes will make things easier to us when we want to work on the recover strategies.
I left some questions around the trait
that are proposed. I believe we should tighten the types to avoid to misuse the implementation of the traits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t have much hands-on experience working with these parse methods so I don’t have much feedback, but this does seem like a good idea to me.
Do you think it’s necessary for ergonomics that the parse methods return an actual Result
rather than a custom enum where things like Absent
and Unsupported
are simply variants? It feels a bit odd that a parse method that doesn’t find an optional node would consider that an error state and return an Err(AbsentError)
(if I’m understanding it correctly, so please correct me if I'm wrong). But if bubbling up errors using ?
is going to be common here, then that makes sense.
I agree on the ergonomics and my first version actually used custom enums but I had to reimplement many of the generic Result methods like 'ok', 'map' etc. The try operator can be useful in some rare cases but I don't think it's super important here. Let me revisit the result/custom enum decision tomorrow. I also came to the conclusion that we most certainly want a custom enum for 'ConditionalSyntax' |
I reworked the PR following @yassere suggestions
Thank you all for your feedback. The API feels now much better and more consistent than with what I initially started. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this new changes are refreshing and will make the usage of the parser better once we get used to it.
Now that we have an infrastructure that allows us to understand when we have errors, optional children and unsupported language features, I think all these features should live inside their own crate. Mainly because this will be used by other parsers.
Imagine creating a JSON parser where we can use ConditionalParsedSyntax
to support comments inside a JSON file.
I would suggest the following changes once the PR is merged:
- create a new crate called
parser_core
- move
ConditionalParsedSyntax
,ParsedSyntax
,RecoveryError
,ParseRecovery
,parse_error.rs
andExpectedNodeDiagnosticBuilder
insideparser_core
- move all markers (and probably events too) inside
parser_core
-
- document
parser_core
and create a small example (in aREADME.md
file or inside a doc comment) of how to use the new parsing infrastructure
- document
- create a trait called
ParserCore
which will contain all the existing functions for parsing a file (eat
,expect
,at
, etc.) rslint_parser
will become ajs_parser
and will implementParserCore
/// specified in the recovery set, the EOF, or a line break (depending on configuration). | ||
/// Returns `Ok(unknown_node)` if recovery was successful, and `Err(RecoveryError::Eof)` if the parser | ||
/// is at the end of the file (before starting recovery). | ||
pub fn recover(&self, p: &mut Parser) -> RecoveryResult { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old recover strategy was using .state.no_recovery
. I assume is not needed anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so but I might be wrong. My hope is that we can delete the whole state.no_recovery
thing but there might be a use case where we still need it and we should then add the check here as well.
@@ -234,6 +234,7 @@ pub fn check_lhs(p: &mut Parser, expr: JsAnyExpression, marker: &CompletedMarker | |||
} | |||
|
|||
/// Check if the var declaration in a for statement has multiple declarators, which is invalid | |||
#[allow(deprecated)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a TODO here? I presume we will need to remove this at some point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I inlined all allow
next to the problematic statement.
Yes, this should go away when we apply the changes to the AST that @jamiebuilds suggests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and the main for now is that we don't add any new uses... I'm less concerned about the existing ones.
…ing and prototype object parsing
…o guarantee progress).
@@ -226,10 +230,10 @@ fn class_member(p: &mut Parser) -> CompletedMarker { | |||
if declare && !has_access_modifier { | |||
// declare() and declare: foo | |||
if is_method_class_member(p, offset) { | |||
literal_member_name(p); // bump declare as identifier | |||
literal_member_name(p).ok().unwrap(); // bump declare as identifier |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we leave this unwrap()
here? If yes, why the .ok()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ParsedSyntax
doesn't have an unwrap
method. We need the ok
to first convert it into an Option
.
We can remove the unwrap
once the enclosing function returns a ParsedSyntax
to. I just didn't want to rewrite all parse rules in this commit :D
076b7f5
to
5e930e8
Compare
Moving out the core logic makes sense but I don't think I'll tackle this right now because i don't know all the constraints yet. But what I started doing is to move |
… to `or_missing` Rename `precede_required` to `precede_or_missing_with_error` and `precede_optional` to `precede_or_missing`
Summary
The preliminary goal of this PR is to refine the
parse_*
andrecover_
APIs so that they guide developers in building correct syntax trees and avoid infinite loops. The secondary goal is to add means to improve error recovery and simplify addingmissing
slots if a child is missing.Correctness
The preliminary goal of this PR is to refine the API of the
parse_*
andrecover
methods to guide parser authors to create correct syntax trees. One shortcoming of the current API is that it's unclear for the caller when they must handle an error and when not because many methods return anOption<CompletedMarker>
but it has a different meaning for different rule implementationsThe problem is that handling a missing node is required for a) if the calling rule expects this to be a required parent but doesn't have to do anything in case of b) or c). There are other situations where there are three different variants of the same parse rule only to support the three different cases a), b), and c).
Our API (and the compiler) should guide developers to do the necessary recovery when needed and provide means to propagate errors in case they can't handle the error on their own. It should further not be required to implement the same rule multiple times to support the different cases. That's why this PR introduces a new
ParsedSyntax
that must be handled and redefines the contract of parse rules. The contract of a parsing rule is:Present(completed)
if it was able to parse a (partial) node. Partial means, even if it means that it only parsed thefor (
head of a for statement and all other children are missingAbsent
if the node wasn't present. The rule doesn't add any error in that case and the rule isn't allowed to progress the parser position.The PR further addresses the
recover
APIs (and unifies them) and enforce handling whatever the recovery is successful or not to avoid infinite loops. It does so by introducing a newRecoveryResult
and a recovery function is only successful if:Missing slots
#1724 requires that the parser adds a "missing slot" for every optional or required child that isn't present in the source text. This requires that the
parse_*
methods expose the information if they parsed a node or not.This PR adds helper methods to the before introduced
ParsedSyntax
to accomplish that.make_required
: Adds an error and a missing slot if the parse method failed to parse the expected node, doesn't do anything otherwisemake_optional
: Adds a missing slot if the parse method didn't return a nodeIt further introduces two helpers
precede_required
, andprecede_optional
that are useful if a parsing rule can only be parsed if another parse rule succeeded:The benefit of this is that this avoids creating a marker that then must be explicitly
abandoned
in case thelhs
isn't present in the source code.Recovery
A rule often doesn't know enough about its surroundings to decide for the best error recovery strategy. Rules may then be forced to only "eat" the next token and wrap it in an "unknown" node (if that is even allowed in that context). The problem is, there's no guarantee that the next token is valid in this context which has the result, that the parser will insert many diagnostics.
Ideally, the parser groups as many invalid tokens as possible into a single
Unknown
node and only adds a single diagnostic. For example, an array expression can use a more aggressive recovery if it failed to parse the next element and can eat all tokens up to the next,
,]
,}
,;
into anUnknown*
node. However, that only works if e.g. parsing an expression doesn't perform any error recovery as well.This is why this PR proposes to move error recovery to the call sites, that have the required context to perform good error recovery.
Conditional Syntax
There are different syntaxes that are only valid in a certain context:
with
: Loose mode onlyThe difficulty is that the parser must wrap such conditional syntax inside of an
Unknown
* node but unknown nodes don't exist for every node type. For example, the whole function declaration must be wrapped in anUnknownStatement
if any parameter has a typescript type annotation. However, this can't be done in theparse_parameter
rule because there's noUnknownParameter
node type. That's why the rule must propagate the error to the caller until it reaches theFunctionDeclaration
implementation that then can handle the case.This PR introduces a
ConditionalParsedSyntax
that must be handled to address this need. It should only be returned by parse rules that may return conditional syntax (and can't convert the node to anUnknown
node).Usage
The PR rewrote some parsing rules to show how the API is intended to be used. I also rewrote the assignment target parsing to use the new API [in this commit](https://github.com/rome/tools/pull/1805/commits
/b9f616e634e39a6bf5f0942d7ca8cd5193f402bc) (part of #1805)
Proposal
Rename the parsing rules from
assignment_expression
toparse_assignment_expression
, etc..Examples
Parsing a list with error recovery
tools/crates/rslint_parser/src/syntax/object.rs
Lines 23 to 50 in f57b7e3
Rule with conditional syntax
tools/crates/rslint_parser/src/syntax/stmt.rs
Lines 611 to 632 in 7212349
Type Script parse method
Nothing special, return a regular
ParsedSyntax
tools/crates/rslint_parser/src/syntax/function.rs
Lines 163 to 169 in 7212349
JS with may contain TS
tools/crates/rslint_parser/src/syntax/function.rs
Lines 44 to 120 in 7212349
Test Plan
cargo test
andcargo xtask coverage