-
Notifications
You must be signed in to change notification settings - Fork 664
☂️ AST Façade Improvements #1725
Comments
@ematipico can you take a look at the codegen extensions and tick off any that you've already implemented as part of #1722 |
In #1715, I created a I am going to prepare now a list of differences in the AST that I found. |
Regarding the Maybe we could have |
The approach we decided on uses ‘Unknown *’ nodes that are part of the corresponding union type. But there might be non_unique, as you outlined for the default clause tough I’m not sure if we should just parse this as an unknown case |
OK. I will start the implementation with an enum, and we can revisit it later if we think it's not needed. |
Do we already have a solution for the union types? Would be nice to get the code gen as close to done as possible |
I think the best approach is to explode the enum during the code gen. Let's take this example:
In this example During the code gen we can detect enums of enum and make a substitution and explode the enum // before
impl AstNode for MyNode {
fn cast(syntax: SyntaxNode) -> Option<Self> {
// this here would fail because kind is "FOO"
let res = match syntax.kind() {
ALT_ONE => => MyNode::AltOne(AltOne { syntax }),
ALT_TWO => => MyNode::AltTwo(AltTwo { syntax }),
_ => None
}
}
}
// after
impl AstNode for MyNode {
fn cast(syntax: SyntaxNode) -> Option<Self> {
// this here would fail because kind is "FOO"
let res = match syntax.kind() {
FOO => => MyNode::Foo(Foo { syntax }),
BAR => => MyNode::Bar(Bar { syntax }),
// lorem and ipsum too
_ => None
}
}
} |
That makes sense. I was initially worried that it will make it hard to convert a statement to a declaration and then pass it to a function but that’s easy, just call ‘Decl::cast(stmt)’ |
This is not yet implemented. For example, the code gen generates impl BinExpr {
pub fn l_angle_token(&self) -> Option<SyntaxToken> { support::token(&self.syntax, T ! [<]) }
pub fn r_angle_token(&self) -> Option<SyntaxToken> { support::token(&self.syntax, T ! [>]) }
pub fn less_than_equal_token(&self) -> Option<SyntaxToken> {
support::token(&self.syntax, T ! [<=])
}
pub fn greater_than_equal_token(&self) -> Option<SyntaxToken> {
support::token(&self.syntax, T ! [>=])
}
pub fn equality_token(&self) -> Option<SyntaxToken> { support::token(&self.syntax, T ! [==]) }
pub fn strict_equality_token(&self) -> Option<SyntaxToken> {
support::token(&self.syntax, T ! [===])
}
pub fn inequality_token(&self) -> Option<SyntaxToken> { support::token(&self.syntax, T ! [!=]) }
pub fn strict_inequality_token(&self) -> Option<SyntaxToken> {
support::token(&self.syntax, T ! [!==])
}
pub fn plus_token(&self) -> Option<SyntaxToken> { support::token(&self.syntax, T ! [+]) }
pub fn minus_token(&self) -> Option<SyntaxToken> { support::token(&self.syntax, T ! [-]) }
pub fn star_token(&self) -> Option<SyntaxToken> { support::token(&self.syntax, T ! [*]) }
pub fn divide_token(&self) -> Option<SyntaxToken> { support::token(&self.syntax, T ! [/]) }
pub fn reminder_token(&self) -> Option<SyntaxToken> { support::token(&self.syntax, T ! [%]) }
pub fn exponent_token(&self) -> Option<SyntaxToken> { support::token(&self.syntax, T ! [**]) }
pub fn left_shift_token(&self) -> Option<SyntaxToken> { support::token(&self.syntax, T ! [<<]) }
pub fn right_shift_token(&self) -> Option<SyntaxToken> {
support::token(&self.syntax, T ! [>>])
}
pub fn unsigned_right_shift_token(&self) -> Option<SyntaxToken> {
support::token(&self.syntax, T ! [>>>])
}
pub fn amp_token(&self) -> Option<SyntaxToken> { support::token(&self.syntax, T ! [&]) }
pub fn bitwise_or_token(&self) -> Option<SyntaxToken> { support::token(&self.syntax, T ! [|]) }
pub fn bitwise_xor_token(&self) -> Option<SyntaxToken> { support::token(&self.syntax, T ! [^]) }
pub fn nullish_coalescing_token(&self) -> Option<SyntaxToken> {
support::token(&self.syntax, T ! [??])
}
pub fn logical_or_token(&self) -> Option<SyntaxToken> { support::token(&self.syntax, T ! [||]) }
pub fn logical_and_token(&self) -> Option<SyntaxToken> {
support::token(&self.syntax, T ! [&&])
}
pub fn in_token(&self) -> Option<SyntaxToken> { support::token(&self.syntax, T![in]) }
pub fn instanceof_token(&self) -> Option<SyntaxToken> {
support::token(&self.syntax, T![instanceof])
}
} instead of impl BinExpr {
pub fn operator(&self) -> Option<SyntaxToken> {
support::token(&self.syntax)
}
} |
@MichaReiser I am tackling the issue in your last comment and I believe the implementation would need to be different. The function impl BinExpr {
pub fn operator(&self) -> Option<SyntaxToken> {
// new function find_token
support::find_token(&self.syntax, [T![&&], T![==]]) // an array of all the possible expected tokens
}
} And the function pub(super) fn find_token(
parent: SyntaxNode,
possible_kinds: &[SyntaxKind],
) -> Option<SyntaxToken> {
parent
.children_with_tokens()
.filter_map(|it| it.into_token())
.find(|it| {
possible_kinds
.iter()
.any(|possible_kind| *possible_kind == it.kind())
})
} |
Refines the AST Facade for literals to match the grammar proposed in #1719. This change is part of #1725. * Introduce new `JsAnyLiteral` * Split `Literal` into `JsStringLiteral`, `JsBooleanLiteral`, `JsNullLiteral`, `JsRegexLiteral`, and `JsBigIntLiteral`. This allows to implement custom methods on the corresponding literal nodes. * Renames the `number` and kinds to `JS_NUMBER_LITERAL_TOKEN` `JS_STRING_LITERAL_TOKEN` to avoid conflicts with the TS `number` keyword (and keep symmetry). * Removed some unused keywords and special handling inside of the code gen.
Refines the AST Facade for literals to match the grammar proposed in #1719. This change is part of #1725. * Introduce new `JsAnyLiteral` * Split `Literal` into `JsStringLiteral`, `JsBooleanLiteral`, `JsNullLiteral`, `JsRegexLiteral`, and `JsBigIntLiteral`. This allows to implement custom methods on the corresponding literal nodes. * Renames the `number` and kinds to `JS_NUMBER_LITERAL_TOKEN` `JS_STRING_LITERAL_TOKEN` to avoid conflicts with the TS `number` keyword (and keep symmetry). * Removed some unused keywords and special handling inside of the code gen.
Changes the js grammar of the variable declaration to match our new AST facade as defined in #1725 (and proposed in #1719) * Split `VarDecl` into `JsVariableDeclarationStatement` and `JsVariableDeclaration` to correctly account for variable declarations inside for loops (that aren't statements). * Change the parser to emit a `let` token instead of an ident if let is used inside a variable declaration (let is a valid identifier which is why the lexer returns an `ident` token) * Rename `Declarator` to `JsVariableDeclarator` * Split out the `init` into a `JsEqualValueClause` that holds the `=` token and the initializer expression.
Refines the AST Facade for literals to match the grammar proposed in #1719. This change is part of #1725. * Introduce new `JsAnyLiteral` * Split `Literal` into `JsStringLiteral`, `JsBooleanLiteral`, `JsNullLiteral`, `JsRegexLiteral`, and `JsBigIntLiteral`. This allows to implement custom methods on the corresponding literal nodes. * Renames the `number` and kinds to `JS_NUMBER_LITERAL_TOKEN` `JS_STRING_LITERAL_TOKEN` to avoid conflicts with the TS `number` keyword (and keep symmetry). * Removed some unused keywords and special handling inside of the code gen.
Refines the AST Facade for literals to match the grammar proposed in #1719. This change is part of #1725. * Introduce new `JsAnyLiteral` * Split `Literal` into `JsStringLiteral`, `JsBooleanLiteral`, `JsNullLiteral`, `JsRegexLiteral`, and `JsBigIntLiteral`. This allows to implement custom methods on the corresponding literal nodes. * Renames the `number` and kinds to `JS_NUMBER_LITERAL_TOKEN` `JS_STRING_LITERAL_TOKEN` to avoid conflicts with the TS `number` keyword (and keep symmetry). * Removed some unused keywords and special handling inside of the code gen.
This restructures the AST structure of object expressions and related nodes (members, etc.) and is part of #1725. * Renames `ObjectExpr` to `JsObjectExpression` * Renames `props` to `members` * Renames the `key` of all members to `name` * Renames `IdentProp` to `JsShorthandPropertyObjectMember` that directly stores a `ident` token * Renames `LiteralProp` to `JsPropertyMember` and changes the `key` from `Name` to `JsStaticObjectMemberName` * Introduces `JsGetterObjectMember` (unclear if it should be shared with classes in the future) * Introduces `JsSetterObjectMember` (unclear if it should be shared with classes in the future) * Introduces `JsMethodObjectMember` (unclear if it should be shared with classes in the future) * Renames `SpreadProp` to `JsSpread` The PR fixes some issues when it comes to `setter/getter` parsing. The main concern around sharing `Js*Member`s with classes is that objects are more restrictive when it comes to naming because classes support private properties which object members don't. I think the better approach is to introduce a `JsAnyGetter = JsGetterObjectMember | JsGetterClassMember` union that implements helpers to retrieve the shared properties.
This PR refactors the AST structures for class declarations and class expressions as defined in #1725. * Class declaration & Expression * Rename from `ClassDecl`/`ClassExpr` to `JsClassDeclaration`/`JsClassExpression` * Renamed `name` to `id` * Introduced a `JsExtendsClause` that wraps the `extends` keyword and the parent class * Renamed `body` to `members` and removed the `ClassBody` node * Constructor * Rename from `Constructor` to `JsConstructorClassMember` * Rename `name` to `id` * Parse "constructor" and 'constructor' methods as `JsConstructorClassMember`s * Method * Renamed `Method` to `JsMethodClassMember` * Properties * Renamed `ClassProp` to `JsPropertyClassMember` * Getter * Renamed `Getter` to `JsGetterClassMember` * Removed the parameter list (getters can't have parameters) * Setter * Renamed `Setter` to `JsSetterClassMember` * Replaced the parameter list with a single `parameter` * Private properties * Deleted the `PrivateProp` and instead introduce a `JsPrivatePropertyClassMemberName` type since all members can be private (except constructors) * Replaced `JsEmptyStatement` with `JsEmptyMember` (these aren't proper statements) * TS Accessibility: Rename to access modifier to match typescripts class documentation * Rename `JsComputedObjectMemberName` and `JsStaticObjectMemberName` to `JsComputedMemberName` and `JsLiteralMember` because they can be shared between classes and objects * Rename `TsReturnType` to `TsTypeAnnotation` so that it can also be used for properties, parameters, etc. This PR also adds a set of new tests for classes to verify the member parsing. I did so because I had to refactor much of the parsing logic to make sense of it (and reduce some repetitive code and fix some parsing issues related to semicolons). I further extracted the class parsing from the `decl.rs` into its own `class.rs`. You can see my refactorings if you look at the individual commits.
Refactors the AST tree structure for MemberExpressions as part of #1725 * Renames `BracketExpr` to `JsComputedMemberExpression` * Renames `DotExpr` to `JsStaticMemberExpression` * Introduces the new `JsReferenceIdentifierMember` and `JsReferencePrivateMember` which are references to member names (analog to `JsReferenceIdentifierExpression` that references an identifier and `JsBindingIdentifier` that defines an identifier) * Merge `PrivatePropAccess` into `JsStaticMemberExpression` (enabled by introducing `JsReferenceMember` * Introduce `SuperExpression` so that `super.test()` works * Add new check that verifies that calling `super()` only works inside of constructor (I leave checking if we're inside of a subclass to another PR). * Delete `SuperCall` as this can now be modelled using `CallExpr` * Deleted some no longer used nodes/kinds
Implementing the shape validation described in #1867 requires that it's possible to tell the required shape of a node by just looking at its kind. This hasn't been possible so far for list nodes that all use the same ` LIST` kind. The kind doesn't expose any information of what the expected type of the child nodes is nor if it is a separated or a node list. This PR replaces the generic `LIST` kind with specific list types, for example, it introduces the `JsArrayElementList` for the list storing the elements of an array expression. A consequence of having explicit `LIST` nodes is that these should now also implement the `AstNode` interface, they are nodes after all. The consequences of doing so are: * `AstSeparatedList` and `AstNodeList` must be traits that each specific list implement together with the `AstNode` interface * It's no longer possible to use a `missing` placeholder to represent an empty list because a concrete node is required to implement `AstNode.syntax()`. Overall, the change seems to nicely align with how the rest of the AST facade works and makes list nodes less special, they're just regular nodes that on top implement additional methods that allow iterating over their elements. The main downside of this change is that it's now required to implement the `AstSeparatedList` and/or `AstNodeList` traits if someone wants to use any of the list methods. I guess, that's just how Rust works and shouldn't be too much of a surprise. Part of #1725 #1867
Closing this because most AST changes have been implemented. There are some more proposed changes that were discussed in #1868 and are probably worth doing at some point but implementing these changes isn't of the utmost importance at the time being. The
The next steps are discussed in #1848 and I propose to first write a few analyzers to get some real-world experiences with potential issues before making any decision. |
Description
Part of #1718
AST Facade for lists
Introduce the
AstList<N>
andAstSeparatedList<N>
and change the codegen to reduce anAstList
forT*
and anAstSeparatedList
for(T (',' T)* ','?)
where','
could be any token. See this discussion for more details.AstList
: View over a homogenous list of child nodes. For example, a list of the statements in a block statement. Wrap many-child inside of a list node #1728AstSeparatedList
: View over a list that contains nodes separated by a token. For example, the array element nodes that are separated by comma tokens. See this prototype for inspiration. AST Separated List #1746Replace
Error
withUnknown*
nodesA possible way of defining the
Unknown*
nodes inUngrammar
could look as follow:SyntaxNode
,SyntaxToken
, andSyntaxElement
nodes and instead, resolve them to the types from the parser crate. #1751Stmt
,Expr
,ClassElement
)Error
handler #1759Error
kindChange return type of mandatory nodes
The ungrammar encodes the optionality of the children and the codegen should take that into consideration when generating the field accessors.
Mandatory children should now return
Result
s instead ofOption
because it's an error if they're missing. Following a draft of the APISyntaxResult
#1731Codegen extensions
Add support for labels so that we can influence how the child accessors are named. For example, we want to name the left and right expressions of
BinaryExpression
left
andright
and notexpression
andexpression. Ungrammar supports the
label: Type` syntax. All we need to do is to respect the naming when generating the fields.Implode alternatives of tokens #1741
Support inline token-union types. For example, the binary expression operator can be any of
==, +, -, ...
. That's why it's defined asoperator: ( '==' | '+' | '-' ...)
. The source gen should only generate a single field for the operator that returns aSyntaxToken
Flatten unions of unions during the code generation #1743
It's convenient to define unions in respect of other unions. For example,
JsClassMemberName = JsObjectMemberName | JsPrivateClassMemberName
whereJsObjectMemberName
is an union type as well. This is convenient to maintain the grammar but makes the Facade more awkward to use because it requires two matches: first on the outer union and then on the inner union. We can avoid this if we flatten unions inside of the source gen and automatically generateFrom<InnerUnion>
implementations.New AST Structure and naming
Requirements to AST
It's possible to visit every element using the AST facade (including separator tokens)
Visiting a node with a missing mandatory child yields a
SyntaxError::MissingElement
(for example, atry
statement missing both thecatch
andfinally
clauseChildren, where their optionality depends on the presence/absence of other siblings, are grouped in a node to guarantee that visiting any yields a
SyntaxError::MissingElement
if any of the other siblings are missing.Prefix nodes with 'Js'
Js
from codegen and enforce usage of labels in codegen #1733Align the AST structure with Rome-classic's AST. This is not a complete list of the work to be done. The end goal is to implement the grammar defined in RFC: JS Grammar #1719
Stmt
andExpr
toAnyJsStatement
andAnyJsExpression
refactor (rslint_parser): Rename Script & Module, Stmt, and Expr #1767JsScript
andJsModule
refactor (rslint_parser): Rename Script & Module, Stmt, and Expr #1767Block
,Debugger
,Empty
,Expression
,Return
,Labeled
,With
refactor(rslint_parser): Restructure simple statements #1769Continue
,Break
,DoWhile
While
refactor(rslint_parser): Update while loops to match new AST Facade #1774For
,ForIn
,ForOf
. RenameForStmt
,CallExpr
andNewExpr
#1886If
,Switch
,Try
,Throw
refactor(rslint_parser): IfStatement Ast Node #1771, refactor(rslint_parser): Try Statement Facade #1770FunctionDeclaration
,FunctionExpression
,ArrowFunctionExpression
(without patterns) Function Expression/Declaration AST Facade #1786ForStmt
,CallExpr
andNewExpr
#1886ForStmt
,CallExpr
andNewExpr
#1886OptionalCallExpression...LiteralExpression
Rename*Literal
nodes to*LiteralExpression
#1800remove TS parsing from JS parsing 📎 Gracefully remove TS parsing from JS parsing #1835Documentation the AST
FunctionHead
etc) @ematipicoThe text was updated successfully, but these errors were encountered: