feat: ungrammar codegen #1715

ematipico · 2021-10-22T18:20:57Z

Summary

This PR implements #1722

It took more than expected to implement this. The unexpected part wasn't the actual implementation of ungrammar and the changes in the codegen, but the fact that rslint_parser logic has some tweaks that I uncovered along the way and it took a while to understand them and to patch them up.

The issue is the following:

It's convenient to define unions in respect of other unions. For example, JsClassMemberName = JsObjectMemberName | JsPrivateClassMemberName where JsObjectMemberName is an union type as well. This is convenient to maintain the grammar but makes the Facade more awkward to use because it requires two matches: first on the outer union and then on the inner union. We can avoid this if we flatten unions inside of the source gen and automatically generate From implementations.

In order to fix the issue and maintain compatibility with the current logic of the parser/green nodes, I had to add more types to the current enums. I commented the parts that I added in order to fix the issue.

This is not the final fix.

In order to correctly fix the issue, we have to remove the "middle" enums. An example of the fix would be:

From:

Smt = 
	ArrayExpr
	| ObjectExpr
	| Decl

Decl = 
	ClassDecl
	| FnDecl

To:

Smt = 
	ArrayExpr
	| ObjectExpr
-	| Decl
+	| ClassDecl
+	| FnDecl

- Decl = 
-	ClassDecl
-	| FnDecl

While changing the code of the codegen, I took the liberty to breakdown a bit the code and move some logic into different files.

Note around JSON test

I had to remove the -42 from the JSON test because it was marked as a UNARY_EXPR, which is a node that we didn't implement. Not sure this changed now. As we are using a temporary solution for our JSON parser, I figured that we could, for now, skip it and take a lot at it later.

Test Plan

cargo lint
cargo test
cargo format

cloudflare-workers-and-pages · 2021-10-22T18:34:59Z

Deploying with Cloudflare Pages

Latest commit:	`caf97d8`
Status:	✅ Deploy successful!
Preview URL:	https://83a05fe0.tools-8rn.pages.dev

View logs

xtask/js.ungram

jamiebuilds · 2021-10-27T00:46:50Z

I haven't fully read through the PR yet so idk how much it's already doing this, but question:

Can we enforce rules on how the Ungrammar should be structured, such as requiring field_names: in places we need it, during the codegen step?

ematipico · 2021-10-27T19:31:59Z

Can we enforce rules on how the Ungrammar should be structured, such as requiring field_names: in places we need it, during the codegen step?

If you mean that each node should have a prefix like statements:Stmt (where this will generate statementes() -> Option<Stmt> then yes, we can enforce it during the code generation.

MichaReiser

Boah! That's amazing work and my head is spinning after reviewing it.

There's certainly a lot going on here. Overall looks very good. I noticed some grammar rules that do not reflect the JS AST correctly. I would say, ignore all these comments if they exactly match how RSLint defines the tree today. We can re-visit those when we change the grammar

crates/rslint_parser/src/ast/generated/tokens.rs

crates/formatter/src/format_json.rs

crates/formatter/src/ts/class/class_declarator.rs

crates/formatter/src/ts/class/constructor.rs

MichaReiser · 2021-10-29T13:09:50Z

crates/formatter/src/ts/expr_or_spread.rs

+			ExprOrSpread::ObjectExpr(object_expression) => {
+				object_expression.to_format_element(formatter)
+			}
+			ExprOrSpread::ArrayExpr(array_expression) => {
+				array_expression.to_format_element(formatter)
+			}


What's the reason that we now need to handle Literal, ObjectExpr, and ArrayExpr here (all of them are Expr).

Same issue I explained before. The enums are not flatten so I needed a workaround to make it work, so I added additional nodes to the enum.

I guess it depends on what behaviour we want. The interesting part is that RSLint doesn't flatten its enum. Because Expr is an enum on it's own: ExprOrSpread = Expr | SpreadElement and Expr = Literal | ObjectProp | ...

You would need to handle all the Expr branches here if we flatten the enum (but you could remove Expr.

What we have now is some weird mix... and it's not clear to me why. Would you be able to look a bit deeper into what the exact problem is if you remove Literal and ObjectExpr (and Array) again.

I will try to explain what's going in RSLint parser and how its manual implementation works.

Let's start a code example:

let v = (value , second_value) => true

This snippet generates the following green tree:

[email protected] [email protected] [email protected] "let" [email protected] " " [email protected] [email protected] [email protected] [email protected] "v" [email protected] " " [email protected] "=" [email protected] " " [email protected] [email protected] [email protected] "(" [email protected] [email protected] [email protected] "value" [email protected] " " [email protected] "," [email protected] " " [email protected] [email protected] [email protected] "second_value" [email protected] ")" [email protected] " " [email protected] "=>" [email protected] " " [email protected] [email protected] "true"

Now, the root node in our example is represented by our struct Script. Script is designed like this:

Script = stmts:Stmt*

This means that it can have from zero to n number of Stmt. Now let's have a look at Stmt.

Stmt = BlockStmt | EmptyStmt | ExprStmt | IfStmt | DoWhileStmt | WhileStmt | ForStmt | ForInStmt | ForOfStmt | ContinueStmt | BreakStmt | ReturnStmt | WithStmt | LabelledStmt | SwitchStmt | ThrowStmt | TryStmt | DebuggerStmt | Decl

This is the original implementation that I created by following rslint_parser codegen.

Now, from SCRIPT we need to traverse to VAR_DECL. How? We pick the .kind() of Scripts child and we get VAR_DECL. As we instructed that Script can contain a children of Stmt we need to make sure that VAR_DECL is actually a Stmt.

VAR_DECL is not inside Stmt, so it should fail. But it doesn't because the manual implementation has an escape hatch, which is the following:

impl Stmt { cast(syntax) { match syntax.kind() { // all the possibilities LABELLED_STMT => LabelledStmt::LabelledStmt(LabelledStmt {syntax}) _ => Decl::cast(syntax.kind()?) // this is the escape hatch } } }

Please check the code here: https://github.com/rome/tools/blob/main/crates/rslint_parser/src/ast/stmt_ext.rs#L198

Doing so, it would allow the API to apply the cast() function to Decl, which also contains VarDecl - where its kind is VAR_DECL. And doing so, we are able to pull correctly the VarDecl child node.

The same thing happens with ExprOrSpread, where there's an escape hatch: https://github.com/rome/tools/blob/main/crates/rslint_parser/src/ast/expr_ext.rs#L423-L429

I am not a fun of this approach because it makes our API more opaque and difficult to maintain.

Thanks for that detailed explanation. I wasn't aware that the can_cast and cast implementation tread union types differently.

We should probably take some time (in a separate issue/PR) to figure out how union types work because the suggestion to flatten union types would probably solve that issue as well, but we need to see how economic it is to work with them (need to make sure that casting between the two is easy). I'm no longer 100% convinced if it is a good idea.

Replicating RSLint's behaviour should be straight forward in the codegen (in a new PR?): It mainly means that it must generate different code in can_cast and cast depending if the type is a "simple type" or a union type:

simple_type: Match on kind

union_type: Call into child union.

The main problem that I see is that this only works if union type only ever contains at most one other union type.

xtask/src/codegen/generate_nodes.rs

xtask/src/codegen/kinds_src.rs

xtask/src/codegen/syntax.rs

xtask/src/lib.rs

MichaReiser

Thanks for addressing the comments and the nice explanation of the "Union" issue. It would probably be good to follow up with a solution to the union problem asap as this is a regression compared to our RSLint facade.

I didn't do a full review of the JS syntax but we can go through it when we re-visit the AST structure in #1725

crates/formatter/src/ts/mod.rs

xtask/src/main.rs

ematipico force-pushed the feature/codegen-with-ungrammar branch from 82db3de to 978a1c1 Compare October 22, 2021 18:43

MichaReiser force-pushed the feature/codegen-with-ungrammar branch 2 times, most recently from 809a25d to 978a1c1 Compare October 25, 2021 16:38

jamiebuilds reviewed Oct 27, 2021

View reviewed changes

xtask/js.ungram Outdated Show resolved Hide resolved

MichaReiser linked an issue Oct 29, 2021 that may be closed by this pull request

☂️ Use ungrammar to automate the generation of code #1722

Closed

ematipico marked this pull request as ready for review October 29, 2021 13:07

ematipico mentioned this pull request Oct 29, 2021

☂️ AST Façade Improvements #1725

Closed

47 tasks

MichaReiser reviewed Oct 29, 2021

View reviewed changes

ematipico force-pushed the feature/codegen-with-ungrammar branch from e19dea4 to f71cfc8 Compare October 29, 2021 17:55

MichaReiser approved these changes Nov 1, 2021

View reviewed changes

crates/formatter/src/ts/mod.rs Outdated Show resolved Hide resolved

xtask/src/main.rs Outdated Show resolved Hide resolved

MichaReiser mentioned this pull request Nov 1, 2021

☂️ Weekly goals - week 43 #1717

Closed

ematipico added 16 commits November 1, 2021 08:43

chore: wip for ungrammar codegen

a6e5e30

fix: token identifier generation

3196859

feat: support expressions

92f5117

feat: statements

78d1ba0

feat: patterns

358ce8f

feat: typescript types

a0ead85

feat: put generated code in new files

9df187a

chore: rename few methods

3bcd6b7

feat: working codegen

3478dd8

fix: rename methods

8d66498

chore: better exclusion logic

119e952

fix: array elements

1e25ec7

fix: class declarations

5b53a60

fix: json array expression

67849d0

fix: literal prop formatting

f0c3d24

chore: breakdown in more files

1c470ce

ematipico added 4 commits November 1, 2021 08:43

chore: format

12f453d

chore: code review

f1c4342

chore: clippy

f0dc89e

chore: code review

caf97d8

ematipico force-pushed the feature/codegen-with-ungrammar branch from 8d53f9b to caf97d8 Compare November 1, 2021 11:44

ematipico merged commit ac7c574 into main Nov 1, 2021

ematipico deleted the feature/codegen-with-ungrammar branch November 1, 2021 12:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: ungrammar codegen #1715

feat: ungrammar codegen #1715

ematipico commented Oct 22, 2021 •

edited

Loading

cloudflare-workers-and-pages bot commented Oct 22, 2021 •

edited

Loading

jamiebuilds commented Oct 27, 2021

ematipico commented Oct 27, 2021 •

edited

Loading

MichaReiser left a comment

MichaReiser Oct 29, 2021

ematipico Oct 29, 2021

MichaReiser Oct 29, 2021

ematipico Oct 29, 2021 •

edited

Loading

MichaReiser Oct 30, 2021

MichaReiser left a comment •

edited

Loading

feat: ungrammar codegen #1715

feat: ungrammar codegen #1715

Conversation

ematipico commented Oct 22, 2021 • edited Loading

Summary

Note around JSON test

Test Plan

cloudflare-workers-and-pages bot commented Oct 22, 2021 • edited Loading

Deploying with Cloudflare Pages

jamiebuilds commented Oct 27, 2021

ematipico commented Oct 27, 2021 • edited Loading

MichaReiser left a comment

Choose a reason for hiding this comment

MichaReiser Oct 29, 2021

Choose a reason for hiding this comment

ematipico Oct 29, 2021

Choose a reason for hiding this comment

MichaReiser Oct 29, 2021

Choose a reason for hiding this comment

ematipico Oct 29, 2021 • edited Loading

Choose a reason for hiding this comment

MichaReiser Oct 30, 2021

Choose a reason for hiding this comment

MichaReiser left a comment • edited Loading

Choose a reason for hiding this comment

ematipico commented Oct 22, 2021 •

edited

Loading

cloudflare-workers-and-pages bot commented Oct 22, 2021 •

edited

Loading

ematipico commented Oct 27, 2021 •

edited

Loading

ematipico Oct 29, 2021 •

edited

Loading

MichaReiser left a comment •

edited

Loading