Skip to content
This repository has been archived by the owner on Aug 31, 2023. It is now read-only.

feat: ungrammar codegen #1715

Merged
merged 20 commits into from
Nov 1, 2021
Merged

feat: ungrammar codegen #1715

merged 20 commits into from
Nov 1, 2021

Conversation

ematipico
Copy link
Contributor

@ematipico ematipico commented Oct 22, 2021

Summary

This PR implements #1722

It took more than expected to implement this. The unexpected part wasn't the actual implementation of ungrammar and the changes in the codegen, but the fact that rslint_parser logic has some tweaks that I uncovered along the way and it took a while to understand them and to patch them up.

The issue is the following:

It's convenient to define unions in respect of other unions. For example, JsClassMemberName = JsObjectMemberName | JsPrivateClassMemberName where JsObjectMemberName is an union type as well. This is convenient to maintain the grammar but makes the Facade more awkward to use because it requires two matches: first on the outer union and then on the inner union. We can avoid this if we flatten unions inside of the source gen and automatically generate From implementations.

In order to fix the issue and maintain compatibility with the current logic of the parser/green nodes, I had to add more types to the current enums. I commented the parts that I added in order to fix the issue.

This is not the final fix.

In order to correctly fix the issue, we have to remove the "middle" enums. An example of the fix would be:

From:

Smt = 
	ArrayExpr
	| ObjectExpr
	| Decl

Decl = 
	ClassDecl
	| FnDecl

To:

Smt = 
	ArrayExpr
	| ObjectExpr
-	| Decl
+	| ClassDecl
+	| FnDecl

- Decl = 
-	ClassDecl
-	| FnDecl

While changing the code of the codegen, I took the liberty to breakdown a bit the code and move some logic into different files.

Note around JSON test

I had to remove the -42 from the JSON test because it was marked as a UNARY_EXPR, which is a node that we didn't implement. Not sure this changed now. As we are using a temporary solution for our JSON parser, I figured that we could, for now, skip it and take a lot at it later.

Test Plan

cargo lint
cargo test
cargo format

@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Oct 22, 2021

Deploying with  Cloudflare Pages  Cloudflare Pages

Latest commit: caf97d8
Status: ✅  Deploy successful!
Preview URL: https://83a05fe0.tools-8rn.pages.dev

View logs

@ematipico ematipico force-pushed the feature/codegen-with-ungrammar branch from 82db3de to 978a1c1 Compare October 22, 2021 18:43
@MichaReiser MichaReiser force-pushed the feature/codegen-with-ungrammar branch 2 times, most recently from 809a25d to 978a1c1 Compare October 25, 2021 16:38
xtask/js.ungram Outdated Show resolved Hide resolved
@jamiebuilds
Copy link
Contributor

I haven't fully read through the PR yet so idk how much it's already doing this, but question:

Can we enforce rules on how the Ungrammar should be structured, such as requiring field_names: in places we need it, during the codegen step?

@ematipico
Copy link
Contributor Author

ematipico commented Oct 27, 2021

Can we enforce rules on how the Ungrammar should be structured, such as requiring field_names: in places we need it, during the codegen step?

If you mean that each node should have a prefix like statements:Stmt (where this will generate statementes() -> Option<Stmt> then yes, we can enforce it during the code generation.

@MichaReiser MichaReiser linked an issue Oct 29, 2021 that may be closed by this pull request
@ematipico ematipico marked this pull request as ready for review October 29, 2021 13:07
@ematipico ematipico mentioned this pull request Oct 29, 2021
47 tasks
Copy link
Contributor

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Boah! That's amazing work and my head is spinning after reviewing it.

There's certainly a lot going on here. Overall looks very good. I noticed some grammar rules that do not reflect the JS AST correctly. I would say, ignore all these comments if they exactly match how RSLint defines the tree today. We can re-visit those when we change the grammar

crates/formatter/src/format_json.rs Outdated Show resolved Hide resolved
crates/formatter/src/ts/class/class_declarator.rs Outdated Show resolved Hide resolved
crates/formatter/src/ts/class/constructor.rs Outdated Show resolved Hide resolved
Comment on lines +10 to +15
ExprOrSpread::ObjectExpr(object_expression) => {
object_expression.to_format_element(formatter)
}
ExprOrSpread::ArrayExpr(array_expression) => {
array_expression.to_format_element(formatter)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason that we now need to handle Literal, ObjectExpr, and ArrayExpr here (all of them are Expr).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue I explained before. The enums are not flatten so I needed a workaround to make it work, so I added additional nodes to the enum.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it depends on what behaviour we want. The interesting part is that RSLint doesn't flatten its enum. Because Expr is an enum on it's own: ExprOrSpread = Expr | SpreadElement and Expr = Literal | ObjectProp | ...

You would need to handle all the Expr branches here if we flatten the enum (but you could remove Expr.

What we have now is some weird mix... and it's not clear to me why. Would you be able to look a bit deeper into what the exact problem is if you remove Literal and ObjectExpr (and Array) again.

Copy link
Contributor Author

@ematipico ematipico Oct 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try to explain what's going in RSLint parser and how its manual implementation works.

Let's start a code example:

let v = (value  , second_value) =>    true

This snippet generates the following green tree:

Now, the root node in our example is represented by our struct Script. Script is designed like this:

Script = stmts:Stmt*

This means that it can have from zero to n number of Stmt. Now let's have a look at Stmt.

Stmt =
	BlockStmt
	| EmptyStmt
	| ExprStmt
	| IfStmt
	| DoWhileStmt
	| WhileStmt
	| ForStmt
	| ForInStmt
	| ForOfStmt
	| ContinueStmt
	| BreakStmt
	| ReturnStmt
	| WithStmt
	| LabelledStmt
	| SwitchStmt
	| ThrowStmt
	| TryStmt
	| DebuggerStmt
	| Decl

This is the original implementation that I created by following rslint_parser codegen.

Now, from SCRIPT we need to traverse to VAR_DECL. How? We pick the .kind() of Scripts child and we get VAR_DECL. As we instructed that Script can contain a children of Stmt we need to make sure that VAR_DECL is actually a Stmt.

VAR_DECL is not inside Stmt, so it should fail. But it doesn't because the manual implementation has an escape hatch, which is the following:

impl Stmt {
	cast(syntax) {
		match syntax.kind() {
			// all the possibilities
			LABELLED_STMT => LabelledStmt::LabelledStmt(LabelledStmt {syntax})
			_ => Decl::cast(syntax.kind()?) // this is the escape hatch
		}
	}
}

Please check the code here: https://github.com/rome/tools/blob/main/crates/rslint_parser/src/ast/stmt_ext.rs#L198

Doing so, it would allow the API to apply the cast() function to Decl, which also contains VarDecl - where its kind is VAR_DECL. And doing so, we are able to pull correctly the VarDecl child node.

The same thing happens with ExprOrSpread, where there's an escape hatch: https://github.com/rome/tools/blob/main/crates/rslint_parser/src/ast/expr_ext.rs#L423-L429

I am not a fun of this approach because it makes our API more opaque and difficult to maintain.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for that detailed explanation. I wasn't aware that the can_cast and cast implementation tread union types differently.

We should probably take some time (in a separate issue/PR) to figure out how union types work because the suggestion to flatten union types would probably solve that issue as well, but we need to see how economic it is to work with them (need to make sure that casting between the two is easy). I'm no longer 100% convinced if it is a good idea.

Replicating RSLint's behaviour should be straight forward in the codegen (in a new PR?): It mainly means that it must generate different code in can_cast and cast depending if the type is a "simple type" or a union type:

  • simple_type: Match on kind
  • union_type: Call into child union.

The main problem that I see is that this only works if union type only ever contains at most one other union type.

xtask/src/codegen/generate_nodes.rs Outdated Show resolved Hide resolved
xtask/src/codegen/kinds_src.rs Show resolved Hide resolved
xtask/src/codegen/kinds_src.rs Show resolved Hide resolved
xtask/src/codegen/syntax.rs Outdated Show resolved Hide resolved
xtask/src/lib.rs Outdated Show resolved Hide resolved
@ematipico ematipico force-pushed the feature/codegen-with-ungrammar branch from e19dea4 to f71cfc8 Compare October 29, 2021 17:55
Copy link
Contributor

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the comments and the nice explanation of the "Union" issue. It would probably be good to follow up with a solution to the union problem asap as this is a regression compared to our RSLint facade.

I didn't do a full review of the JS syntax but we can go through it when we re-visit the AST structure in #1725

crates/formatter/src/ts/mod.rs Outdated Show resolved Hide resolved
xtask/src/main.rs Outdated Show resolved Hide resolved
@ematipico ematipico force-pushed the feature/codegen-with-ungrammar branch from 8d53f9b to caf97d8 Compare November 1, 2021 11:44
@ematipico ematipico merged commit ac7c574 into main Nov 1, 2021
@ematipico ematipico deleted the feature/codegen-with-ungrammar branch November 1, 2021 12:03
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

☂️ Use ungrammar to automate the generation of code
3 participants