Skip to content
This repository has been archived by the owner on Aug 31, 2023. It is now read-only.

feat(rome_js_formatter): Parenthesizing expressions #3057

Merged
merged 8 commits into from
Aug 18, 2022

Conversation

MichaReiser
Copy link
Contributor

@MichaReiser MichaReiser commented Aug 12, 2022

Summary

Part of #3046

This PR adds the capability of adding parentheses to improve readability or removing parentheses if they aren't strictly necessary.

The core architecture is around the NeedsParentheses trait that every JsAnyExpression now implements. Its main method is needs_parentheses which must return true if parentheses are necessary if removing the parentheses would change the semantics OR result in a syntax error. needs_parentheses may also return true in cases where parentheses aren't strictly necessary but improve readability.

The new NeedsParentheses trait is used in the FormatNodeRule trait that gets called when formatting every node. It now adds parentheses if needs_parentheses returns true.

FormatJsParenthesizedExpression has been changed to always remove parentheses and rely on FormatNodeRule to re-insert parentheses if necessary. This has the downside that any trailing comments of ( and leading comments of ) are pushed outside the parentheses. I think that's OK and is something we can address when working on #2768.

Note: This PR implements needs_parentheses for all binary like expression but they aren't used yet. I struggled to integrate the logic into the existing binary like formatting and intend to tackle it with anyway needed rewrite to closer match prettier's formatting. I plan to do this next.

Performance

This PR significantly reduces Rome performance because it is now necessary to call expression.resolve() or expression.resolve_parent to skip over parenthesized expressions when testing if a child or parent is of a certain kind.

The only alternative that I see to this approach would be to remove all ParenthesizedExpressions before formatting but this will have its own performance cost, requires source maps to map back to the original code in case a node must be printed in verbatim (to recover the parentheses), and is something we still have the chance to do in the future.

My main concern is more about the added complexity and it's easy to forget a resolve_expression or resolve_expression_parent call. I don't have good answer to this and there are likely still a few places where I missed to add these calls myself. My recommendation is that we can fix these over time.

group                                    main                                    parentheses
-----                                    ----                                    -----------
formatter/checker.ts                     1.00   219.6±15.94ms    11.8 MB/sec     1.04   227.3±18.85ms    11.4 MB/sec
formatter/compiler.js                    1.00    124.8±1.59ms     8.4 MB/sec     1.12    140.0±9.95ms     7.5 MB/sec
formatter/d3.min.js                      1.00    103.2±7.78ms     2.5 MB/sec     1.16    119.6±8.73ms     2.2 MB/sec
formatter/dojo.js                        1.00      8.1±0.02ms     8.5 MB/sec     1.00      8.1±0.62ms     8.5 MB/sec
formatter/ios.d.ts                       1.00    144.8±1.88ms    12.9 MB/sec     1.09   158.0±15.89ms    11.8 MB/sec
formatter/jquery.min.js                  1.00     29.7±1.87ms     2.8 MB/sec     1.03     30.7±2.13ms     2.7 MB/sec
formatter/math.js                        1.01   235.5±14.90ms     2.7 MB/sec     1.00   232.7±18.55ms     2.8 MB/sec
formatter/parser.ts                      1.02      5.2±0.40ms     9.3 MB/sec     1.00      5.1±0.32ms     9.5 MB/sec
formatter/pixi.min.js                    1.00    127.5±6.82ms     3.4 MB/sec     1.05   134.3±10.27ms     3.3 MB/sec
formatter/react-dom.production.min.js    1.04     37.3±2.29ms     3.1 MB/sec     1.00     35.7±1.18ms     3.2 MB/sec
formatter/react.production.min.js        1.00  1861.6±130.45µs     3.3 MB/sec    1.12      2.1±0.13ms     3.0 MB/sec
formatter/router.ts                      1.00      4.1±0.31ms    15.1 MB/sec     1.13      4.6±0.19ms    13.3 MB/sec
formatter/tex-chtml-full.js              1.00    274.8±6.53ms     3.3 MB/sec     1.14   314.6±22.55ms     2.9 MB/sec
formatter/three.min.js                   1.00   135.7±10.68ms     4.3 MB/sec     1.07    145.5±9.52ms     4.0 MB/sec
formatter/typescript.js                  1.00   911.6±60.87ms    10.4 MB/sec     1.03   938.6±58.19ms    10.1 MB/sec
formatter/vue.global.prod.js             1.00     42.9±0.15ms     2.8 MB/sec     1.25     53.4±3.34ms     2.3 MB/sec

Prettier Difference

I decided to diverge from Prettier's formatting in two cases:

ObjectExpression, FunctionExpression, or ClassExpression at the beginning of a statement.

// input
{}.b

// Prettier
({}.b)

// Rome
({}).b 

Prettier parenthesizes the whole expression whereas Rome parenthesizes the object/function/class expression only.

The main reason for diverging is that parenthesizing the whole expression requires that the logic is implemented in any expression that starts with another child expression (tagged template, binary expressions, computed member/assignment, sequence, conditional, ... ) and it requires traversing the first expression until it reaches an object expression or any expression that doesn't start with another expression. This is rather expensive and Rome's approach avoids the expensive traversal except for object/function and class expressions, of which there should be fewer.

Parenthesized optional chain

// Input
(a?.b).c
a?.b.c

// Prettier
(a?.b).c
a?.b.c

// Rome
a?.b.c
a?.b.c

Prettier keeps the parentheses if they were present in the source document but never adds them if they were missing. This is the only case where I found that the fact that parens are present in the source document changed how Prettier parenthesizes formatted expression.

My take is that Rome should apply a consistent formatting regardless if parentheses were or were not present in the source document.

Test Plan

I manually reviewed the snapshot changes.

Average compatibility: 81.82 -> 81.54%
Compatible lines: 78.64 -> 78.34%

This PR is regressing the compatibility. Mainly due to the following reasons:

  • Prettier has custom handling to avoid removing parentheses around closure type cast comments which Rome has not. This is something we can add support for in the future. The way I envision this to work is that we add a tag to every JsParenthesizedExpression that has a type cast comment (CC: @leops for a use case for node annotations)
  • Our handling of comments differs from Prettier's which results in a large number of line changes. We'll tackle this as part of Comment formatting - stability issues #2768
  • Prettier incompatibility of other syntax like JsCallArguments, JsTemplate, JsBinaryLikeExpression, member like expressions and more.

I'm in favour of merging this PR regardless of the regressions because these issues should be tackled when working on comments, or when improving the formatting of other syntaxes.

@MichaReiser MichaReiser temporarily deployed to aws August 12, 2022 16:04 Inactive
@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Aug 12, 2022

Deploying with  Cloudflare Pages  Cloudflare Pages

Latest commit: 9823c4b
Status: ✅  Deploy successful!
Preview URL: https://af726b97.tools-8rn.pages.dev
Branch Preview URL: https://feat-needs-parentheses.tools-8rn.pages.dev

View logs

@github-actions
Copy link

Parser conformance results on ubuntu-latest

js/262

Test result main count This PR count Difference
Total 45878 45878 0
Passed 44938 44938 0
Failed 940 940 0
Panics 0 0 0
Coverage 97.95% 97.95% 0.00%

jsx/babel

Test result main count This PR count Difference
Total 39 39 0
Passed 36 36 0
Failed 3 3 0
Panics 0 0 0
Coverage 92.31% 92.31% 0.00%

symbols/microsoft

Test result main count This PR count Difference
Total 5946 5946 0
Passed 396 396 0
Failed 5550 5550 0
Panics 0 0 0
Coverage 6.66% 6.66% 0.00%

ts/babel

Test result main count This PR count Difference
Total 588 588 0
Passed 519 519 0
Failed 69 69 0
Panics 0 0 0
Coverage 88.27% 88.27% 0.00%

ts/microsoft

Test result main count This PR count Difference
Total 16257 16257 0
Passed 12397 12397 0
Failed 3860 3860 0
Panics 0 0 0
Coverage 76.26% 76.26% 0.00%

@github-actions
Copy link

github-actions bot commented Aug 12, 2022

@MichaReiser MichaReiser temporarily deployed to aws August 13, 2022 13:10 Inactive
@MichaReiser MichaReiser temporarily deployed to aws August 13, 2022 16:01 Inactive
@MichaReiser MichaReiser temporarily deployed to aws August 13, 2022 16:02 Inactive
@MichaReiser MichaReiser changed the title feat(rome_js_formatter): Needs Parentheses feat(rome_js_formatter): Parenthesizing expressions Aug 13, 2022
@MichaReiser MichaReiser temporarily deployed to aws August 13, 2022 16:21 Inactive
@@ -16,19 +17,30 @@ impl FormatNodeRule<JsNumberLiteralExpression> for FormatJsNumberLiteralExpressi
let JsNumberLiteralExpressionFields { value_token } = node.as_fields();
let value_token = value_token?;

if let Some(static_member_expression) = node.parent::<JsStaticMemberExpression>() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now handled as part of needs_parens

matches!(arrow.body()?, JsAnyFunctionBody::JsFunctionBody(_))
}
_ => false,
let is_function_like = match resolve_call_argument_expression(&first) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes in this file are that checks testing if an argument ar of a specific expression now skip over parenthesized expressions.

let formatted =
FormatLiteralStringToken::new(&value_token, StringLiteralParentKind::Expression);

// Prevents that a string literal expression becomes a directive
let needs_parens =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now handled as part of needs_parentheses

}

// Parenthesize the inner expression if it's a binary or pre-update
// operation with an ambiguous operator (+ and ++ or - and --)
let is_ambiguous_expression = match &argument {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now handled as part of needs_parentheses

@@ -17,15 +21,60 @@ impl FormatNodeRule<JsxTagExpression> for FormatJsxTagExpression {
if_group_breaks(&text(")"))
])]
],
WrapState::AlwaysWrap => write![
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handled by needs_parens


-(a?.b).c();
-(a?.b[c]).c();
+a?.b.c();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rome always formats optional chains without parentheses for consistency reasons

+() => ({})``;
+({})``;
+a = () => ({}).x;
+({}) && a, b;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rome parentheses the object expression and not the whole expression if it is the first of the statement.

+ topRankedZoneFit.areaPercentageRemaining - previousZoneFitNow.areaPercentageRemaining
).toFixed(2);
-).toFixed(2);
+const areaPercentageDiff =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is related to our member chain formatting generating different IR

@@ -1,38 +1,30 @@
-(
- aaaaaaaaaaaaaaaaaaaaaaaaa &&
+(aaaaaaaaaaaaaaaaaaaaaaaaa &&
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is due to our binary expression formatting being different. This will be tackled separately

+ .ffffffff.gggggggg2 = class extends (
aaaaaaaa.bbbbbbbb.cccccccc.dddddddd.eeeeeeee.ffffffff.gggggggg1
) {
+ .ffffffff.gggggggg2 = class extends aaaaaaaa.bbbbbbbb.cccccccc.dddddddd
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because member assignment aren't formatted the same way as static member expressions (which they should)

@MichaReiser
Copy link
Contributor Author

!bench_formatter

@MichaReiser MichaReiser temporarily deployed to aws August 13, 2022 16:27 Inactive
@MichaReiser MichaReiser mentioned this pull request Aug 13, 2022
17 tasks
@github-actions
Copy link

Formatter Benchmark Results

group                                    main                                   pr
-----                                    ----                                   --
formatter/checker.ts                     1.00    362.9±1.62ms     7.2 MB/sec    1.09    396.9±3.94ms     6.6 MB/sec
formatter/compiler.js                    1.00    225.4±1.89ms     4.6 MB/sec    1.11    250.2±1.69ms     4.2 MB/sec
formatter/d3.min.js                      1.00    176.7±1.40ms  1519.4 KB/sec    1.14    200.6±2.17ms  1337.8 KB/sec
formatter/dojo.js                        1.00     12.9±0.06ms     5.3 MB/sec    1.08     13.8±0.02ms     5.0 MB/sec
formatter/ios.d.ts                       1.00    258.5±2.59ms     7.2 MB/sec    1.06    273.6±1.29ms     6.8 MB/sec
formatter/jquery.min.js                  1.00     49.4±0.20ms  1711.6 KB/sec    1.11     54.7±0.11ms  1547.9 KB/sec
formatter/math.js                        1.00    387.4±1.70ms  1711.7 KB/sec    1.08    418.8±2.41ms  1583.4 KB/sec
formatter/parser.ts                      1.00      8.7±0.10ms     5.5 MB/sec    1.07      9.3±0.01ms     5.2 MB/sec
formatter/pixi.min.js                    1.00    202.3±2.37ms     2.2 MB/sec    1.16    234.8±1.84ms  1914.2 KB/sec
formatter/react-dom.production.min.js    1.00     60.1±0.48ms  1962.4 KB/sec    1.11     66.8±0.21ms  1763.1 KB/sec
formatter/react.production.min.js        1.00      3.2±0.01ms  1990.4 KB/sec    1.09      3.4±0.01ms  1828.4 KB/sec
formatter/router.ts                      1.00      6.8±0.00ms     9.1 MB/sec    1.08      7.3±0.02ms     8.4 MB/sec
formatter/tex-chtml-full.js              1.00    496.3±1.85ms  1880.3 KB/sec    1.09    541.6±3.21ms  1723.0 KB/sec
formatter/three.min.js                   1.00    234.0±1.53ms     2.5 MB/sec    1.13    263.7±1.69ms     2.2 MB/sec
formatter/typescript.js                  1.00  1512.6±14.24ms     6.3 MB/sec    1.08  1627.0±10.80ms     5.8 MB/sec
formatter/vue.global.prod.js             1.00     80.6±1.02ms  1531.0 KB/sec    1.10     89.0±0.96ms  1385.6 KB/sec

}

/// Formats the node's fields.
fn fmt_fields(&self, item: &N, f: &mut JsFormatter) -> FormatResult<()>;

/// Returns whether the node requires parens.
fn needs_parentheses(&self, item: &N) -> bool {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, the default implementation for needs_parentheses would be:

fn needs_parentheses(&self, item: &N) -> bool where N: NeedsParentheses {
  item.needs_parentheses()
}

fn needs_parentheses(&self, item: &N) -> bool where N: !NeedsParentheses {
  false
}

However, negative type constraints do not exist. If anyone has an idea how we can get a similar behaviour please let me know (CC: @leops, @xunilrj)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to move the methods of NeedsParentheses into FormatNodeRule with a default implementation (and resolve_parent to an AstNode extension trait I guess) ? This could remove the need for an additional level of indirection

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I prefer having an explicit NeedsParentheses trait is that it feels more ergonomic to call node.needs_parentheses rather than FormatAnyExpressionRule::needs_parentheses(&node).

I also do like the fact that resolve_parent isn't exposed for nodes where it isn't necessary to go through that extra loop. For example, you won't need to call resolve_parent on a statement.

@ematipico ematipico added this to the 0.9.0 milestone Aug 14, 2022
@ematipico ematipico added A-Formatter Area: formatter L-JavaScript Langauge: JavaScript labels Aug 14, 2022
@MichaReiser
Copy link
Contributor Author

Is there a specific reason why the "needs parentheses" information is tracked inside the formatter phase?

Have you considered some other solutions, such as a visitor approach, or even computing this information during the parsing phase and storing it inside our green tree?

The main reasons for computing whether an expression needs to be parenthesized during the formatting phase are:

  • The formatter adds more parentheses than necessary to not change precedence. For example, Rome adds parentheses around shift operations to ease readability.
  • The information which nodes need parentheses would become outdated during tree mutations or, it would be necessary to update this information when mutating the tree. Computing this information isn't cheap because it involves many tree navigation (multiple allocations).
  • The knowledge of which expressions need parentheses aren't needed for all use cases of the parser and computing the information during the parse phase would add an unnecessary cost (performance penalty). Besides, an over-approximation
  • of which nodes require parentheses is stored in the green tree by having ParenthesizedExpression. Agreed, the tree may have more parenthesized expressions than necessary, but all the required once are in place.
  • Our CST doesn't support storing additional data, e.g. by using annotations on the tree.

I would like to know what are the implications of your decisions in the future developments

The main implication is that it increases complexity as well as the risk for subtle bugs when we forget to call resolve_expression or resolve_expression_parent.

There are a few alternatives to this design, some of which I want to explore after I rewrote binary expression formatting:

  • Pre-compute needs_parentheses: An option is to pre-compute needs_parentheses when extracting the suppression comments and storing the information on the FormatContext. This would remove the overhead for resolving the parent for every node. I'm doubtful that this will yield a big performance and it increases complexity a bit, because knowing if something needs parentheses then needs a FormatContext. We can try this out as a dedicated PR if we're interested.
  • Remove parenthesized expressions: The idea here is to pre-process the tree to remove all parenthesized expressions (except the ones that have closure type cast comments, or have any skipped token trivia attached to the left or right parens). This will have its own performance overhead but removes the need for calling resolve_expression and resolve_expression_parent which I see as the main drawback of the current solution. This is something I want to explore after I finished my work on the binary expression formatting. The main challenge is that I'll need to figure out a way to map positions from the transformed tree back to the original tree (source maps?) to be able to print a node in verbatim node if it's formatting (or any child) fails because of a missing child node.

@MichaReiser
Copy link
Contributor Author

!bench_formatter

@MichaReiser MichaReiser temporarily deployed to aws August 15, 2022 08:37 Inactive
@MichaReiser MichaReiser marked this pull request as ready for review August 15, 2022 08:37
@MichaReiser MichaReiser requested a review from a team August 15, 2022 08:37
@MichaReiser MichaReiser temporarily deployed to aws August 16, 2022 13:34 Inactive
@MichaReiser MichaReiser temporarily deployed to aws August 16, 2022 16:11 Inactive
crates/rome_js_formatter/src/lib.rs Outdated Show resolved Hide resolved
crates/rome_js_formatter/src/parentheses.rs Show resolved Hide resolved
}

/// Formats the node's fields.
fn fmt_fields(&self, item: &N, f: &mut JsFormatter) -> FormatResult<()>;

/// Returns whether the node requires parens.
fn needs_parentheses(&self, item: &N) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to move the methods of NeedsParentheses into FormatNodeRule with a default implementation (and resolve_parent to an AstNode extension trait I guess) ? This could remove the need for an additional level of indirection

@ematipico
Copy link
Contributor

If the diverging is intentional and we won't go back to that, could you please add a task/reminder for ourselves to update the website with this decision? We have to update the section where we track the difference section

@MichaReiser
Copy link
Contributor Author

I plan to merge this PR by tomorrow except there are some major conceptual concerns around it.

@MichaReiser MichaReiser temporarily deployed to aws August 17, 2022 16:04 Inactive
Copy link
Contributor

@ematipico ematipico left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could have a module comment in the parenthesis file, explaining the new concepts introduced, mainly:

  • what “resolving” means, since it has a meaning based on the context and it’s a concept that is now widespread and we will use somewhere else
  • what “needs parenthesis” means, mostly to not assume that people already knows

@MichaReiser MichaReiser temporarily deployed to aws August 17, 2022 16:27 Inactive
@MichaReiser MichaReiser temporarily deployed to aws August 18, 2022 07:11 Inactive
@MichaReiser MichaReiser temporarily deployed to aws August 18, 2022 07:14 Inactive
@MichaReiser MichaReiser temporarily deployed to aws August 18, 2022 07:21 Inactive
@MichaReiser MichaReiser merged commit 14eab14 into main Aug 18, 2022
@MichaReiser MichaReiser deleted the feat/needs-parentheses branch August 18, 2022 07:30
IWANABETHATGUY pushed a commit to IWANABETHATGUY/tools that referenced this pull request Aug 22, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Formatter Area: formatter L-JavaScript Langauge: JavaScript
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants