Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Important performance regression since commit 7fd45d6 on a very big file #1212

Closed
sroddy opened this issue Apr 28, 2023 · 3 comments · Fixed by #1217
Closed

Important performance regression since commit 7fd45d6 on a very big file #1212

sroddy opened this issue Apr 28, 2023 · 3 comments · Fixed by #1217

Comments

@sroddy
Copy link

sroddy commented Apr 28, 2023

I've discovered this issue when I tried to bump the version of dart_style from 2.2.5 to 2.3.0.

My build_runner command gets stuck (I've left it running for more than 10 minutes before killing it) when trying to format the output of a particularly big generated code file.
I've bisected the git log and figured out that the first commit in which the issue is present is after the merge of #1182 (commit 7fd45d6). Before that commit, the step takes 2/5 seconds at most.
I've put a breakpoint and managed to dump the input string of the DartFormatter.format function inside the build step that causes it to hang.
I would prefer to avoid attaching that string here, as it contains some proprietary code, but I can send it via email if it can help you fix the issue.

Thanks

@sroddy
Copy link
Author

sroddy commented Apr 29, 2023

I've managed to reproduce the regression on the attached snippet of code, this is the output (elapsed time) on my machine before and after:

regression.dart.zip

@munificent
Copy link
Member

Thanks, I'll take a look.

@munificent
Copy link
Member

OK, I know what's going on. The change to allow inline case bodies (c9e5191) incidentally fixed a long-standing bug where metadata splitting behavior across parameters wasn't correct. It was always the intent that if a metadata annotation on a parameter splits, then all metadata annotations across all parameters in the parameter list should split.

Unfortunately, because of the bug, the line splitter didn't enforce that rule. The above commit fixes the behavior. But in order to fix that, all parameters in a parameter list must go through the line splitting algorithm process instead of line splitting each parameter independently. Line splitting is the slowest part of formatting and can get very slow if you throw more code at it at once.

Your program happens to contain a particularly long parameter list (85 parameters), so line splitting takes too long and it gives up.

I'll spend some time and see if I can come up with a better way to handle this.

munificent added a commit that referenced this issue May 8, 2023
Also, don't indent parameters with metadata annotations.

Before this change, the formatter indents any parameter that contains
metadata annotations that end up splitting:

    function(
        @required
            int n,
        @deprecated('Some long deprecation string.')
            String s) {
      ...
    }

It also treats all parameters in a parameter list as a unit when
deciding how to split their metadata. If any annotation in any parameter
splits, then they all do. That's why there's a split after `@required`
in the above example.

The intent of both of these is so that a parameter with unsplit metadata
doesn't get confused for an annotation on the next parameter:

    function(
        @FIRST('some string') int n,
        @second('another string forces a split')
        String s) {
      ...
    }

Note here that there are two parameters, `n`, and `s`, and not just a
single parameter `s` with two annotations.

Unfortunately line splitting all of the parameters as a unit is very bad
for performance when there is a large number of parameters (#1212).
Also, it's not very helpful. In practice, parameter metadata is rare and
most parameters that have any annotations only have one.

And the indentation is just strange looking and inconsistent with how
annotations are formatted elsewhere. It also means that parameters with
split metadata don't align with parameters that don't have metadata.

This change determines whether each parameter's annotations should split
independently from the other parameters' and removes that indentation.
The above example becomes:

    function(
        @required int n,
        @deprecated('Some long deprecation string.')
        String s) {
      ...
    }

This improves performance on large parameter lists and I think looks
better on real-world examples. I ran it on a large corpus (2,112,352
lines in 6,911 files) and I think the impact is small enough to not go
through the full change process:

293 insertions + 443 deletions = 736 changes
1 changed line for every 2870.04 lines of code
0.3484 changed lines for every 1,000 lines of code

The full diff is: https://gist.github.com/munificent/1dc7361438934a3587f6149049682e29

Fix #1212.
munificent added a commit that referenced this issue May 9, 2023
Also, don't indent parameters with metadata annotations.

Before this change, the formatter indents any parameter that contains
metadata annotations that end up splitting:

    function(
        @required
            int n,
        @deprecated('Some long deprecation string.')
            String s) {
      ...
    }

It also treats all parameters in a parameter list as a unit when
deciding how to split their metadata. If any annotation in any parameter
splits, then they all do. That's why there's a split after `@required`
in the above example.

The intent of both of these is so that a parameter with unsplit metadata
doesn't get confused for an annotation on the next parameter:

    function(
        @FIRST('some string') int n,
        @second('another string forces a split')
        String s) {
      ...
    }

Note here that there are two parameters, `n`, and `s`, and not just a
single parameter `s` with two annotations.

Unfortunately line splitting all of the parameters as a unit is very bad
for performance when there is a large number of parameters (#1212).
Also, it's not very helpful. In practice, parameter metadata is rare and
most parameters that have any annotations only have one.

And the indentation is just strange looking and inconsistent with how
annotations are formatted elsewhere. It also means that parameters with
split metadata don't align with parameters that don't have metadata.

This change determines whether each parameter's annotations should split
independently from the other parameters' and removes that indentation.
The above example becomes:

    function(
        @required int n,
        @deprecated('Some long deprecation string.')
        String s) {
      ...
    }

This improves performance on large parameter lists and I think looks
better on real-world examples. I ran it on a large corpus (2,112,352
lines in 6,911 files) and I think the impact is small enough to not go
through the full change process:

293 insertions + 443 deletions = 736 changes
1 changed line for every 2870.04 lines of code
0.3484 changed lines for every 1,000 lines of code

The full diff is: https://gist.github.com/munificent/1dc7361438934a3587f6149049682e29

Fix #1212.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants