Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate making JSDoc parsing lazier and more optimal #52959

Closed
DanielRosenwasser opened this issue Feb 25, 2023 · 8 comments
Closed

Investigate making JSDoc parsing lazier and more optimal #52959

DanielRosenwasser opened this issue Feb 25, 2023 · 8 comments
Assignees
Labels
Domain: Performance Reports of unusually slow behavior Experiment A fork with an experimental idea which might not make it into master

Comments

@DanielRosenwasser
Copy link
Member

Not parsing JSDoc seems to unlock significant parsing wins.

Surprisingly, we parse JSDoc unconditionally, even in .ts files for batch compilation.

So there are a few things I want to investigate here:

  1. Can we optimize JSDoc parsing itself? Why is JSDoc parsing so slow in the first place? Even if we decided to always parse JSDoc comments, it should be fast because full time-to-interactivity in the language service is bottle-necked on parse time.

    My naive theory is that JSDoc scanning today is "chatty", returning uninteresting tokens, making every request for the next token unnecessarily slow. But of course, this is without benchmarking. I would love to understand what exactly is the issue here.

  2. If it's not strictly required, can we make JSDoc parse in a lazier or optional fashion? It seems unavoidable in JavaScript files; but maybe in TypeScript files, JSDoc is only attempted to be attached in cases where we detect that trivia intersects with tag names we're interested in (e.g. @deprecated, @see, @link, maybe more). Maybe we only do this for the language service.

@DanielRosenwasser DanielRosenwasser added Experiment A fork with an experimental idea which might not make it into master Domain: Performance Reports of unusually slow behavior labels Feb 25, 2023
@DanielRosenwasser DanielRosenwasser added this to the TypeScript 5.1.0 milestone Feb 25, 2023
@DanielRosenwasser
Copy link
Member Author

I tried a fun benchmark here. I made a TypeScript file that contains 3 functions (foo, bar, and baz), each prefixed by JSDoc comments containing the contents of War and Peace.

/** CHAPTER I */

/** "Well, Prince, so Genoa and Lucca are now just family estates of the */
/** Buonapartes. But I warn you, if you don't tell me that this means war, */
/** if you still try to defend the infamies and horrors perpetrated by that */
/** Antichrist--I really believe he is Antichrist--I will have nothing more */
/** to do with you and you are no longer my friend, no longer my 'faithful */
/** slave,' as you call yourself! But how do you do? I see I have frightened */
/** you--sit down and tell me all the news." */
// ...
function foo() {
    console.log("lol");
}

I then replaced all the characters in those comments with spaces.

/**           */

/**                                                                      */
/**                                                                        */
/**                                                                         */
/**                                                                         */
/**                                                                        */
/**                                                                          */
/**                                          */
// ...
function foo() {
    console.log("lol");
}

And because one could make the argument that whitespace doesn't need to be retained so this is an unfair benchmark, I made a file where all the contents are simply replaced with As.

/** AAAAAAAAA */

/** AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA */
/** AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA */
/** AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA */
/** AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA */
/** AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA */
/** AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA */
/** AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA */
// ...
function foo() {
    console.log("lol");
}

Of course, that many JSDoc comments is very uncommon. The first thing I actually tried was all the contents of War and Peace in a single comment above each function.

/**
 * CHAPTER I
 * 
 * "Well, Prince, so Genoa and Lucca are now just family estates of the
 * Buonapartes. But I warn you, if you don't tell me that this means war,
 * if you still try to defend the infamies and horrors perpetrated by that
 * Antichrist--I really believe he is Antichrist--I will have nothing more
 * to do with you and you are no longer my friend, no longer my 'faithful
 * slave,' as you call yourself! But how do you do? I see I have frightened
 * you--sit down and tell me all the news."
 *
/// ...
 */
function foo() {
    console.log("lol");
}

And finally, the AAAAAA version of that.

/**
 * AAAAAAAAA
 * 
 * AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
 * AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
 * AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
 * AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
 * AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
 * AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
 * AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
/// ...
 */
function foo() {
    console.log("lol");
}

So here's the output of running these, though I'll limit the summary to the parse time

$ node ../built/local/tsc.js -p tsconfig-split.json
Parse time:                 0.95s

$ node ../built/local/tsc.js -p tsconfig-split-space.json
Parse time:                 0.47s

$ node ../built/local/tsc.js -p tsconfig-split-AAA.json
Parse time:                 0.53s

$ node ../built/local/tsc.js -p tsconfig-single.json
Parse time:                 0.99s

$ node ../built/local/tsc.js -p tsconfig-single-AAA.json
Parse time:                 0.32s
$ node ../built/local/tsc.js -p tsconfig-split.json
error TS2318: Cannot find global type 'Array'.
error TS2318: Cannot find global type 'Boolean'.
error TS2318: Cannot find global type 'Function'.
error TS2318: Cannot find global type 'IArguments'.
error TS2318: Cannot find global type 'Number'.
error TS2318: Cannot find global type 'Object'.
error TS2318: Cannot find global type 'RegExp'.
error TS2318: Cannot find global type 'String'.
/workspaces/TypeScript/benchy/war-and-peace-split.ts
Files:                          1
Lines of Library:               0
Lines of Definitions:           0
Lines of TypeScript:       191546
Lines of JavaScript:            0
Lines of JSON:                  0
Lines of Other:                 0
Identifiers:                    9
Symbols:                       10
Types:                         66
Instantiations:                 0
Memory used:               88895K
Assignability cache size:       0
Identity cache size:            0
Subtype cache size:             0
Strict subtype cache size:      0
I/O Read time:              0.04s
Parse time:                 0.95s
Program time:               0.99s
Bind time:                  0.02s
Check time:                 0.02s
printTime time:             0.00s
Emit time:                  0.00s
Total time:                 1.04s

$ node ../built/local/tsc.js -p tsconfig-split-space.json
error TS2318: Cannot find global type 'Array'.
error TS2318: Cannot find global type 'Boolean'.
error TS2318: Cannot find global type 'Function'.
error TS2318: Cannot find global type 'IArguments'.
error TS2318: Cannot find global type 'Number'.
error TS2318: Cannot find global type 'Object'.
error TS2318: Cannot find global type 'RegExp'.
error TS2318: Cannot find global type 'String'.
/workspaces/TypeScript/benchy/war-and-peace-split-space.ts
Files:                          1
Lines of Library:               0
Lines of Definitions:           0
Lines of TypeScript:       191548
Lines of JavaScript:            0
Lines of JSON:                  0
Lines of Other:                 0
Identifiers:                    9
Symbols:                       10
Types:                         66
Instantiations:                 0
Memory used:               71089K
Assignability cache size:       0
Identity cache size:            0
Subtype cache size:             0
Strict subtype cache size:      0
I/O Read time:              0.01s
Parse time:                 0.47s
Program time:               0.48s
Bind time:                  0.02s
Check time:                 0.02s
printTime time:             0.00s
Emit time:                  0.00s
Total time:                 0.52s

$ node ../built/local/tsc.js -p tsconfig-split-AAA.json
error TS2318: Cannot find global type 'Array'.
error TS2318: Cannot find global type 'Boolean'.
error TS2318: Cannot find global type 'Function'.
error TS2318: Cannot find global type 'IArguments'.
error TS2318: Cannot find global type 'Number'.
error TS2318: Cannot find global type 'Object'.
error TS2318: Cannot find global type 'RegExp'.
error TS2318: Cannot find global type 'String'.
/workspaces/TypeScript/benchy/war-and-peace-split-AAA.ts
Files:                          1
Lines of Library:               0
Lines of Definitions:           0
Lines of TypeScript:       191546
Lines of JavaScript:            0
Lines of JSON:                  0
Lines of Other:                 0
Identifiers:                    9
Symbols:                       10
Types:                         66
Instantiations:                 0
Memory used:               85167K
Assignability cache size:       0
Identity cache size:            0
Subtype cache size:             0
Strict subtype cache size:      0
I/O Read time:              0.01s
Parse time:                 0.53s
Program time:               0.54s
Bind time:                  0.02s
Check time:                 0.02s
printTime time:             0.00s
Emit time:                  0.00s
Total time:                 0.58s

$ node ../built/local/tsc.js -p tsconfig-single.json
error TS2318: Cannot find global type 'Array'.
error TS2318: Cannot find global type 'Boolean'.
error TS2318: Cannot find global type 'Function'.
error TS2318: Cannot find global type 'IArguments'.
error TS2318: Cannot find global type 'Number'.
error TS2318: Cannot find global type 'Object'.
error TS2318: Cannot find global type 'RegExp'.
error TS2318: Cannot find global type 'String'.
/workspaces/TypeScript/benchy/war-and-peace-single.ts
Files:                          1
Lines of Library:               0
Lines of Definitions:           0
Lines of TypeScript:       191550
Lines of JavaScript:            0
Lines of JSON:                  0
Lines of Other:                 0
Identifiers:                    6
Symbols:                        9
Types:                         66
Instantiations:                 0
Memory used:               57249K
Assignability cache size:       0
Identity cache size:            0
Subtype cache size:             0
Strict subtype cache size:      0
I/O Read time:              0.04s
Parse time:                 0.99s
Program time:               1.03s
Bind time:                  0.00s
Check time:                 0.01s
printTime time:             0.00s
Emit time:                  0.00s
Total time:                 1.04s

$ node ../built/local/tsc.js -p tsconfig-single-AAA.json
error TS2318: Cannot find global type 'Array'.
error TS2318: Cannot find global type 'Boolean'.
error TS2318: Cannot find global type 'Function'.
error TS2318: Cannot find global type 'IArguments'.
error TS2318: Cannot find global type 'Number'.
error TS2318: Cannot find global type 'Object'.
error TS2318: Cannot find global type 'RegExp'.
error TS2318: Cannot find global type 'String'.
/workspaces/TypeScript/benchy/war-and-peace-single-AAA.ts
Files:                          1
Lines of Library:               0
Lines of Definitions:           0
Lines of TypeScript:       191554
Lines of JavaScript:            0
Lines of JSON:                  0
Lines of Other:                 0
Identifiers:                    9
Symbols:                       10
Types:                         66
Instantiations:                 0
Memory used:               37265K
Assignability cache size:       0
Identity cache size:            0
Subtype cache size:             0
Strict subtype cache size:      0
I/O Read time:              0.01s
Parse time:                 0.32s
Program time:               0.33s
Bind time:                  0.00s
Check time:                 0.01s
printTime time:             0.00s
Emit time:                  0.00s
Total time:                 0.34s

So this is what I mean by the parser being too "chatty" with the scanner. There's probably half a second of time being spent ping-ponging between them for every identifier.

One idea I would suggest is trying to proactively grab chunks of text from the comment whenever "normalization" doesn't need to occur. This might occur when whitespace between words is only a single space character, possibly no backticks, etc. - basically, whenever the parser handles these conditions, maybe the scanner would instead. This is the same thing we do on string literals, where we can eagerly grab chunks of data. For the common case, my hope is that this would approximate the results of the "AAAAAA" example.

@bradzacher
Copy link
Contributor

bradzacher commented Feb 27, 2023

From what I've seen this is a one source of wasted time for @typescript-eslint parsing. It's rare that anyone actually consumes JSDoc info in lint rules so having it computed up-front is just wasted effort.

So I would be solidly in favour of lazy-by-default or lazy-behind-option.

@DanielRosenwasser
Copy link
Member Author

Yeah, I think the right things to do are

  1. Make it efficient.
  2. Make it opt-out if necessary.

@sandersn
Copy link
Member

I ran into the need for this from the other direction: trying to improve just the jsdoc parser. I noticed that the jsdoc scanner provides way more granular tokens than the jsdoc parser needs, even though it's (basically) the only caller of the jsdoc scanner.

I put up a draft PR of my work so far at #53007

@bradzacher
Copy link
Contributor

Adding a CPU profile onto this just to show a more real-world example of the impact to @typescript-eslint

This profile was taken from a lint run on the @typescript-eslint codebase

$ git clone git@github.com:typescript-eslint/typescript-eslint.git
$ cd typescript-eslint
$ yarn
$ node --cpu-prof ./node_modules/.bin/eslint .

TS_ESLINT_PARSE.cpuprofile.zip (extract and open this in something like https://speedscope.app)

We can see that across the entire run the parseJSDoc function consumes a total of ~1.63s across the run, which equates to ~3.8% of the total lint run!! This is huge, considering that we also lint with types in the repo, which ofc adds quite a bit of overhead. For a non-type-aware lint run you'd find the %age would be much, much higher.

@sandersn
Copy link
Member

sandersn commented Mar 7, 2023

Here's a summary of five things I tried:

The last two are surprising to me, since there's a constant overhead per line for yielding and slicing the whitespace/asterisk strings. It shouldn't be as large as the overhead of yielding individual tokens for comment text though. The slight increases in the last two PRs makes me think that the first big-token PR may also increase time in the parser, but offset by the larger gains in the scanner.

@jakebailey
Copy link
Member

#52921 is now merged, so tsc will now be skipping JSDoc parsing when able. #55739 is a PR which opens the door to that being used in the API with a bit more configurability.

@sandersn
Copy link
Member

My latest profiles of xstate (the latest commit in their next branch) and angular 2 (the ancient version we perf test on) don't show much time in JSDoc, even with JSDoc parsing forced on. I think we can close this bug, at least for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Domain: Performance Reports of unusually slow behavior Experiment A fork with an experimental idea which might not make it into master
Projects
None yet
Development

No branches or pull requests

4 participants