-
Notifications
You must be signed in to change notification settings - Fork 12.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scan bigger/fewer jsdoc tokens #53081
Scan bigger/fewer jsdoc tokens #53081
Conversation
@typescript-bot perf test this |
Heya @sandersn, I've started to run the perf test suite on this PR at f8f9721. You can monitor the build here. Update: The results are in! |
@sandersn Here they are:
CompilerComparison Report - main..53081
System
Hosts
Scenarios
TSServerComparison Report - main..53081
System
Hosts
Scenarios
StartupComparison Report - main..53081
System
Hosts
Scenarios
Developer Information: |
The PR appears to have some errors so I don't know if the above is believable, unfortunately. |
I can't reproduce those errors locally, although I haven't tried very hard yet, just |
@@ -4407,6 +4407,7 @@ declare namespace ts { | |||
reScanInvalidIdentifier(): SyntaxKind; | |||
scanJsxToken(): JsxTokenSyntaxKind; | |||
scanJsDocToken(): JSDocSyntaxKind; | |||
scanBigJsDocToken(): JSDocSyntaxKind; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make these internal.
src/compiler/scanner.ts
Outdated
// TODO: Make sure this is right (and in the right place) | ||
tokenValue = text.substring(tokenPos, pos); | ||
} | ||
return token = SyntaxKind.Identifier; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that this is currently a proof-of-concept, but you should create an internal SyntaxKind.JSDocTextSpan
that is used only in the parser
Outside backticks, @ only starts a new tag after whitespace and before non-whitespace. The scanner now checks for this instead of the parser.
@typescript-bot perf test this |
Heya @sandersn, I've started to run the perf test suite on this PR at 9f4daf5. You can monitor the build here. Update: The results are in! |
The checks are now done in the scanner during scanning of jsdoc comment text. A couple of baselines change because the top-level @-detection for new tags works the same way between top-level comment text and tag comment text. This makes the parser detect more tags than before, specifically in the case where non-whitespace precedes the first tag on the first line. I think this is a fix since our parser tries to find tags anywhere on the line, and previously didn't for the *first line only*.
When whitespace crosses an earlier indent margin, that is, when a line is indented further than a previous one, we have to slice that whitespace, but we also know that subsequent text will be comment text, so can switch to SavingComments.
@sandersn Here they are:
CompilerComparison Report - main..53081
System
Hosts
Scenarios
TSServerComparison Report - main..53081
System
Hosts
Scenarios
StartupComparison Report - main..53081
System
Hosts
Scenarios
Developer Information: |
@typescript-bot perf test this faster |
Heya @jakebailey, I've started to run the abridged perf test suite on this PR at 2ab197b. You can monitor the build here. Update: The results are in! |
I ran the current state on my perf machine with DetailsLoading benchmark and comparing to baseline.
System
Hosts
Scenarios
|
src/compiler/types.ts
Outdated
@@ -109,6 +109,11 @@ export const enum SyntaxKind { | |||
BacktickToken, | |||
/** Only the JSDoc scanner produces HashToken. The normal scanner produces PrivateIdentifier. */ | |||
HashToken, | |||
/** | |||
* Only the special JSDoc comment text scanner produces JSDocCommentTextTokes. One of these tokens spans all text after a tag comment's start and before the next @ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just now noticing the parallel between this comment and the previous ones. Given people can call our scanner themselves, maybe I was wrong and this should be public so that no undocumented kinds are returned from token
?
@sandersn Here they are:
CompilerComparison Report - main..53081
System
Hosts
Scenarios
TSServerComparison Report - main..53081
System
Hosts
Scenarios
StartupComparison Report - main..53081
System
Hosts
Scenarios
Developer Information: |
Would you be able to investigate what's getting slower on every benchmark that's not getting a speed boost? |
Specifically parsing in Compiler-Unions gets slightly (but significantly!) slower. |
} | ||
indent += whitespace.length; | ||
Debug.assert(state !== JSDocState.SavingComments && state !== JSDocState.SavingBackticks, "whitespace shouldn't come from the scanner while saving comment text"); | ||
const whitespace = scanner.getTokenText(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This slices whitespace, then you slice out of the whitespace. Swap this to grab the beginning/end of the whitespace, calculate the length, and slice right out of the original sourceText
only if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea. I noted it and will try it in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done: #53121
Except for the textToToken map exemption
A profile of compiler-unions shows that parseTrailingComments is slower, and that scanJSDocCommentTextToken adds to the time previously taken by scanJSDocToken. That doesn't make sense, since those calls should be replacing calls to scanJSDocToken. I'm going to (1) re-run perf measurements on the most recent commit (maybe it's a fluke!) (2) step through comment parsing in compiler-unions to see if I can figure out if there's some problem there I'm missing. @typescript-bot perf test this |
Heya @sandersn, I've started to run the perf test suite on this PR at d7c2377. You can monitor the build here. Update: The results are in! |
My suspicion is that we're changing object allocation so much that GC is happening at a different time and causing other steps to take longer. But, I have no evidence for this besides |
Notably, with --predictable, the latest commit on my machine does not show a significant difference in total time for Compiler or Compiler-Unions. I stepped through a few example parses in compiler-unions and didn't see anything suspicious either. |
@sandersn Here they are:
CompilerComparison Report - main..53081
System
Hosts
Scenarios
TSServerComparison Report - main..53081
System
Hosts
Scenarios
StartupComparison Report - main..53081
System
Hosts
Scenarios
Developer Information: |
After the latest commit, compiler-unions parsing is not significantly different on node 18, a tiny bit slower on 16 and slower on 14. But node 14 also transfers half of xstate's parser speedup to the binder. I think we're seeing noise. At worst the code is better on most codebases and slightly worse on a few. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM at this point, assuming we're still happy with the performance characteristics.
src/compiler/parser.ts
Outdated
loop: while (true) { | ||
switch (tok) { | ||
case SyntaxKind.JSDocCommentTextToken: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a stupid nit, but I was tricked into thinking this token could be returned from token()
before any other token because it comes first in the list. Maybe I would have not misunderstood had the case not been added at the top, but, I don't think it actually matters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a better place for it? I put it at the top because I thought it would be the most common and--who knows--maybe switch performance is order dependent. Except that with the improved scanner, it's not really more common than any other token.
Probably down with default
is the best place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I probably would have stuck it at the bottom, but, I don't really mind either way. Just a silly brain thing from me trying to understand the state machine.
@typescript-bot perf test this |
Heya @sandersn, I've started to run the perf test suite on this PR at c91de88. You can monitor the build here. Update: The results are in! |
@sandersn Here they are:
CompilerComparison Report - main..53081
System
Hosts
Scenarios
TSServerComparison Report - main..53081
System
Hosts
Scenarios
StartupComparison Report - main..53081
System
Hosts
Scenarios
Developer Information: |
Works on #52959, more detail later