Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spell check large file blocks all other extensions #1

Closed
rebornix opened this issue Apr 5, 2017 · 4 comments
Closed

Spell check large file blocks all other extensions #1

rebornix opened this issue Apr 5, 2017 · 4 comments

Comments

@rebornix
Copy link

rebornix commented Apr 5, 2017

Hey Bartosz, our PM pointed me to an awesome spell checking extension on the marketplace and the amazing part is we already knew each other :) I played with your extension, it's pretty fast and the results are really reasonable (aka I really like it , even though I need to run node-gyp rebuild myself 😃, tada).

We talked about async interface in microsoft/vscode#20266 (comment) and at that time I think it's not a big deal as we can use setTimeout to make spellchecker's api async. But this doesn't mitigate the performance issue.

  • In your current implementation, we call spellcheck.checkspelling directly, which is a synchronous function. It means this onDidTextDocumentChange handler is synchronous, when nodejs is running this function, no other events will be handled in the event loop.
  • Then let's say we put it into setTimeout or whatever delaying function, this onDidTextDocumentChange handler is finished immediately, which is good. However when nodejs picks up the spellcheck.checkspelling, it still blocks all other events.

So this time let me describe it more precisely, if the spellchecking api is running in the main thread (the event loop), the main thread is blocked. For small documents, it's okay as spellchecking is pretty fast. But for medium to large files (eg, https://github.com/Microsoft/vscode-docs/blob/master/release-notes/v1_8.md ), it takes several seconds to finish the spellchecking, then none other extension is reactive.

A proper reproduce step:

To mitigate this issue and the node-gyp rebuild problem, I tried two ways

You may want to go with the first option or similar solutions to solve the node-gyp problem and even better not block the extension host :)

Lastly I'm thinking about how to ignore ranges in a document, your approach is really inspiring.

@bartosz-antosik
Copy link
Owner

Hello Peng!

Thanks for kind words about the extension. It is something between a Lab Rat and Proof of Concept. Also it reflects my vision of spell checker - light, unobtrusive with as little setup as possible (well, I don't mean node-gyp rebuild of course).

So I have to agree and disagree. First I agree with the obvious - spelling takes about O(n) time. So it's only a matter of finding document large enough to prove it. Delay that you experience does not happen in onDidTextDocumentChange event. It happens in onDidChangeVisibleTextEditors. It is one of two places (along with onDidSaveTextDocument) where the WHOLE document is spelled. I found no other place to do initial spelling because there is some problem with onDidOpenTextDocument (document parameter of the event contains strange data like languageID == 'plaintext' no mater what type of document it refers to, text size is always 1 character etc. I should maybe report it BTW.) But the whole thing leads to the other thing that I have to disagree about.

I am pretty confident that time spent in onDidTextDocumentChange does not block the event loop. It does very little there - spelling at most the number of lines that have been changed (+ 1) by the chain of edits passed as parameter.

But I think have to disagree with the solutions proposed. What the whole situation calls for is:

onIdle event!

Having spelling asynchronous in either way you propose could lead to a racing condition - the document could be changed by user/extension while it is being spell checked. It's impossible to handle as the onDidTextDocumentChange would be happening asynchronously to the spelling process.

If there would be an onIdle event it would all magically fit at once!

Spelling inside onIdle would be limited to, say, a word, n words or a line of text. Spelling would start on it own once any other events are done processing and it would be continued in subsequent onIdle events as long as there is what to spell. All done in amortized time, consuming bit more CPU when there is what to spell, nothing later. That would not block messaging loop at all. On the other hand - if there would happen an edit in the mean time then the onDidTextDocumentChange would do exactly what it does now - see if the edits are inside spelled area or ahead of it, decide whether to adjust diagnostics if the edit inserts/deletes something in-between already spelled area etc.

What do you think?

Side Notes:

  1. Running spellchecking in a command line tool still leads to the situation where binaries for at least three platforms have to be maintained. Still far more to deal with than simply rebuild node-spellchecker with VSCode.

  2. What do you mean by "Besides I put it inside VS Code." regarding asynchronous node spellchecker?

  3. You can turn off filling in suggestions at spelling time which speeds spelling WHOLE document a lot (although less than what I have observed earlier so it may depend on what is the average number of suggestions per word, it may be useful to calculate & display it) see:

suggestionsInHints: false

Spell check start of "c:\Users\ban.necne\Documents\tex\v1_8.md" [markdown].
Spell check completed in 1.291s, found 273 errors.

suggestionsInHints: true

Spell check start of "c:\Users\ban.necne\Documents\tex\v1_8.md" [markdown].
Spell check completed in 3.696s, found 273 errors.

  1. Event informing about cursor jumps would be useful (maybe a parameter in onDidTextDocumentChange?). I really don't like what most spell checkers do when they start underlining as a mistake every word that is longer than 3 characters no matter if it's just the beginning of the word being typed correctly. So this idea of spelling the "trail" of edits instead of spelling everything that is being typed. It complicates things but seems nicer. But the problem are cursor jumps. If you jump away last word typed is not spelled until you edit something somewhere. Would be more convenient to spell this "trail" on cursor jump because sometimes you jump around the recently edited place.

  2. I was also thinking about setInterval as something which could amortize the time, but it would clash with the edits same way as running spelling asynchronously would.

  3. I know nearly nothing about TypeScript/JavaScript & node.js and it's asynchronous model but I (think) I know asynchronous programming quite well in other environments so please do not laugh at me if I am very off.
    Hello Peng!

Thanks for kind words about the extension. It is something between a Lab Rat and Proof of Concept. Also it reflects my vision of spell checker - light, unobtrusive with as little setup as possible (well, I don't mean node-gyp rebuild of course).

So I have to agree and disagree. First I agree with the obvious - spelling takes about O(n) time. So it's only a matter of finding document large enough. Delay that you experience does not happen in onDidTextDocumentChange event. It happens in onDidChangeVisibleTextEditors. It is one of two places (along with onDidSaveTextDocument) where the WHOLE document is spelled. I found no other place to do initial spelling because there is some problem with onDidOpenTextDocument (document parameter of the event contains strange data like languageID == 'plaintext' no mater what type of document it refers to, text size is always 1 character etc. I should maybe report it BTW.) But the whole thing leads to the other thing that I have to disagree about.

I am pretty confident that time spent in onDidTextDocumentChange does not block the event loop. It does very little there - spelling at most the number of lines that have been changed (+ 1) by the chain of edits passed as parameter.

But I think I have to disagree with the solutions proposed. What the whole situation calls for is:

onIdle event!

Having spelling asynchronous in either way you propose could lead to a racing condition - the document could be changed by user/extension while it is being spell checked. It's impossible to handle as the onDidTextDocumentChange would be happening asynchronously to the spelling process.

If there would be an onIdle event it would all magically fit at once!

Spelling inside onIdle would be limited to, say, a word, n words or a line of text. Or, best way, until there is new message present (method to test?) Spelling would start on it own once any other events are done processing and it would be continued in subsequent onIdle events as long as there is what to spell. All done in amortized time, consuming bit more CPU when there is what to spell, nothing later. That would not block messaging loop at all. On the other hand - if there would happen an edit in the mean time then the onDidTextDocumentChange would do exactly what it does now - see if the edits are inside spelled area or ahead of it, decide whether to adjust diagnostics if the edit inserts/deletes something in-between already spelled area etc.

What do you think?

Side Notes:

  1. Running spellchecking in a command line tool still leads to the situation where binaries for at least three platforms have to be maintained. Still far more to deal with than simply rebuild node-spellchecker with VSCode.

  2. What do you mean by "Besides I put it inside VS Code." regarding asynchronous node spellchecker?

  3. You can turn off filling in suggestions at spelling time which speeds spelling WHOLE document a lot (although less than what I have observed earlier so it may depend on what is the average number of suggestions per word, it may be useful to calculate & display it) see:

suggestionsInHints: false

Spell check start of "c:\Users\ban.necne\Documents\tex\v1_8.md" [markdown].
Spell check completed in 1.291s, found 273 errors.

suggestionsInHints: true

Spell check start of "c:\Users\ban.necne\Documents\tex\v1_8.md" [markdown].
Spell check completed in 3.696s, found 273 errors.

  1. Event informing about cursor jumps would be useful (maybe a parameter in onDidTextDocumentChange?). I really don't like what most spell checkers do when they start underlining as a mistake every word that is longer than 3 characters no matter if it's just the beginning of the word being typed correctly. So this idea of spelling the "trail" of edits instead of spelling everything that is being typed. It complicates things but seems nicer. But the problem are cursor jumps. If you jump away last word typed is not spelled until you edit something somewhere. Would be more convenient to spell this "trail" on cursor jump because sometimes you jump around the recently edited place.

  2. I was also thinking about setInterval(...) as something which could amortize the time, but it would clash with the edits same way as running spelling asynchronously, I suppose, would.

  3. I know nearly nothing about TypeScript/JavaScript & node.js and it's asynchronous model but I (think) I know asynchronous programming quite well in other environments so please do not laugh at me if I am very off.

@rebornix
Copy link
Author

rebornix commented Apr 6, 2017

Such a long reply, thank you thank you. Even though you mentioned agree and disagree but I found we were thinking about almost the same issues. Let's go through it one by one.

Perf
We did the same thing about onDidTextDocumentChange, only spell check changed lines incrementally. My test results are single digital ms for most proper lines. Sync method should be faster as there is no cost of creating threads. Yes I agree it doesn't block the event loop of extensin host. And since it's in extension host, we can say it's perfect.

Spell checking the whole document is where might block. 2-3 seconds on a single document is not a big deal sometimes but it becomes worse when users are viewing/debugging/searching in files.

onIdle
Sorry I don't any way of getting this event in nodejs. I don't know if it can solve the problem perfectly. Imagine you have a eslint or whatever linter that has a file watcher, I don't know if there is enough idle time.

But postponing spellchecking to idle phase is similar to multithread, it still has racing issue. Let's say we are running spellchecking when nodejs event loop thinks it's idle, and a content change event occurs. This content change event deletes several lines before the line we are spellchecking. It means after getting the spell checking result, we don't know the correct line number of spelling errors, because the content change event is still waiting in event loop, we actually don't know if it exists or not ---> we don't if the model is dirty.

racing
You already provided a solution to solve racing problem

see if the edits are inside spelled area or ahead of it, decide whether to adjust diagnostics if the edit inserts/deletes something in-between already spelled area etc

that's exactly what I did right now, it works reasonably. We can do the same thing for the time-costing spell checking event for the whole document

  1. run it in a separate process/thread/whatever, cache a reference to it.
  2. whenever there is content change, save the position diff
  3. when the long spellchecking process finishes, calculate the correct positions based on MisspelledRanges and position diffs we save on step 2.

Conclusion

I think we have a good agreement that we need to somehow make the first spell checking process fast, non-blocking and safe. I'm pretty close to solve the racing problem in a proper way. Once I finish that I'll share it with you and see if it's possible to adopt part of them in this extension. I'd like to find a balance between async/multithreads and sync.

@bartosz-antosik
Copy link
Owner

It's been changed to do spelling in Idle time. Should not block extension host's main thread in the way that @rebornix has described (vim mode + spelling large document).

@bartosz-antosik
Copy link
Owner

@rebornix In this very long answer (for which I am sorry, but it gathers few important things I was thinking about, so it is in parts for me) I have asked one question, which may have got unnoticed:

What do you mean by "Besides I put it inside VS Code." regarding asynchronous node spellchecker?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants