-
Notifications
You must be signed in to change notification settings - Fork 901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Markdown files with more than 500 lines become perceptibly slow #2916
Comments
You're editing source code in the markdown. We know that language injections are not implemented in the most efficient way (no incremental parsing). Can you identify whether the markdown parser or the language injection is the problem? |
@theHamsta, I created a new file with just 500 lines or so, and without any code blocks. Here's the result. So, I believe the markdown parser is likely the one causing the slowdown. markdown_slow_2.mp4 |
@medwatt the markdown parser can be used also from other editors do you have the time to check whether |
@theHamsta, this is the first time I am hearing of these. However, I installed the helix editor to test. I am having problems installing the markdown parser. According to the documentation, all I need to do is put the following in the languages.toml file in the config folder: [[grammar]]
name = "markdown"
source = { git = "https://github.com/ikatyang/tree-sitter-markdown" } Launching helix gives the following:
One thing I noticed though is the delay is even worse in helix. For example, holding down a key for a while prints out the characters after a much longer delay than neovim. With helix, however, there's no choppiness; it's smooth but takes longer. With neovim, the delay is shorter, but very choppy. |
That's the wrong parser, though: we use https://github.com/MDeiml/tree-sitter-markdown |
The question is how to get helix to install the parser? I have no idea how helix works. But as I said, the delay is much worse in helix with the same file already, so I don't think there's a point checking it further. |
@theHamsta, maybe you can shed some light into why parsing a markdown file is more intensive than Lua, for instance, given that Lua's grammar is more complex than markdown's. I also use treesitter for verilog, and there is a noticeable delay when opening a verilog file for the first time, even when the file has a few lines. |
The verilog parser is very complex, it takes long to generate and the resulting parser is enormous. I would not be surprised when it's slow to parse. E.g. Lua stays always below the 2ms threshold that I've set on the build I'm experimenting with right now. Markdown is very complex to parse because it has no strict grammar and requires an external parser to keep track of all the stack of all the nested pairs. |
I had a long time the plan to use tree-sitter's timeout feature for parsing to guarantee that we don't stay in the parse+query cycle to long (and maybe do off-thread background parsing with the intermediate result in case of timeout). With tree-sitter time out set I can type without lags (but of course I have not answered whether long parsing also implies longer querying afterwards |
If you want, you can experiment with https://github.com/theHamsta/neovim/tree/nvtx You can set the parsing timeout here: https://github.com/theHamsta/neovim/blob/7d313d9395befb743aae2309633b78e160db8c68/src/nvim/lua/treesitter.c#L337 It will stop parsing (and also highlighting) when parsing takes to long. You will loose highlights from time to time, but typing is fast 😄 . Also solves the problem people have when opening files that are multiple MB large Let's see why markdown is slower. |
@theHamsta, thanks for doing these tests. You mentioned Here's a screenshot of my some file being highlighted correctly. Here's the same section when I start scrolling. This doesn't always happen, so its not easy to reproduced. It happens from time to time and my current solution is to restart neovim. Can you say what might be causing this issue? |
This seems to be a different issue. I've also seen it when experimenting the timeout. Some extmarks got out of sync but is some thing Neovim does wrong, not tree-sitter taking a long time parsing. @medwatt by default tree-sitter has unlimited amount of time to finish its parsing, to enable a timeout you have to edit the source code of Neovim. I could become a feature in future to protect ourselves from slow parsers or very big files. A flamegraph of one session where I pressed the same key within two different link regions (it goes really slow at the second) It seems to spend some time in to reproduce add a few |
@theHamsta, I think, for a temporary solution, it would be a good idea to expose the option to set a custom timeout for slow parsers. |
This is a really weird bug. I don't think it's a problem with my parser, since it does not appear when just parsing the document as a whole. It appears to only happen with incremental parsing. Also I noticed that if I stop holding down d (letting my computer catch up) and then start again it does not slow down again. I also don't think it's a problem with tree-sitter, because then the surrounding document should not have any influence on parsing speed (only the stack at the current position). But if I delete all the other paragraphs I don't get any slowdown. It's also not a problem with language injections (if I remove the injection queries, I still get the same effect). Rather it's probably something to do with highlighting (after disabling tree-sitter highlighting the problem disappears). But weirdly, if I disable all markdown highlight queries the problem still appears. If I had to guess I would say it's a problem with neovim, but that's just a hunch. |
@MDeiml no, it is not Neovim or the highlighting. It is tree-sitter doing the parsing. It almost consumes all time with queries and injection negligible parser_parse is just invoking but sure Neovim could handle this in a better way |
I think how Atom is handling this is that it let's the parsing timeout while doing parsing in a background thread that can be canceled by the foreground thread as soon as the foreground thread wants to parse again. I think every call to |
Still, the problem does not appear when highlighting is disabled, and does appear when highlighting is enabled. (For both tests I left TSPlayground open to verify that the document does actually get parsed). Within Could this be the culprit? |
The
|
@theHamsta Could you maybe record a flamegraph of inserting some ds, letting neovim catch up with work, and then inserting some more? For me it doesn't hang the second time, so there should be some difference. (Sry for bothering, but you seem to have a nice profiling setup :)) |
My profiling setup is not so great at the moment Intel Vtune is crashing whenever it tries to finalize the result which would be the best to filter certain time periods out of perf traces (will try on a different machine). I couldn't see anything fundamentally different in the instances when it takes longer which for me seems to depend mostly on document position. |
@maxbrunsfield do you have any advice on how to debug this? We have the problem that for https://github.com/MDeiml/tree-sitter-markdown incremental parsing takes can take a long time 40ms-60ms see (cold start parsing takes 30ms) which causes a lag in the editor as keys can be fed at a faster rate. The edits are in each case single letters by just pressing one key in the README of out repo. Since the largest fraction of the time is spent in |
Maybe it would be good to reproduce this programmatically using the tree-sitter rust API: parsing the text once and then do edits to understand what's going on (profiling or with debugger attached) |
I came across this issue after asking a query on /r/neovim. It's the same issue described by @medwatt in the first comment. I've enabled
|
Remove the
Are you sure they're actually executed? They will show up even if they're skipped by |
That's disappointing. It reminds me of this post on undeadly.org about markdown. I'm not sure if neorg and its treesitter parser can handle large documents without introducing input latency in the terminal. If not, I'll probably switch to writing articles in HTML. Thanks! |
@ayushnix I'm sure the problems with markdown input latency can be solved by a time out for tree-sitter parsing. It was working smoothly when I added the time out (except that highlighting was lost sometimes due to the fact that there is not code), possibly switching to background parsing or to reusing the previous parsing result. We're talking about max 42ms during incremental parsing which is slow enough to build up a latency lag when you type multiple letters at once, but still manageable as an editor to provide the highlighting. In other words: it's to slow for "on every keystroke", but fast enough to catch up once it moved to background parse once it reached the timeout. The problem we're experiencing here is that after a fast input of 5 letters, we experience 5 times the parsing latency while with a timeout it would be possible to cancel the first 4 letters and finish at the last letter with a background thread. Usually, the 5 times incremental parsing should go really fast as the parser state should have changed much. But even when that does not work the editor should harness itself against excessive parsing times. There is not fundamental limitation why Markdown parser should be slow. It's just that nested pairs of delimiters are difficult to express with tree-sitter and almost always require an external parser that can count the nesting state. You can test whether https://github.com/ikatyang/tree-sitter-markdown has the same limitation. It's also possible that the parser of @MDeiml has some properties that make the incremental parsing logic fail to build efficiently on the previous result. |
@ayushnix can you provide some evidence that the injections have any effect at all? With neovim/neovim#18761 you can visualize what fraction of the latency is cause by markdown parsing and what by the injections. In the document I tested I was experiencing latency purely by the markdown parser with injection causing only a negligible fraction of the whole incremental parsing |
I'm actually experimenting at the moment on if I can get this faster. This would include optionally only parsing inline that are visible (parsing inlines only depends on all the inlines in the same block and not other blocks) and a few changes around paragraphs, which are kinda important and really slow at the moment. But if I can get something faster to work it's gonna take a while since it probably needs some features in upstream neovim. |
But I'm quite confident that I should be able to get this at least somewhat fast since parsing the block structure could probably be done pretty fast since it's well definer, it's mainly inlines like links and emphasis that make the parser slow. I should be able to split the two . |
I think I found the cause of this issue. I think tree-sitter has problems with reusing trees that are very "flat", i.e. trees where most nodes have a lot of siblings. This is not a problem with usual programming languages, since they're often structured hierarchically, but with markdown (as it is now) most nodes are children of the root node. When parsing a file after introducing some edits, all siblings of nodes that changed are also parsed again. I'm not sure why, maybe @maxbrunsfeld could give some insights? I was able to solve this by just introducing more hierarchy artificially. More concretely I added a I might try later to get a quick fix in this way for the current version, but as I said I'm currently working on rewriting the parser so I'd rather work on that. |
A quick fix would be very much appreciated, since the rewrite sounds like something we can't just drop in in place of the current one (needing substantial infrastructure work to support such "split parsers"). Of course, I understand that this is much less interesting work ;) |
A quickfix would probably to let nvim timeout long parsings. We will always face the situation that parsing is when the file is too big (at least for initial parsing). Although a change in Neovim might not be not that quick. |
We'll never know until someone puts a PR for it up for discussion... |
I tried to implement the fix on the main branch, but I didn't get the same speedup. Not sure why. |
well, I guess the how in the implementation is the thing that's taking some time... Maybe I'll find some time tomorrow for it. There are quite many possibilities to deal with this and neither me knows what is the best one until I tried them out. |
Nvm there was a hidden error and I was getting garbage data. |
tree-sitter/tree-sitter-haskell#41 (comment) It seems that my previous comment about hierarchical structure was the right hunch. Reducing conflicts should be the main priority for slow parsers, but "sectioning off" the conflicts seems to work as well. Unfortunately neither is possible for inline markdown elements like emphasis. |
I think it must be something more specific than that; otherwise it would reproduce in, for example, a large C file with hundreds of small functions, since those functions would all be sibling nodes. I'm curious what's going on, and I'll try to reproduce the slowness with the |
Ok, I can reproduce the problem from the command line. I believe the problem is a certain conflict in your grammar, between I determined this by creating a small markdown file, a b
c d
e f
g h
i j I then parsed this file from the command line with debug graphs enabled: tree-sitter parse test.md -D
This creates a long sequence of SVG graphs. In this graph, you can zoom in on a particular point, when the parser reaches the end of a paragraph, and see that the parse stack splits into two branches: Any node that is created in an ambiguous state like this is considered fragile - it cannot be reused during incremental parsing if any of its contents have changed. In this case, the ambiguity is still in effect while the To observe the performance impact of this ☝️ more directly, you can perform an edit and an incremental re-parse at the command line, inserting a character on line 4 (the third paragraph).
It re-parses correctly, but if you run with
@MDeiml Can you think of a way to not have this conflict with |
Thank you! I have a fix for this conflict in paragraphs where I parse ahead quite a bit to determine if a newline is a soft line break. This means that a lot paragraphs can now be reused. But a similar problem now appears with emphasis, which appear in a lot of paragraphs as top level inline nodes. I'm not sure it's possible to parse those without conflicts as that would require potentially infinite lookahead. But maybe it's possible to create a "fast path" for the most common use case of no nested inlines. |
I think it's probably ok for emphasis to have that conflict, since most (all?) top-level nodes in the document are not emphasis nodes. |
That's true, but pretty much every top level node has children that cannot be reused, which means that parsing is still slow in very very large documents, though I can get it to very acceptable speeds for e.g. the README for this repo. I have a question though, shouldn't it be possible to reuse fragile trees (whole trees not nodes) if all edits were outside their range set with I am currently working on a version of this parser where inline elements (emphasis) and block elements (paragraphs) are split into two grammars. This means that every inline range is parsed separately. I noticed that almost all of the inline ranges are not reused, which makes sense as most contain emphasis and are thus fragile. But all that needs to be done is to shift the node positions, so I'd be keen to just not reparse the unaffected trees. |
There's a lot of activity in this thread. @MDeiml are you able to provide a status update? Especially now neovim/neovim#22309 is merged. |
Sure! The split parser I mentioned is implemented and used in neovim for quite some while. This helped a lot in other editors, bur neovim had some problem with injected languages which was improved in neovim/neovim#22309 |
Note that we also recently bumped the tree-sitter version in our official builds to include the vastly improved error handling; I suspect that this also helps quite a bit in this regard. So a test on the latest nightly with one of the bad example files would be very much appreciated. |
Describe the bug
I noticed recently when editing a large markdown file that has more 1000 lines that the delay between keystrokes while typing becomes perceptible. This only happens when treesitter for markdown is enabled.
Here's a video demonstrating the difference in the typing experience between an empty file and a large file.
markdown_slow.mp4
To Reproduce
Expected behavior
It is expected that there should be no lag when typing irrespective of the number of lines in the file.
Output of
:checkhealth nvim-treesitter
Output of
nvim --version
Additional context
No response
The text was updated successfully, but these errors were encountered: