-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Very slow *editing* of large files when tree-sitter is used #3072
Comments
Author of the markdown grammar here. It seems that at the moment language injection cannot be parsed incrementally. The markdown grammar uses injection a lot so it becomes quite apparent for it. This is what I get from line 846 here: helix/helix-core/src/syntax.rs Lines 841 to 848 in 8681fb6
Notice how a layer can only be reused if ranges did not change. But this is often the case when editing a file. The test here could probably be laxer.
I didn't look to deep, but (from my knowledge of treesitter) any layer should be fine here to "reuse" as long as none is reused twice. So something akin to
should work. |
We actually follow tree-sitter-cli/atom logic to update all the markers according to user edits beforehand: https://github.com/helix-editor/helix/blob/8681fb6d9ee06610fac2baafb4ca34f93f9c7f4d/helix-core/src/syntax.rs#L644-L726= So the injection layers are tracked reliably. We also seem to only use HTML injections & code blocks: https://github.com/helix-editor/helix/blob/master/runtime/queries/markdown/injections.scm |
Ah ok, I guess you are still using the old version of my parser then (the |
Ah, good point! I'll see if I can update us to the new branch 👍 |
On an 8.4MiB markdown file (I keep daily notes in a single big document), helix 22.05 takes > 10s to open it and is used 1GiB of memory after doing so. I'm assuming this is tree-sitter related and the same issue described here, but let me know if you think not. |
The following script generates a ~5MiB file that takes >5 seconds to open with Helix. After opening, memory usage (reported via ps_mem) is ~660MiB. #!/bin/sh
for i in $(seq 10000); do
cat <<EOF >> big_markdown.md
# Title
## Subtitle
Some notes about things.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis
nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore
eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt
in culpa qui officia deserunt mollit anim id est laborum
* A list
* With a sub-item
* Item 2
* Item 3
* Item 4
* Item 5
* Item 5.1
EOF
done |
@asb if you disable grammar for markdown does that happen? |
@kchibisov I'm not sure what option to change to disable the grammar (I was trying out helix for the first time). Certainly, if |
I think the simplest way to fix this is to only run the grammar for injected languages if they are visible (i.e. run it lazily). Markdown is split into two grammars: block and inline. The block-level grammar is quite fast and produces the ranges on which the inline-level grammar is injected. So the computational cost mostly grows with the amount of inline content parsed. I'd guess that the memory cost is also from the parsed inline-level trees, so that problem would also be solved in the same way, but I'd have to check that. |
I'm not sure how to best implements something like this tough. Currently the parsing is handled by the |
Having trouble when open rust.org.gz 3000 line |
I think that this could build upon the work in #2857 to keep track of the nodes and lazily highlight injections. |
Just wanted to mention #4146 here. Maybe that fixed some problems, but probably not all of them. |
That indeed improved the situation and the editing of markdown is at least possibly with tree-sitter. Large files still blow up helix leading to seconds in input lag. Not sure where it should be tracked. |
With The machine I am working on is a MacBook Pro M1 with 16GB of RAM. |
Wondering if #6218 fixed this |
It probably fixed or improved viewing the file, but insert mode will still be just as slow on large files. |
Yeah both of these are an issue. You're parsing a very large file but performance is also not optimized on long lines. Or at least it wasn't before but maybe @pascalkuthe's rewrite improved things slightly. |
Right now it shouldnt make much if a difference. Howver performance for huge lines will be improved in a future followup PR to #5420 where I will Implement line chunking. This was omitted from the MVP but the architecture lends itself well to this optimization (and was designed with it in mind, I even had a mostly finished implementation but the changes were accidently deleted by an overwrite from a second helix instance) |
While my original issue is resolved with the markdown. I'd think I'll leave a real world example of a big file that is well formatted and not some dump of the logging utility, since I've mentioned in original report a C++ file, but I haven't showed it. So that's the file I've referred to originally wrt my C++ quote. https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/X86/X86ISelLowering.cpp The |
I use Still happening with 23.03, for what it's worth. |
Summary
Certain markdown files are very slow to edit. In particular when they have a lot of lists.
For example a
CHANGELOG.md
file from the helix repo is a enough to trigger it.The problem is likely in grammar here, since neovim with tree-sitter works fine on it.
While this particular problem could be solved with slightly optimized, but less accurate grammar. However the issue manifests itself on any large file of any other type, just requires much larger file than ~600LoC(I've used 50K LoC C++ that I have around).
To avoid delay and input lag in those particular scenarios it would make sense to either run tree-sitter only on a part of the buffer (I'm not familiar how it works) or update it after the typing ended. Since right now you have a latency of like 2 seconds on 50K lines file of C++, which is impossible to work with.
Reproduction Steps
Try to edit CHANGELOG.md from the helix repo or any large file.
Helix log
No response
Platform
Linux
Terminal Emulator
alacritty
Helix Version
helix 22.05 (3cced1e)
The text was updated successfully, but these errors were encountered: