Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Markdown files with more than 500 lines become perceptibly slow #2916

Open
medwatt opened this issue May 7, 2022 · 56 comments
Open

Markdown files with more than 500 lines become perceptibly slow #2916

medwatt opened this issue May 7, 2022 · 56 comments
Labels
bug Something isn't working performance

Comments

@medwatt
Copy link

medwatt commented May 7, 2022

Describe the bug

I noticed recently when editing a large markdown file that has more 1000 lines that the delay between keystrokes while typing becomes perceptible. This only happens when treesitter for markdown is enabled.

Here's a video demonstrating the difference in the typing experience between an empty file and a large file.

markdown_slow.mp4

To Reproduce

  1. Create a large markdown file with many code blocks (how large the file is would probably depend on your computer)
  2. Start typing while inside a code block

Expected behavior

It is expected that there should be no lag when typing irrespective of the number of lines in the file.

Output of :checkhealth nvim-treesitter

nvim-treesitter: require("nvim-treesitter.health").check()
========================================================================
## Installation
  - OK: `tree-sitter` found 0.20.6 (parser generator, only needed for :TSInstallFromGrammar)
  - OK: `node` found v17.8.0 (only needed for :TSInstallFromGrammar)
  - OK: `git` executable found.
  - OK: `cc` executable found. Selected from { vim.NIL, "cc", "gcc", "clang", "cl", "zig" }
    Version: cc (GCC) 11.2.0
  - OK: Neovim was compiled with tree-sitter runtime ABI version 14 (required >=13). Parsers must be compatible with runtime ABI.

## Parser/Features H L F I J
  - hack           ✓ . . . . 
  - norg           . . . . . 
  - svelte         ✓ . ✓ ✓ ✓ 
  - astro          ✓ ✓ ✓ ✓ ✓ 
  - bash           ✓ ✓ ✓ . ✓ 
  - beancount      ✓ . ✓ . . 
  - lalrpop        ✓ ✓ . . . 
  - php            ✓ ✓ ✓ ✓ ✓ 
  - swift          ✓ ✓ . . . 
  - markdown       ✓ . ✓ . ✓ 
  - cooklang       ✓ . . . . 
  - tlaplus        ✓ ✓ ✓ . ✓ 
  - fish           ✓ ✓ ✓ ✓ ✓ 
  - toml           ✓ ✓ ✓ ✓ ✓ 
  - wgsl           ✓ . ✓ . . 
  - proto          ✓ . ✓ . . 
  - m68k           ✓ ✓ ✓ . ✓ 
  - pug            ✓ . . . ✓ 
  - elvish         ✓ . . . ✓ 
  - solidity       ✓ . . . . 
  - glsl           ✓ ✓ ✓ ✓ ✓ 
  - tsx            ✓ ✓ ✓ ✓ ✓ 
  - regex          ✓ . . . . 
  - turtle         ✓ ✓ ✓ ✓ ✓ 
  - go             ✓ ✓ ✓ ✓ ✓ 
  - html           ✓ ✓ ✓ ✓ ✓ 
  - yang           ✓ . ✓ ✓ . 
  - graphql        ✓ . . ✓ ✓ 
  - d              ✓ . ✓ ✓ ✓ 
  - vue            ✓ . ✓ ✓ ✓ 
  - r              ✓ ✓ . ✓ ✓ 
  - make           ✓ . . . ✓ 
  - http           ✓ . . . ✓ 
  - prisma         ✓ . . . . 
  - query          ✓ ✓ ✓ ✓ ✓ 
  - java           ✓ ✓ . ✓ ✓ 
  - cmake          ✓ . ✓ . . 
  - llvm           ✓ . . . . 
  - ruby           ✓ ✓ ✓ ✓ ✓ 
  - css            ✓ . ✓ ✓ ✓ 
  - perl           ✓ . ✓ . . 
  - pioasm         ✓ . . . ✓ 
  - json5          ✓ . . . ✓ 
  - julia          ✓ ✓ ✓ ✓ ✓ 
  - pascal         ✓ ✓ ✓ ✓ ✓ 
  - vim            ✓ ✓ . . ✓ 
  - json           ✓ ✓ ✓ ✓ . 
  - cpp            ✓ ✓ ✓ ✓ ✓ 
  - slint          ✓ . . ✓ . 
  - zig            ✓ . ✓ ✓ ✓ 
  - bibtex         ✓ . ✓ ✓ . 
  - gowork         ✓ . . . ✓ 
  - yaml           ✓ ✓ ✓ ✓ ✓ 
  - jsdoc          ✓ . . . . 
  - hcl            ✓ . ✓ ✓ ✓ 
  - heex           ✓ ✓ ✓ ✓ ✓ 
  - glimmer        ✓ . . . . 
  - sparql         ✓ ✓ ✓ ✓ ✓ 
  - dot            ✓ . . . ✓ 
  - latex          ✓ . ✓ . ✓ 
  - gdscript       ✓ ✓ . ✓ ✓ 
  - devicetree     ✓ ✓ ✓ ✓ ✓ 
  - lua            ✓ ✓ ✓ ✓ ✓ 
  - foam           ✓ ✓ ✓ ✓ ✓ 
  - godot_resource ✓ ✓ ✓ . . 
  - scheme         ✓ . ✓ . ✓ 
  - clojure        ✓ ✓ ✓ . ✓ 
  - gomod          ✓ . . . ✓ 
  - comment        ✓ . . . . 
  - elixir         ✓ ✓ ✓ ✓ ✓ 
  - phpdoc         ✓ . . . . 
  - erlang         . . . . . 
  - verilog        ✓ ✓ ✓ . ✓ 
  - rego           ✓ . . . ✓ 
  - dockerfile     ✓ . . . ✓ 
  - fortran        ✓ . ✓ ✓ . 
  - jsonc          ✓ ✓ ✓ ✓ ✓ 
  - haskell        ✓ . . . ✓ 
  - embedded_template✓ . . . ✓ 
  - javascript     ✓ ✓ ✓ ✓ ✓ 
  - fennel         ✓ ✓ . . ✓ 
  - gleam          ✓ ✓ ✓ ✓ ✓ 
  - commonlisp     ✓ ✓ ✓ . . 
  - kotlin         ✓ ✓ ✓ . ✓ 
  - rst            ✓ ✓ . . ✓ 
  - dart           ✓ ✓ . ✓ ✓ 
  - ocaml          ✓ ✓ ✓ . ✓ 
  - cuda           ✓ ✓ ✓ ✓ ✓ 
  - nix            ✓ ✓ ✓ . ✓ 
  - ninja          ✓ . ✓ ✓ . 
  - help           ✓ . . . . 
  - ocaml_interface✓ ✓ ✓ . ✓ 
  - rust           ✓ ✓ ✓ ✓ ✓ 
  - org            . . . . . 
  - ocamllex       ✓ . . . ✓ 
  - typescript     ✓ ✓ ✓ ✓ ✓ 
  - ql             ✓ ✓ . ✓ ✓ 
  - hjson          ✓ ✓ ✓ ✓ ✓ 
  - scala          ✓ . ✓ . ✓ 
  - fusion         ✓ ✓ ✓ ✓ . 
  - hocon          ✓ . . . ✓ 
  - scss           ✓ . . ✓ . 
  - todotxt        ✓ . . . . 
  - eex            ✓ . . . ✓ 
  - c              ✓ ✓ ✓ ✓ ✓ 
  - python         ✓ ✓ ✓ ✓ ✓ 
  - ledger         ✓ . ✓ ✓ ✓ 
  - vala           ✓ . . . . 
  - surface        ✓ . ✓ ✓ ✓ 
  - elm            ✓ . . . ✓ 
  - supercollider  ✓ ✓ ✓ ✓ ✓ 
  - rasi           ✓ ✓ ✓ ✓ . 
  - c_sharp        ✓ ✓ ✓ . ✓ 
  - teal           ✓ ✓ ✓ ✓ ✓ 

  Legend: H[ighlight], L[ocals], F[olds], I[ndents], In[j]ections
         +) multiple parsers found, only one will be used
         x) errors found in the query, try to run :TSUpdate {lang}

Output of nvim --version

NVIM v0.7.0
Build type: Release
LuaJIT 2.1.0-beta3
Compiled by builduser

Features: +acl +iconv +tui
See ":help feature-compile"

   system vimrc file: "$VIM/sysinit.vim"
  fall-back for $VIM: "/usr/share/nvim"

Run :checkhealth for more info

Additional context

No response

@medwatt medwatt added the bug Something isn't working label May 7, 2022
@theHamsta
Copy link
Member

You're editing source code in the markdown. We know that language injections are not implemented in the most efficient way (no incremental parsing). Can you identify whether the markdown parser or the language injection is the problem?

@medwatt
Copy link
Author

medwatt commented May 7, 2022

@theHamsta, I created a new file with just 500 lines or so, and without any code blocks. Here's the result. So, I believe the markdown parser is likely the one causing the slowdown.

markdown_slow_2.mp4

@medwatt medwatt changed the title Markdown files with more than 1000 lines become perceptibly slow Markdown files with more than 500 lines become perceptibly slow May 7, 2022
@theHamsta
Copy link
Member

@medwatt the markdown parser can be used also from other editors do you have the time to check whether helix or the tree-sitter web playground have the same problem. I will try to get the timing for parsing and querying from Neovim to see where we have the problem.

@medwatt
Copy link
Author

medwatt commented May 7, 2022

@theHamsta, this is the first time I am hearing of these. However, I installed the helix editor to test. I am having problems installing the markdown parser. According to the documentation, all I need to do is put the following in the languages.toml file in the config folder:

[[grammar]]
name = "markdown"
source = { git = "https://github.com/ikatyang/tree-sitter-markdown" }

Launching helix gives the following:

Bad language config: unknown field `grammar`, expected `language`
Press <ENTER> to continue with default language config

One thing I noticed though is the delay is even worse in helix. For example, holding down a key for a while prints out the characters after a much longer delay than neovim. With helix, however, there's no choppiness; it's smooth but takes longer. With neovim, the delay is shorter, but very choppy.

@clason
Copy link
Contributor

clason commented May 7, 2022

That's the wrong parser, though: we use https://github.com/MDeiml/tree-sitter-markdown

@medwatt
Copy link
Author

medwatt commented May 7, 2022

https://github.com/MDeiml/tree-sitter-markdown

The question is how to get helix to install the parser? I have no idea how helix works. But as I said, the delay is much worse in helix with the same file already, so I don't think there's a point checking it further.

@theHamsta
Copy link
Member

theHamsta commented May 7, 2022

This is what happens when you press down a key with high repetition rate in a markdown file
grafik

as you can see the compute time seems to be indeed be spent in parser_parse (big chunk of 35ms is the markdown file, smaller chunks are injected languages). I'll check whether setting a timeout for the parser changes anything or whether it is really the parsing (or rather the querying causing problems here)

@medwatt
Copy link
Author

medwatt commented May 7, 2022

@theHamsta, maybe you can shed some light into why parsing a markdown file is more intensive than Lua, for instance, given that Lua's grammar is more complex than markdown's.

I also use treesitter for verilog, and there is a noticeable delay when opening a verilog file for the first time, even when the file has a few lines.

@theHamsta
Copy link
Member

The verilog parser is very complex, it takes long to generate and the resulting parser is enormous. I would not be surprised when it's slow to parse. E.g. Lua stays always below the 2ms threshold that I've set on the build I'm experimenting with right now. Markdown is very complex to parse because it has no strict grammar and requires an external parser to keep track of all the stack of all the nested pairs.

@theHamsta
Copy link
Member

I had a long time the plan to use tree-sitter's timeout feature for parsing to guarantee that we don't stay in the parse+query cycle to long (and maybe do off-thread background parsing with the intermediate result in case of timeout). With tree-sitter time out set I can type without lags (but of course I have not answered whether long parsing also implies longer querying afterwards

@theHamsta
Copy link
Member

theHamsta commented May 7, 2022

If you want, you can experiment with https://github.com/theHamsta/neovim/tree/nvtx

You can set the parsing timeout here: https://github.com/theHamsta/neovim/blob/7d313d9395befb743aae2309633b78e160db8c68/src/nvim/lua/treesitter.c#L337

It will stop parsing (and also highlighting) when parsing takes to long. You will loose highlights from time to time, but typing is fast 😄 . Also solves the problem people have when opening files that are multiple MB large

Let's see why markdown is slower.

@theHamsta
Copy link
Member

theHamsta commented May 7, 2022

Some findings:

  • it depends a lot where in the document you insert text
  • it is indeed the C function parser_parse of markdown (which invokes the parsing) which takes the major part of the range (this example repeats over all the timeline)
    grafik
  • nvtx is really cool when you visualize the timings of injected languages (I haven't marked querying yet, but it should be the empty spots)
    grafik

@medwatt
Copy link
Author

medwatt commented May 7, 2022

@theHamsta, thanks for doing these tests. You mentioned parsing timeout, and from what I understood, it's something that is not active by default. I wonder then what causes treesitter to go mad sometimes when I start scrolling.

Here's a screenshot of my some file being highlighted correctly.

Here's the same section when I start scrolling.

This doesn't always happen, so its not easy to reproduced. It happens from time to time and my current solution is to restart neovim. Can you say what might be causing this issue?

@theHamsta
Copy link
Member

theHamsta commented May 7, 2022

This seems to be a different issue. I've also seen it when experimenting the timeout. Some extmarks got out of sync but is some thing Neovim does wrong, not tree-sitter taking a long time parsing.

@medwatt by default tree-sitter has unlimited amount of time to finish its parsing, to enable a timeout you have to edit the source code of Neovim. I could become a feature in future to protect ourselves from slow parsers or very big files.

A flamegraph of one session where I pressed the same key within two different link regions (it goes really slow at the second)
flamegraph

It seems to spend some time in ts_stack_pop_count (is the stack very deep for that language?)

to reproduce add a few ds to the first paragraph in the README within the link
grafik
@MDeiml any ideas?

@medwatt
Copy link
Author

medwatt commented May 7, 2022

@theHamsta, I think, for a temporary solution, it would be a good idea to expose the option to set a custom timeout for slow parsers.

@MDeiml
Copy link
Contributor

MDeiml commented May 7, 2022

This is a really weird bug. I don't think it's a problem with my parser, since it does not appear when just parsing the document as a whole. It appears to only happen with incremental parsing. Also I noticed that if I stop holding down d (letting my computer catch up) and then start again it does not slow down again.

I also don't think it's a problem with tree-sitter, because then the surrounding document should not have any influence on parsing speed (only the stack at the current position). But if I delete all the other paragraphs I don't get any slowdown.

It's also not a problem with language injections (if I remove the injection queries, I still get the same effect).

Rather it's probably something to do with highlighting (after disabling tree-sitter highlighting the problem disappears).

But weirdly, if I disable all markdown highlight queries the problem still appears.

If I had to guess I would say it's a problem with neovim, but that's just a hunch.

@theHamsta
Copy link
Member

theHamsta commented May 7, 2022

@MDeiml no, it is not Neovim or the highlighting. It is tree-sitter doing the parsing. It almost consumes all time with queries and injection negligible
grafik

parser_parse is just invoking ts_parse the small ranges are Scanner::scan (so I suppose when it's doing ts_parser_parse). I suppose when Scanner::scan stops it is doing ts_tree_get_changed_ranges (will verify that in minute)
grafik
It is not your external parser (it is only active 6% of the traced ranges)

grafik

but sure Neovim could handle this in a better way

Now with ts_parser_parse traced
grafik

@theHamsta
Copy link
Member

I think how Atom is handling this is that it let's the parsing timeout while doing parsing in a background thread that can be canceled by the foreground thread as soon as the foreground thread wants to parse again. I think every call to ts_parser_parse does progress even when it times out.

@MDeiml
Copy link
Contributor

MDeiml commented May 7, 2022

Still, the problem does not appear when highlighting is disabled, and does appear when highlighting is enabled. (For both tests I left TSPlayground open to verify that the document does actually get parsed).

Within ts_parse neovim passes a callback for reading new data:
https://github.com/neovim/neovim/blob/9005ffbe7757eca8ad809c81db76aec930db8e68/src/nvim/lua/treesitter.c#L292-L323

Could this be the culprit?

@theHamsta
Copy link
Member

The input_cb only makes a small . Without highlighting the tree doesn't get updated on every key stroke. The playground wasn't working for a long time without the triggers by the highlighter. I think that we now at least parse the tree once.

Time (%)  Total Time (ns)  Instances    Avg (ns)      Med (ns)    Min (ns)    Max (ns)    StdDev (ns)    Style              Range           
 --------  ---------------  ---------  ------------  ------------  ---------  -----------  ------------  -------  ---------------------------
     32,0   18.276.135.814        292  62.589.506,0  65.104.053,0        917  249.261.151  19.474.199,0  PushPop  LanguageTree:parse markdown
     28,0   16.266.869.258      7.150   2.275.086,0      27.403,0      3.295   69.144.939  11.178.170,0  PushPop  parser_parse               
     25,0   14.375.366.952      7.150   2.010.540,0      25.540,0      2.410   62.835.704   9.868.962,0  PushPop  ts_parser_parse            
      4,0    2.699.118.363  7.714.968         349,0         331,0        121      364.863         516,0  PushPop  markdown scan              
      3,0    1.866.829.630      7.150     261.095,0         245,0        116    7.792.537   1.310.739,0  PushPop  ts_tree_get_changed_ranges 
      1,0    1.059.199.678        275   3.851.635,0   3.121.339,0  2.742.390   99.952.165   6.289.228,0  PushPop  LanguageTree:parse lua     
      1,0      867.959.021        275   3.156.214,0   3.002.615,0  2.636.921   30.037.728   1.651.085,0  PushPop  _get_injections markdown   
      0,0      404.102.225      1.448     279.076,0      50.607,0      6.101  182.794.010   4.880.470,0  PushPop  on_line                    
      0,0      323.440.617        275   1.176.147,0     579.766,0    404.118   67.359.675   4.486.126,0  PushPop  _get_injections lua        
      0,0      205.169.011    523.333         392,0         338,0        126      128.838         483,0  PushPop  input_cb                   
      0,0      189.857.909        275     690.392,0     323.011,0    257.144   79.581.512   4.807.582,0  PushPop  LanguageTree:parse vim     
      0,0      135.103.159        275     491.284,0     411.432,0    329.545    9.388.716     603.138,0  PushPop  LanguageTree:parse html    
      0,0      122.241.394        275     444.514,0     122.033,0     80.412   73.084.122   4.411.825,0  PushPop  _get_injections vim        
      0,0       60.690.523        275     220.692,0     160.789,0    110.730    6.923.854     417.566,0  PushPop  _get_injections html       
      0,0       58.474.838         13   4.498.064,0   2.416.024,0     21.448   15.370.281   5.726.295,0  PushPop  tslua_parse_query          

input_cb doesn't spend a lot of time, but I suppose that you think it causes the parsing to take longer than necessary based on the output it's producing. I saw that input_cb (and also read system calls get called more often in injections).

@MDeiml
Copy link
Contributor

MDeiml commented May 8, 2022

@theHamsta Could you maybe record a flamegraph of inserting some ds, letting neovim catch up with work, and then inserting some more? For me it doesn't hang the second time, so there should be some difference. (Sry for bothering, but you seem to have a nice profiling setup :))

@theHamsta
Copy link
Member

theHamsta commented May 8, 2022

My profiling setup is not so great at the moment Intel Vtune is crashing whenever it tries to finalize the result which would be the best to filter certain time periods out of perf traces (will try on a different machine). I couldn't see anything fundamentally different in the instances when it takes longer which for me seems to depend mostly on document position.

@theHamsta
Copy link
Member

theHamsta commented May 8, 2022

@maxbrunsfield do you have any advice on how to debug this? We have the problem that for https://github.com/MDeiml/tree-sitter-markdown incremental parsing takes can take a long time 40ms-60ms see (cold start parsing takes 30ms) which causes a lag in the editor as keys can be fed at a faster rate. The edits are in each case single letters by just pressing one key in the README of out repo. Since the largest fraction of the time is spent in ts_parser_parse (see #2916 (comment) for timeline) it should be also reproducible using Atom (when no timeout is set for parsing). At the moment Neovim parses fully synchronously without any timeout for the parser set, also every keystroke triggers a parsing event

@theHamsta
Copy link
Member

Maybe it would be good to reproduce this programmatically using the tree-sitter rust API: parsing the text once and then do edits to understand what's going on (profiling or with debugger attached)

@ayushnix
Copy link

I came across this issue after asking a query on /r/neovim.

It's the same issue described by @medwatt in the first comment. I've enabled filetype.lua using g.do_filetype_lua = 1 and disabled filetype.vim using g.did_load_filetypes = 0. I'm using the markdown treesitter parser by @MDeiml. Here's the --startuptime log file when a markdown file is opened

--startuptime log file

 times in msec
 clock   self+sourced   self:  sourced script
 clock   elapsed:              other lines

000.023 000.023: --- NVIM STARTING ---
000.553 000.530: locale set
001.155 000.602: inits 1
001.192 000.037: window checked
001.543 000.351: parsing arguments
005.880 004.337: expanding arguments
005.923 000.043: inits 2
006.889 000.966: init highlight
006.894 000.005: waiting for UI
009.039 002.145: done waiting for UI
009.093 000.054: init screen for UI
009.134 000.041: init default mappings
009.202 000.068: init default autocommands
012.028 000.224 000.224: sourcing /usr/share/nvim/runtime/ftplugin.vim
012.525 000.122 000.122: sourcing /usr/share/nvim/runtime/indent.vim
012.779 000.052 000.052: sourcing /usr/share/nvim/archlinux.vim
012.797 000.157 000.105: sourcing /etc/xdg/nvim/sysinit.vim
026.088 013.177 013.177: sourcing /home/user/.config/nvim/init.lua
026.125 003.244: sourcing vimrc file(s)
026.926 000.044 000.044: sourcing /home/user/.local/share/nvim/site/pack/packer/start/LuaSnip/ftdetect/snippets.vim
027.294 000.039 000.039: sourcing /usr/share/vim/vimfiles/ftdetect/PKGBUILD.vim
027.397 000.059 000.059: sourcing /usr/share/vim/vimfiles/ftdetect/meson.vim
027.489 000.048 000.048: sourcing /usr/share/vim/vimfiles/ftdetect/vagrantfile.vim
027.854 001.363 001.173: sourcing /usr/share/nvim/runtime/filetype.lua
027.968 000.048 000.048: sourcing /usr/share/nvim/runtime/filetype.vim
028.539 000.220 000.220: sourcing /usr/share/nvim/runtime/syntax/synload.vim
028.826 000.756 000.537: sourcing /usr/share/nvim/runtime/syntax/syntax.vim
030.913 000.047 000.047: sourcing /usr/share/nvim/runtime/plugin/gzip.vim
030.999 000.037 000.037: sourcing /usr/share/nvim/runtime/plugin/health.vim
031.138 000.091 000.091: sourcing /usr/share/nvim/runtime/plugin/man.vim
031.229 000.039 000.039: sourcing /usr/share/nvim/runtime/plugin/matchit.vim
031.600 000.325 000.325: sourcing /usr/share/nvim/runtime/plugin/matchparen.vim
031.701 000.049 000.049: sourcing /usr/share/nvim/runtime/plugin/netrwPlugin.vim
032.075 000.037 000.037: sourcing /home/user/.local/share/nvim/rplugin.vim
032.094 000.349 000.312: sourcing /usr/share/nvim/runtime/plugin/rplugin.vim
032.293 000.152 000.152: sourcing /usr/share/nvim/runtime/plugin/shada.vim
032.387 000.037 000.037: sourcing /usr/share/nvim/runtime/plugin/spellfile.vim
032.483 000.047 000.047: sourcing /usr/share/nvim/runtime/plugin/tarPlugin.vim
032.568 000.037 000.037: sourcing /usr/share/nvim/runtime/plugin/tohtml.vim
032.667 000.051 000.051: sourcing /usr/share/nvim/runtime/plugin/tutor.vim
032.766 000.048 000.048: sourcing /usr/share/nvim/runtime/plugin/zipPlugin.vim
033.070 000.050 000.050: sourcing /usr/share/vim/vimfiles/plugin/fzf.vim
033.270 000.147 000.147: sourcing /usr/share/vim/vimfiles/plugin/redact_pass.vim
063.140 010.130 010.130: sourcing /home/user/.local/share/nvim/site/pack/packer/start/onedark.nvim/colors/onedark.lua
108.306 074.803 064.673: sourcing /home/user/.config/nvim/plugin/packer_compiled.lua
109.021 004.420: loading rtp plugins
109.678 000.245 000.245: sourcing /home/user/.local/share/nvim/site/pack/packer/start/LuaSnip/plugin/luasnip.vim
110.296 000.385 000.385: sourcing /home/user/.local/share/nvim/site/pack/packer/start/indent-blankline.nvim/plugin/indent_blankline.vim
111.682 001.048 001.048: sourcing /home/user/.local/share/nvim/site/pack/packer/start/nvim-treesitter/plugin/nvim-treesitter.lua
112.090 000.168 000.168: sourcing /home/user/.local/share/nvim/site/pack/packer/start/vim-cool/plugin/cool.vim
112.269 001.401: loading packages
112.641 000.231 000.231: sourcing /home/user/.local/share/nvim/site/pack/packer/start/Comment.nvim/after/plugin/Comment.lua
112.650 000.151: loading after plugins
112.663 000.012: inits 3
117.340 004.678: reading ShaDa
124.668 000.452 000.452: sourcing /usr/share/nvim/runtime/autoload/htmlcomplete.vim
124.819 000.754 000.302: sourcing /usr/share/nvim/runtime/ftplugin/html.vim
125.160 001.425 000.671: sourcing /usr/share/nvim/runtime/ftplugin/markdown.vim
127.838 000.333 000.333: sourcing /usr/share/nvim/runtime/syntax/javascript.vim
130.206 002.177 002.177: sourcing /usr/share/nvim/runtime/syntax/vb.vim
136.619 006.283 006.283: sourcing /usr/share/nvim/runtime/syntax/css.vim
137.933 011.319 002.527: sourcing /usr/share/nvim/runtime/syntax/html.vim
138.344 011.844 000.524: sourcing /usr/share/nvim/runtime/syntax/markdown.vim
204.324 073.714: opening buffers
205.968 001.644: BufEnter autocommands
205.977 000.009: editing files in windows
206.806 000.829: VimEnter autocommands
206.814 000.008: UIEnter autocommands
207.221 000.287 000.287: sourcing /usr/share/nvim/runtime/autoload/provider/clipboard.vim
207.231 000.131: before starting main loop
271.764 064.532: first screen update
271.772 000.008: --- NVIM STARTED ---

Whenever I edit a markdown file with more than 300 or 500 lines with some code blocks, the input latency increases dramatically. When it's more than 1000 lines, I have to wait for almost a second for a keypress to show up on my screen. If I delete characters, the cursor disappears and text is deleted with a delay of almost a second.

I'm not sure how to disable syntax highlighting for fenced code blocks when using the markdown treesitter parser or if it'll help. If I disable the markdown treesitter parser, there's a significant improvement in input latency.

I've noticed from the startuptime logs that vimscript runtime syntax files are sourced for code blocks, including markdown.vim, even though I've installed treesitter parsers for all the languages mentioned in the log and I've also disabled vim regex syntax highlighting in my neovim config.

@clason
Copy link
Contributor

clason commented Jun 17, 2022

I'm not sure how to disable syntax highlighting for fenced code blocks when using the markdown treesitter parser or if it'll help. If I disable the markdown treesitter parser, there's a significant improvement in input latency.

Remove the injections.scm from your runtime path.

I've noticed from the startuptime logs that vimscript runtime syntax files are sourced for code blocks, including markdown.vim, even though I've installed treesitter parsers for all the languages mentioned in the log and I've also disabled vim regex syntax highlighting in my neovim config.

Are you sure they're actually executed? They will show up even if they're skipped by finishing early (which is the usual mechanism for Vim to "skip" files).

@ayushnix
Copy link

That's disappointing. It reminds me of this post on undeadly.org about markdown.

I'm not sure if neorg and its treesitter parser can handle large documents without introducing input latency in the terminal. If not, I'll probably switch to writing articles in HTML.

Thanks!

@theHamsta
Copy link
Member

@ayushnix I'm sure the problems with markdown input latency can be solved by a time out for tree-sitter parsing. It was working smoothly when I added the time out (except that highlighting was lost sometimes due to the fact that there is not code), possibly switching to background parsing or to reusing the previous parsing result. We're talking about max 42ms during incremental parsing which is slow enough to build up a latency lag when you type multiple letters at once, but still manageable as an editor to provide the highlighting. In other words: it's to slow for "on every keystroke", but fast enough to catch up once it moved to background parse once it reached the timeout. The problem we're experiencing here is that after a fast input of 5 letters, we experience 5 times the parsing latency while with a timeout it would be possible to cancel the first 4 letters and finish at the last letter with a background thread. Usually, the 5 times incremental parsing should go really fast as the parser state should have changed much. But even when that does not work the editor should harness itself against excessive parsing times.

There is not fundamental limitation why Markdown parser should be slow. It's just that nested pairs of delimiters are difficult to express with tree-sitter and almost always require an external parser that can count the nesting state. You can test whether https://github.com/ikatyang/tree-sitter-markdown has the same limitation. It's also possible that the parser of @MDeiml has some properties that make the incremental parsing logic fail to build efficiently on the previous result.

@theHamsta
Copy link
Member

@ayushnix can you provide some evidence that the injections have any effect at all? With neovim/neovim#18761 you can visualize what fraction of the latency is cause by markdown parsing and what by the injections. In the document I tested I was experiencing latency purely by the markdown parser with injection causing only a negligible fraction of the whole incremental parsing

@MDeiml
Copy link
Contributor

MDeiml commented Jun 17, 2022

I'm actually experimenting at the moment on if I can get this faster. This would include optionally only parsing inline that are visible (parsing inlines only depends on all the inlines in the same block and not other blocks) and a few changes around paragraphs, which are kinda important and really slow at the moment. But if I can get something faster to work it's gonna take a while since it probably needs some features in upstream neovim.

@MDeiml
Copy link
Contributor

MDeiml commented Jun 17, 2022

But I'm quite confident that I should be able to get this at least somewhat fast since parsing the block structure could probably be done pretty fast since it's well definer, it's mainly inlines like links and emphasis that make the parser slow. I should be able to split the two .

@MDeiml
Copy link
Contributor

MDeiml commented Jun 18, 2022

I think I found the cause of this issue. I think tree-sitter has problems with reusing trees that are very "flat", i.e. trees where most nodes have a lot of siblings. This is not a problem with usual programming languages, since they're often structured hierarchically, but with markdown (as it is now) most nodes are children of the root node.

When parsing a file after introducing some edits, all siblings of nodes that changed are also parsed again. I'm not sure why, maybe @maxbrunsfeld could give some insights?

I was able to solve this by just introducing more hierarchy artificially. More concretely I added a section node, which starts with a heading and stretches until the next heading. With this I can get syntax highlighting in a ~3000 line file without any noticeable delay.

I might try later to get a quick fix in this way for the current version, but as I said I'm currently working on rewriting the parser so I'd rather work on that.

@clason
Copy link
Contributor

clason commented Jun 18, 2022

A quick fix would be very much appreciated, since the rewrite sounds like something we can't just drop in in place of the current one (needing substantial infrastructure work to support such "split parsers").

Of course, I understand that this is much less interesting work ;)

@theHamsta
Copy link
Member

A quickfix would probably to let nvim timeout long parsings. We will always face the situation that parsing is when the file is too big (at least for initial parsing). Although a change in Neovim might not be not that quick.

@clason
Copy link
Contributor

clason commented Jun 18, 2022

We'll never know until someone puts a PR for it up for discussion...

@MDeiml
Copy link
Contributor

MDeiml commented Jun 18, 2022

I tried to implement the fix on the main branch, but I didn't get the same speedup. Not sure why.

@theHamsta
Copy link
Member

We'll never know until someone puts a PR for it up for discussion...

well, I guess the how in the implementation is the thing that's taking some time... Maybe I'll find some time tomorrow for it. There are quite many possibilities to deal with this and neither me knows what is the best one until I tried them out.

@MDeiml
Copy link
Contributor

MDeiml commented Jun 21, 2022

I noticed something else while writing rust bindings for my parser. If I use a single tree-sitter parser object and ts_parser_set_language then parsing again after edits seems to happen almost instantly. If I use one parser for each language then parsing after edits takes equally as long as the first parse.

I don't know if this is specific to my use case, but maybe it would make sense to investigate something similar for neovim, as it seems it also uses on parser per language.

Nvm there was a hidden error and I was getting garbage data.

@MDeiml
Copy link
Contributor

MDeiml commented Jun 21, 2022

tree-sitter/tree-sitter-haskell#41 (comment)

It seems that my previous comment about hierarchical structure was the right hunch. Reducing conflicts should be the main priority for slow parsers, but "sectioning off" the conflicts seems to work as well. Unfortunately neither is possible for inline markdown elements like emphasis.

@maxbrunsfeld
Copy link

I think I found the cause of this issue. I think tree-sitter has problems with reusing trees that are very "flat", i.e. trees where most nodes have a lot of siblings. This is not a problem with usual programming languages, since they're often structured hierarchically, but with markdown (as it is now) most nodes are children of the root node.

I think it must be something more specific than that; otherwise it would reproduce in, for example, a large C file with hundreds of small functions, since those functions would all be sibling nodes.

I'm curious what's going on, and I'll try to reproduce the slowness with the tree-sitter CLI, using the parse --edit command.

@maxbrunsfeld
Copy link

maxbrunsfeld commented Jun 21, 2022

Ok, I can reproduce the problem from the command line. I believe the problem is a certain conflict in your grammar, between _soft_line_break and _paragraph_end_newline. It causes every paragraph to be considered "fragile", and not re-usable.

I determined this by creating a small markdown file, test.md with five two-word paragraphs:

a b

c d

e f

g h

i j

I then parsed this file from the command line with debug graphs enabled:

tree-sitter parse test.md -D
(document [0, 0] - [10, 0]
  (paragraph [0, 0] - [1, 0])
  (paragraph [2, 0] - [3, 0])
  (paragraph [4, 0] - [5, 0])
  (paragraph [6, 0] - [7, 0])
  (paragraph [8, 0] - [9, 0]))

This creates a long sequence of SVG graphs. In this graph, you can zoom in on a particular point, when the parser reaches the end of a paragraph, and see that the parse stack splits into two branches:

graph

Screen Shot 2022-06-21 at 1 13 26 PM

Any node that is created in an ambiguous state like this is considered fragile - it cannot be reused during incremental parsing if any of its contents have changed. In this case, the ambiguity is still in effect while the paragraph and block nodes are created.

To observe the performance impact of this ☝️ more directly, you can perform an edit and an incremental re-parse at the command line, inserting a character on line 4 (the third paragraph).

tree-sitter parse test.md --edit '4,1 0 1'

It re-parses correctly, but if you run with -d (for terminal logging) or -D (to generate another SVG log), you can see that the parser decides not to reuse any block/paragraph nodes.

...
cant_reuse_node_is_fragile tree:_block
cant_reuse_node_is_fragile tree:paragraph
...

@MDeiml Can you think of a way to not have this conflict with _paragraph_end_newline? Can you tell the difference between a paragraph ending and a "soft" line break by the number of newlines?

@MDeiml
Copy link
Contributor

MDeiml commented Jun 21, 2022

Thank you! I have a fix for this conflict in paragraphs where I parse ahead quite a bit to determine if a newline is a soft line break. This means that a lot paragraphs can now be reused.

But a similar problem now appears with emphasis, which appear in a lot of paragraphs as top level inline nodes. I'm not sure it's possible to parse those without conflicts as that would require potentially infinite lookahead. But maybe it's possible to create a "fast path" for the most common use case of no nested inlines.

@maxbrunsfeld
Copy link

I think it's probably ok for emphasis to have that conflict, since most (all?) top-level nodes in the document are not emphasis nodes.

@MDeiml
Copy link
Contributor

MDeiml commented Jun 22, 2022

That's true, but pretty much every top level node has children that cannot be reused, which means that parsing is still slow in very very large documents, though I can get it to very acceptable speeds for e.g. the README for this repo.

I have a question though, shouldn't it be possible to reuse fragile trees (whole trees not nodes) if all edits were outside their range set with ts_parser_set_included_range? I have to admit I don't really understand this concept of fragility so I might be wrong, but even with conflicts parsing should be deterministic.

I am currently working on a version of this parser where inline elements (emphasis) and block elements (paragraphs) are split into two grammars. This means that every inline range is parsed separately. I noticed that almost all of the inline ranges are not reused, which makes sense as most contain emphasis and are thus fragile. But all that needs to be done is to shift the node positions, so I'd be keen to just not reparse the unaffected trees.

@lewis6991
Copy link
Member

There's a lot of activity in this thread.

@MDeiml are you able to provide a status update? Especially now neovim/neovim#22309 is merged.

@MDeiml
Copy link
Contributor

MDeiml commented Feb 24, 2023

Sure!

The split parser I mentioned is implemented and used in neovim for quite some while. This helped a lot in other editors, bur neovim had some problem with injected languages which was improved in neovim/neovim#22309
so I imagine that this helped quite a bit. Since then I improved the "block" parser out of the two to almost not use "conflicts". Conflicts are still what makes the "inline" parser quite slow.

@clason
Copy link
Contributor

clason commented Feb 24, 2023

Note that we also recently bumped the tree-sitter version in our official builds to include the vastly improved error handling; I suspect that this also helps quite a bit in this regard.

So a test on the latest nightly with one of the bad example files would be very much appreciated.

@Feel-ix-343

This comment was marked as off-topic.

@clason

This comment was marked as off-topic.

@Feel-ix-343

This comment was marked as off-topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working performance
Projects
None yet
Development

No branches or pull requests

8 participants