Markdown files with more than 500 lines become perceptibly slow #2916

medwatt · 2022-05-07T12:39:47Z

Describe the bug

I noticed recently when editing a large markdown file that has more 1000 lines that the delay between keystrokes while typing becomes perceptible. This only happens when treesitter for markdown is enabled.

Here's a video demonstrating the difference in the typing experience between an empty file and a large file.

markdown_slow.mp4

To Reproduce

Create a large markdown file with many code blocks (how large the file is would probably depend on your computer)
Start typing while inside a code block

Expected behavior

It is expected that there should be no lag when typing irrespective of the number of lines in the file.

Output of `:checkhealth nvim-treesitter`

nvim-treesitter: require("nvim-treesitter.health").check()
========================================================================
## Installation
  - OK: `tree-sitter` found 0.20.6 (parser generator, only needed for :TSInstallFromGrammar)
  - OK: `node` found v17.8.0 (only needed for :TSInstallFromGrammar)
  - OK: `git` executable found.
  - OK: `cc` executable found. Selected from { vim.NIL, "cc", "gcc", "clang", "cl", "zig" }
    Version: cc (GCC) 11.2.0
  - OK: Neovim was compiled with tree-sitter runtime ABI version 14 (required >=13). Parsers must be compatible with runtime ABI.

## Parser/Features H L F I J
  - hack           ✓ . . . . 
  - norg           . . . . . 
  - svelte         ✓ . ✓ ✓ ✓ 
  - astro          ✓ ✓ ✓ ✓ ✓ 
  - bash           ✓ ✓ ✓ . ✓ 
  - beancount      ✓ . ✓ . . 
  - lalrpop        ✓ ✓ . . . 
  - php            ✓ ✓ ✓ ✓ ✓ 
  - swift          ✓ ✓ . . . 
  - markdown       ✓ . ✓ . ✓ 
  - cooklang       ✓ . . . . 
  - tlaplus        ✓ ✓ ✓ . ✓ 
  - fish           ✓ ✓ ✓ ✓ ✓ 
  - toml           ✓ ✓ ✓ ✓ ✓ 
  - wgsl           ✓ . ✓ . . 
  - proto          ✓ . ✓ . . 
  - m68k           ✓ ✓ ✓ . ✓ 
  - pug            ✓ . . . ✓ 
  - elvish         ✓ . . . ✓ 
  - solidity       ✓ . . . . 
  - glsl           ✓ ✓ ✓ ✓ ✓ 
  - tsx            ✓ ✓ ✓ ✓ ✓ 
  - regex          ✓ . . . . 
  - turtle         ✓ ✓ ✓ ✓ ✓ 
  - go             ✓ ✓ ✓ ✓ ✓ 
  - html           ✓ ✓ ✓ ✓ ✓ 
  - yang           ✓ . ✓ ✓ . 
  - graphql        ✓ . . ✓ ✓ 
  - d              ✓ . ✓ ✓ ✓ 
  - vue            ✓ . ✓ ✓ ✓ 
  - r              ✓ ✓ . ✓ ✓ 
  - make           ✓ . . . ✓ 
  - http           ✓ . . . ✓ 
  - prisma         ✓ . . . . 
  - query          ✓ ✓ ✓ ✓ ✓ 
  - java           ✓ ✓ . ✓ ✓ 
  - cmake          ✓ . ✓ . . 
  - llvm           ✓ . . . . 
  - ruby           ✓ ✓ ✓ ✓ ✓ 
  - css            ✓ . ✓ ✓ ✓ 
  - perl           ✓ . ✓ . . 
  - pioasm         ✓ . . . ✓ 
  - json5          ✓ . . . ✓ 
  - julia          ✓ ✓ ✓ ✓ ✓ 
  - pascal         ✓ ✓ ✓ ✓ ✓ 
  - vim            ✓ ✓ . . ✓ 
  - json           ✓ ✓ ✓ ✓ . 
  - cpp            ✓ ✓ ✓ ✓ ✓ 
  - slint          ✓ . . ✓ . 
  - zig            ✓ . ✓ ✓ ✓ 
  - bibtex         ✓ . ✓ ✓ . 
  - gowork         ✓ . . . ✓ 
  - yaml           ✓ ✓ ✓ ✓ ✓ 
  - jsdoc          ✓ . . . . 
  - hcl            ✓ . ✓ ✓ ✓ 
  - heex           ✓ ✓ ✓ ✓ ✓ 
  - glimmer        ✓ . . . . 
  - sparql         ✓ ✓ ✓ ✓ ✓ 
  - dot            ✓ . . . ✓ 
  - latex          ✓ . ✓ . ✓ 
  - gdscript       ✓ ✓ . ✓ ✓ 
  - devicetree     ✓ ✓ ✓ ✓ ✓ 
  - lua            ✓ ✓ ✓ ✓ ✓ 
  - foam           ✓ ✓ ✓ ✓ ✓ 
  - godot_resource ✓ ✓ ✓ . . 
  - scheme         ✓ . ✓ . ✓ 
  - clojure        ✓ ✓ ✓ . ✓ 
  - gomod          ✓ . . . ✓ 
  - comment        ✓ . . . . 
  - elixir         ✓ ✓ ✓ ✓ ✓ 
  - phpdoc         ✓ . . . . 
  - erlang         . . . . . 
  - verilog        ✓ ✓ ✓ . ✓ 
  - rego           ✓ . . . ✓ 
  - dockerfile     ✓ . . . ✓ 
  - fortran        ✓ . ✓ ✓ . 
  - jsonc          ✓ ✓ ✓ ✓ ✓ 
  - haskell        ✓ . . . ✓ 
  - embedded_template✓ . . . ✓ 
  - javascript     ✓ ✓ ✓ ✓ ✓ 
  - fennel         ✓ ✓ . . ✓ 
  - gleam          ✓ ✓ ✓ ✓ ✓ 
  - commonlisp     ✓ ✓ ✓ . . 
  - kotlin         ✓ ✓ ✓ . ✓ 
  - rst            ✓ ✓ . . ✓ 
  - dart           ✓ ✓ . ✓ ✓ 
  - ocaml          ✓ ✓ ✓ . ✓ 
  - cuda           ✓ ✓ ✓ ✓ ✓ 
  - nix            ✓ ✓ ✓ . ✓ 
  - ninja          ✓ . ✓ ✓ . 
  - help           ✓ . . . . 
  - ocaml_interface✓ ✓ ✓ . ✓ 
  - rust           ✓ ✓ ✓ ✓ ✓ 
  - org            . . . . . 
  - ocamllex       ✓ . . . ✓ 
  - typescript     ✓ ✓ ✓ ✓ ✓ 
  - ql             ✓ ✓ . ✓ ✓ 
  - hjson          ✓ ✓ ✓ ✓ ✓ 
  - scala          ✓ . ✓ . ✓ 
  - fusion         ✓ ✓ ✓ ✓ . 
  - hocon          ✓ . . . ✓ 
  - scss           ✓ . . ✓ . 
  - todotxt        ✓ . . . . 
  - eex            ✓ . . . ✓ 
  - c              ✓ ✓ ✓ ✓ ✓ 
  - python         ✓ ✓ ✓ ✓ ✓ 
  - ledger         ✓ . ✓ ✓ ✓ 
  - vala           ✓ . . . . 
  - surface        ✓ . ✓ ✓ ✓ 
  - elm            ✓ . . . ✓ 
  - supercollider  ✓ ✓ ✓ ✓ ✓ 
  - rasi           ✓ ✓ ✓ ✓ . 
  - c_sharp        ✓ ✓ ✓ . ✓ 
  - teal           ✓ ✓ ✓ ✓ ✓ 

  Legend: H[ighlight], L[ocals], F[olds], I[ndents], In[j]ections
         +) multiple parsers found, only one will be used
         x) errors found in the query, try to run :TSUpdate {lang}

Output of `nvim --version`

NVIM v0.7.0
Build type: Release
LuaJIT 2.1.0-beta3
Compiled by builduser

Features: +acl +iconv +tui
See ":help feature-compile"

   system vimrc file: "$VIM/sysinit.vim"
  fall-back for $VIM: "/usr/share/nvim"

Run :checkhealth for more info

Additional context

No response

The text was updated successfully, but these errors were encountered:

theHamsta · 2022-05-07T12:41:48Z

You're editing source code in the markdown. We know that language injections are not implemented in the most efficient way (no incremental parsing). Can you identify whether the markdown parser or the language injection is the problem?

medwatt · 2022-05-07T12:59:50Z

@theHamsta, I created a new file with just 500 lines or so, and without any code blocks. Here's the result. So, I believe the markdown parser is likely the one causing the slowdown.

markdown_slow_2.mp4

theHamsta · 2022-05-07T13:26:43Z

@medwatt the markdown parser can be used also from other editors do you have the time to check whether helix or the tree-sitter web playground have the same problem. I will try to get the timing for parsing and querying from Neovim to see where we have the problem.

medwatt · 2022-05-07T14:09:35Z

@theHamsta, this is the first time I am hearing of these. However, I installed the helix editor to test. I am having problems installing the markdown parser. According to the documentation, all I need to do is put the following in the languages.toml file in the config folder:

[[grammar]]
name = "markdown"
source = { git = "https://github.com/ikatyang/tree-sitter-markdown" }

Launching helix gives the following:

Bad language config: unknown field `grammar`, expected `language`
Press <ENTER> to continue with default language config

One thing I noticed though is the delay is even worse in helix. For example, holding down a key for a while prints out the characters after a much longer delay than neovim. With helix, however, there's no choppiness; it's smooth but takes longer. With neovim, the delay is shorter, but very choppy.

clason · 2022-05-07T14:14:40Z

That's the wrong parser, though: we use https://github.com/MDeiml/tree-sitter-markdown

medwatt · 2022-05-07T14:17:16Z

https://github.com/MDeiml/tree-sitter-markdown

The question is how to get helix to install the parser? I have no idea how helix works. But as I said, the delay is much worse in helix with the same file already, so I don't think there's a point checking it further.

theHamsta · 2022-05-07T14:54:46Z

This is what happens when you press down a key with high repetition rate in a markdown file

as you can see the compute time seems to be indeed be spent in parser_parse (big chunk of 35ms is the markdown file, smaller chunks are injected languages). I'll check whether setting a timeout for the parser changes anything or whether it is really the parsing (or rather the querying causing problems here)

medwatt · 2022-05-07T15:05:49Z

@theHamsta, maybe you can shed some light into why parsing a markdown file is more intensive than Lua, for instance, given that Lua's grammar is more complex than markdown's.

I also use treesitter for verilog, and there is a noticeable delay when opening a verilog file for the first time, even when the file has a few lines.

theHamsta · 2022-05-07T15:22:17Z

The verilog parser is very complex, it takes long to generate and the resulting parser is enormous. I would not be surprised when it's slow to parse. E.g. Lua stays always below the 2ms threshold that I've set on the build I'm experimenting with right now. Markdown is very complex to parse because it has no strict grammar and requires an external parser to keep track of all the stack of all the nested pairs.

theHamsta · 2022-05-07T15:33:06Z

I had a long time the plan to use tree-sitter's timeout feature for parsing to guarantee that we don't stay in the parse+query cycle to long (and maybe do off-thread background parsing with the intermediate result in case of timeout). With tree-sitter time out set I can type without lags (but of course I have not answered whether long parsing also implies longer querying afterwards

theHamsta · 2022-05-07T15:44:08Z

If you want, you can experiment with https://github.com/theHamsta/neovim/tree/nvtx

You can set the parsing timeout here: https://github.com/theHamsta/neovim/blob/7d313d9395befb743aae2309633b78e160db8c68/src/nvim/lua/treesitter.c#L337

It will stop parsing (and also highlighting) when parsing takes to long. You will loose highlights from time to time, but typing is fast 😄 . Also solves the problem people have when opening files that are multiple MB large

Let's see why markdown is slower.

theHamsta · 2022-05-07T17:34:32Z

Some findings:

it depends a lot where in the document you insert text
it is indeed the C function parser_parse of markdown (which invokes the parsing) which takes the major part of the range (this example repeats over all the timeline)
nvtx is really cool when you visualize the timings of injected languages (I haven't marked querying yet, but it should be the empty spots)

medwatt · 2022-05-07T18:43:23Z

@theHamsta, thanks for doing these tests. You mentioned parsing timeout, and from what I understood, it's something that is not active by default. I wonder then what causes treesitter to go mad sometimes when I start scrolling.

Here's a screenshot of my some file being highlighted correctly.

Here's the same section when I start scrolling.

This doesn't always happen, so its not easy to reproduced. It happens from time to time and my current solution is to restart neovim. Can you say what might be causing this issue?

theHamsta · 2022-05-07T19:04:47Z

This seems to be a different issue. I've also seen it when experimenting the timeout. Some extmarks got out of sync but is some thing Neovim does wrong, not tree-sitter taking a long time parsing.

@medwatt by default tree-sitter has unlimited amount of time to finish its parsing, to enable a timeout you have to edit the source code of Neovim. I could become a feature in future to protect ourselves from slow parsers or very big files.

A flamegraph of one session where I pressed the same key within two different link regions (it goes really slow at the second)

It seems to spend some time in ts_stack_pop_count (is the stack very deep for that language?)

to reproduce add a few ds to the first paragraph in the README within the link

@MDeiml any ideas?

medwatt · 2022-05-07T19:47:54Z

@theHamsta, I think, for a temporary solution, it would be a good idea to expose the option to set a custom timeout for slow parsers.

MDeiml · 2022-05-07T19:54:48Z

This is a really weird bug. I don't think it's a problem with my parser, since it does not appear when just parsing the document as a whole. It appears to only happen with incremental parsing. Also I noticed that if I stop holding down d (letting my computer catch up) and then start again it does not slow down again.

I also don't think it's a problem with tree-sitter, because then the surrounding document should not have any influence on parsing speed (only the stack at the current position). But if I delete all the other paragraphs I don't get any slowdown.

It's also not a problem with language injections (if I remove the injection queries, I still get the same effect).

Rather it's probably something to do with highlighting (after disabling tree-sitter highlighting the problem disappears).

But weirdly, if I disable all markdown highlight queries the problem still appears.

If I had to guess I would say it's a problem with neovim, but that's just a hunch.

theHamsta · 2022-05-07T20:49:37Z

@MDeiml no, it is not Neovim or the highlighting. It is tree-sitter doing the parsing. It almost consumes all time with queries and injection negligible

parser_parse is just invoking ts_parse the small ranges are Scanner::scan (so I suppose when it's doing ts_parser_parse). I suppose when Scanner::scan stops it is doing ts_tree_get_changed_ranges (will verify that in minute)

It is not your external parser (it is only active 6% of the traced ranges)

but sure Neovim could handle this in a better way

Now with ts_parser_parse traced

theHamsta · 2022-05-07T21:02:32Z

I think how Atom is handling this is that it let's the parsing timeout while doing parsing in a background thread that can be canceled by the foreground thread as soon as the foreground thread wants to parse again. I think every call to ts_parser_parse does progress even when it times out.

MDeiml · 2022-05-07T21:11:14Z

Still, the problem does not appear when highlighting is disabled, and does appear when highlighting is enabled. (For both tests I left TSPlayground open to verify that the document does actually get parsed).

Within ts_parse neovim passes a callback for reading new data:
https://github.com/neovim/neovim/blob/9005ffbe7757eca8ad809c81db76aec930db8e68/src/nvim/lua/treesitter.c#L292-L323

Could this be the culprit?

theHamsta · 2022-05-07T21:33:32Z

The input_cb only makes a small . Without highlighting the tree doesn't get updated on every key stroke. The playground wasn't working for a long time without the triggers by the highlighter. I think that we now at least parse the tree once.

Time (%)  Total Time (ns)  Instances    Avg (ns)      Med (ns)    Min (ns)    Max (ns)    StdDev (ns)    Style              Range           
 --------  ---------------  ---------  ------------  ------------  ---------  -----------  ------------  -------  ---------------------------
     32,0   18.276.135.814        292  62.589.506,0  65.104.053,0        917  249.261.151  19.474.199,0  PushPop  LanguageTree:parse markdown
     28,0   16.266.869.258      7.150   2.275.086,0      27.403,0      3.295   69.144.939  11.178.170,0  PushPop  parser_parse               
     25,0   14.375.366.952      7.150   2.010.540,0      25.540,0      2.410   62.835.704   9.868.962,0  PushPop  ts_parser_parse            
      4,0    2.699.118.363  7.714.968         349,0         331,0        121      364.863         516,0  PushPop  markdown scan              
      3,0    1.866.829.630      7.150     261.095,0         245,0        116    7.792.537   1.310.739,0  PushPop  ts_tree_get_changed_ranges 
      1,0    1.059.199.678        275   3.851.635,0   3.121.339,0  2.742.390   99.952.165   6.289.228,0  PushPop  LanguageTree:parse lua     
      1,0      867.959.021        275   3.156.214,0   3.002.615,0  2.636.921   30.037.728   1.651.085,0  PushPop  _get_injections markdown   
      0,0      404.102.225      1.448     279.076,0      50.607,0      6.101  182.794.010   4.880.470,0  PushPop  on_line                    
      0,0      323.440.617        275   1.176.147,0     579.766,0    404.118   67.359.675   4.486.126,0  PushPop  _get_injections lua        
      0,0      205.169.011    523.333         392,0         338,0        126      128.838         483,0  PushPop  input_cb                   
      0,0      189.857.909        275     690.392,0     323.011,0    257.144   79.581.512   4.807.582,0  PushPop  LanguageTree:parse vim     
      0,0      135.103.159        275     491.284,0     411.432,0    329.545    9.388.716     603.138,0  PushPop  LanguageTree:parse html    
      0,0      122.241.394        275     444.514,0     122.033,0     80.412   73.084.122   4.411.825,0  PushPop  _get_injections vim        
      0,0       60.690.523        275     220.692,0     160.789,0    110.730    6.923.854     417.566,0  PushPop  _get_injections html       
      0,0       58.474.838         13   4.498.064,0   2.416.024,0     21.448   15.370.281   5.726.295,0  PushPop  tslua_parse_query

input_cb doesn't spend a lot of time, but I suppose that you think it causes the parsing to take longer than necessary based on the output it's producing. I saw that input_cb (and also read system calls get called more often in injections).

MDeiml · 2022-05-08T06:41:08Z

@theHamsta Could you maybe record a flamegraph of inserting some ds, letting neovim catch up with work, and then inserting some more? For me it doesn't hang the second time, so there should be some difference. (Sry for bothering, but you seem to have a nice profiling setup :))

theHamsta · 2022-05-08T14:37:50Z

My profiling setup is not so great at the moment Intel Vtune is crashing whenever it tries to finalize the result which would be the best to filter certain time periods out of perf traces (will try on a different machine). I couldn't see anything fundamentally different in the instances when it takes longer which for me seems to depend mostly on document position.

theHamsta · 2022-05-08T14:51:41Z

@maxbrunsfield do you have any advice on how to debug this? We have the problem that for https://github.com/MDeiml/tree-sitter-markdown incremental parsing takes can take a long time 40ms-60ms see (cold start parsing takes 30ms) which causes a lag in the editor as keys can be fed at a faster rate. The edits are in each case single letters by just pressing one key in the README of out repo. Since the largest fraction of the time is spent in ts_parser_parse (see #2916 (comment) for timeline) it should be also reproducible using Atom (when no timeout is set for parsing). At the moment Neovim parses fully synchronously without any timeout for the parser set, also every keystroke triggers a parsing event

theHamsta · 2022-05-08T20:31:29Z

Maybe it would be good to reproduce this programmatically using the tree-sitter rust API: parsing the text once and then do edits to understand what's going on (profiling or with debugger attached)

ayushnix · 2022-06-17T08:23:42Z

I came across this issue after asking a query on /r/neovim.

It's the same issue described by @medwatt in the first comment. I've enabled filetype.lua using g.do_filetype_lua = 1 and disabled filetype.vim using g.did_load_filetypes = 0. I'm using the markdown treesitter parser by @MDeiml. Here's the --startuptime log file when a markdown file is opened

--startuptime log file

times in msec clock self+sourced self: sourced script clock elapsed: other lines

000.023 000.023: --- NVIM STARTING --- 000.553 000.530: locale set 001.155 000.602: inits 1 001.192 000.037: window checked 001.543 000.351: parsing arguments 005.880 004.337: expanding arguments 005.923 000.043: inits 2 006.889 000.966: init highlight 006.894 000.005: waiting for UI 009.039 002.145: done waiting for UI 009.093 000.054: init screen for UI 009.134 000.041: init default mappings 009.202 000.068: init default autocommands 012.028 000.224 000.224: sourcing /usr/share/nvim/runtime/ftplugin.vim 012.525 000.122 000.122: sourcing /usr/share/nvim/runtime/indent.vim 012.779 000.052 000.052: sourcing /usr/share/nvim/archlinux.vim 012.797 000.157 000.105: sourcing /etc/xdg/nvim/sysinit.vim 026.088 013.177 013.177: sourcing /home/user/.config/nvim/init.lua 026.125 003.244: sourcing vimrc file(s) 026.926 000.044 000.044: sourcing /home/user/.local/share/nvim/site/pack/packer/start/LuaSnip/ftdetect/snippets.vim 027.294 000.039 000.039: sourcing /usr/share/vim/vimfiles/ftdetect/PKGBUILD.vim 027.397 000.059 000.059: sourcing /usr/share/vim/vimfiles/ftdetect/meson.vim 027.489 000.048 000.048: sourcing /usr/share/vim/vimfiles/ftdetect/vagrantfile.vim 027.854 001.363 001.173: sourcing /usr/share/nvim/runtime/filetype.lua 027.968 000.048 000.048: sourcing /usr/share/nvim/runtime/filetype.vim 028.539 000.220 000.220: sourcing /usr/share/nvim/runtime/syntax/synload.vim 028.826 000.756 000.537: sourcing /usr/share/nvim/runtime/syntax/syntax.vim 030.913 000.047 000.047: sourcing /usr/share/nvim/runtime/plugin/gzip.vim 030.999 000.037 000.037: sourcing /usr/share/nvim/runtime/plugin/health.vim 031.138 000.091 000.091: sourcing /usr/share/nvim/runtime/plugin/man.vim 031.229 000.039 000.039: sourcing /usr/share/nvim/runtime/plugin/matchit.vim 031.600 000.325 000.325: sourcing /usr/share/nvim/runtime/plugin/matchparen.vim 031.701 000.049 000.049: sourcing /usr/share/nvim/runtime/plugin/netrwPlugin.vim 032.075 000.037 000.037: sourcing /home/user/.local/share/nvim/rplugin.vim 032.094 000.349 000.312: sourcing /usr/share/nvim/runtime/plugin/rplugin.vim 032.293 000.152 000.152: sourcing /usr/share/nvim/runtime/plugin/shada.vim 032.387 000.037 000.037: sourcing /usr/share/nvim/runtime/plugin/spellfile.vim 032.483 000.047 000.047: sourcing /usr/share/nvim/runtime/plugin/tarPlugin.vim 032.568 000.037 000.037: sourcing /usr/share/nvim/runtime/plugin/tohtml.vim 032.667 000.051 000.051: sourcing /usr/share/nvim/runtime/plugin/tutor.vim 032.766 000.048 000.048: sourcing /usr/share/nvim/runtime/plugin/zipPlugin.vim 033.070 000.050 000.050: sourcing /usr/share/vim/vimfiles/plugin/fzf.vim 033.270 000.147 000.147: sourcing /usr/share/vim/vimfiles/plugin/redact_pass.vim 063.140 010.130 010.130: sourcing /home/user/.local/share/nvim/site/pack/packer/start/onedark.nvim/colors/onedark.lua 108.306 074.803 064.673: sourcing /home/user/.config/nvim/plugin/packer_compiled.lua 109.021 004.420: loading rtp plugins 109.678 000.245 000.245: sourcing /home/user/.local/share/nvim/site/pack/packer/start/LuaSnip/plugin/luasnip.vim 110.296 000.385 000.385: sourcing /home/user/.local/share/nvim/site/pack/packer/start/indent-blankline.nvim/plugin/indent_blankline.vim 111.682 001.048 001.048: sourcing /home/user/.local/share/nvim/site/pack/packer/start/nvim-treesitter/plugin/nvim-treesitter.lua 112.090 000.168 000.168: sourcing /home/user/.local/share/nvim/site/pack/packer/start/vim-cool/plugin/cool.vim 112.269 001.401: loading packages 112.641 000.231 000.231: sourcing /home/user/.local/share/nvim/site/pack/packer/start/Comment.nvim/after/plugin/Comment.lua 112.650 000.151: loading after plugins 112.663 000.012: inits 3 117.340 004.678: reading ShaDa 124.668 000.452 000.452: sourcing /usr/share/nvim/runtime/autoload/htmlcomplete.vim 124.819 000.754 000.302: sourcing /usr/share/nvim/runtime/ftplugin/html.vim 125.160 001.425 000.671: sourcing /usr/share/nvim/runtime/ftplugin/markdown.vim 127.838 000.333 000.333: sourcing /usr/share/nvim/runtime/syntax/javascript.vim 130.206 002.177 002.177: sourcing /usr/share/nvim/runtime/syntax/vb.vim 136.619 006.283 006.283: sourcing /usr/share/nvim/runtime/syntax/css.vim 137.933 011.319 002.527: sourcing /usr/share/nvim/runtime/syntax/html.vim 138.344 011.844 000.524: sourcing /usr/share/nvim/runtime/syntax/markdown.vim 204.324 073.714: opening buffers 205.968 001.644: BufEnter autocommands 205.977 000.009: editing files in windows 206.806 000.829: VimEnter autocommands 206.814 000.008: UIEnter autocommands 207.221 000.287 000.287: sourcing /usr/share/nvim/runtime/autoload/provider/clipboard.vim 207.231 000.131: before starting main loop 271.764 064.532: first screen update 271.772 000.008: --- NVIM STARTED ---

Whenever I edit a markdown file with more than 300 or 500 lines with some code blocks, the input latency increases dramatically. When it's more than 1000 lines, I have to wait for almost a second for a keypress to show up on my screen. If I delete characters, the cursor disappears and text is deleted with a delay of almost a second.

I'm not sure how to disable syntax highlighting for fenced code blocks when using the markdown treesitter parser or if it'll help. If I disable the markdown treesitter parser, there's a significant improvement in input latency.

I've noticed from the startuptime logs that vimscript runtime syntax files are sourced for code blocks, including markdown.vim, even though I've installed treesitter parsers for all the languages mentioned in the log and I've also disabled vim regex syntax highlighting in my neovim config.

clason · 2022-06-17T08:25:37Z

I'm not sure how to disable syntax highlighting for fenced code blocks when using the markdown treesitter parser or if it'll help. If I disable the markdown treesitter parser, there's a significant improvement in input latency.

Remove the injections.scm from your runtime path.

I've noticed from the startuptime logs that vimscript runtime syntax files are sourced for code blocks, including markdown.vim, even though I've installed treesitter parsers for all the languages mentioned in the log and I've also disabled vim regex syntax highlighting in my neovim config.

Are you sure they're actually executed? They will show up even if they're skipped by finishing early (which is the usual mechanism for Vim to "skip" files).

ayushnix · 2022-06-17T12:00:47Z

That's disappointing. It reminds me of this post on undeadly.org about markdown.

I'm not sure if neorg and its treesitter parser can handle large documents without introducing input latency in the terminal. If not, I'll probably switch to writing articles in HTML.

Thanks!

theHamsta · 2022-06-17T12:26:31Z

@ayushnix I'm sure the problems with markdown input latency can be solved by a time out for tree-sitter parsing. It was working smoothly when I added the time out (except that highlighting was lost sometimes due to the fact that there is not code), possibly switching to background parsing or to reusing the previous parsing result. We're talking about max 42ms during incremental parsing which is slow enough to build up a latency lag when you type multiple letters at once, but still manageable as an editor to provide the highlighting. In other words: it's to slow for "on every keystroke", but fast enough to catch up once it moved to background parse once it reached the timeout. The problem we're experiencing here is that after a fast input of 5 letters, we experience 5 times the parsing latency while with a timeout it would be possible to cancel the first 4 letters and finish at the last letter with a background thread. Usually, the 5 times incremental parsing should go really fast as the parser state should have changed much. But even when that does not work the editor should harness itself against excessive parsing times.

There is not fundamental limitation why Markdown parser should be slow. It's just that nested pairs of delimiters are difficult to express with tree-sitter and almost always require an external parser that can count the nesting state. You can test whether https://github.com/ikatyang/tree-sitter-markdown has the same limitation. It's also possible that the parser of @MDeiml has some properties that make the incremental parsing logic fail to build efficiently on the previous result.

theHamsta · 2022-06-17T12:30:27Z

@ayushnix can you provide some evidence that the injections have any effect at all? With neovim/neovim#18761 you can visualize what fraction of the latency is cause by markdown parsing and what by the injections. In the document I tested I was experiencing latency purely by the markdown parser with injection causing only a negligible fraction of the whole incremental parsing

MDeiml · 2022-06-17T14:10:24Z

I'm actually experimenting at the moment on if I can get this faster. This would include optionally only parsing inline that are visible (parsing inlines only depends on all the inlines in the same block and not other blocks) and a few changes around paragraphs, which are kinda important and really slow at the moment. But if I can get something faster to work it's gonna take a while since it probably needs some features in upstream neovim.

MDeiml · 2022-06-17T14:13:14Z

But I'm quite confident that I should be able to get this at least somewhat fast since parsing the block structure could probably be done pretty fast since it's well definer, it's mainly inlines like links and emphasis that make the parser slow. I should be able to split the two .

MDeiml · 2022-06-18T15:54:56Z

I think I found the cause of this issue. I think tree-sitter has problems with reusing trees that are very "flat", i.e. trees where most nodes have a lot of siblings. This is not a problem with usual programming languages, since they're often structured hierarchically, but with markdown (as it is now) most nodes are children of the root node.

When parsing a file after introducing some edits, all siblings of nodes that changed are also parsed again. I'm not sure why, maybe @maxbrunsfeld could give some insights?

I was able to solve this by just introducing more hierarchy artificially. More concretely I added a section node, which starts with a heading and stretches until the next heading. With this I can get syntax highlighting in a ~3000 line file without any noticeable delay.

I might try later to get a quick fix in this way for the current version, but as I said I'm currently working on rewriting the parser so I'd rather work on that.

clason · 2022-06-18T16:17:00Z

A quick fix would be very much appreciated, since the rewrite sounds like something we can't just drop in in place of the current one (needing substantial infrastructure work to support such "split parsers").

Of course, I understand that this is much less interesting work ;)

theHamsta · 2022-06-18T16:38:12Z

A quickfix would probably to let nvim timeout long parsings. We will always face the situation that parsing is when the file is too big (at least for initial parsing). Although a change in Neovim might not be not that quick.

clason · 2022-06-18T16:40:41Z

We'll never know until someone puts a PR for it up for discussion...

MDeiml · 2022-06-18T17:05:56Z

I tried to implement the fix on the main branch, but I didn't get the same speedup. Not sure why.

theHamsta · 2022-06-18T21:03:00Z

We'll never know until someone puts a PR for it up for discussion...

well, I guess the how in the implementation is the thing that's taking some time... Maybe I'll find some time tomorrow for it. There are quite many possibilities to deal with this and neither me knows what is the best one until I tried them out.

MDeiml · 2022-06-21T12:10:56Z

I noticed something else while writing rust bindings for my parser. If I use a single tree-sitter parser object and ts_parser_set_language then parsing again after edits seems to happen almost instantly. If I use one parser for each language then parsing after edits takes equally as long as the first parse.

~~I don't know if this is specific to my use case, but maybe it would make sense to investigate something similar for neovim, as it seems it also uses on parser per language.~~

Nvm there was a hidden error and I was getting garbage data.

MDeiml · 2022-06-21T14:42:51Z

tree-sitter/tree-sitter-haskell#41 (comment)

It seems that my previous comment about hierarchical structure was the right hunch. Reducing conflicts should be the main priority for slow parsers, but "sectioning off" the conflicts seems to work as well. Unfortunately neither is possible for inline markdown elements like emphasis.

maxbrunsfeld · 2022-06-21T19:51:32Z

I think I found the cause of this issue. I think tree-sitter has problems with reusing trees that are very "flat", i.e. trees where most nodes have a lot of siblings. This is not a problem with usual programming languages, since they're often structured hierarchically, but with markdown (as it is now) most nodes are children of the root node.

I think it must be something more specific than that; otherwise it would reproduce in, for example, a large C file with hundreds of small functions, since those functions would all be sibling nodes.

I'm curious what's going on, and I'll try to reproduce the slowness with the tree-sitter CLI, using the parse --edit command.

maxbrunsfeld · 2022-06-21T20:17:59Z

Ok, I can reproduce the problem from the command line. I believe the problem is a certain conflict in your grammar, between _soft_line_break and _paragraph_end_newline. It causes every paragraph to be considered "fragile", and not re-usable.

I determined this by creating a small markdown file, test.md with five two-word paragraphs:

a b

c d

e f

g h

i j

I then parsed this file from the command line with debug graphs enabled:

tree-sitter parse test.md -D

(document [0, 0] - [10, 0]
  (paragraph [0, 0] - [1, 0])
  (paragraph [2, 0] - [3, 0])
  (paragraph [4, 0] - [5, 0])
  (paragraph [6, 0] - [7, 0])
  (paragraph [8, 0] - [9, 0]))

This creates a long sequence of SVG graphs. In this graph, you can zoom in on a particular point, when the parser reaches the end of a paragraph, and see that the parse stack splits into two branches:

graph

Any node that is created in an ambiguous state like this is considered fragile - it cannot be reused during incremental parsing if any of its contents have changed. In this case, the ambiguity is still in effect while the paragraph and block nodes are created.

To observe the performance impact of this ☝️ more directly, you can perform an edit and an incremental re-parse at the command line, inserting a character on line 4 (the third paragraph).

tree-sitter parse test.md --edit '4,1 0 1'

It re-parses correctly, but if you run with -d (for terminal logging) or -D (to generate another SVG log), you can see that the parser decides not to reuse any block/paragraph nodes.

...
cant_reuse_node_is_fragile tree:_block
cant_reuse_node_is_fragile tree:paragraph
...

@MDeiml Can you think of a way to not have this conflict with _paragraph_end_newline? Can you tell the difference between a paragraph ending and a "soft" line break by the number of newlines?

MDeiml · 2022-06-21T20:46:58Z

Thank you! I have a fix for this conflict in paragraphs where I parse ahead quite a bit to determine if a newline is a soft line break. This means that a lot paragraphs can now be reused.

But a similar problem now appears with emphasis, which appear in a lot of paragraphs as top level inline nodes. I'm not sure it's possible to parse those without conflicts as that would require potentially infinite lookahead. But maybe it's possible to create a "fast path" for the most common use case of no nested inlines.

maxbrunsfeld · 2022-06-21T21:04:03Z

I think it's probably ok for emphasis to have that conflict, since most (all?) top-level nodes in the document are not emphasis nodes.

MDeiml · 2022-06-22T08:08:15Z

That's true, but pretty much every top level node has children that cannot be reused, which means that parsing is still slow in very very large documents, though I can get it to very acceptable speeds for e.g. the README for this repo.

I have a question though, shouldn't it be possible to reuse fragile trees (whole trees not nodes) if all edits were outside their range set with ts_parser_set_included_range? I have to admit I don't really understand this concept of fragility so I might be wrong, but even with conflicts parsing should be deterministic.

I am currently working on a version of this parser where inline elements (emphasis) and block elements (paragraphs) are split into two grammars. This means that every inline range is parsed separately. I noticed that almost all of the inline ranges are not reused, which makes sense as most contain emphasis and are thus fragile. But all that needs to be done is to shift the node positions, so I'd be keen to just not reparse the unaffected trees.

lewis6991 · 2023-02-24T10:27:20Z

There's a lot of activity in this thread.

@MDeiml are you able to provide a status update? Especially now neovim/neovim#22309 is merged.

MDeiml · 2023-02-24T10:37:29Z

Sure!

The split parser I mentioned is implemented and used in neovim for quite some while. This helped a lot in other editors, bur neovim had some problem with injected languages which was improved in neovim/neovim#22309
so I imagine that this helped quite a bit. Since then I improved the "block" parser out of the two to almost not use "conflicts". Conflicts are still what makes the "inline" parser quite slow.

clason · 2023-02-24T10:39:54Z

Note that we also recently bumped the tree-sitter version in our official builds to include the vastly improved error handling; I suspect that this also helps quite a bit in this regard.

So a test on the latest nightly with one of the bad example files would be very much appreciated.

medwatt added the bug Something isn't working label May 7, 2022

medwatt changed the title ~~Markdown files with more than 1000 lines become perceptibly slow~~ Markdown files with more than 500 lines become perceptibly slow May 7, 2022

gregdezeeuw mentioned this issue May 12, 2022

Support Wikilink syntax jakewvincent/mkdnflow.nvim#10

Closed

theHamsta mentioned this issue May 26, 2022

chore: Experiments with profiling tree-sitter with Tracy and NVTX neovim/neovim#18761

Closed

theHamsta mentioned this issue Jul 13, 2022

treesitter impact of redraw performance neovim/neovim#14762

Closed

ayushnix mentioned this issue Aug 4, 2022

tree-sitter grammar jgm/djot#21

Open

pascalkuthe mentioned this issue Oct 7, 2022

Slow Markdown Highlighting helix-editor/helix#4139

Closed

lewis6991 added the performance label Feb 24, 2023

gen740 mentioned this issue Jun 1, 2023

Strong performance impact (fancy enabled) gen740/SmoothCursor.nvim#38

Closed

ayushnix mentioned this issue Jun 21, 2023

replace import with media query for dark and light variables ayushnix/kangae#4

Merged

This comment was marked as off-topic.

Sign in to view

Markdown files with more than 500 lines become perceptibly slow #2916

Markdown files with more than 500 lines become perceptibly slow #2916

Comments

medwatt commented May 7, 2022 • edited Loading

Describe the bug

To Reproduce

Expected behavior

Output of :checkhealth nvim-treesitter

Output of nvim --version

Additional context

theHamsta commented May 7, 2022

medwatt commented May 7, 2022

theHamsta commented May 7, 2022

medwatt commented May 7, 2022

clason commented May 7, 2022

medwatt commented May 7, 2022

theHamsta commented May 7, 2022 • edited Loading

medwatt commented May 7, 2022

theHamsta commented May 7, 2022

theHamsta commented May 7, 2022

theHamsta commented May 7, 2022 • edited Loading

theHamsta commented May 7, 2022 • edited Loading

medwatt commented May 7, 2022

theHamsta commented May 7, 2022 • edited Loading

medwatt commented May 7, 2022

MDeiml commented May 7, 2022

theHamsta commented May 7, 2022 • edited Loading

theHamsta commented May 7, 2022

MDeiml commented May 7, 2022

theHamsta commented May 7, 2022

MDeiml commented May 8, 2022

theHamsta commented May 8, 2022 • edited Loading

theHamsta commented May 8, 2022 • edited Loading

theHamsta commented May 8, 2022

ayushnix commented Jun 17, 2022

clason commented Jun 17, 2022 • edited Loading

ayushnix commented Jun 17, 2022

theHamsta commented Jun 17, 2022

theHamsta commented Jun 17, 2022

MDeiml commented Jun 17, 2022

MDeiml commented Jun 17, 2022

MDeiml commented Jun 18, 2022

clason commented Jun 18, 2022

theHamsta commented Jun 18, 2022

clason commented Jun 18, 2022

MDeiml commented Jun 18, 2022

theHamsta commented Jun 18, 2022

MDeiml commented Jun 21, 2022 • edited Loading

MDeiml commented Jun 21, 2022

maxbrunsfeld commented Jun 21, 2022

maxbrunsfeld commented Jun 21, 2022 • edited Loading

MDeiml commented Jun 21, 2022

maxbrunsfeld commented Jun 21, 2022

MDeiml commented Jun 22, 2022

lewis6991 commented Feb 24, 2023

MDeiml commented Feb 24, 2023

clason commented Feb 24, 2023

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

medwatt commented May 7, 2022 •

edited

Loading

Output of `:checkhealth nvim-treesitter`

Output of `nvim --version`

theHamsta commented May 7, 2022 •

edited

Loading

theHamsta commented May 7, 2022 •

edited

Loading

theHamsta commented May 7, 2022 •

edited

Loading

theHamsta commented May 7, 2022 •

edited

Loading

theHamsta commented May 7, 2022 •

edited

Loading

theHamsta commented May 8, 2022 •

edited

Loading

theHamsta commented May 8, 2022 •

edited

Loading

clason commented Jun 17, 2022 •

edited

Loading

MDeiml commented Jun 21, 2022 •

edited

Loading

maxbrunsfeld commented Jun 21, 2022 •

edited

Loading