Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to split markdown parser #3048

Merged
merged 3 commits into from
Jun 26, 2022

Conversation

MDeiml
Copy link
Contributor

@MDeiml MDeiml commented Jun 22, 2022

My new split markdown parser works quite well now (better than the non-split one). This PR changes the config to point at the new branch because I want to keep the old branch for now as other projects are using it.

It also adds some other changes that are needed for the two parsers to work:

  1. An exclude_children! directive, that might also be very useful for something like Injected markdown incorrectly highlights indented docstrings #2212
  2. Splitting markdown queries into block and inline ones and add the injection for inline into block grammar
  3. Add an include_dir option to parser configs (needed, because the two grammars don't live in the repos root directory)

@clason
Copy link
Contributor

clason commented Jun 22, 2022

Do you want to add yourself as a maintainer so people know whom to ping when things go wrong?

@MDeiml
Copy link
Contributor Author

MDeiml commented Jun 23, 2022

Sure good idea

@clason
Copy link
Contributor

clason commented Jun 23, 2022

Also, please don't forget to mark this as a breaking change (feat(markdown)!: switch to split parser) as this will invalidate downstream queries.

@MDeiml
Copy link
Contributor Author

MDeiml commented Jun 24, 2022

I also noticed a bug (which probably should be solved separately from this PR): A node from an injected language used for highlighting can have a range that might include bytes that lie outside of the included range. E.g. with this

> *Foo
> Bar*

The emphasis spans over the second >, but that one is not part of the included range for the inline content. It is still highlighted as emphasis.

We would need to calculate the intersection between included range and node range for every node used in an injected language and use that instead.

@theHamsta
Copy link
Member

@MDeiml wow, awesome that we now have an exclude_children! directive. We can of course merge it. But I'm wondering whether we should just make it always the default as this is the way upstream tree-sitter behaves (except "injection.combined" is set https://tree-sitter.github.io/tree-sitter/syntax-highlighting#language-injection).

@MDeiml
Copy link
Contributor Author

MDeiml commented Jun 24, 2022

I also think excluding children is the better default, but making this the default would probably need changes (and some discussion) in neovim itself. Maybe it's better to first test this directive here and later make a PR to neovim?

lockfile.json Outdated Show resolved Hide resolved
@MDeiml MDeiml force-pushed the use_split_markdown_grammar branch from a462a34 to b35b32c Compare June 24, 2022 15:48
@MDeiml MDeiml force-pushed the use_split_markdown_grammar branch from b35b32c to bf1ebcf Compare June 24, 2022 20:04
Copy link
Member

@kyazdani42 kyazdani42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me otherwise :)

queries/markdown_inline/highlights.scm Outdated Show resolved Hide resolved
@clason clason closed this Jun 25, 2022
@clason clason reopened this Jun 25, 2022
@MDeiml MDeiml force-pushed the use_split_markdown_grammar branch from 3d4058a to 34ae4ac Compare June 25, 2022 16:53
lockfile.json Outdated Show resolved Hide resolved
@theHamsta
Copy link
Member

theHamsta commented Jun 25, 2022

I'm having similar performance issues as with the old parser. Was the intention of the split to improve performance?

Wrong alarm. I accidentally used the old markdown parser (was left installed as an artifact of testing a different PR at alternative install location). It seems to be slow still with markdown_inline but that might also be our injection implementation that does no incremental parsing.

@MDeiml MDeiml force-pushed the use_split_markdown_grammar branch from 34ae4ac to e795da2 Compare June 26, 2022 14:29
@MDeiml
Copy link
Contributor Author

MDeiml commented Jun 26, 2022

It's a bit faster, but still slow. I think parsing time after edits is still linear with file size (should be linear with size of edit), because all inline ranges with more complicated elements like emphasis or code spans still get reparsed. This could be fixed, by detecting which inline ranges actually got edited and then just reusing the old trees for all the other ones. But that is something that tree-sitter itself should be able to detect, so maybe I'm gonna create a PR in for that in tree-sitter.

@clason
Copy link
Contributor

clason commented Jun 26, 2022

Just on the off-chance: will tree-sitter/tree-sitter#1783 be of any help?

@MDeiml
Copy link
Contributor Author

MDeiml commented Jun 26, 2022

I don't think so. There is no "syntactically wrong markdown", so error recovery is not relevant.

@clason clason merged commit 002084b into nvim-treesitter:master Jun 26, 2022
megalithic added a commit to megalithic/dotfiles that referenced this pull request Jun 26, 2022
- adding back nvim-markdown for syntax highlighting while queries
  downstream get updated to support the markdown split for
  markdown_inline
- REF:  nvim-treesitter/nvim-treesitter#3048
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants