Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interest in supporting correct hightlighting and multiple schemas? #421

Closed
eemeli opened this issue Mar 6, 2021 · 16 comments
Closed

Interest in supporting correct hightlighting and multiple schemas? #421

eemeli opened this issue Mar 6, 2021 · 16 comments

Comments

@eemeli
Copy link

eemeli commented Mar 6, 2021

Summary

I would like to improve YAML editor support, and I'd rather not write & maintain my own language server.

Relevant information

I'm the main developer behind eemeli/yaml, and I'm getting incrementally closer to having its v2 release "done". As a part of this update, I've refactored the parsing to include an explicit lexer as a pre-stage to its (also rewritten) concrete syntax tree builder. One reason for doing that was to enable correct highlighting of YAML in editors, as parts of the language are notoriously difficult to tokenise with regular expressions.

Separately from that, I'm interested in enabling a simultaneous parse of a document as both YAML 1.1 and YAML 1.2, in order to e.g. be able to warn a user that their input off will get parsed as a boolean false in one schema, and a string 'off' in another.

As the support for semantic tokens was added to the latest LSP 3.16 release, it should be possible for a YAML language server to provide both of the above features. I would like for such a language server to exist, but I'd really rather not have to "compete" with this implementation. However, I am aware that these features may be considered beyond the scope of this project and/or require too much refactoring to be worthwhile.

Are you open to considering such developments? I'd be happy to talk about this elsewhere as well; you can reach me at [email protected].

@joshuawilson
Copy link
Member

@gorkem wdyt?

@gorkem
Copy link
Collaborator

gorkem commented Mar 6, 2021

Inevitable, I was expecting this conversation :)

These features are NOT beyond the scope of this extension. Semantic highlights was at the back of my mind. YAML 1.1 and YAML 1.2 difference came up before.

@JPinkney mentioned that we should move to https://github.com/eemeli/yaml from current parser https://github.com/redhat-developer/yaml-ast-parser . Replacing the parser is a bit of refactoring but I suspect it is manageable. I definitely would like to help.

@evidolob @JPinkney any ideas on how big the refactoring would be?

@JPinkney
Copy link
Contributor

JPinkney commented Mar 8, 2021

It's actually shouldn't be too bad to switch parsers. A lot of the changes are just going to be in recursivelyBuildAst. FWIW I attempted to switch parsers a couple months ago and within probably ~5 hours of work I had ~65% of tests working (had to nuke my computer last week so I lost the changes). I think in general it's a good idea though. That parser is so much better tested than https://github.com/redhat-developer/yaml-ast-parser. If we did do these changes we could probably close about 5-10 issues on vscode-yaml/yaml-language-server that would no longer occur

@evidolob
Copy link
Collaborator

evidolob commented Mar 9, 2021

I'm also like an idea to switch parser, my only concern is that eemeli/yaml should work with syntax invalid files, as we provide editing support of YAML and, most of the time, in editor we have not valid YAML,
eemeli/yaml should provide as match as possible valid AST for such YAML.

@gorkem
Copy link
Collaborator

gorkem commented Mar 9, 2021

@eemeli Does eemeli/yaml parser do tolerant parsing?

@eemeli
Copy link
Author

eemeli commented Mar 9, 2021

@evidolob It should be able to give you the most-yamlish-possible lexing, CST and AST of any input that you throw at it, while also identifying the error or errors that make it not actually valid YAML. If there are cases where its output is suboptimal, I'd be very interested in having them identified so that I can improve on it.

@evidolob
Copy link
Collaborator

evidolob commented Mar 9, 2021

Then we can prototype this, I can start next week. Or let @JPinkney to start it 🙂

@eemeli
Copy link
Author

eemeli commented Mar 9, 2021

Might be best to wait until next week, tbh. I'll release a new prerelease version by next weekend, and the lower-level lexer/parser API will change a bit.

@eemeli
Copy link
Author

eemeli commented Mar 13, 2021

Just released the new version 2.0.0-4 on npm under the next tag:

npm install --save-exact yaml@next

The documentation site is also updated; in particular the section on Parsing YAML describes the Lexer/Parser/Composer layers of the API.

@gorkem
Copy link
Collaborator

gorkem commented Mar 25, 2021

@eemeli I am looking at the YAMLError and I do not see a way to detect where the error token ends. Is there a way to figure it out?

@eemeli
Copy link
Author

eemeli commented Mar 26, 2021

@gorkem At the moment, the only place where end positions are explicitly available are the AST node ranges. They're also implicitly in the CST, but there you need to add up string lengths to find them.

The visitor for finding the node or nodes at an offset is pretty simple, but I'm wondering now if something like it ought to be made even more easily available? Or if additional info needs to be included in the CST?

function nodesAtOffset(doc: Document, offset: number) {
  const nodes: Node[] = []
  visit(doc, (_key, node) => {
    if (
      isNode(node) &&
      node.range &&
      node.range[0] <= offset &&
      node.range[1] > offset
    )
      nodes.push(node)
  })
  return nodes
}

That doesn't of course quite directly answer your question, because those are nodes rather than tokens. :/

@gorkem
Copy link
Collaborator

gorkem commented Mar 26, 2021

Another option is to implement a Composer that spits out SingleYamlDocument, we need to generate them anyway for schema service so this will make it more efficient and allow us to have more control over the Error objects created I suppose.

@eemeli
Copy link
Author

eemeli commented Apr 18, 2021

@gorkem [email protected] now includes ranges for errors.

@gorkem
Copy link
Collaborator

gorkem commented Apr 19, 2021

Just trying out, looks like it has fixed some of the issues that I was observing. I need to investigate the rest

@gorkem
Copy link
Collaborator

gorkem commented Apr 19, 2021

published my branch #442

@evidolob
Copy link
Collaborator

evidolob commented Oct 7, 2021

#442 was merged

@evidolob evidolob closed this as completed Oct 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants