Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser doesn't support whitespace before prolog #248

Open
elsassph opened this issue Oct 24, 2020 · 6 comments
Open

Parser doesn't support whitespace before prolog #248

elsassph opened this issue Oct 24, 2020 · 6 comments
Labels
help wanted Extra attention is needed

Comments

@elsassph
Copy link

The parser fails completely, and without errors, when there is whitespace before the XML prolog.
I need to preserve positions so I can't trim those spaces; it should be possible to allow them.

{empty lines here}
<?xml version="1.0" encoding="utf-8" ?>
<component>
</component>
@bd82
Copy link
Member

bd82 commented Oct 25, 2020

As far as I can tell leading whitespace is not allowed before the prolog according to the specifications.

@elsassph
Copy link
Author

That's fair but it's tolerated by many readers/engines consuming XML.

@bd82
Copy link
Member

bd82 commented Oct 27, 2020

Related issues:

I think it would make sense to only show a syntax error in that case but be fault tolerant, meaning
successfully parse the all the document and construct a valid CST.

This may depend on: Chevrotain/chevrotain#646
Although a workaround without a change to Chevrotain may be possible.

@bd82 bd82 added the help wanted Extra attention is needed label Oct 27, 2020
@elsassph
Copy link
Author

I'd suggest that at least recovering from the error would be nice.

@bd82
Copy link
Member

bd82 commented Oct 27, 2020

I'd suggest that at least recovering from the error would be nice.

A full productive solution in this library would also have to deal with creating the syntax error and perhaps even making
this grammar variation optional.

However, you may be able to accomplish this yourself by extending the XMLParser class and overriding the document rule to allow
consuming multiple misc rules before the prolog section.

However, Grammar inheritance is a little tricky with Chevrotain, so I'm not sure you can use it without first making some small changes to the XMLParser itself.

Another option could be to create a subclass with a new grammar rule entry point documentWithMiscPrefix

    $.RULE("documentWithMiscPrefix", () => {
      $.MANY(() => {
        $.SUBRULE($.misc);
      });

     // continue parsing using the original logic
     $.SUBRULE($.document);

This may bypass some of the grammar inheritance issues, but I am not 100% sure.

To make this XMLParser easily extensible you would need to enable deferring the performSelfAnalysis call at the end of the constructor.

  • e.g by an optional constructor flag argument.

Perhaps this is the most time efficient solution, As I personally don't know when I will get around to implementing a "proper" solution to this issue, however you could pretty easily contribute a PR making the XMLParser more extensible and then
You can implement whichever partial workaround in your own code base.

@elsassph
Copy link
Author

Thanks for the pointers. Though I'd love to contribute, this is a bit over our bandwidth right now; this is for another OSS project where switching to this parser is already a major effort 😄 We'll see if it harms our users in practice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants