Parser doesn't support whitespace before prolog #248

elsassph · 2020-10-24T21:53:37Z

The parser fails completely, and without errors, when there is whitespace before the XML prolog.
I need to preserve positions so I can't trim those spaces; it should be possible to allow them.

{empty lines here}
<?xml version="1.0" encoding="utf-8" ?>
<component>
</component>

bd82 · 2020-10-25T08:57:01Z

As far as I can tell leading whitespace is not allowed before the prolog according to the specifications.

https://www.w3.org/TR/xml/#sec-well-formed (see Document and Prolog rules).

elsassph · 2020-10-25T11:48:50Z

That's fair but it's tolerated by many readers/engines consuming XML.

bd82 · 2020-10-27T13:41:06Z

Related issues:

I think it would make sense to only show a syntax error in that case but be fault tolerant, meaning
successfully parse the all the document and construct a valid CST.

This may depend on: Chevrotain/chevrotain#646
Although a workaround without a change to Chevrotain may be possible.

elsassph · 2020-10-27T15:54:47Z

I'd suggest that at least recovering from the error would be nice.

bd82 · 2020-10-27T17:22:52Z

I'd suggest that at least recovering from the error would be nice.

A full productive solution in this library would also have to deal with creating the syntax error and perhaps even making
this grammar variation optional.

However, you may be able to accomplish this yourself by extending the XMLParser class and overriding the document rule to allow
consuming multiple misc rules before the prolog section.

https://github.com/SAP/xml-tools/blob/master/packages/parser/lib/parser.js#L17-L19

However, Grammar inheritance is a little tricky with Chevrotain, so I'm not sure you can use it without first making some small changes to the XMLParser itself.

See: https://github.com/SAP/chevrotain/blob/master/examples/parser/inheritance/inheritance.js

Another option could be to create a subclass with a new grammar rule entry point documentWithMiscPrefix

    $.RULE("documentWithMiscPrefix", () => {
      $.MANY(() => {
        $.SUBRULE($.misc);
      });

     // continue parsing using the original logic
     $.SUBRULE($.document);

This may bypass some of the grammar inheritance issues, but I am not 100% sure.

To make this XMLParser easily extensible you would need to enable deferring the performSelfAnalysis call at the end of the constructor.

e.g by an optional constructor flag argument.

Perhaps this is the most time efficient solution, As I personally don't know when I will get around to implementing a "proper" solution to this issue, however you could pretty easily contribute a PR making the XMLParser more extensible and then
You can implement whichever partial workaround in your own code base.

elsassph · 2020-10-27T21:59:00Z

Thanks for the pointers. Though I'd love to contribute, this is a bit over our bandwidth right now; this is for another OSS project where switching to this parser is already a major effort 😄 We'll see if it harms our users in practice.

bd82 added the help wanted Extra attention is needed label Oct 27, 2020

Sec-ant mentioned this issue Nov 27, 2023

Add new languages Sec-ant/prettier-plugin-embed#40

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parser doesn't support whitespace before prolog #248

Parser doesn't support whitespace before prolog #248

elsassph commented Oct 24, 2020

bd82 commented Oct 25, 2020

elsassph commented Oct 25, 2020

bd82 commented Oct 27, 2020

elsassph commented Oct 27, 2020

bd82 commented Oct 27, 2020

elsassph commented Oct 27, 2020

Parser doesn't support whitespace before prolog #248

Parser doesn't support whitespace before prolog #248

Comments

elsassph commented Oct 24, 2020

bd82 commented Oct 25, 2020

elsassph commented Oct 25, 2020

bd82 commented Oct 27, 2020

elsassph commented Oct 27, 2020

bd82 commented Oct 27, 2020

elsassph commented Oct 27, 2020