Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate files with RelaxNG schemas using MSV #1294

Closed
wants to merge 15 commits into from

Conversation

datho7561
Copy link
Contributor

@datho7561 datho7561 commented Sep 16, 2022

This experimental PR allows validating XML documents against *.rng schemas, by associating the XML with the schema using <?xml-model href="..."?>.

Features:

  • Automatically associates *.rng files with the relaxng.xsd schema
  • Marks the reference to the schema as an error when the schema is missing or broken

Limitations:

  • Many of the error ranges and error messages generated by MSV are displayed as-is
  • This PR is missing unit tests
  • From my understanding, MSV doesn't support RelaxNG compact syntax (*.rnc)
  • MSV will need a CQ before this PR can be merged

Closes #828

Signed-off-by: David Thompson <[email protected]>
Signed-off-by: David Thompson <[email protected]>
 * Put it on the xml-model tag

Signed-off-by: David Thompson <[email protected]>
 * kind of broken, since XSD thinks it can try to validate the document

Signed-off-by: David Thompson <[email protected]>
Signed-off-by: David Thompson <[email protected]>
@rgrunber
Copy link
Contributor

The 2022.7 release is recent, but the last release before that was https://clearlydefined.io/definitions/maven/mavencentral/net.java.dev.msv/msv-core/2013.6.1 which has a Licensed score of 60, which is acceptable according to the project guidelines. It should be possible to use it once it is reviewed.

Signed-off-by: David Thompson <[email protected]>
@datho7561
Copy link
Contributor Author

@svanteschubert I did an experiment about adding RelaxNG validation support into lemminx using MSV. We're still trying to decide if MSV or jing is a better fit for lemminx. If you have time, do you think you could try it out and let me know what you think of it?

@svanteschubert
Copy link

svanteschubert commented Sep 21, 2022

@datho7561 Hi David, good work! I have skimmed through your pull request and although there might be still a devil in the detail, it does look good to me!

Some comments to MSV. Me and Michael Stahl are taking now care of MSV for a while. I did the MSV release recently and will very likely do further in the future (as long I am working with MSV of course), but I do like MSV's idea of abstracting the grammar XML details.

The first release was a bit painful as the last Oracle release 2013 was neglecting new features of the Red Hat release in 2011, like the default attribute values. The RedHat release was based on the fork from the former Code Owner Kohsuke Kawaguchi (KK) at Oracle, as Oracle has vanished the original sources, which unfortunately missed some copyright headers.
In 2013 Oracle finally/suddenly did their own release on Maven based on the original sources from Sun with new Source code uploaded including correct copyright headers for MSV-Core and some others (for details see https://github.com/svanteschubert/msv-merge-project).
Unfortunately, as Oracle based their release on their original sources, without taking the updates of RedHat from 2011 into account (also available as zipped sources on Maven Repos, but also identical with KK sources, they neglected the features & fixes.
I reached out for Oracle without success so far, perhaps you have connections?

Long story short, I merged them manually and reviewed it very time consumedly.
In addition, I activated all sub-projects to build correctly with Maven and turned their docu to markdown at GitHub pages.

On the other hand, I would be curious about the possible advantages of Jing and an in-depth comparison, but as I stated earlier the idea of an abstraction API for all XML grammars is a big plus for me and we are using MSV for the validation of ODF in the ODF Validator.

Michael Stahl and I as co-editor in the OASIS ODF TC will for now co-maintain MSV and it was suggested that the TDF (The Document Foundation) takes over MSV from xmlark to have a more prominent place with a mailing list.
KK did not yet "donated"/moved his GitHub to us, but we were just not aware of that GitHub feature and will follow up on that

Personally, I would like to (likely will) add some high-level regression tests to MSV like loading some big ODF RNG grammar and UN/CEFACT & UBL XSD grammars and dumping the run-time model (or add some save functionality.
That is one of the biggest drawbacks of the old validators, they were only meant for the use case of validation and not to use the XML model for code generation (like we are doing) or to load/edit and save the grammar using MSV as the run-time model.
For instance, the prefix is thrown away as not required for validation, but is required for code generation or later saving round-trip.

In addition, MSV does not really abstract full featured XSD correctly. I heard about XSD test suites and would be curious how and if is MSV could be extended to become as well a full featured XSD validator - but more curiosity and not a business case already existing that could pay it, yet. Let's see. Good luck with your further work! Let's keep in touch!

@datho7561
Copy link
Contributor Author

... although there might be still a devil in the detail, it does look good to me!

Good to hear. Thank you!

... I do like MSV's idea of abstracting the grammar XML details.

This is a neat choice. I'm a little concerned that this choice might make some things we want to implement, like completion, a bit more complex. However, it would be neat to reuse the same code for the different schemas.

... I reached out for Oracle without success so far, perhaps you have connections?

I don't know, I'd have to ask around.

Good luck with your further work! Let's keep in touch!

Thank you! You too!

@datho7561
Copy link
Contributor Author

@angelozerr I did some testing, and RelaxNG validation appears to be working in the binary (there are no exceptions being thrown in the logs and the errors appear correctly)

@datho7561
Copy link
Contributor Author

Here is some example code to play with:

myBook.xml:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-model href="./myBookSchema.rng" type="application/xml"?>
<book>
    <page number="1">
        <paragraph>
            <word>
                <letter>
                    e
                </letter>
            </word>
        </paragraph>
    </page>
</book>

myBookSchema.rng:

<?xml version="1.0" encoding="UTF-8"?>
<grammar xmlns="http://relaxng.org/ns/structure/1.0">

    <start>
        <element name="book">
            <oneOrMore>
                <ref name="pageType"></ref>
            </oneOrMore>
        </element>
    </start>

    <define name="letterType">
        <element name="letter">
            <text></text>
        </element>
    </define>

    <define name="wordType">
        <element name="word">
            <oneOrMore>
                <ref name="letterType"></ref>
            </oneOrMore>
        </element>
    </define>

    <define name="paraType">
        <element name="paragraph">
            <oneOrMore>
                <ref name="wordType"></ref>
            </oneOrMore>
        </element>
    </define>

    <define name="pageType">
        <element name="page">
            <attribute name="number">
                <text/>
            </attribute>
            <oneOrMore>
                <ref name="paraType"></ref>
            </oneOrMore>
        </element>
    </define>

</grammar>

@angelozerr
Copy link
Contributor

Thanks so much @datho7561 for investigating RelaxNG with MSV, but unfortunately we will use Jing which provide better result in term of validation.

See the current PR #841

@angelozerr angelozerr closed this Oct 12, 2022
@svanteschubert
Copy link

we will use Jing which provide better result in term of validation.

@angelozerr Out of curiosity, as I have not dived into Jing, yet, but took over MSV, what better validation result are you referring to?

@angelozerr
Copy link
Contributor

@svanteschubert sorry I have not explained why Jing is better than MSV. Here the advantage of Jing:

  • Jing is used by Oxygen
  • Jing has been contributed by Oxygen
  • Jing has been created by James Clarkon one of author of RelaxNG
  • Jing support multi error ranges without be polluting with so many BadText.
  • Jing support relaxng compact syntax too validation.

@svanteschubert
Copy link

Strange, I assumed RNC would work, but a quick test failed indeed.
As your PR is fresh, do you have an API reference to the error ranges of Jing at hand?
I will check all this in more detail after my autumn vacation. :-)

Thanks in advance, looking forward to using your RNG features with Lemminx! :-)

Keep up the good work!
Svante

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Initialize RelaxNG support with validation/completion/hover
4 participants