Skip to content

Validating your data

Tomaž Erjavec edited this page Sep 14, 2020 · 7 revisions

Validating your data with the Parla-CLARIN XML schema

The XML schema for Parla-CLARIN is found in the Schema directory. The source TEI ODD schema was converted into several types of XML schemas, which can be used for falidation of the XML parliamentary documents:

  • parla-clarin.rng: XML schema in RelaxNG XML syntax
  • parla-clarin.rnc: XML schema in RelaxNG compact syntax
  • parla-clarin.xsd: XML schema in W3C XML schema format (also uses two auxiliary files, dcr.tmp and xml.tmp)
  • parla-clarin.dtd: XML schema in the old DTD format (does not necessarily enforce all the constraints from the RelaxNG format)

Although any of the above schemas can be used to validate your XML documents, the one that best models the TEI ODD is the RelaxNG one.

For Unix users, xmllint can be used for validation, although jing is better for validating against RelaxNG. For Windows users, it is probably easiest to validate using the Oxygen XML editor for validation, although the command line xmllint can also be used.

Here is how to use xmllint for validation under Unix, assuming you are in the parla-clarin directory, your file is Examples/Parla-CLARIN-Exemplar.xml, and you want to use the W3C XML schema (.xsd):

$ xmllint --noout --schema Schema/parla-clarin.xsd Examples/Parla-CLARIN-Exemplar.xml

And here is how to do it if you want to use the RelaxNG schema (.rng):

$ xmllint --noout --relaxng Schema/parla-clarin.rng Examples/Parla-CLARIN-Exemplar.xml

Here is how to use jing (which is much faster in validation than xmllint), which expects a RelaxNG schema:

$ jing Schema/parla-clarin.rng Examples/Parla-CLARIN-Exemplar.xml

If your corpus is composed form of a number of XML files, which are all XIncluded from the corpus root file, you need xmllint to (first) make the complete XML document by XIncluding the files. So, if your root file is Examples/siParl2.0/siParl.tei/siParl-sample.xml, here is how you can validate it:

$ xmllint --noout --xinclude --relaxng Schema/parla-clarin.rng Examples/siParl2.0/siParl.tei/siParl-sample.xml

or

$ xmllint --xinclude Examples/siParl2.0/siParl.tei/siParl-sample.xml | jing Schema/parla-clarin.rng