-
Notifications
You must be signed in to change notification settings - Fork 6
Validating your data
The XML schema for Parla-CLARIN is found in the Schema directory. The source TEI ODD schema was converted into several types of XML schemas, which can be used for falidation of the XML parliamentary documents:
-
parla-clarin.rng
: XML schema in RelaxNG XML syntax -
parla-clarin.rnc
: XML schema in RelaxNG compact syntax -
parla-clarin.xsd
: XML schema in W3C XML schema format (also uses two auxiliary files,dcr.tmp
andxml.tmp
) -
parla-clarin.dtd
: XML schema in the old DTD format (does not necessarily enforce all the constraints from the RelaxNG format)
Although any of the above schemas can be used to validate your XML documents, the one that best models the TEI ODD is the RelaxNG one.
For Unix users, xmllint
can be used for validation, although jing
is better for validating against RelaxNG. For Windows users, it is probably easiest to validate using the Oxygen XML editor for validation, although the command line xmllint
can also be used.
Here is how to use xmllint
for validation under Unix, assuming you are in the parla-clarin
directory, your file is Examples/Parla-CLARIN-Exemplar.xml
, and you want to use the W3C XML schema (.xsd):
$ xmllint --noout --schema Schema/parla-clarin.xsd Examples/Parla-CLARIN-Exemplar.xml
And here is how to do it if you want to use the RelaxNG schema (.rng):
$ xmllint --noout --relaxng Schema/parla-clarin.rng Examples/Parla-CLARIN-Exemplar.xml
Here is how to use jing
(which is much faster in validation than xmllint), which expects a RelaxNG schema:
$ jing Schema/parla-clarin.rng Examples/Parla-CLARIN-Exemplar.xml
If your corpus is composed form of a number of XML files, which are all XIncluded from the corpus root file, you need xmllint to (first) make the complete XML document by XIncluding the files. So, if your root file is Examples/siParl2.0/siParl.tei/siParl-sample.xml
, here is how you can validate it:
$ xmllint --noout --xinclude --relaxng Schema/parla-clarin.rng Examples/siParl2.0/siParl.tei/siParl-sample.xml
or
$ xmllint --xinclude Examples/siParl2.0/siParl.tei/siParl-sample.xml | jing Schema/parla-clarin.rng