Skip to content

Changing the XML schema

Tomaž Erjavec edited this page Jul 19, 2019 · 5 revisions

Changing the Parla-CLARIN XML schema

For those who would like to change the proposed Parla-CLARIN XML schema, either for their own use, or to contribute to its development, knowledge of how a TEI ODD schema is designed is needed: this is further explained in Chapters 22 Documentation Elements and 23 Using the TEI of the TEI Guidelines.

The official Parla-CLARIN ODD can be found, just as the derived XML schemas, in the Schema directory under the name of parla-clarin-odd.xml. Once the ODD has been modified, it needs to be converted to the XML schemas (formal specifications) and HTML (the documentation).

The conversion is performed using the TEI XSLT Stylesheets, available at GitHub. Note that it is assumed that they are cloned into the parla-clarin/bin/Stylesheets directory (which is, however, .gitignoreed). Note also that the bin/ directory contains a Makefile that implements the conversion to the schemas and HTML on a particular Linux enviroment.

To use the XSLT stylesheets you need the Saxon program, either as a stand alone Java program, or as part of the Oxygen XML editor.

Generating the schemas

Assuming you are in the parla-clarin/bin directory and the TEI Stylesheets have been cloned into the Stylesheets directory, then the conversion to the various XML schema formats is performed (in Unix) by convenience shell programs tha call the appropriate XSLT stylesheets:

$ Stylesheets/bin/teitorelaxng ../Schema/parla-clarin-odd.xml ../Schema/parla-clarin.rng
$ Stylesheets/bin/teitornc     ../Schema/parla-clarin-odd.xml ../Schema/parla-clarin.rnc
$ Stylesheets/bin/teitoxsd     ../Schema/parla-clarin-odd.xml ../Schema/parla-clarin.xsd
$ Stylesheets/bin/teitodtd     ../Schema/parla-clarin-odd.xml ../Schema/parla-clarin.dtd

Generating the documentation

The Parla-CLARIN TEI ODD also contains the documentation of the schema, which can be converted to HTML using the TEI Stylsheets. The Stylesheets allow for parametrisation of conversions by specifying a project-specific profile, which is in our case located in the bin/profile directory.

Assuming you are in the parla-clarin/bin directory and the TEI Stylesheets have been cloned into the Stylesheets directory, and that the output HTML is to be stored in the parla-clarin/docs/index.html file, then (on Linux) the conversion is invoked as follows:

$ Stylesheets/bin/teitohtml --profiledir=/absolutepath/para-clarin/bin --profile=profile --odd ../Schema/parla-clarin-odd.xml ../docs/index.html

The above means that we call the teitohtml script of the TEI Stylesheets (which then invokes the appropriate XSLT), that the profile directory is in the para-clarin/bin and is called, simply profile, that we are processing and odd document (rather than just any TEI document, as the ODD has some special characteristics, in particular that we want to generate the documentation of all the defined elements, classes, and attributes), and that the input is out Parla-CLARIN TEI ODD, and the output should go to para-clarin/docs/index.html (where GitHub finds it and displays it on https://clarin-eric.github.io/parla-clarin/).