Skip to content
Alexander Koller edited this page May 18, 2023 · 9 revisions

You can use the classes of the package de.up.ling.irtg.codec to read and write various objects from and to a variety of file formats. An input codec will read a string representation of some object from a file (or some other input stream) and return the object, whereas an output codec will encode an object as a string representation and write it to a file (or some other output stream).

You can convert an entire corpus from one codec format to another using the CodecConverter script.

You can add your own input and output codecs to Alto by extending the abstract base classes InputCodec and OutputCodec, respectively, and putting your classes on the classpath.

For reference, here are the codecs that are currently defined in Alto. Each line lists the class defining the codec, the kind of object that this codec will encode or decode, and a short description.

Input codecs

Codec class Object type Description
IrtgInputCodec InterpretedTreeAutomaton Standard input codec for IRTGs
PcfgIrtgInputCodec InterpretedTreeAutomaton Reads a PCFG as an IRTG
NltkPcfgInputCodec InterpretedTreeAutomaton Reads a PCFG in NLTK format as an IRTG
BolinasHrgInputCodec InterpretedTreeAutomaton Reads HRG grammars for the Bolinas parser as IRTGs
TemplateIrtgInputCodec TemplateInterpretedTreeAutomaton Template IRTG
TreeAutomatonInputCodec TreeAutomaton Standard input codec for tree automata
TiburonTreeAutomatonInputCodec TreeAutomaton Reads a tree automaton in Tiburon format
BottomUpTreeAutomatonInputCodec TreeAutomaton Reads bottom-up tree automata (Hanneforth style)
IsiAmrInputCodec SGraph Reads graphs in the ISI AMR-Bank format
PtbTreeInputCodec Tree Reads trees in Penn Treebank format

Output codecs

There are fewer output codecs than input codecs, because many classes (such as InterpretedTreeAutomaton and TreeAutomaton) simply have toString methods that Alto calls to generate string representations for such objects. Furthermore, most input codecs convert grammars of various formalisms into IRTGs, and this is much easier than converting IRTGs back into the other formalisms. The output codecs below are mostly useful when several useful string representations are available for the same class of objects.

Note that you can right-click on any value in the Alto GUI (i.e., the contents of the "value" panel in the derivation view) to display a context menu which lets you copy a string representation of that value to the clipboard. The context menu will contain all output codecs that are suitable for the type of that value.

Codec class Object type Description
BolinasGraphOutputCodec SGraph Writes graphs in a format that Bolinas can read
SgraphAmrOutputCodec SGraph Writes graphs in the ISI AMR-Bank format
TikzSgraphOutputCodec SGraph Encodes graphs as Latex graph-drawing code
TikzQtreeOutputCodec Tree Encodes trees as Latex tree-drawing code
Clone this wiki locally