-
Notifications
You must be signed in to change notification settings - Fork 4
SrgFreeling
This page describes the interface between the Spanish Resource Grammar and the morphophonological analyzer Freeling that it relies on.
See e.g. Bender and Good 2005 and also some discussion here: https://delphinqa.ling.washington.edu/t/reparsing-and-updating-a-treebank-keeping-previous-decisions/873/8
Freeling is a complex tool with lots of functionalities. SRG relies on it so that it can only keep a single form of a word in the lexicon (rather than all of its inflected forms).
dormir_v := v_-_native_le &
[ STEM < "dormir" >,
SYNSEM.LKEYS.KEYREL.PRED "_dormir_v_rel" ].
In contrast, the form duerme (3rd person singular, present tense, indicative mood) will not be found in the lexicon.
Instead, the grammar has a lexical rule:
vmip3s0 :=
%suffix (vmip3s vmip3s)
pres-ind_ilr &
[ SYNSEM.LOCAL [ CAT.HEAD.AUX -,
AGR.PNG.PN 3sg ] ].
The above lexical rule is not associated with any orthographic change. This may be confusing because we usually talk about inflectional lexical rule in DELPH-IN meaning an orthographic change. But that is in the absence of an external morphophonological analyzer. In this case, Freeling provides an analysis for a given word form, and so the grammar must not provide any further orthographic changes on top of what was already analyzed by Freeling. Below you can see Freeling's output for the Spanish sentence El gato duerme (The cat sleeps).
el gato duerme.
el el DA0MS0 1
gato gato NCMS000 1
duerme dormir VMIP3S0 0.989241
. . Fp 1
The above output was obtained using the Freeling's own tool, the analyze
binary which will be installed on your computer if you install Freeling 4.1. This is not what is used by the SRG. SRG uses the Freeling python API which can be found also in the location to which Freeling was installed (such as \usr\share\freeling
). The API includes a file called pyfreeling_api.py
and _pyfreeling.so
as well as a sample program sample.py
which gives a few example of how to use it. Important: This API is not available through pypi, and misleadingly, there is a package named pyfreeling
which you can install via pypi and import, and you don't want that one.
The goal is to map Freeling output to YY input format, which the ACE parser can process (scroll to the right to see the whole line):
(42, 0, 1, <0:2>, 1, "mi" "mi", 0, "dp1css") (43, 1, 2, <4:8>, 1, "perro" "perro", 0, "ncms000") (44, 2, 3, <9:15>, 1, "dormir" "duerme", 0, "vmip3s0")
In order to work with the above input, ACE should be called with the -y --yy-rules
option. What happens then is, ACE can find the lemma _dormir in lexicon.tdl
even though what it is getting as input is duerme and furthermore, it will instantiate a lexical rule instance VMIP3S0 and include it in the chain.
As a result, the lexical chart will contain edges for the lexical entry associated with the verb dormir and for the appropriate lexical rule which will provide the person, number, and tense information. These edges should then be successfully combined into something the parser will be able to use for the subsequent syntactic parsing stage.
The interfacing between Freeling and the grammar is done by several python modules under the folder util/
. util/populate_tokens.py
can be given a folder of tsdb profiles. It will call Freeling API and populate the i-tokens
field of each item
file with (hopefully) appropriate YY-input. ACE can then be called using the pydelphing library so as to select the i-tokens
field for parsing:
delphin process --options="-y --yy-rules" -g ~/delphin/srg/ace/srg.dat --full-forest --select i-tokens path-to-test-suite
In some cases, Freeling output can be overridden. In the old version of the SRG, this was done with the file sppp.dat
and a C++ program which acted as an interface between that file, Freeling, and the SRG.
In the current version, this is done with the files: freeling_api/srg-freeling.dat
, util/override_freeling.py
, util/parse_sppp_dat.py
, srg_freeling2yy.py`.
-
Check SRG Github issues
-
Missing Freeling tag: Freeling provides a tag that is not in
inflr.tdl
. A lexical rule edge is not instanciated -> no parse. Solution: Assuming the tag is generally useful, a new lexical rule can be added toinflr.tdl
. If the tag seems to essentially double another tag, it can instead be added to the TAGS dictionary inutil/override_freeling.py
. -
A lexical rule supertype inherits from
basic-lex-rule
. Basic-lex-rule does not implement token mapping and the lexical rule will not be instantiated for the given token. Solution: Have the specific lexical rule (as it appears in Freeling's output) inherit fromtmt-lex-rule
instead. TMT-lex-rule is the same asbasic-lex-rule
but respects token mapping. -
Nonsensical Freeling output (wrong tags): This may happen if there is a typo in the sentence. Freeling is statistical, so it will try to output something even if the probability of a tag sequence is low.
-
Sequence of tags: Freeling sometimes outputs multiple tags for one word form. This can be desirable (as in the case with clitics) but it is possible that not all required mappings or postprocessing routines were implemented in the SRG-to-Freeling interface. Each such case should be investigated separately.
Home | Forum | Discussions | Events