Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"expression" lemmas #1

Open
martinpopel opened this issue Jun 9, 2017 · 2 comments
Open

"expression" lemmas #1

martinpopel opened this issue Jun 9, 2017 · 2 comments

Comments

@martinpopel
Copy link
Member

In UD_Latin-PROIEL v2.0, about 0.5% of words have an artificial lemma.
In train+dev, there are

  • 410 greek.expression
  • 149 expression
  • 138 calendar
  • 11 monetary
  • 9 calendar.expression
  • 3 monetary.expression

For example,

19      esse    sum     AUX     V-      Tense=Pres|VerbForm=Inf|Voice=Act       20      cop     _       ref=1.1.2
20      ἀδύνατον        greek.expression        X       F-      _       17      xcomp   _       ref=1.1.2
21      Curium  Curius  PROPN   Ne      Case=Acc|Gender=Masc|Number=Sing        22      obj:dir _       ref=1.1.2
22      tribuniciis     tribunicius     ADJ     A-      Case=Abl|Degree=Pos|Number=Plur 21      amod    _       ref=1.1.1
23      a       calendar        ADV     Df      _       21      amod    _       ref=1.1.1
24      d       expression      ADV     Df      _       23      flat    _       ref=1.1.1
25      xvi     xvi     ADV     Df      _       23      flat    _       ref=1.1.1
26      Kalend  Kalend  ADV     Df      _       23      flat    _       ref=1.1.1
27      Sextilis        Sextilis        ADV     Df      _       23      flat    _       ref=1.1.1
15      HS      monetary        ADV     Df      _       14      advmod  _       ref=1.6.1
16      CCCIↃↃↃX̅X̅X̅      expression      ADV     Df      _       15      flat    _       ref=1.6.1

The guidelines say that "The LEMMA field should not be used to encode features or other similar properties of the word (use FEATS and MISC instead; see format)."
Moreover, the word form should be uniquely defined by the lemma and FEATS (except for capitalization and other orthographic synonyms).
Thus I suggest

  • Keep the lemma equal to the form in these cases.
  • For foreign phrases, use the standard feature Foreign=Yes and if they span multiple words, use the flat deprel.
  • For calendar and monetary expressions, design a language-specific guidelines which are consistent with the universal guidelines. (I think no change is needed here except for fixing the lemmas).

I admit, I feel a bit uneasy with the suggestion to use flat structure for all foreign phrases because in case of UD_Latin-PROIEL, it would mean a loss of information. Currently, some Greek words are annotated with the "correct" dependencies, e.g.:

9       ignoscendum     ignosco VERB    V-      Case=Acc|Gender=Neut|Number=Sing|VerbForm=Gdv   3       ccomp   _       ref=1.1.4
10      esse    sum     AUX     V-      Tense=Pres|VerbForm=Inf|Voice=Act       9       cop     _       ref=1.1.4
11      ἐπεὶ    greek.expression        X       F-      _       9       advmod  _       ref=1.1.4
12      οὐχ     greek.expression        X       F-      _       14      flat:foreign    _       ref=1.1.4
13      ἱερήϊον greek.expression        X       F-      _       14      obj:dir _       ref=1.1.4
14      οὐδὲ    greek.expression        X       F-      _       11      advmod  _       ref=1.1.4
15      βοεΐην  greek.expression        X       F-      _       14      obj:dir _       ref=1.1.4

Feel free to open a ''universal" issue to discuss the cases when the foreign phrase is expected to be understood by the readers, so it is rather a code switching.
I think in such cases, we can keep the correct dependencies (and deprels) and just use Foreign=Yes.
However, the current UD_Latin-PROIEL is not consistent in this, as shown in the example above - it uses flat:foreign, but only for some words in the Greek phrases and goes against the guidelines which prescribe that "all subsequent words in the expression are attached to the first one".

@daghaug
Copy link
Contributor

daghaug commented Jun 9, 2017 via email

@nschneid
Copy link

nschneid commented Jun 9, 2017

Regarding conventions for date and value expressions, see UniversalDependencies/docs#455

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants