Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

converging UD_Russian and UD_Russian-SynTagRus annotation #10

Open
olesar opened this issue May 28, 2017 · 11 comments
Open

converging UD_Russian and UD_Russian-SynTagRus annotation #10

olesar opened this issue May 28, 2017 · 11 comments
Assignees

Comments

@olesar
Copy link
Contributor

olesar commented May 28, 2017

  1. Compound numerals (incl. cx with тысяча, миллион).
    Cases like "сорок пять" should be annotated as сорок >flat пять according to http://universaldependencies.org/u/dep/flat.html.
    In UD2.0 files: ru: сорок >compound пять, сорок <nummod пять
    ru-syntagrus: сорок <nummod:gov пять
    "Universal" approach is somewhat problematic since in двадцать один, двадцать два, двадцать три, двадцать четыре the last numeral predicts the case of the noun (cf. nummod:gov), so we will have different tags on the first numeral word depending what its dependent is.
    ::::: 1--4: the rules seem to be all right, but some overgeneralization happens
@olesar
Copy link
Contributor Author

olesar commented May 28, 2017

  1. There are cases like два девяносто (standing for 'two (roubles) 90 (kopecks)' and три двадцать (standing for 'three (hours) 20 min'). Need attention.

@olesar
Copy link
Contributor Author

olesar commented May 28, 2017

  1. NUM + NUM.Gen: меньше пяти, больше пяти.
    In UD2.0 files: ru:
    6 более БОЛЕЕ ADV RBR Degree=Cmp 8 advmod _ _
    7 двух ДВА NUM CD Animacy=Inan|Case=Gen|Gender=Fem 8 compound _ _
    8 тысяч ТЫСЯЧА NOUN NN Animacy=Inan|Case=Gen|Gender=Fem|Number=Plur 9 nummod:gov _ _
    9 человек ЧЕЛОВЕК NOUN NN Animacy=Anim|Case=Gen|Gender=Masc|Number=Plur 5 nsubj _ SpaceAfter=No

ru-syntagrus:
5 более более ADV _ Degree=Cmp 7 nummod:gov 7:nummod:gov _
6 пяти пять NUM _ Case=Gen 7 nummod 7:nummod _
7 лет год NOUN _ Animacy=Inan|Case=Gen|Gender=Masc|Number=Plur 4 obl 4:obl SpaceAfter=No

1 Более более ADV _ Degree=Cmp 4 nsubj 4:nsubj _
2 двух два NUM _ Case=Gen 3 nummod 3:nummod _
3 месяцев месяц NOUN _ Animacy=Inan|Case=Gen|Gender=Masc|Number=Plur 1 nmod 1:nmod _
4 прошло проходить VERB _ Aspect=Perf|Gender=Neut|Mood=Ind|Number=Sing|Tense=Past|VerbForm=Fin|Voice=Act 0 root 0:root _

21 больше много NUM _ _ 23 nummod:gov 23:nummod:gov _
22 300 300 NUM _ _ 23 nummod 23:nummod _
23 заявок заявка NOUN _ Animacy=Inan|Case=Gen|Gender=Fem|Number=Plur 20 obl 20:obl _
(NB different pos)

@olesar
Copy link
Contributor Author

olesar commented May 28, 2017

более/больше/менее/меньше should be linked to the numeral head, cf. террористов там было не более двух.

@olesar
Copy link
Contributor Author

olesar commented May 28, 2017

  1. Compound ordinal numerals like сорок пятый.
    Pose a problem as well since the last word agrees with the noun head.
    In UD2.0 files: ru: NA
    ru-syntagrus:
    12 сорок сорок NUM _ Case=Nom 13 nummod:gov 13:nummod:gov _
    13 второго второй ADJ _ Case=Gen|Degree=Pos|Gender=Masc|Number=Sing 14 amod 14:amod _
    14 года год NOUN _ Animacy=Inan|Case=Gen|Gender=Masc|Number=Sing 11 nmod 11:nmod _

@dan-zeman
Copy link
Member

I think that the ordinals are compounds:

compound(второго, сорок)
amod(года, второго)

@dan-zeman
Copy link
Member

Related to numerals is UniversalDependencies/docs#455.

@olesar
Copy link
Contributor Author

olesar commented Jun 1, 2017

nmod (dep?) depending on ADJ or ADV --> obl

@martinpopel
Copy link
Member

martinpopel commented Jun 1, 2017

If the ADJ or ADV is a head of copula construction then you are right: such ADJ|ADV should not have nmod children, but obl.
In the remaining cases, we should be careful: the ADJ could be a head of a noun phrase with elided noun and then nmod child is correct.

BTW: This is exactly the case when the nmod vs. obl distinction is needed because it cannot be reconstructed fully automatically (at least not easily).

@olesar
Copy link
Contributor Author

olesar commented Jun 1, 2017

acl with participles (single participles vs. prtcp group), advcl vs. acl. Need attention.

@olesar
Copy link
Contributor Author

olesar commented Jun 1, 2017

discourse/parataxis is tagged differently in two treebanks
==> Cross-Check Task, scheduled March 2018
==> Olga makes two lists

@olesar
Copy link
Contributor Author

olesar commented Jun 1, 2017

vocative: check parataxis & NOUN & Animacy=Anim in ru-SynTagRus

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants