Inconsistent annotations for LS numbers #464

rhdunn · 2023-10-28T14:20:20Z

Validation issues:

ERROR: Sentence answers-20111108024148AAO8oFI_ans-0010 token 12 -- invalid X form '1'
ERROR: Sentence email-enronsent24_01-0014 token 5 -- invalid X form '20'
ERROR: Sentence email-enronsent24_01-0057 token 4 -- invalid X form '20'
ERROR: Sentence email-enronsent24_01-0114 token 4 -- invalid X form '20'
ERROR: Sentence answers-20111108090913AAf83Jh_ans-0007 token 1 -- invalid X form '1'
ERROR: Sentence answers-20111108090913AAf83Jh_ans-0011 token 1 -- invalid X form '2'
ERROR: Sentence answers-20111108090913AAf83Jh_ans-0017 token 1 -- invalid X form '3'
ERROR: Sentence answers-20111108090913AAf83Jh_ans-0021 token 1 -- invalid X form '4'
ERROR: Sentence answers-20111108073322AA27tkh_ans-0012 token 2 -- invalid X form '2'

There are several issues here:

These should be NUM instead of X to be consistent with the other LS annotations.
They should be attached to the following sentence to be consistent with how the other LS+NUM tokens are grouped.
The LS tokens are missing NumType=Ord|NumForm=Digit features -- there may be other cases like this.

Note: I'm using NumType=Ord here instead of Card as these are ordered values -- first, second, third, etc. -- not counted values.

The text was updated successfully, but these errors were encountered:

rhdunn · 2023-10-28T14:24:52Z

Looking across the different treebanks, the EWT treebank is separating the (1)/i)/etc. into separate tokens, whereas GUM and GENTLE are keeping them as a single token.

They are also keeping multi-section list items grouped, such as in 2.1.. I don't think EWT has examples of that in its data set.

nschneid · 2023-10-28T14:43:06Z

These should be NUM instead of X to be consistent with the other LS annotations.

Thanks. A Grew-match query for these:

https://universal.grew.fr/?custom=653d1c67a7052

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent annotations for LS numbers #464

Inconsistent annotations for LS numbers #464

rhdunn commented Oct 28, 2023 •

edited

Loading

rhdunn commented Oct 28, 2023

nschneid commented Oct 28, 2023 •

edited

Loading

Inconsistent annotations for LS numbers #464

Inconsistent annotations for LS numbers #464

Comments

rhdunn commented Oct 28, 2023 • edited Loading

rhdunn commented Oct 28, 2023

nschneid commented Oct 28, 2023 • edited Loading

rhdunn commented Oct 28, 2023 •

edited

Loading

nschneid commented Oct 28, 2023 •

edited

Loading