Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement Polarity=Neg #526

Closed
nschneid opened this issue May 5, 2024 · 25 comments
Closed

implement Polarity=Neg #526

nschneid opened this issue May 5, 2024 · 25 comments
Labels

Comments

@nschneid
Copy link
Contributor

nschneid commented May 5, 2024

https://universaldependencies.org/u/feat/Polarity.html (no English-specific documentation though)

Looking at the comparison of English treebanks, GUM/GENTLE/GUMReddit cover both prefixes and function words, while some of the others only cover function words.

@nschneid
Copy link
Contributor Author

nschneid commented May 5, 2024

GUM appears to implement this by rule.

Breakdown of results:

Some false positives:

  • universal
  • underwater, underway
  • undue? I wouldn't analyze this as morphologically complex though historically it probably was
  • dismounted, disbanded, discomfort, disentangle: Maybe "discomfort" fits, but I'm not sure the others are really negative polarity; the dis- prefix is more about reversal. I think I'd just remove Polarity on all the verbs. This also covers "undo", "uncover", etc. (The guidelines even say "In English, verbs are negated using the particle not".)

@AngledLuffa
Copy link
Contributor

Do we like this feature with more Polarity in the treebanks? PUD has nowhere_ADV, no_DET, nothing_NOUN, non-fiction_NOUN, non-white_ADJ, nobody_NOUN, and a whole bunch of un- such as unprecedented_ADJ, unbelievable_ADJ, unable_ADJ, ...

@nschneid
Copy link
Contributor Author

nschneid commented May 6, 2024

@AngledLuffa Sorry I don't understand the question

@AngledLuffa
Copy link
Contributor

are we adding Polarity to more words in more treebanks? in PUD, for example, should I add Polarity=Neg to the examples I listed? (which are all Neg in GUM)

@nschneid
Copy link
Contributor Author

nschneid commented May 6, 2024

I guess it would be nice, since the guidelines for Polarity do mention that it can apply in a language to a mixture of bound affixes and function words.

I wonder if "without" should also be included (as the negation of "with").

@AngledLuffa
Copy link
Contributor

from GUM

8       Miyako  Miyako  X       FW      Foreign=Yes     10      compound        10:compound     Entity=(79-organization-giv:act-cf3-3-appos|XML=<hi rend:::"italic"><foreign xml:lang:::"jap">
9       no      no      X       FW      Foreign=Yes|Polarity=Neg        10      compound        10:compound     _
10      Hana    Hana    X       FW      Foreign=Yes     7       appos   7:appos Entity=79)|XML=</foreign></hi>

@AngledLuffa
Copy link
Contributor

unfortunately?

unborn?

uncovering_VERB? not this one?

unassuming?

unearthed? i vote not this one either

undisputed? this looks weird

8       Spanish Spanish ADJ     JJ      Degree=Pos      9       amod    9:amod  Proper=True
9       dominance       dominance       NOUN    NN      Number=Sing     13      nsubj   13:nsubj|14:nsubj:xsubj _
10      in      in      ADP     IN      _       12      case    12:case _
11      the     the     DET     DT      Definite=Def|PronType=Art       12      det     12:det  _
12      region  region  NOUN    NN      Number=Sing     9       nmod    9:nmod:in       _
13      remained        remain  VERB    VBD     Mood=Ind|Tense=Past|VerbForm=Fin        0       root    0:root  _
14      undisputed      undisputed      ADJ     JJ      Degree=Pos|Polarity=Neg 13      xcomp   13:xcomp        SpaceAfter=No
15      .       .       PUNCT   .       _       13      punct   13:punct        _

unemployment? probably not

underground? could break it up into un-der-ground

unearth

@AngledLuffa
Copy link
Contributor

double check the linked change please?

@nschneid
Copy link
Contributor Author

nschneid commented May 6, 2024

As I suggested above (but waiting for @amir-zeldes's input), I would not use Polarity on any VERBs, and would omit the under- ones.

I don't have a problem with unborn, unfortunately, unassuming, undisputed, unemployment as their meaning is more or less 'not'/'no' + stem.

@AngledLuffa
Copy link
Contributor

My proposed change to PUD doesn't have any under- or any VERB

@amir-zeldes
Copy link
Contributor

universal ... underwater, underway

Thanks for catching, will fix.

I would not use Polarity on any VERBs

I'm open to this, but would like to understand the reasoning better. Etymologically, verbal "un" is indeed distinct, but in the context of synchronic English, I don't see a big difference between "uncovered" as a participle and "uncover" as a verb, which means 'make not covered'. Are we sure VERBs can't be negative in that sense? Do we remove the feature on uncovered/VBN/VERB but keep it if we have the same structure on "unbaked", because there is no lemma "unbake" and we are forced to tag JJ/ADJ?

Miyako no hana

Nice catch! 🤣

@nschneid
Copy link
Contributor Author

nschneid commented May 6, 2024

For a dynamic event, I would expect Polarity=Neg to signal that it didn't happen. Of course many languages have verbal morphology for this. For derivational morphemes on verbs that signal reversal of a state or causing a negative outcome etc., it may make sense to have some feature, but I would not expect it to be called Polarity=Neg, especially if such morphemes are limited in their lexical or semantic productivity (it cannot unrain, for example).

"uncovered" as an adjective can mean 'not covered' (negation of adjective), or a participle derived from 'uncover' can mean 'uncovering has happened'. It's a structural ambiguity of the morphology: un[[cover]ed] vs. [un[cover]]ed. So I would not assume the two uses of "un-", one of which takes a verb stem and one of which takes an adjective stem, are the same in terms of morphological features.

@amir-zeldes
Copy link
Contributor

I'm not sure about this from a typological perspective...

For a dynamic event, I would expect Polarity=Neg to signal that it didn't happen

Negations can have all sorts of aspects - "not" and "never" both negate verbs and are Polarity=Neg, but "never" means something more than just "X doesn't happen". Similarly if you have a modal, you get various scope readings on the verbal complex (must [not happen] vs. [must not] happen - the default reading is flipped for German <> English, which can lead to hilarious misunderstandings). I thought "Neg" just means "some kind of negation", which means different things for "no one", "nowhere", "not"...

if such morphemes are limited in their lexical or semantic productivity (it cannot unrain, for example)

It's not unusual for negative morphology to be unproductive or limited. In some languages it's limited just to a special form of the copula (e.g. Church Slavonic), or just to existence predicates (Arabic/Hebrew/Coptic lexical negative existentials). I don't think that's a criterion for what carries this feature.

It's a structural ambiguity of the morphology: un[[cover]ed] vs. [un[cover]]ed. So I would not assume the two uses of "un-", one of which takes a verb stem and one of which takes an adjective stem, are the same in terms of morphological features

Yes, I agree it's ambiguous and not the same, but I'm not sure that means that one of them is not Polarity=Neg. I guess the question is how broad we want Polarity to be? I could certainly imagine someone would be interested to know which verbs do this, and would discover for example that "rain" doesn't do it in English. But without the annotation, that's not exposed to the user. Maybe something to discuss with the group? I'm not 100% for including it on these verbs, just feeling a bit uncertain about throwing something out which seems non-arbitrary/meaningful.

@nschneid
Copy link
Contributor Author

nschneid commented May 7, 2024

It's not unusual for negative morphology to be unproductive or limited.

Limited to certain grammatical constructions is one thing. I'm expressing skepticism about the ones limited based on the lexical semantics.

To put it another way: if you asked me to negate the verb "covering", I would say "not covering". "Uncovering" is contrary to covering in the same way that "opening" is contrary to "closing". That's not regular negation, even if it can be expressed morphologically for certain verbs. "Uncovering" can be negated too: "not uncovering".

Do we know of other languages using Polarity=Neg in a very broad way? If you think more input is necessary I suggest opening an issue in the docs repo.

@rueter
Copy link

rueter commented May 8, 2024

I'm not sure about this from a typological perspective...

For a dynamic event, I would expect Polarity=Neg to signal that it didn't happen

Negations can have all sorts of aspects - "not" and "never" both negate verbs and are Polarity=Neg, but "never" means something more than just "X doesn't happen". Similarly if you have a modal, you get various scope readings on the verbal complex (must [not happen] vs. [must not] happen - the default reading is flipped for German <> English, which can lead to hilarious misunderstandings). I thought "Neg" just means "some kind of negation", which means different things for "no one", "nowhere", "not"...

I would suggest a perhaps clearer example of modal verb: may [not happen] vs [may not] happen.

And the next time I listen to M. Jackson I'll think about unrain vs unbreak my heart.

This work definitely needs a lot more investigation, and discussion.

if such morphemes are limited in their lexical or semantic productivity (it cannot unrain, for example)

It's not unusual for negative morphology to be unproductive or limited. In some languages it's limited just to a special form of the copula (e.g. Church Slavonic), or just to existence predicates (Arabic/Hebrew/Coptic lexical negative existentials). I don't think that's a criterion for what carries this feature.

@nschneid
Copy link
Contributor Author

nschneid commented Jun 2, 2024

The Core Group discussed and concluded that Polarity=Neg should apply only to the most grammatical (nonlexical) forms of negation, and those which are not pro-forms covered by PronType=Neg. So this limits its application in English to the word "not".

Maybe we should implement MSeg to capture additional morphological structure—whether it's prefixes of "pure negation" like un-/in-/im-/non-, or other morphemes: de-, re-, anti-, pro-, etc. etc. (When they are not tokenized off; cf. #152.)

@nschneid
Copy link
Contributor Author

nschneid commented Jun 2, 2024

Per https://universaldependencies.org/u/feat/Polarity.html, also implemented the feature for interjections "yes" and "no"

@amir-zeldes
Copy link
Contributor

Hm, I think this makes this annotation rather useless for English, but I'll respect and implement the decision. I take it this does not apply to "never" or other items which can stand in a paradigm with "not"?

I'm moving the old GUM polarity to a misc attribute Negation=Yes so that we don't lose that useful information. MSeg already identifies relevant affixes, but does not disambiguate them, and some negating elements are not affixes.

@nschneid
Copy link
Contributor Author

"never" will be PronType=Neg, which is mutually exclusive with Polarity=Neg.

@amir-zeldes
Copy link
Contributor

OK, I've applied Polarity=Neg based on this spec, let me know if we're adding PronTypes as well or if you want to wait for more discussion.

nschneid added a commit to UniversalDependencies/docs that referenced this issue Jun 18, 2024
nschneid referenced this issue in UniversalDependencies/docs Jun 18, 2024
@nschneid
Copy link
Contributor Author

@martinpopel
Copy link
Member

martinpopel commented Jun 24, 2024

I agree Polarity=Neg should apply only to the most grammatical (nonlexical) forms of negation.
I suggest we further require (in the documentation) that for each language and UPOS, the Polarity feature should be either lexical or inflectional, but not both.*

For example, English lemma not and no have Polarity=Neg in all its forms, so its a lexical feature.
Czech verbs (e.g. nepřišel "he did not come") and adjectives (e.g. nevelký "not big") have Polarity=Neg as an inflectional feature, which means that the lemma does not include the negative prefix (lemma(nepřišel)=přijít, lemma(nevelký)=velký).

Thus, if we decided that e.g. the English prefix un in adjectives (or another prefix in another language) is "the most grammatical form of negation", we should have e.g. form=unnecessary, lemma=necessary, Polarity=Neg.

15 years ago, I did an English lemmatizer that tried to identify negative adjectives, adverbs and nouns (but not verbs), but I admit there is a "canny valley" of non easy cases making me feel uneasy. Thus I don't suggest we go this way in UD.

*) Personally, I would prefer Polarity being always inflectional, but there are some existing counter examples in UD.

@nschneid
Copy link
Contributor Author

@martinpopel I made an English-specific documentation page: https://universaldependencies.org/en/feat/Polarity.html

If you have input for the universal page please post that to https://github.com/UniversalDependencies/docs

@AngledLuffa
Copy link
Contributor

The Pronouns dataset already has Polarity=Neg on its not words

@AngledLuffa
Copy link
Contributor

PUD has a couple oddities for its not words

Most of them already have Polarity=Neg, but this one doesn't:

# sent_id = w01053067
# text = It is possible to establish the phase of the moon on a particular day two thousand years ago but not whether it was obscured by clouds or haze.
19      but     but     CCONJ   CC      _       24      cc      24:cc   _
20      not     not     PART    RB      _       24      advmod  24:advmod       _
21      whether whether SCONJ   IN      _       24      mark    24:mark _

The other weirdness is that quite a few have the UPOS tag ADV:

12      its     its     PRON    PRP$    Gender=Neut|Number=Sing|Person=3|Poss=Yes|PronType=Prs  15      nmod:poss       15:nmod:poss    _
13      prominent       prominent       ADJ     JJ      Degree=Pos      15      amod    15:amod _
14      topographic     topographic     ADJ     JJ      Degree=Pos      15      amod    15:amod _
15      features        feature NOUN    NNS     Number=Plur     18      nsubj   18:nsubj        _
16      do      do      AUX     VBP     Mood=Ind|Tense=Pres|VerbForm=Fin        18      aux     18:aux  _
17      not     not     ADV     RB      Polarity=Neg    18      advmod  18:advmod       _
18      parallel        parallel        VERB    VB      VerbForm=Inf    2       advcl   2:advcl:in_that _
19      the     the     DET     DT      Definite=Def|PronType=Art       20      det     20:det  _
20      coast   coast   NOUN    NN      Number=Sing     18      obj     18:obj  SpaceAfter=No

Presumably those should all be changed to PART? In general do not VERB is tagged PART in EWT, and there's no cases of not_ADV in either EWT or GUM:

# sent_id = weblog-blogspot.com_rigorousintuition_20060511134300_ENG_20060511_134300-0295
# text = Sub-cultures (as the lable implies) do not develope in a vacuum.
8       do      do      AUX     VBP     Mood=Ind|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin   10      aux     10:aux  _
9       not     not     PART    RB      Polarity=Neg    10      advmod  10:advmod       _
10      develope        develope        VERB    VB      VerbForm=Inf    0       root    0:root  _

... side note, extra e in develop in this EWT sentence

AngledLuffa added a commit to UniversalDependencies/UD_English-PUD that referenced this issue Jun 25, 2024
nschneid added a commit that referenced this issue Jun 26, 2024
…iated spellings (UniversalDependencies/docs#517 - also fix neaten.py cause of false negative in #532); some typos (including "develope", #526)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants