Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overuse of INTJ #429

Closed
nschneid opened this issue Sep 24, 2023 · 16 comments
Closed

Overuse of INTJ #429

nschneid opened this issue Sep 24, 2023 · 16 comments

Comments

@nschneid
Copy link
Contributor

Per https://universaldependencies.org/u/pos/INTJ.html, INTJ should not be used for words that come from another category, like adjectives/adverbs.

https://universal.grew.fr/?custom=650f87d83dd5e - includes "good", "great", "fine", "well", "Christ", ...

@amir-zeldes
Copy link
Contributor

Agreed about good, great, fine; I think "well" is established as a discoursy interjection in the sense that doesn't mean 'good' in any way (sentence initial well, xpos UH), I would keep that INTJ (not the same lexical item). For "Christ" we get into the general area of profanities, which often do not behave like their morphological POS - I'm not sure we want them not to be INTJ just because of etymology. They are definitely UH in xpos.

@nschneid
Copy link
Contributor Author

I think "well" is established as a discoursy interjection in the sense that doesn't mean 'good' in any way (sentence initial well, xpos UH)

Agreed, I wasn't thinking that one though

For "Christ" we get into the general area of profanities, which often do not behave like their morphological POS - I'm not sure we want them not to be INTJ just because of etymology.

https://universaldependencies.org/u/pos/INTJ.html specifically rules out "God". Not to sound religious but I don't think we can distinguish "God" from "Christ".

@amir-zeldes
Copy link
Contributor

Hm, it's hard with that being in the guidelines the way it is, but do we think that's true for swearwords as well? I mean, we could argue about what POS some of them are but it would lead to a colorful GH issue ;) If we stick to PTB UH for them, which I think is standard, we would need a conversion table to know what their 'etymological POS' is, and I'm not sure how much sense that makes. Morphosyntactically, profanities and other oathes do fit the definition of emotional, syntactically unintegrated language...

@nschneid
Copy link
Contributor Author

There may be some borderline cases but here's an interpretation that makes sense to me: If a word is mainly used for swearing, and it's syntactically extrinsic to the semantics-bearing part of the sentence (not a predicate or argument etc.), then it's INTJ. Same for discourse particles from which verbs have been derived (INTJ for the main use of "OK" even though it can also be a VERB), and discourse particles whose meaning is quite distinct from the non-discourse one ("well", "like" as INTJ). If it's a word that is mainly an ADV or NOUN etc. and also has a secondary use as a discourse particle, then it's not INTJ. We can tell that it is being used as a discourse particle because it attaches as discourse, but we don't need to posit a separate lexical entry.

"Please", "sorry", "right" feel kinda borderline. I guess "Sorry/discourse" just expresses that you're sorry/ADJ so we can call it ADJ. "Right?" is like saying "Is that right/ADJ?", so ADJ as well. "Please/discourse" is farther from evoking an act of pleasing, so I'd call it INTJ.

@amir-zeldes
Copy link
Contributor

I have to say it's not that I necessarily think the above is a bad way to slice up the space, but it seems like 'just another arbitrary system', where we already have one (PTB UH). I would probably be happier just deciding "if it's UH, it's INTJ" since that's an established practice in English and be done with it. Having/maintaining another lexicalized list of items just for upos seems rather unappealing... We could just decide that all of those swear and discourse items have a second sense/lexical entry which deserves tagging as INTJ - if it's good enough for "well" and "ok" then why not also for the gander?

@nschneid
Copy link
Contributor Author

Well, it's a question of how closely we want to follow the INTJ guidelines. If other languages follow them and exclude "God" etc. then we risk having English be incompatible and making crosslinguistic comparison harder. TBF this is a pretty small set of lexical items in practice (that would be UH but not INTJ).

(Incidentally I don't see a mention of "God" in the PTB guidelines—are we sure the annotators consistently tag it UH when it's vocative?) image

@amir-zeldes
Copy link
Contributor

For "[Oo]h [Gg]od", ON has 50 UH : 5 NNP, and three of the latter are actually referring to God ("Oh God, we ask you for...")

I get the issue with crosslinguistic comparison, but I feel like language internal consistency should also not be overlooked, and I worry about chaos/arbitrary decisions and incompatibility between TBs. I think ultimately people will decide on a language-specific basis whether there is a separate lexical item for an intj version of something. For example, some UD Russian TBs tag the archaic "боже" (vocative of "God") as an INTJ and lemmatize it to itself (UD_Russian-Taiga). But others still consider it to be a form of lemma God, and annotate it as a noun with vocative case (even though modern Russian has no productive vocative), for example in UD_Russian-SynTagRus, with the nominative lemma бог "God".

So at the end of the day, if Russian can choose for there to be a 'special' interjection use of God, I think it's likely other languages will vary too, and I'm not sure that's wrong (though I am sure it shouldn't oscillate within the same UD language...)

@nschneid
Copy link
Contributor Author

I agree that flexibility in the universal guidelines is sometimes necessary. If we want to say that it should be up to the language, then the guidelines shouldn't articulate a hard-and-fast rule. @dan-zeman do you think this calls for a more flexible guideline? (Come to think of it, why does https://universaldependencies.org/u/pos/INTJ.html say "God" is a NOUN and not a PROPN?)

@amir-zeldes
Copy link
Contributor

"God" is a NOUN and not a PROPN

In PTB non-INTJ usage, it's determined just by captialization it seems... I suppose both tags are possible, certainly for common noun uses ("a/some god").

@dan-zeman
Copy link
Member

I believe we need INTJ for words that cannot be anything else. A vocative use of god is just a vocative use of a noun. (Why do you think it should be PROPN?) It is irrelevant whether the speaker actually intends to talk with the god or is simply swearing. First, I'm not sure I'd want (and be able) to distinguish those two cases. And second, those differences are pragmatic, but syntactically it's still a vocative. Same for Russian боже - yes, the vocative morphology is no longer productive, but it only means that for most other nouns the nominative form is used where vocative would be appropriate; syntactically it is still a noun and it is annotated with Case=Voc in SynTagRus.

@amir-zeldes
Copy link
Contributor

amir-zeldes commented Sep 28, 2023

Why do you think it should be PROPN?

In the use as a referring expression? Because it is capitalized, refers to a unique individual (at least in the sense of Judeo-Christian God), and appears without an article, like other names. If it's just one of many (usually lowercase) gods, "the/a god" etc., then it should indeed be NOUN IMO.

it only means that for most other nouns the nominative form is used where vocative would be appropriate

This would be true if the language had lost vocative for most nouns, but had a stable class of nouns that still clearly distinguish vocative. I don't think that's true for Russian - outside of this lexicalized exclamative use, even if you are talking to God, there is no special form, for example I just found this on Pinterest:

I think it's just in a specific type of exclamation, often preceded by "my".

it is annotated with Case=Voc in SynTagRus

Right, but not in Taiga for example, so there is some variation in how annotators perceive it. That's part of my point - that language internal guidelines will often have to decide if there is a separate INTJ item that happens to look like a NOUN etc. For English, for example, exclamative God alternates with the euphamistic non-NOUN "gosh", and you can say "oh my gosh!", but you can't say "Dear Gosh, hear my prayer", so this is another indication that the exclamative God is perhaps a different lexical item.

@dan-zeman
Copy link
Member

OK, then the Russians should really make up their minds whether the frozen vocative боже can still be analyzed as a noun; I cannot judge its frequency (when I speak Russian at all, it's not to a god) and I'm probably biased, being a speaker of a language where the vocative is still productive and bože sounds absolutely normal.

In English, I understand that gosh is probably an interjection. But I don't think its an argument for god/God to not be a noun (common or proper).

@amir-zeldes
Copy link
Contributor

In English, I understand that gosh is probably an interjection. But I don't think its an argument for god/God to not be a noun (common or proper).

Well, I'm saying it could be used as an argument for there being two lexical items "god", one of which is a noun (and is not in a paradigm with gosh), and one of which has the same POS as "gosh", with which it is completely interchangeable paradigmatically.

@nschneid
Copy link
Contributor Author

Per today's Core Group discussion, the wording on the INTJ page was probably a bit too specific; the intention was to emphasize the general guidelines about prototypical vs. productively extended usages. Revised to make this clearer: https://universaldependencies.org/u/pos/INTJ.html

@jnivre
Copy link

jnivre commented Dec 13, 2023 via email

@amir-zeldes
Copy link
Contributor

amir-zeldes commented Dec 13, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants