Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add obl:agent for English passives? And rationale for Voice=Pass in English #290

Closed
nschneid opened this issue Jan 10, 2022 · 71 comments
Closed

Comments

@nschneid
Copy link
Contributor

nschneid commented Jan 10, 2022

https://universaldependencies.org/u/dep/obl-agent.html

I assume English has the :pass subtype on subjects/auxes, but not :agent on by-phrases, for historical reasons. We should probably add it for crosslinguistic comparability.

Most, but not all, of the by phrases recovered by this query are passive agents:

Issue #289 should be addressed first to correct errors in the :pass annotations.

@nschneid
Copy link
Contributor Author

@amir-zeldes
Copy link
Contributor

Sure, I'd be on board. The query may be too low recall though, as not all passive participles have aux:pass.

@nschneid
Copy link
Contributor Author

nschneid commented Jan 10, 2022

True, searching for xpos=VBN has higher recall.

And Voice=Pass seems to be missing for a lot of passive verbs with VBN and a by-phrase (and probably passive verbs without a by-phrase as well).

Basically there should be rules to validate that all the elements of the passive construction (the verb, subject if present, passive aux if present, by-phrase if present) are annotated properly.

@amir-zeldes
Copy link
Contributor

Not sure this is possible with perfect accuracy, since there are some pretty weird elliptical and coordinate structures; but checking manually makes sense if we can afford it. The cases you identified are now fixed in GUM via amir-zeldes/gum@851c6d5 !

amir-zeldes added a commit to amir-zeldes/gum that referenced this issue Jan 14, 2022
@nschneid
Copy link
Contributor Author

The head of an obl:agent should be marked with Voice=Pass, right? GUM

@amir-zeldes
Copy link
Contributor

That makes sense to me. Currently I think for some reason passive voice in GUM is only in finite conditions, i.e. with an auxiliary. Do we want passive voice on all ostensibly passive participles? If so we could just apply it indiscriminately to xpos=VBN.

@nschneid
Copy link
Contributor Author

I am thinking passive = subject (if there is one) will be :pass + obl:agent by-phrase is possible.

I would leave out the VBNs with a perfect aux, and fixed VBNs ("as opposed to").

Not sure about pre-head amod VBNs: "radiation-induced thyroid cancers" can't have a by-phrase in that position, but it can be paraphrased with acl as "thyroid cancers induced by radiation". "the affected/amod children are...", "the children affected/amod are...", "the children affected/acl by the policy are..."

And there are some misc cases in EWT and GUM, many of them errors

Not sure about non-fixed connectives like "supposed to", "based on", "given"

@amir-zeldes
Copy link
Contributor

The more I think about this the more unsure I am - I mean, these are morphological annotations, right? I agree it's off-putting to have active perfects be tagged as passive voice anywhere, but we are not really tagging periphrastic constructions here (that's what deprel is for). I think basically anything tagged VBN is morphologically passive, and the passive participles in English are used for various things. But like you I also don't like tagging present perfects as Pass.

I'd be curious for more input on this and some cross-linguistic information about how others are handling this, any thoughts @dan-zeman / @ftyers / others?

@dan-zeman
Copy link
Member

I am not sure what specifically those "any thoughts" should be on, as the title of the issue is obl:agent, and that is a deprel, nothing morphological. It should go to the by nominal iff it occurs in a passive clause. Whether and how you establish that it's a passive clause is a different question (and maybe difficult, I don't know).

If you are rather asking about Voice=Pass, then I don't think it should be used in English. Passive predicates are periphrastic and neither the auxiliary nor the past participle are specifically passive. This is different from Czech and other Slavic languages, where the passive participle is morphologically distinct from active participles.

@sylvainkahane
Copy link

@amir-zeldes In French, obl:agent is also used for the causative construction where the verb is infinitive: http://universal.grew.fr/?custom=6328a0b7b1184
In French, we don't have a feature Voice on participles, but I think we could add it.

@dan-zeman Even if the same form of the past participle is used for active and passive in English or French, I wouldn't say that the passive is periphrastic, because in many cases we have a passive construction without a copula, for instance in participial clauses: the question asked by Mary.

@amir-zeldes
Copy link
Contributor

If you are rather asking about Voice=Pass, then I don't think it should be used in English. Passive predicates are periphrastic and neither the auxiliary nor the past participle are specifically passive. This is different from Czech and other Slavic languages, where the passive participle is morphologically distinct from active participles.

Hi Dan - yes, we are asking about the morphological feature now. But I don't think I agree with the statement above - the Slavic passive participles, like the English ones, can be used in periphrastic predicative passive constructions, in postponed acls and in adjectival attributive ones. They are distinct from the active participles in the same way as their English counterparts. Consider this example showing all 3, adapted from UD_Polish-LFG:

  • Polish: Wybrany/Pass konwojent, delegowany/Pass przez fabrykę, został już obrobiony/Pass.
  • English: The chosen/Pass? escort, delegated/Pass? by the factory, has already been processed/Pass?.

In UD Polish, all three passive participles are tagged Voice=Pass, and we are now debating whether the same should happen in English. The main case that differentiates English and Polish is that in English, there are also periphrastic active tenses involving the morphologically passive participle, i.e. we have "I have read the book" with little/no sense of passiveness (whereas I'm guessing the equivalent Polish "??mam książkę przeczytaną" would be understood as containing passivization, if anyone ever said that)

In French, we don't have a feature Voice on participles, but I think we could add it.

I think the same basic arguments that apply to English would also work for French, since the morphosyntax is fairly similar, and should probably be made consistent across romance and Germanic languages if possible. In French it seems to be inconsistent, for example FQB, Sequoia and FTB all use Voice=Pass on participles, but the other corpora don't.

@dan-zeman
Copy link
Member

Yes, you can use a passive participle attributively, but it is still a passive participle, and that's what the feature Voice=Pass says. It labels the form, not the construction it is used in.

Essentially, you have three classes of participles in Slavic:

  • attributive active participle (two versions, present and past): dělající, udělavší / robiący, zrobiwszy / делающий, сделавший / delajući, dodelavši / doing, having done
  • passive participle (present and past only disinguished in Russian): dělaný / robiony / делаемый, сделанный / delan / done
  • the form that Slavic linguistics calls simply the l-participle; it is active and it is normally used predicatively: dělal / robił / делал / delao / did, has done

So, Voice=Act vs. Pass, and Tense=Pres vs. Past are the obvious feature candidates to distinguish the forms in the first two classes. Marking the l-participle is a bit trickier (unless we want to introduce Slavic-specific VerbForm=Lpart, which none of the languages does). The way we go in Czech is that the first two classes are ADJ (regardless whether in the actual sentence they're used attributively or predicatively), and the l-participle is VERB. It has VerbForm=Part, Voice=Act, and Tense=Past. The tense feature is a bit controversial because besides the past tense, it can be also used in the conditional mood construction, and again we use the feature to mark the form, not its actual usage in the current sentence. But since the past tense is the default perception of this form, we go with it. (In some other Slavic languages, the l-participle can be used also in the periphrastic future tense (besides the past), so it is probably not a good idea to mark Tense there. And in East Slavic languages the l-participle is used without an auxiliary in the past tense, so they decided to mark it as VerbForm=Fin; there was a long discussion about it in UniversalDependencies/docs#281.)

@nschneid
Copy link
Contributor Author

nschneid commented Sep 20, 2022

I think this is worth discussing in the broader group in case there is an overarching principle we can add to the morphology guidelines.

E.g., for the position @dan-zeman is articulating: "Morphological features explain the form of the word, not necessarily its full function. Most of the features are for locating the form of the word in a slot of a morphological paradigm, and are canonical labels for the slot—they will not always reflect the morphosyntactic function in a broader construction. Thus, for example, Voice=Pass would be appropriate for a predicate that is inflected for passive as distinct from other forms; just because a certain form of a verb is required by a broader passive construction does not guarantee that it should be labeled Voice=Pass when it is used in that way."

OTOH I wonder if this is too strict with the downsides that a) if a form has two major and clearly distinct grammatical uses, it may be difficult to choose the canonical one, and b) it will hurt crosslinguistic parallelism. From one perspective it would be nice to explicitly distinguish English passive uses (which are not guaranteed to have a subject or obl:agent) from perfect uses of the past participle, so that we know that the verb form reflects passiveness even in English (just not unambiguously).

@amir-zeldes
Copy link
Contributor

Essentially, you have three classes of participles in Slavic

Yes, so English has basically the same two first types, but not the third, and only the second is passive (actually some languages even still have the fourth aorist s-participle, like Polish wstawszy "having stood up", but that isn't passive either).

Voice=Act vs. Pass, and Tense=Pres vs. Past are the obvious feature candidates to distinguish the forms in the first two classes

Agreed, so this would motivate doing the same for English as well, except that in English the passive one also shows up in a specific non-passive construction, on top of showing up in exactly the same passive constructions. This makes sense because the perfect tenses are a later addition, and other than that the English -ed/-en participles are etymologically 1:1 cognates of the Slavic -t-/-n- passive participles (znan-y is the exact cognate of known, etc.)

it may be difficult to choose the canonical one

@nschneid I agree this complicates things. The way I see it our options are:

  1. Never use Voice=Pass in English
  2. Use it for all occurrences of those forms (xpos=VBN)
  3. Use it for those forms except when they are part of the active perfect constructions

Personally I would be fine with 3. Option 1 seems too extreme to me, and I think it would be nice to be able to do things like count passives in each language in UD this way. You can't rely on deprels for this due to subjectless cases. Option 2 seems unintuitive to me because there is nothing really passive about the active perfect tenses. So that leaves 2., unless I'm missing other ideas?

@dan-zeman
Copy link
Member

I think this is worth discussing in the broader group in case there is an overarching principle we can add to the morphology guidelines.

Good point. It was actually discussed in the core group in September 2016, primarily for POS tags, but also for features, and at least for POS tags there was a decision about what should be added to the overview of morphology, but for some reason the clarifying text did not make it there. I just added it now, together with a similar clarification for features.

English has basically the same two first types

Yes, but it was my understanding that they were being distinguished by Tense=Pres vs. Tense=Past, that is, tense was picked here as the approximate canonical label. Obviously, it is a simplification as well, and Pres is to be interpreted rather as simultaneous (while Past is anterior), but that is also done for non-finite verb forms in other UD languages. One could also consider using Aspect=Prog for the former and Aspect=Perf for the latter but that is not done in English AFAIK.

@nschneid
Copy link
Contributor Author

Thanks for resurrecting the 2016 text in UniversalDependencies/docs@0ac5aa4 & UniversalDependencies/docs@0d64110! I still think it's worth discussing in the core group and potentially issuing a formal clarification, because different individuals/treebanks may have different thresholds for considering a use to be sufficiently different to warrant a separate label.

@amir-zeldes
Copy link
Contributor

Agreed, thank you both. So much has changed since 2016 and the correct thing to do seems non-obvious to me so I would welcome more discussion. Pres and Past come from names that have been used for these participles before, but they are also not a great fit (a present passive uses the "Past" participle, and a past progressive uses the "Pres" participle). We can decide to use one (or both) features all of the time, never use them, or use them contextually. Currently it seems the Tense feature is the only one used across the board in English, and the Voice one is only used if there is a subject/auxiliary, but I suspect this happened without too much reflection.

@nschneid
Copy link
Contributor Author

Per the core group decision, I'll update the guidelines to say that it's OK to use Voice=Pass in English. @dan-zeman would have preferred to treat the distinction between passive and non-passive past participles as strictly syntactic (not morphological syncretism) but acknowledged it is in a gray zone. It would be useful in practice to have the two uses of past participles disambiguated.

Once Voice=Pass is added consistently, do we still need the :pass subtype on passive nsubj, csubj, and aux where such dependents are present? It seems redundant. Many other languages use :pass subtypes (also expl:pass in some languages). But maybe that predated morphological features as a place to encode voice. Are there languages where passive voice is expressed periphrastically and cannot be pinned on the predicate, and if so, do we want to keep using :pass in English for crosslinguistic compatibility?

@jnivre
Copy link

jnivre commented Jan 16, 2023 via email

@dan-zeman
Copy link
Member

But maybe that predated morphological features as a place to encode voice.

No. Voice=Pass exists as a morphological feature since the launch of UD v1. So do the relations nsubjpass, csubjpass and auxpass, later (in UD v2) converted to subtypes. Only Voice=Pass was not used as a morphological feature in English because there is no passive-specific morphology in English.

@dan-zeman
Copy link
Member

Are there languages where passive voice is expressed periphrastically

Of course! English is one of them :-) Also German, Spanish, probably the other Romance languages...

@nschneid
Copy link
Contributor Author

I meant where there is not a verb form strongly associated with the passive. (In English there are two main uses of past participles—the passive and the perfect. To my knowledge an auxiliary is always required with the perfect, but that is not true of the passive.)

Anyway, I'm not suggesting to forbid :pass in other languages, just wondering if we should drop it for English if it's redundant. Or keep it in the interest of conservatism + crosslinguistic compatibility.

@dan-zeman
Copy link
Member

Anyway, I'm not suggesting to forbid :pass in other languages, just wondering if we should drop it for English if it's redundant. Or keep it in the interest of conservatism + crosslinguistic compatibility.

Since it has been strongly recommended for 8 years, and it will not disappear from the 60+ languages where it is currently used, I would keep it in English as well.

@nschneid
Copy link
Contributor Author

A second question is whether we should restrict Voice=Pass to cases where the verb is the main predicate in a non-copular clause, or also include attributive uses:

  • simple amod: the retired/unemployed/respected teacher
  • compound amod: the brow - beaten teacher
  • copular predicate: They are very well made

@amir-zeldes
Copy link
Contributor

I also agree we should keep nsubj:pass and aux:pass to minimize breaking changes that would confuse users and retain comparability with other resources.

I'm not passionate about putting Voice=Pass on adjectives, but I'm convinceable, esp. if it turns out this is common/standard in other languages.

@nschneid
Copy link
Contributor Author

OK. Neither EWT nor GUM currently have it on any amod VERBs. So for now let's just make sure it's used consistently in other contexts.

@nschneid nschneid changed the title Add obl:agent for English passives? Add obl:agent for English passives? And rationale for Voice=Pass in English Jan 29, 2023
@nschneid
Copy link
Contributor Author

nschneid commented Oct 22, 2023

Coming back to passives. Gonna see if I can incorporate obl:agent and Voice=Pass into EWT before the data freeze. :)

Here is a query that seems to work for obl:agent after poking around a bit for exceptions (such as a by-oblique marking a means/medium, extent, or location): https://universal.grew.fr/?custom=6535622a699c3

This excludes by-obliques headed by an ADJ derived from a verb. These are unambiguously adjectives thanks to a prefix or suffix: "payable by me", "unrestrained by anything in the Constitution", "unaffected by sweeping cuts", (GUM) "unobstructed by the Earth's atmosphere". The by-phrase marks the same semantic role as the obl:agent of the verb the adjective derives from, but since it's an adjective not a verb, we can stick with plain obl.

Voice=Pass is harder (see discussion above). I think we're agreed that it should apply to VBNs triggered by the passive construction (not triggered by the perfect, and not "been" marking a passive).

Here are cases currently excluded as Voice=Pass in GUM that we might add—VBNs functioning as advcl, acl, amod, xcomp etc.: https://universal.grew.fr/?custom=65357aaef253f

Some have a passive or obl:agent dependent.

Some of the xcomp ones have, or clearly would license, a by-phrase: e.g. (GUM) "you become thrilled by everything", "became known", (EWT) "Hamas remains targeted by Israel". I feel it's hard to argue these are not passive, and if we add Voice=Pass here maybe we should add it across the board wherever the VBN is not triggered by the perfect construction.

nschneid added a commit that referenced this issue Oct 23, 2023
nschneid added a commit that referenced this issue Oct 29, 2023
@nschneid
Copy link
Contributor Author

nschneid commented Oct 29, 2023

@amir-zeldes curious for your take on the following which currently have cop + VERB:

  • They are very well made and realistic.
  • What matters is how well trained he is
  • County Kerry is the worst affected.
  • Sounds like your cat is stressed out .

How would you deal with these in GUM, considering that the predicates are compounds with adjective-like properties?

@amir-zeldes
Copy link
Contributor

I think I would view them as passive after all, it seems conspicuous to me that you can have a by-phrase ("worst affected by the storm")

@nschneid
Copy link
Contributor Author

  • ?They are very well-made by artisans. - sounds odd to me

Some more with "very":

  • Silkie crosses are great birds, and very neat looking. (EWT)
  • I'm actually very excited for this yoga class. (GUM)

In both of these, "very" attaches to the VERB as advmod, but this seems like a classic scalar adjective modification....

@amir-zeldes
Copy link
Contributor

I think you can find many real examples for "well made by" and similar.

I agree it's strange to tag neat looking as a VERB but that's just a consequence of UPOS not having a participle tag. I think that's what they are - something neat looking is something which looks neat.

@nschneid
Copy link
Contributor Author

If "very" is not a test for ADJ, how should the line be drawn? GUM has plenty of ADJs that look like they could be considered participles and (in the case of past participles) might license a by-phrase.

@amir-zeldes
Copy link
Contributor

amir-zeldes commented Oct 29, 2023

The PTB guidelines for the distinction are pretty murky, but I think the intensification test is probably the weakest one, because almost any participle used attributively can be intensified - so I allow pretty much any other test to override it.

In the examples you point to (which may also include actual errors of course), some of the most compelling countertests apply - negation with un- is a total deal breaker for a participle, since the lemma wouldn't exist (uncredited -> *uncredit).

In other cases, we can use a relative clause paraphrase to show that the relevant verbal lemma does not apply, and the adjective is lexicalized. For example, commercially interested research is not research which is interested in anything/commerce interests it etc.

But availability of, or even better attestation of the by agent is for me a clear indication that we are still dealing with a form of the verb (if you are surprised by some results, then the results surprise you- they are still argument fillers of a verb, even if you are "very surprised")

I don't know how idiosyncratic my reading of the PTB guidelines is, but I think/hope it is informed by corpus searches in LDC corpora when in doubt (especially OntoNotes), although those are far from 100% consistent either.

@nschneid
Copy link
Contributor Author

OK—worth noting that by-phrases can also be present with adjectives (I was unbothered by the high temperatures), though they are not obl:agent.

The cat is stressed out seems more adjective-like than "well-known" etc., because e.g. stressed by itself can be negated. I guess unstressed out is weird though I could imagine it in casual speech: "I'm feeling pretty un–stressed out", where "un-" scopes over the phrase.

With very excited for this yoga class, you can surely be unexcited about something, and it looks like unexcited for something is also attested on the web. Also having a preposition other than by is weird if we're saying it's passive. Would that be better as ADJ?

@manning
Copy link
Contributor

manning commented Oct 29, 2023 via email

@nschneid
Copy link
Contributor Author

Thanks @manning. I've updated https://universaldependencies.org/en/feat/Voice.html with new guidelines (though it's not updating presently, so see https://github.com/UniversalDependencies/docs/blob/pages-source/_en/feat/Voice.md).

I realized that we hadn't considered AUXes promoted to predicate of the clause. It is possible that this happens due to elliptical stranding and that the elliptical clause would be interpreted as passive, as in some of these results. But usually the AUX would not be a VBN ("been" is the only one that qualifies). I think to avoid confusion we should just limit Voice=Pass to VBN/VERB cases.

@amir-zeldes
Copy link
Contributor

worth noting that by-phrases can also be present with adjectives (I was unbothered by the high temperatures), though they are not obl:agent.

Agreed. Basically the order of priority of PTB tests for me places everything above intensification, I think I operate like this:

  1. Negation, a.k.a. "no such verb" - if the item has negation that would preclude a verbal lemma, it must be an adjective (uncredited -> JJ). This outranks even a by-agent; exception: if the "un-" lemma exists as a verb (e.g. Santorini's example "untied" -> "untie"). Similar logic applies to the 'no VBG for no such verb' rule, e.g. "outgoing" cannot be VBG, because "outgo" is not a verb.
  2. By-agent - if it has an overt by agent, then it is VBN. Santorini originally ranked this below intensifiers, but the corpora seem to behave the opposite way so I have followed them and I think we've been pretty consistent about it. It's also in the spirit of prioritizing argument structure where possible.
  3. If a relative clause paraphrase is possible, prefer VBG/VBN, but if that changes the meaning, prefer JJ: "appetizing dish" ->*dish which appetizes -> JJ; existing safeguards ->safeguards that exist -> VBG
  4. If "get" is possible but "become" is bad, prefer VBN - Santorini's "I was/got/*became married"
  5. Prefer JJ for state if it has a different reading from the event (e.g. "I was mistaken/JJ" != "someone mistook me")

@amir-zeldes
Copy link
Contributor

amir-zeldes commented Oct 30, 2023

I think to avoid confusion we should just limit Voice=Pass to VBN/VERB cases

OK - I lightly edited your script, note you need .*Voice.* to check if morph contains Voice, and you don't need ^$ for exact match if you are not using regex. I'm also forbidding lemma 'be' in the last step, just in case:

; No Voice for VBN functioning as aux/cop
xpos=/VBN/&func=/cop|aux(:pass)?/	none	#1:storage=no_voice;#1:morph-=Voice
; Has a dependent specific to passive construction: nsubj:pass, csubj:pass, aux:pass, or obl:agent
xpos=/VBN/&storage!=/no_voice/;func=/.*:pass|obl:agent/	#1>#2	#1:morph+=Voice=Pass
; Has a plain aux but no aux:pass, indicating the VBN is there because of the perfect construction
xpos=/VBN/&storage!=/no_voice/&morph!=/.*Voice.*/;func=/aux/	#1>#2	#1:storage=perfect
; "Got/VBN" assumed perfect (even without aux: "Got it." short for "I've got it.", "I gotta have it." short for "I've got to have it", etc.)
xpos=/VBN/&storage!=/no_voice/&morph!=/.*Voice.*/&lemma=/get/	none	#1:storage=perfect
; "Have" aux assumed to scope over coordination
xpos=/VBN/&storage!=/no_voice|perfect/&morph!=/.*Voice.*/&func=/conj/;xpos=/VBN/;func=/.*:pass|obl:agent/	#2>#1;#2>#3	#1:storage=par_passive
xpos=/VBN/&storage!=/no_voice|perfect|par_passive/&morph!=/.*Voice.*/&func=/conj/;xpos=/VBN/;func=/aux/	#2>#1;#2>#3	#1:storage=perfect
; All other VBNs assumed passive
xpos=/VBN/&lemma!=/be/&storage!=/no_voice|perfect/&morph!=/.*Voice.*/	none	#1:morph+=Voice=Pass

@nschneid
Copy link
Contributor Author

Hmm...I was confused about how regex matching works because in a previous Depedit script I searched for something like lemma=/begin|continue|end/ and it matched "friend".

@nschneid
Copy link
Contributor Author

Actually now all the morph!=/.*Voice.*/ parts may be redundant because other than the first rule it is only ever adding Voice=Pass. In an earlier version of the script some of the rules added Voice=Act, which is why it was necessary to check for an existing Voice feature.

@amir-zeldes
Copy link
Contributor

Hmm...I was confused about how regex matching works because in a previous Depedit script I searched for something like lemma=/begin|continue|end/ and it matched "friend".

Yes, it's a little counterintuitive right now - basically if re.escape of the criterion is identical to itself, then depedit uses exact match (string identity) instead of regex, to save time. This means that for a regular string, omitting ^$ is faster and equivalent. But if you have pipes, now you are forcing the system to use regex, and now it will use re.search, so it becomes necessary again to use lemma=/^(begin|continue|end)$/. In fact, if you don't have ^$, I think it auto appends them, but this can create perverse situations, which is what must have happened to you, but 'end' must have been in the middle:

lemma=/begin|end|continue/

Became:

lemma=/^begin|end|continue$/

leading to the error due to operator priority and lack of brackets (which would change the meaning, since depedit uses capturing groups. I should probably make it wrap all regex-y values in non-capturing brackets...

Are you sure we don't need the morph!=/.Voice./? Then I'd take it out

@nschneid
Copy link
Contributor Author

nschneid commented Oct 30, 2023

Ah I think I get it.

I should probably make it wrap all regex-y values in non-capturing brackets...

Yes that should work, or use re.match() and check that the end index of the match is the length of the string being searched. (Never mind I realize that index-checking strategy could fail for disjunctions of different lengths where one is a prefix of another, or non-greedy operators etc.)

Are you sure we don't need the morph!=/.Voice./? Then I'd take it out

If Voice=Pass is already present, subsequent rules—which either add Voice=Pass or store a temporary variable—will have no effect. So there is no need for the check.

amir-zeldes added a commit to amir-zeldes/gum that referenced this issue Oct 31, 2023
@amir-zeldes
Copy link
Contributor

OK, done!

@nschneid
Copy link
Contributor Author

nschneid commented Nov 1, 2023

@amir-zeldes reopening for:

@amir-zeldes
Copy link
Contributor

Looks like I didn't get to this on time... It will end up in the next release though!

@amir-zeldes
Copy link
Contributor

GUM cases fixed upstream and validation integrated into build bot. Will commit after freeze.

@nschneid
Copy link
Contributor Author

@amir-zeldes are these GUM hits corrected upstream? https://universal.grew.fr/?custom=653dccf9ed623

@amir-zeldes
Copy link
Contributor

Yes, with the exception of compounds, due to revised PTB tokenization. So the second match, for example, is fixed ("is wasted"), but the first is not ("self-inflicted"), since morphologically, "inflicted" is still passive/VBN (and in fact, "self" provides the agent), but there is no lemma "self-inflict", so syntactically we retain "is" as cop, despite tagging VBN inside the HYPH compound. In old PTB tokenization this would have just been JJ, but due to HYPH splitting I think it's reasonable to allow this exception of VBN + cop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants