-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add obl:agent for English passives? And rationale for Voice=Pass in English #290
Comments
Sure, I'd be on board. The query may be too low recall though, as not all passive participles have aux:pass. |
True, searching for And Basically there should be rules to validate that all the elements of the passive construction (the verb, subject if present, passive aux if present, by-phrase if present) are annotated properly. |
Not sure this is possible with perfect accuracy, since there are some pretty weird elliptical and coordinate structures; but checking manually makes sense if we can afford it. The cases you identified are now fixed in GUM via amir-zeldes/gum@851c6d5 ! |
* Implements suggestion in UniversalDependencies/UD_English-EWT#290
The head of an |
That makes sense to me. Currently I think for some reason passive voice in GUM is only in finite conditions, i.e. with an auxiliary. Do we want passive voice on all ostensibly passive participles? If so we could just apply it indiscriminately to xpos=VBN. |
I am thinking passive = subject (if there is one) will be I would leave out the VBNs with a perfect aux, and Not sure about pre-head And there are some misc cases in EWT and GUM, many of them errors Not sure about non- |
The more I think about this the more unsure I am - I mean, these are morphological annotations, right? I agree it's off-putting to have active perfects be tagged as passive voice anywhere, but we are not really tagging periphrastic constructions here (that's what deprel is for). I think basically anything tagged VBN is morphologically passive, and the passive participles in English are used for various things. But like you I also don't like tagging present perfects as Pass. I'd be curious for more input on this and some cross-linguistic information about how others are handling this, any thoughts @dan-zeman / @ftyers / others? |
I am not sure what specifically those "any thoughts" should be on, as the title of the issue is If you are rather asking about |
@amir-zeldes In French, @dan-zeman Even if the same form of the past participle is used for active and passive in English or French, I wouldn't say that the passive is periphrastic, because in many cases we have a passive construction without a copula, for instance in participial clauses: the question asked by Mary. |
Hi Dan - yes, we are asking about the morphological feature now. But I don't think I agree with the statement above - the Slavic passive participles, like the English ones, can be used in periphrastic predicative passive constructions, in postponed
In UD Polish, all three passive participles are tagged
I think the same basic arguments that apply to English would also work for French, since the morphosyntax is fairly similar, and should probably be made consistent across romance and Germanic languages if possible. In French it seems to be inconsistent, for example FQB, Sequoia and FTB all use Voice=Pass on participles, but the other corpora don't. |
Yes, you can use a passive participle attributively, but it is still a passive participle, and that's what the feature Essentially, you have three classes of participles in Slavic:
So, |
I think this is worth discussing in the broader group in case there is an overarching principle we can add to the morphology guidelines. E.g., for the position @dan-zeman is articulating: "Morphological features explain the form of the word, not necessarily its full function. Most of the features are for locating the form of the word in a slot of a morphological paradigm, and are canonical labels for the slot—they will not always reflect the morphosyntactic function in a broader construction. Thus, for example, OTOH I wonder if this is too strict with the downsides that a) if a form has two major and clearly distinct grammatical uses, it may be difficult to choose the canonical one, and b) it will hurt crosslinguistic parallelism. From one perspective it would be nice to explicitly distinguish English passive uses (which are not guaranteed to have a subject or |
Yes, so English has basically the same two first types, but not the third, and only the second is passive (actually some languages even still have the fourth aorist s-participle, like Polish wstawszy "having stood up", but that isn't passive either).
Agreed, so this would motivate doing the same for English as well, except that in English the passive one also shows up in a specific non-passive construction, on top of showing up in exactly the same passive constructions. This makes sense because the perfect tenses are a later addition, and other than that the English -ed/-en participles are etymologically 1:1 cognates of the Slavic -t-/-n- passive participles (znan-y is the exact cognate of known, etc.)
@nschneid I agree this complicates things. The way I see it our options are:
Personally I would be fine with 3. Option 1 seems too extreme to me, and I think it would be nice to be able to do things like count passives in each language in UD this way. You can't rely on deprels for this due to subjectless cases. Option 2 seems unintuitive to me because there is nothing really passive about the active perfect tenses. So that leaves 2., unless I'm missing other ideas? |
Good point. It was actually discussed in the core group in September 2016, primarily for POS tags, but also for features, and at least for POS tags there was a decision about what should be added to the overview of morphology, but for some reason the clarifying text did not make it there. I just added it now, together with a similar clarification for features.
Yes, but it was my understanding that they were being distinguished by |
Thanks for resurrecting the 2016 text in UniversalDependencies/docs@0ac5aa4 & UniversalDependencies/docs@0d64110! I still think it's worth discussing in the core group and potentially issuing a formal clarification, because different individuals/treebanks may have different thresholds for considering a use to be sufficiently different to warrant a separate label. |
Agreed, thank you both. So much has changed since 2016 and the correct thing to do seems non-obvious to me so I would welcome more discussion. Pres and Past come from names that have been used for these participles before, but they are also not a great fit (a present passive uses the "Past" participle, and a past progressive uses the "Pres" participle). We can decide to use one (or both) features all of the time, never use them, or use them contextually. Currently it seems the Tense feature is the only one used across the board in English, and the Voice one is only used if there is a subject/auxiliary, but I suspect this happened without too much reflection. |
Per the core group decision, I'll update the guidelines to say that it's OK to use Once |
Since subtypes are optional and can be used freely, we cannot forbid the use of aux:pass. We can at most recommend people not to use it once they have added Voice=Pass. However, since we don’t know whether there are languages that cannot mark passives on participles, and since many treebanks will not add this marking in the near future, it seems wise to me to allow the redundancy for the time being.
My two cents …
Best,
Joakim
Skickat från min iPhone
|
No. |
Of course! English is one of them :-) Also German, Spanish, probably the other Romance languages... |
I meant where there is not a verb form strongly associated with the passive. (In English there are two main uses of past participles—the passive and the perfect. To my knowledge an auxiliary is always required with the perfect, but that is not true of the passive.) Anyway, I'm not suggesting to forbid |
Since it has been strongly recommended for 8 years, and it will not disappear from the 60+ languages where it is currently used, I would keep it in English as well. |
A second question is whether we should restrict
|
I also agree we should keep I'm not passionate about putting Voice=Pass on adjectives, but I'm convinceable, esp. if it turns out this is common/standard in other languages. |
OK. Neither EWT nor GUM currently have it on any amod VERBs. So for now let's just make sure it's used consistently in other contexts. |
Coming back to passives. Gonna see if I can incorporate Here is a query that seems to work for This excludes by-obliques headed by an ADJ derived from a verb. These are unambiguously adjectives thanks to a prefix or suffix: "payable by me", "unrestrained by anything in the Constitution", "unaffected by sweeping cuts", (GUM) "unobstructed by the Earth's atmosphere". The by-phrase marks the same semantic role as the
Here are cases currently excluded as Some have a passive or Some of the |
@amir-zeldes curious for your take on the following which currently have cop + VERB:
How would you deal with these in GUM, considering that the predicates are compounds with adjective-like properties? |
I think I would view them as passive after all, it seems conspicuous to me that you can have a by-phrase ("worst affected by the storm") |
Some more with "very":
In both of these, "very" attaches to the VERB as |
I think you can find many real examples for "well made by" and similar. I agree it's strange to tag neat looking as a VERB but that's just a consequence of UPOS not having a participle tag. I think that's what they are - something neat looking is something which looks neat. |
If "very" is not a test for ADJ, how should the line be drawn? GUM has plenty of ADJs that look like they could be considered participles and (in the case of past participles) might license a by-phrase. |
The PTB guidelines for the distinction are pretty murky, but I think the intensification test is probably the weakest one, because almost any participle used attributively can be intensified - so I allow pretty much any other test to override it. In the examples you point to (which may also include actual errors of course), some of the most compelling countertests apply - negation with un- is a total deal breaker for a participle, since the lemma wouldn't exist (uncredited -> *uncredit). In other cases, we can use a relative clause paraphrase to show that the relevant verbal lemma does not apply, and the adjective is lexicalized. For example, commercially interested research is not research which is interested in anything/commerce interests it etc. But availability of, or even better attestation of the by agent is for me a clear indication that we are still dealing with a form of the verb (if you are surprised by some results, then the results surprise you- they are still argument fillers of a verb, even if you are "very surprised") I don't know how idiosyncratic my reading of the PTB guidelines is, but I think/hope it is informed by corpus searches in LDC corpora when in doubt (especially OntoNotes), although those are far from 100% consistent either. |
OK—worth noting that by-phrases can also be present with adjectives (I was unbothered by the high temperatures), though they are not The cat is stressed out seems more adjective-like than "well-known" etc., because e.g. stressed by itself can be negated. I guess unstressed out is weird though I could imagine it in casual speech: "I'm feeling pretty un–stressed out", where "un-" scopes over the phrase. With very excited for this yoga class, you can surely be unexcited about something, and it looks like unexcited for something is also attested on the web. Also having a preposition other than by is weird if we're saying it's passive. Would that be better as ADJ? |
[A rare reply from me!!!]
Thanks for doing this, Nathan.
I agree with your decisions:
- xcomp cases like "became known" --> known is Voice=Pass
- But, yes, I think you should exclude passive aux "been"
- I'm fine with "is supposed to" being marked passive (though it is being
grammaticalized)
- I don't think we should have expl:pass, even though the position of the
expletive can be promoted from obj to subj by passivization.
But where I disagree:
- "It has/Voice=Act been/Voice=Act destroyed/Voice=Pass.": I guess I come
from "the Anglocentric tradition" (maybe also Francophone, given Sylvain's
comments). It seems insane to me to mark things like this. I would apply
Active and Passive to the main verb only. I think auxiliaries should not
be given Voice!
Participles:
- The PTB manual tests for JJ vs VBN may be somewhat murky, but I think
they're about the best we have. So if the word seems best analyzed as a
participial adjective, then I'd do that with ADJ and no Passive, but if
not, then I'd make them passive.
Chris.
…On Sun, Oct 29, 2023 at 1:30 PM Nathan Schneider ***@***.***> wrote:
OK—worth noting that *by*-phrases can also be present with adjectives (*I
was unbothered by the high temperatures*), though they are not obl:agent.
*The cat is stressed out* seems more adjective-like than "well-known"
etc., because e.g. *stressed* by itself can be negated. I guess *unstressed
out* is weird though I could imagine it in casual speech: "I'm feeling
pretty un–stressed out", where "un-" scopes over the phrase.
With *very excited for this yoga class*, you can surely be *unexcited
about* something, and it looks like *unexcited for* something is also
attested on the web. Also having a preposition other than *by* is weird
if we're saying it's passive. Would that be better as ADJ?
—
Reply to this email directly, view it on GitHub
<#290 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AASTHIT3BMLSCOYJVEETLHTYB24FJAVCNFSM5LS2PGJKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCNZYGQZDCNZRHA2Q>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***
com>
|
Thanks @manning. I've updated https://universaldependencies.org/en/feat/Voice.html with new guidelines (though it's not updating presently, so see https://github.com/UniversalDependencies/docs/blob/pages-source/_en/feat/Voice.md). I realized that we hadn't considered AUXes promoted to predicate of the clause. It is possible that this happens due to elliptical stranding and that the elliptical clause would be interpreted as passive, as in some of these results. But usually the AUX would not be a VBN ("been" is the only one that qualifies). I think to avoid confusion we should just limit |
Agreed. Basically the order of priority of PTB tests for me places everything above intensification, I think I operate like this:
|
OK - I lightly edited your script, note you need ; No Voice for VBN functioning as aux/cop
xpos=/VBN/&func=/cop|aux(:pass)?/ none #1:storage=no_voice;#1:morph-=Voice
; Has a dependent specific to passive construction: nsubj:pass, csubj:pass, aux:pass, or obl:agent
xpos=/VBN/&storage!=/no_voice/;func=/.*:pass|obl:agent/ #1>#2 #1:morph+=Voice=Pass
; Has a plain aux but no aux:pass, indicating the VBN is there because of the perfect construction
xpos=/VBN/&storage!=/no_voice/&morph!=/.*Voice.*/;func=/aux/ #1>#2 #1:storage=perfect
; "Got/VBN" assumed perfect (even without aux: "Got it." short for "I've got it.", "I gotta have it." short for "I've got to have it", etc.)
xpos=/VBN/&storage!=/no_voice/&morph!=/.*Voice.*/&lemma=/get/ none #1:storage=perfect
; "Have" aux assumed to scope over coordination
xpos=/VBN/&storage!=/no_voice|perfect/&morph!=/.*Voice.*/&func=/conj/;xpos=/VBN/;func=/.*:pass|obl:agent/ #2>#1;#2>#3 #1:storage=par_passive
xpos=/VBN/&storage!=/no_voice|perfect|par_passive/&morph!=/.*Voice.*/&func=/conj/;xpos=/VBN/;func=/aux/ #2>#1;#2>#3 #1:storage=perfect
; All other VBNs assumed passive
xpos=/VBN/&lemma!=/be/&storage!=/no_voice|perfect/&morph!=/.*Voice.*/ none #1:morph+=Voice=Pass |
Hmm...I was confused about how regex matching works because in a previous Depedit script I searched for something like |
Actually now all the |
Yes, it's a little counterintuitive right now - basically if re.escape of the criterion is identical to itself, then depedit uses exact match (string identity) instead of regex, to save time. This means that for a regular string, omitting ^$ is faster and equivalent. But if you have pipes, now you are forcing the system to use regex, and now it will use re.search, so it becomes necessary again to use
Became:
leading to the error due to operator priority and lack of brackets (which would change the meaning, since depedit uses capturing groups. I should probably make it wrap all regex-y values in non-capturing brackets... Are you sure we don't need the morph!=/.Voice./? Then I'd take it out |
Ah I think I get it.
Yes that should work,
If |
OK, done! |
@amir-zeldes reopening for: |
Looks like I didn't get to this on time... It will end up in the next release though! |
GUM cases fixed upstream and validation integrated into build bot. Will commit after freeze. |
@amir-zeldes are these GUM hits corrected upstream? https://universal.grew.fr/?custom=653dccf9ed623 |
Yes, with the exception of compounds, due to revised PTB tokenization. So the second match, for example, is fixed ("is wasted"), but the first is not ("self-inflicted"), since morphologically, "inflicted" is still passive/VBN (and in fact, "self" provides the agent), but there is no lemma "self-inflict", so syntactically we retain "is" as |
https://universaldependencies.org/u/dep/obl-agent.html
I assume English has the
:pass
subtype on subjects/auxes, but not:agent
on by-phrases, for historical reasons. We should probably add it for crosslinguistic comparability.Most, but not all, of the by phrases recovered by this query are passive agents:
Issue #289 should be addressed first to correct errors in the
:pass
annotations.The text was updated successfully, but these errors were encountered: