Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ellipsis in UD #1044

Open
ClaudiaCorbe opened this issue Jul 14, 2024 · 2 comments
Open

Ellipsis in UD #1044

ClaudiaCorbe opened this issue Jul 14, 2024 · 2 comments
Labels
dependencies ellipsis ellipsis and elliptical structures enhancement universal
Milestone

Comments

@ClaudiaCorbe
Copy link
Contributor

Ellipsis in UD:

Hi,
I'm working on the Italian_Old treebank, which consists (so far) of part of the Divine Comedy, an Old Italian poetry text.
During the process of annotation, I faced several problems with the annotation of ellipses.

As you already know, in UD there are two possibilities for annotating elliptical structures:

  1. orphan deprel
  2. promotion

However, UD annotation (excluding Enhanced Dependency, which, so far, are not as numerous as standard treebanks) makes it difficult to retrieve and analyze ellipses. On one hand, the orphan relation signals the presence of an ellipsis, but it obscures the dependency relations of the sentence (see example 1 below). On the other hand, promotion is used without explicitly signaling the ellipsis, resulting in a loss of information regarding the presence of this phenomenon (see example 2).

Example 1:
Ed elli a me (Inferno, III v. 76)
Gloss = And he to me

Example 2:
e la lingua (...) si fende, e la forcuta ne l'altro si richiude (Inferno, XXV, vv. 133-135)
Gloss = And the tongue (...) REFL cleave.3sing, and the forked.femsing in the other REFL close.3sing

Schermata 2024-07-14 alle 14 26 20

I suggest the possibility of:

  • introducing dependency relation subtypes for ellipsis in promotion :ellipsis, to easily retrieve such cases;
  • modifying the dependency relation orphan, by keeping the original dependency relation of the node and also adopting the specific subtype :ellipsis for similar cases (X:ellipsis).

I will provide the same example given before with the suggested modification:

Schermata 2024-07-14 alle 14 52 02

Schermata 2024-07-14 alle 14 58 28

In the first example of ellipsis, I have also been suggested to select a me (to me) as the head, resulting in the following structure:

Schermata 2024-07-14 alle 14 57 11

To deal with cases where we already have a subtype (e.g., nsubj:pass), we could adopt the @ symbol, as used in SUD, resulting in nsubj:pass@ellipsis.

@ClaudiaCorbe ClaudiaCorbe added dependencies universal ellipsis ellipsis and elliptical structures labels Jul 14, 2024
@nschneid
Copy link
Contributor

Thanks for bringing this up—I agree the current treatment of ellipsis is not fully satisfying!

Speaking just to what we do in English:

We are reluctant to introduce many new subtypes as we feel that ~50 deprels is what our annotators will be able to handle.

In EWT, I have started adding Promoted=Yes to MISC where I notice non-orphan cases of ellipsis. This will help us understand why an ADJ is attaching as nsubj, for example (and reassure us that it's not an error).

Regarding the orphan cases, in English we have enhanced graphs, so the underspecification of orphan is not an issue. If you wanted to hint at the inferred deprel without introducing an enhanced graph or adding a bunch of subtypes, you might experiment with a new MISC attribute for that, e.g. EllipsisDeprel=obl. This could be a stepping stone toward adding the enhanced graph in the future.

@Stormur
Copy link
Contributor

Stormur commented Jul 22, 2024

I would like to notice that subtypes are not new relations, though, especially when they are simple references to main types.

Here we are speaking more of the status of a relation as appearing in an elliptical construction or not. I am totally in favour of introducing "relation statuses" which remind me feature layers, and of which we might discuss the exact annotation (@, [], ... ). In fact, I am convinced this is strongly needed. It might be possible to consider regular "subtype extensions" for other underdefined relations, e.g. dislocated.

It is different than enhanced annotation, because it does not involve reconstructing the non-elliptical version (if ever possible!) of the construction, which might be beyond the goals of many treebanks. It just signals challenging cases, and this already is extremely beneficial to queries and data extraction. Also, not everybody is willing to query on enhanced graphs.

In EWT, I have started adding Promoted=Yes to MISC where I notice non-orphan cases of ellipsis. This will help us understand why an ADJ is attaching as nsubj, for example (and reassure us that it's not an error).

One could argue that, if this happens, it is always an ellipsis. This is one of the main points of the OP.

@dan-zeman dan-zeman added this to the v2.15 milestone Aug 29, 2024
@dan-zeman dan-zeman modified the milestones: v2.15, later Aug 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies ellipsis ellipsis and elliptical structures enhancement universal
Projects
None yet
Development

No branches or pull requests

4 participants