Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SCONJ should correspond to xpos=IN #457

Closed
2 tasks done
nschneid opened this issue Oct 27, 2023 · 15 comments
Closed
2 tasks done

SCONJ should correspond to xpos=IN #457

nschneid opened this issue Oct 27, 2023 · 15 comments

Comments

@nschneid
Copy link
Contributor

nschneid commented Oct 27, 2023

(This is relevant to adding Voice=Pass #290 as the validator doesn't allow SCONJ to have a Voice feature.)

@nschneid
Copy link
Contributor Author

I guess the goeswith one is actually correct. Fixed the others in EWT.

@amir-zeldes
Copy link
Contributor

OK, fixed GUM except for the VBGs, they're intentional; but that shouldn't matter if we're now not doing Voice=Act.

@nschneid
Copy link
Contributor Author

I thought we decided the VBGs should be tagged VERB (but the deprel of mark is correct).

@amir-zeldes
Copy link
Contributor

Oh I see, I guess that makes sense. I'll do that for GUM & co. too

@nschneid
Copy link
Contributor Author

@amir-zeldes There's still a slight divergence between GUM and EWT here: EWT uses ADJ for "such" in the fixed expression "such as". (The expression as a whole functions as mark, so ExtPos=SCONJ would be appropriate.)

@amir-zeldes
Copy link
Contributor

I see, OK, I can still change this. I guess the simplest solution for absolute parity would be to use the same upos script on both datasets, but I think there are some manual upos edits in EWT, right?

@nschneid
Copy link
Contributor Author

nschneid commented Nov 1, 2023

I don't have a UPOS generation script for EWT—I have been editing whatever was in the .conllu over the years. If you wanted to run the GUM one on EWT that would make for an interesting comparison.

@amir-zeldes
Copy link
Contributor

Sure, it's mostly just depedit, though it's possible there's some other tinkering going on in the buildbot, I'd have to look. It seems like a hassle to maintain both upos and xpos, and since I take it xpos is pretty high quality, I like to correct just that, and project to upos using the tree.

BTW GENTLE also has "such that", I guess we want that to be ADJ SCONJ? Currently they're annotated as mark sisters, since there is no fixed "such that" on the fixed list.

@nschneid
Copy link
Contributor Author

nschneid commented Nov 1, 2023

Hmm, EWT just has 'are such that' which is not quite the same. When it acts as mark, I would probably include "such that" on the fixed list if "such as" is there. I don't think it makes sense to say that "such" can independently be mark.

@nschneid
Copy link
Contributor Author

nschneid commented Nov 1, 2023

"so that" is also on the fixed list BTW. Cf. #400

@amir-zeldes
Copy link
Contributor

Yeah, factually that all makes sense to me, I'm just very cautious about deviations from the list. So you want to canonize "such that" which only appears in GENTLE as fixed? In fairness, it does appear 7 times in two documents there (mathematical proofs)

@nschneid
Copy link
Contributor Author

nschneid commented Nov 1, 2023

Yeah. There are even a few fixed expressions in EWT that never got added to the list for some reason. At some point we should discuss those too.

@nschneid
Copy link
Contributor Author

nschneid commented Nov 1, 2023

@amir-zeldes
Copy link
Contributor

OK, I changed GENTLE trees to match that, and added it to the fixed list in the UD guidelines and GUM wiki. I feel a bit bad for parsers testing on GENTLE, which is meant to be an OOD test set, since we just introduced a fixed expression that is totally absent from GUM/EWT, meaning it would be unreasonable to expect it to be predicted correctly... But such is life I suppose!

@amir-zeldes
Copy link
Contributor

Thanks for adding the commits hashes - I think this is good to close!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants