-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SCONJ should correspond to xpos=IN #457
Comments
I guess the |
OK, fixed GUM except for the VBGs, they're intentional; but that shouldn't matter if we're now not doing Voice=Act. |
I thought we decided the VBGs should be tagged VERB (but the deprel of |
Oh I see, I guess that makes sense. I'll do that for GUM & co. too |
@amir-zeldes There's still a slight divergence between GUM and EWT here: EWT uses ADJ for "such" in the fixed expression "such as". (The expression as a whole functions as |
I see, OK, I can still change this. I guess the simplest solution for absolute parity would be to use the same upos script on both datasets, but I think there are some manual upos edits in EWT, right? |
I don't have a UPOS generation script for EWT—I have been editing whatever was in the .conllu over the years. If you wanted to run the GUM one on EWT that would make for an interesting comparison. |
Sure, it's mostly just depedit, though it's possible there's some other tinkering going on in the buildbot, I'd have to look. It seems like a hassle to maintain both upos and xpos, and since I take it xpos is pretty high quality, I like to correct just that, and project to upos using the tree. BTW GENTLE also has "such that", I guess we want that to be ADJ SCONJ? Currently they're annotated as mark sisters, since there is no fixed "such that" on the fixed list. |
Hmm, EWT just has 'are such that' which is not quite the same. When it acts as |
"so that" is also on the fixed list BTW. Cf. #400 |
Yeah, factually that all makes sense to me, I'm just very cautious about deviations from the list. So you want to canonize "such that" which only appears in GENTLE as fixed? In fairness, it does appear 7 times in two documents there (mathematical proofs) |
Yeah. There are even a few fixed expressions in EWT that never got added to the list for some reason. At some point we should discuss those too. |
OK, I changed GENTLE trees to match that, and added it to the fixed list in the UD guidelines and GUM wiki. I feel a bit bad for parsers testing on GENTLE, which is meant to be an OOD test set, since we just introduced a fixed expression that is totally absent from GUM/EWT, meaning it would be unreasonable to expect it to be predicted correctly... But such is life I suppose! |
Thanks for adding the commits hashes - I think this is good to close! |
(This is relevant to adding
Voice=Pass
#290 as the validator doesn't allow SCONJ to have aVoice
feature.)The text was updated successfully, but these errors were encountered: