-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
review mappings in CILI from PWN30 to PWN31 #17
Open
arademaker
wants to merge
4
commits into
master
Choose a base branch
from
issue-16
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This creates two synsets in PWN31 with the same ID
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, the PWN30 synset was split into two:
WN30 00040058-s {'supine%5:00:00:passive:01', 'unresisting%5:00:00:passive:01', 'resistless%5:00:00:passive:01'}
offering no resistance; "resistless hostages"; "No other colony showed such supine, selfish helplessness in allowing
her own border citizens to be mercilessly harried"- Theodore Roosevelt
WN31 00040189-s {'unresisting%5:00:00:passive:01', 'resistless%5:00:00:passive:01'}
offering no resistance; "resistless hostages"
WN31 00040305-s {'supine%5:00:00:passive:01'}
passive as a result of indolence or indifference; "No other colony showed such supine, selfish helplessness in allowing her own border citizens to be mercilessly harried"- Theodore Roosevelt
If we consider the definition only, we can say that WN30 00040058-s maps to WN31 00040189-s. But one of its senses and one of its examples are now in another synset. There are some other cases similar to that, so let us first discuss that case, ok? @fcbond @jmccrae
WN30 00040058-s has only one
similarTo
relation with 00039592-a. This relation was projected to WN31 00040305-s and WN31 00040189-s which are both similarTo WN31 00039705-a. Moreover, both WN31 synsets also have an antonym relation to 00038863-a. This means they could not be differentiated by their relations in WN31 so the split is suspicious, they are indistinguishable (by their relations) in both WN31 and WN30. Yep, the glosses and examples differ, but the relations are the real WordNet criteria to define and distinguish a synset.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, the ili need not be 1-1 with PWN31, right? I am assuming that one ili can map to more than one synset in the same wordnet. So if we consider that i202 is a concept that is both 00040189-s and 00040305-s according to PWN31 , is it fine, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would seem that 00040058-s and 00040189-s are the same and should both be mapped to i202. However 00040305-s is a novel sense and will need to be assigned a new ILI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The gist at https://gist.githubusercontent.com/ekaf/8cd78cce7005abd923c7ed2af47238e2 pretty prints the wordnet splits dictionary from NLTK, with information about how many senses are carried over into each part of the split. With WN 3.1 it outputs this file:
out-wnsplits.txt, listing the 33 splits since WN 3.0. The first line is:
00040058-s -> 00040305-s (1 sensekey/s) + 00040189-s (2 sensekey/s)
This shows that 00040305-s contains one sense from the source synset, while 00040189-s contains two.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the previous ILI mappings contained no splits, this PR introduces the following 5:
i202 00040189-s,00040305-s
i40396 00951435-n,00951878-n
i63228 07059027-n,07059160-n
i72354 06836640-n,06836790-n
i90722 10230249-n,10230422-n
So it seems that until now, mappers have made an effort to select only one most adequate target for each source. I think there is a good reason for avoiding to create splits, because having two targets yields both a true and a false positive for each involved sense.