Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

review mappings in CILI from PWN30 to PWN31 #17

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 45 additions & 19 deletions ili-map-pwn31.tab
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,7 @@ i112 00023695-s
i113 00023894-s
i114 00024180-s
i115 00024282-s
i116 00024458-a
i117 00024701-s
i118 00493366-s
i119 00025079-s
Expand Down Expand Up @@ -198,6 +199,7 @@ i199 00039507-s
i200 00039705-a
i201 00040060-s
i202 00040189-s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This creates two synsets in PWN31 with the same ID

Copy link
Member Author

@arademaker arademaker Nov 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, the PWN30 synset was split into two:

% rg "i202\t" ili-map-pwn3*
ili-map-pwn31.tab
201:i202	00040189-s
202:i202	00040305-s

ili-map-pwn30.tab
202:i202	00040058-s

WN30 00040058-s {'supine%5:00:00:passive:01', 'unresisting%5:00:00:passive:01', 'resistless%5:00:00:passive:01'}
offering no resistance; "resistless hostages"; "No other colony showed such supine, selfish helplessness in allowing
her own border citizens to be mercilessly harried"- Theodore Roosevelt

WN31 00040189-s {'unresisting%5:00:00:passive:01', 'resistless%5:00:00:passive:01'}
offering no resistance; "resistless hostages"

WN31 00040305-s {'supine%5:00:00:passive:01'}
passive as a result of indolence or indifference; "No other colony showed such supine, selfish helplessness in allowing her own border citizens to be mercilessly harried"- Theodore Roosevelt

If we consider the definition only, we can say that WN30 00040058-s maps to WN31 00040189-s. But one of its senses and one of its examples are now in another synset. There are some other cases similar to that, so let us first discuss that case, ok? @fcbond @jmccrae

WN30 00040058-s has only one similarTo relation with 00039592-a. This relation was projected to WN31 00040305-s and WN31 00040189-s which are both similarTo WN31 00039705-a. Moreover, both WN31 synsets also have an antonym relation to 00038863-a. This means they could not be differentiated by their relations in WN31 so the split is suspicious, they are indistinguishable (by their relations) in both WN31 and WN30. Yep, the glosses and examples differ, but the relations are the real WordNet criteria to define and distinguish a synset.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, the ili need not be 1-1 with PWN31, right? I am assuming that one ili can map to more than one synset in the same wordnet. So if we consider that i202 is a concept that is both 00040189-s and 00040305-s according to PWN31 , is it fine, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would seem that 00040058-s and 00040189-s are the same and should both be mapped to i202. However 00040305-s is a novel sense and will need to be assigned a new ILI

Copy link

@ekaf ekaf Jan 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The gist at https://gist.githubusercontent.com/ekaf/8cd78cce7005abd923c7ed2af47238e2 pretty prints the wordnet splits dictionary from NLTK, with information about how many senses are carried over into each part of the split. With WN 3.1 it outputs this file:
out-wnsplits.txt, listing the 33 splits since WN 3.0. The first line is:

00040058-s -> 00040305-s (1 sensekey/s) + 00040189-s (2 sensekey/s)

This shows that 00040305-s contains one sense from the source synset, while 00040189-s contains two.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the previous ILI mappings contained no splits, this PR introduces the following 5:

i202 00040189-s,00040305-s
i40396 00951435-n,00951878-n
i63228 07059027-n,07059160-n
i72354 06836640-n,06836790-n
i90722 10230249-n,10230422-n

So it seems that until now, mappers have made an effort to select only one most adequate target for each source. I think there is a good reason for avoiding to create splits, because having two targets yields both a true and a false positive for each involved sense.

i202 00040305-s
i203 00040548-a
i204 00040757-s
i205 00040908-a
Expand All @@ -211,7 +213,7 @@ i212 00042063-a
i213 00042258-a
i214 00042449-a
i215 00042677-a
i216 00035037-s
i216 00042912-s
i217 00043057-s
i218 00043202-s
i219 00043345-a
Expand Down Expand Up @@ -3755,7 +3757,7 @@ i3760 00678636-s
i3761 00678741-s
i3762 00678855-s
i3763 00678969-a
i3764 00679361-s
i3764 00679196-s
i3765 00679361-s
i3766 00679539-s
i3767 00679725-a
Expand Down Expand Up @@ -4224,7 +4226,7 @@ i4230 00769908-s
i4231 00770017-a
i4232 00770517-s
i4233 00770693-s
i4234 00766556-s
i4234 00770909-s
i4235 00771186-s
i4236 00771658-s
i4237 00771957-s
Expand Down Expand Up @@ -4294,7 +4296,7 @@ i4300 00783570-s
i4301 00783911-s
i4302 00784134-s
i4303 00784271-s
i4304 00805750-s
i4304 00784503-s
i4305 00784620-s
i4306 00784727-a
i4307 00785098-s
Expand Down Expand Up @@ -4409,7 +4411,7 @@ i4415 00805262-s
i4416 00805445-s
i4417 00805518-s
i4418 00805591-s
i4419 00805871-s
i4419 00805750-s
i4420 00805871-s
i4421 00805968-s
i4422 00806085-s
Expand Down Expand Up @@ -4662,6 +4664,7 @@ i4668 00853459-s
i4669 00853589-s
i4670 00853840-s
i4671 00853958-s
i4672 02401445-a
i4673 00854054-s
i4674 00854162-s
i4675 00854282-a
Expand Down Expand Up @@ -5081,7 +5084,7 @@ i5088 00931766-a
i5089 00932022-s
i5090 00932115-s
i5091 00932405-a
i5092 00041424-s
i5092 00932684-s
i5093 00932808-a
i5094 00933056-s
i5095 00933157-s
Expand Down Expand Up @@ -5151,6 +5154,7 @@ i5158 00945209-s
i5159 00945432-s
i5160 00945649-s
i5161 00945962-a
i5162 02504948-s
i5163 00946057-a
i5164 00946299-s
i5165 00946410-s
Expand Down Expand Up @@ -6174,11 +6178,13 @@ i6183 01131934-s
i6184 01132084-s
i6185 01132237-s
i6186 01132339-s
i6187 02344882-s
i6188 01132550-s
i6189 01132700-s
i6190 01132864-s
i6191 01133050-s
i6192 01133212-s
i6193 00065808-a
i6194 01133323-s
i6195 01133477-a
i6196 01133761-s
Expand Down Expand Up @@ -10018,6 +10024,7 @@ i10031 01832293-s
i10032 01832546-s
i10033 01832697-s
i10034 01832879-s
i10035 01832979-s
i10036 01833150-s
i10037 01833253-a
i10038 01833484-s
Expand Down Expand Up @@ -10598,7 +10605,7 @@ i10612 01943615-s
i10613 01943804-s
i10614 01944007-s
i10615 01944376-s
i10616 01939402-a
i10616 01944611-a
i10617 01944939-s
i10618 01945125-s
i10619 01945276-a
Expand Down Expand Up @@ -11434,6 +11441,7 @@ i11448 02096119-s
i11449 02096522-s
i11450 02096659-s
i11451 02096869-s
i11452 01140630-v
i11453 02096956-s
i11454 02097082-s
i11455 02097374-s
Expand Down Expand Up @@ -12780,7 +12788,7 @@ i12796 02318870-s
i12797 02318973-s
i12798 02319122-s
i12799 02319224-a
i12800 02319740-a
i12800 02319740-s
i12801 02319930-s
i12802 02320034-s
i12803 02320130-s
Expand Down Expand Up @@ -13501,6 +13509,7 @@ i13517 02449665-s
i13518 02449821-s
i13519 02449895-a
i13520 02450085-s
i13521 02450200-s
i13522 02450336-s
i13523 02450419-a
i13524 02450577-s
Expand Down Expand Up @@ -13948,6 +13957,7 @@ i13965 02528427-s
i13966 02528527-s
i13967 02528658-s
i13968 02528909-s
i13969 02528983-s
i13970 02529085-a
i13971 02529227-a
i13972 02529348-s
Expand Down Expand Up @@ -14220,6 +14230,7 @@ i14238 02576669-s
i14239 02576745-a
i14240 02577011-s
i14241 02577165-s
i14242 01342529-s
i14243 02577356-a
i14244 02577673-s
i14245 02577837-s
Expand Down Expand Up @@ -14419,6 +14430,7 @@ i14438 02609578-a
i14439 02609711-a
i14440 02609866-a
i14441 02610006-a
i14442 01040830-a
i14443 02610106-a
i14444 02610254-a
i14445 02610356-a
Expand Down Expand Up @@ -18553,6 +18565,7 @@ i18577 00073249-r
i18578 00073433-r
i18579 00073946-r
i18580 00074057-r
i18581 00003317-r
i18582 00074163-r
i18583 00074361-r
i18584 00074467-r
Expand Down Expand Up @@ -34621,6 +34634,7 @@ i34654 02600068-v
i34655 02600258-v
i34656 02600446-v
i34657 02600625-v
i34658 01185870-v
i34659 02600830-v
i34660 02600976-v
i34661 02601231-v
Expand Down Expand Up @@ -34650,7 +34664,7 @@ i34684 02605001-v
i34685 02605322-v
i34686 02605525-v
i34687 02605633-v
i34688 00680696-v
i34688 02605751-v
i34689 02605875-v
i34690 02606079-v
i34691 02606252-v
Expand Down Expand Up @@ -40358,6 +40372,8 @@ i40392 00950684-n
i40393 00950858-n
i40394 00950950-n
i40395 00951332-n
i40396 00951435-n
i40396 00951878-n
i40397 00952059-n
i40398 00952181-n
i40399 00952328-n
Expand Down Expand Up @@ -40581,6 +40597,7 @@ i40616 00997941-n
i40617 00998142-n
i40618 00998266-n
i40619 00998599-n
i40620 00998599-n
i40621 00998759-n
i40622 00998911-n
i40623 00999979-n
Expand Down Expand Up @@ -49992,6 +50009,7 @@ i50029 02716223-n
i50030 02716355-n
i50031 02716453-n
i50033 02716628-n
i50034 02716785-n
i50035 02717050-n
i50036 02717226-n
i50037 02717446-n
Expand Down Expand Up @@ -51830,6 +51848,7 @@ i51870 03024610-n
i51871 03024804-n
i51872 03024911-n
i51873 03025043-n
i51874 03025214-n
i51875 03025379-n
i51876 03025541-n
i51877 03025724-n
Expand Down Expand Up @@ -55749,7 +55768,7 @@ i55790 03690633-n
i55791 03690812-n
i55792 03690966-n
i55793 03691146-n
i55794 03872233-n
i55794 03691288-n
i55795 03691456-n
i55796 03691689-n
i55797 03691796-n
Expand Down Expand Up @@ -55777,7 +55796,7 @@ i55818 03694673-n
i55819 03694769-n
i55820 03694896-n
i55821 03695026-n
i55822 03872586-n
i55822 03695166-n
i55823 03695331-n
i55824 03695494-n
i55825 03695605-n
Expand Down Expand Up @@ -56275,7 +56294,7 @@ i56316 03782816-n
i56317 03783101-n
i56318 03783287-n
i56319 03783494-n
i56320 03872233-n
i56320 03783668-n
i56321 03783835-n
i56322 03783992-n
i56323 03784133-n
Expand Down Expand Up @@ -56585,7 +56604,7 @@ i56626 03835103-n
i56627 03835397-n
i56628 03835494-n
i56629 03835651-n
i56630 03872233-n
i56630 03835818-n
i56631 03835988-n
i56632 03836122-n
i56633 03836375-n
Expand Down Expand Up @@ -58977,6 +58996,7 @@ i59018 04238334-n
i59019 04238506-n
i59020 04238637-n
i59021 04238755-n
i59022 04238967-n
i59023 04239143-n
i59024 04239262-n
i59025 04239421-n
Expand Down Expand Up @@ -63182,6 +63202,8 @@ i63224 04997257-n
i63225 04997456-n
i63226 04997743-n
i63227 04997910-n
i63228 07059027-n
i63228 07059160-n
i63229 04997999-n
i63230 04998259-n
i63231 04998347-n
Expand Down Expand Up @@ -72306,6 +72328,7 @@ i72351 06836139-n
i72352 06836320-n
i72353 06836441-n
i72354 06836640-n
i72354 06836790-n
i72355 06836975-n
i72356 06837091-n
i72357 06837277-n
Expand Down Expand Up @@ -90093,6 +90116,7 @@ i90163 10141785-n
i90164 10141957-n
i90165 10142098-n
i90166 10142188-n
i90167 10202544-n
i90168 10142302-n
i90169 10142395-n
i90170 10142563-n
Expand Down Expand Up @@ -90648,6 +90672,7 @@ i90719 10229489-n
i90720 10229738-n
i90721 10230113-n
i90722 10230249-n
i90722 10230422-n
i90723 10230581-n
i90724 10230706-n
i90725 10230873-n
Expand Down Expand Up @@ -92634,6 +92659,7 @@ i92706 10570230-n
i92707 10570508-n
i92708 10570822-n
i92709 10571133-n
i92710 10571133-n
i92711 10571326-n
i92712 10571447-n
i92713 10571631-n
Expand Down Expand Up @@ -114595,14 +114621,14 @@ i114669 14800682-n
i114670 14800845-n
i114671 14800963-n
i114672 14801083-n
i114673 14802098-n
i114674 14802098-n
i114673 14801263-n
i114674 14801347-n
i114675 14801436-n
i114676 14802098-n
i114677 14802098-n
i114676 14801600-n
i114677 14801682-n
i114678 14801765-n
i114679 14802098-n
i114680 14802098-n
i114679 14801927-n
i114680 14802015-n
i114681 14802098-n
i114682 14802178-n
i114683 14802595-n
Expand Down