Is there any mapping between different English wordnet? #176

rudaoshi · 2022-10-18T13:15:45Z

There have been may English wordnets and I wonder whether there is any mapping between the ids of synsets in these wordnets, for example, oewn/ewn <-> omw.

If there is, please tell me how to get the mapping.

Thank you ~

fcbond · 2022-10-18T13:27:09Z

Hi, the different versions of the Princeton wordnet use sensekeys to link senses: they are meant to be stable between versions, although there have been occasional surprises (sometimes different capitalization has caused issues). OEWN and OMW use the ILI keys to link synsets: Francis Bond, Piek Vossen, John McCrae, and Christiane Fellbaum (2016) CILI: the Collaborative Interlingual Index. In Proceedings of the 8th Global WordNet Conference (GWC2016), Bucharest. pp 50–57 https://aclanthology.org/2016.gwc-1.9/

…

On Tue, 18 Oct 2022 at 15:15, 孙明明 ***@***.***> wrote: There have been may English wordnets and I wonder whether there is any mapping between the ids of synsets in these wordnets, for example, oewn/ewn <-> omw. If there is, please tell me how to get the mapping. Thank you ~ — Reply to this email directly, view it on GitHub <#176>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAIPZRQWNG3EMTFKSPIJCWDWD2PIZANCNFSM6AAAAAARICOBEE> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

-- Francis Bond <https://fcbond.github.io/>

goodmami · 2022-10-20T04:31:25Z

@rudaoshi, to add to what @fcbond said, in Wn you can use the ili member of a synset to see equivalent synsets across versions or even across lexicons for another language:

>>> import wn
>>> oewn = wn.Wordnet('oewn')
>>> wn30 = wn.Wordnet('omw-en')
>>> oewn.synsets('penumbra')[0].ili
ILI('i110430')
>>> wn30.synsets('penumbra')[0].ili
ILI('i110430')
>>> wn30.synsets(ili='i110430')[0].lemmas()
['penumbra']
>>> wnja = wn.Wordnet('omw-ja')
>>> wnja.synsets(ili='i110430')[0].lemmas()
['半影']

For the omw-en lexicons (which are directly converted from the Princeton WordNet with very few changes), the sensekeys are available as the identifier metadata of senses, but these are not available for other lexicons:

>>> wn30.senses('penumbra')[0].metadata()
{'identifier': 'penumbra%1:26:00::'}
>>> oewn.senses('penumbra')[0].metadata()
{}
>>> wnja.senses('半影')[0].metadata()
{}

ekaf · 2022-10-23T07:41:33Z

Thanks @goodmami and @fcbond . I did not understand this correctly before, but now, I think I start to get a more accurate picture of the implicit "mapping" in Wn. Actually, it seems that Wn does no mapping by itself, but loads resources that were previously mapped to ILI.
This mapping was done by external projects: OMW mapped the multilingual wordnets using the ili-map-pwn30.tab file from CILI-1.0, while OEWN used the corresponding pwn31 mapping.
Joining these mappings gives an intersection of 117583 identifiers, while the recall in OEWN 2021 is only 117441.

import wn

def ili_loss(wnstring1, wnstring2):
# WN 1:
    wn1 = wn.Wordnet(wnstring1)
    v1 = wn1.lexicons()
    i1 = wn1.ilis()
    n1 = len(i1)
    print(f"{v1}: {n1} synsets")
# WN 2:
    wn2 = wn.Wordnet(wnstring2)
    v2 = wn2.lexicons()
    i2 = wn2.ilis()
    n2 = len(i2)
    print(f"{v2}: {n2} synsets")
# Intersection:
    ii = set(i1).intersection(i2)
    ni = len(ii)
    print(f"Intersection: {ni} synsets")
    loss = n1 - ni
    pct = 100 * loss/n1
    print(f"Loss: {loss} synsets ({round(pct,2)})%")

ili_loss('omw-en', 'oewn')

[<Lexicon omw-en:1.4 [en]>]: 117659 synsets
[<Lexicon oewn:2021 [en]>]: 120039 synsets
Intersection: 117441 synsets
Loss: 218 synsets (0.19)%

ili_loss('omw-ja', 'oewn')

[<Lexicon omw-ja:1.4 [ja]>]: 57184 synsets
[<Lexicon oewn:2021 [en]>]: 120039 synsets
Intersection: 57076 synsets
Loss: 108 synsets (0.19)%

ili_loss('omw-arb', 'oewn')

[<Lexicon omw-arb:1.4 [arb]>]: 9916 synsets
[<Lexicon oewn:2021 [en]>]: 120039 synsets
Intersection: 9887 synsets
Loss: 29 synsets (0.29)%

I suppose that a part (though not all) of this difference can be attributed to #179.

goodmami · 2023-03-12T20:06:11Z

It seems like the original question has been answered.

goodmami added the question Further information is requested label Oct 20, 2022

goodmami closed this as completed Mar 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there any mapping between different English wordnet? #176

Is there any mapping between different English wordnet? #176

rudaoshi commented Oct 18, 2022

fcbond commented Oct 18, 2022 via email

goodmami commented Oct 20, 2022

ekaf commented Oct 23, 2022 •

edited

Loading

goodmami commented Mar 12, 2023

Is there any mapping between different English wordnet? #176

Is there any mapping between different English wordnet? #176

Comments

rudaoshi commented Oct 18, 2022

fcbond commented Oct 18, 2022 via email

goodmami commented Oct 20, 2022

ekaf commented Oct 23, 2022 • edited Loading

goodmami commented Mar 12, 2023

ekaf commented Oct 23, 2022 •

edited

Loading