Create a new release with some improvements (1.5) #31

fcbond · 2023-02-23T13:52:29Z

remove synsets with deprecated ilis (move translations from superseded concepts)
- take from merges in oewn
- identify British (and other variants), e.g. moke British informal for donkey; look also at Domain–Region: united_kingdom (and other countries)
add new omw formatted wordnets?
merge with TUFS data https://github.com/fcbond/tufs @ArthurBond
add new MCR data from @ekaf Add Spanish definitions and examples #25

fcbond · 2023-02-23T17:50:17Z

We could go two ways with synsets like moke "British informal for donkey"

link it with ir_synonym and make sure both sides have the same translations
merge, and mark the senses with the dialect and register tags

so moke is in donkey but marked with Domain-Region united_kingdom and exemplifies informal

ekaf · 2023-02-25T12:46:46Z

take from merges in oewn

@fcbond, this sounds ambiguous, and may not be optimal: merges are relative to a target English Wordnet version, so you would for ex. pick either OEWN 2021 or 2022, and then deal with different merges in later OEWN versions?
It might be better not to handle the merges in OMW-data:
NLTK now handles OMW merges seamlessly with any OEWN version, and @goodmami might eventually consider a similar approach in Wn for solving the related issue goodmami/wn#179

arademaker · 2023-02-26T00:01:02Z

merge, and mark the senses with the dialect and register tags
so moke is in donkey but marked with Domain-Region united_kingdom and exemplifies informal

I prefer this option

goodmami · 2023-04-02T16:17:03Z

Also consider fixing #32 for this release.

@goodmami might eventually consider a similar approach in Wn for solving the related issue goodmami/wn#179

The issue is no longer fresh in my mind, but I don't think I was planning on making any significant changes to Wn. More likely I would suggest some documentation about how to deal with such merges, such as using the code snippet I wrote in that issue. But I should first check out how it was handled in the NLTK.

goodmami · 2024-10-01T05:18:50Z

If a 1.5 version is still on the agenda, let's consider adding pre-3.0 versions of the Princeton WordNet data (see goodmami/wn#199).

fcbond · 2024-10-28T20:22:48Z

I am thinking I will probably not try to do too much here: identifying variants should really be done in the language project (so in OEWN for English).

These are the minimum I would like to see for this:

Get more out of the MCR (done, thanks @ekaf)
Remove various duplicates
Add confidence to the OMW built XML (need for OMW 2.0)
Add earlier PWNs
Move to wn 0.9.5 (@goodmami )
Extend tsv2lmf.py to deal with variants, counts and pronunciation (need for TUFS), almost done
show the release summary
Maybe add the other French Wordnet?

Most of these are close to done, I need to push out for review, ...

goodmami added this to the Release 1.5 milestone Oct 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create a new release with some improvements (1.5) #31

Create a new release with some improvements (1.5) #31

fcbond commented Feb 23, 2023

fcbond commented Feb 23, 2023

ekaf commented Feb 25, 2023

arademaker commented Feb 26, 2023

goodmami commented Apr 2, 2023

goodmami commented Oct 1, 2024

fcbond commented Oct 28, 2024

Create a new release with some improvements (1.5) #31

Create a new release with some improvements (1.5) #31

Comments

fcbond commented Feb 23, 2023

fcbond commented Feb 23, 2023

ekaf commented Feb 25, 2023

arademaker commented Feb 26, 2023

goodmami commented Apr 2, 2023

goodmami commented Oct 1, 2024

fcbond commented Oct 28, 2024