Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a new release with some improvements (1.5) #31

Open
fcbond opened this issue Feb 23, 2023 · 6 comments
Open

Create a new release with some improvements (1.5) #31

fcbond opened this issue Feb 23, 2023 · 6 comments
Milestone

Comments

@fcbond
Copy link
Contributor

fcbond commented Feb 23, 2023

@fcbond
Copy link
Contributor Author

fcbond commented Feb 23, 2023

We could go two ways with synsets like moke "British informal for donkey"

  1. link it with ir_synonym and make sure both sides have the same translations
  2. merge, and mark the senses with the dialect and register tags
  • so moke is in donkey but marked with Domain-Region united_kingdom and exemplifies informal

@ekaf
Copy link
Contributor

ekaf commented Feb 25, 2023

take from merges in oewn

@fcbond, this sounds ambiguous, and may not be optimal: merges are relative to a target English Wordnet version, so you would for ex. pick either OEWN 2021 or 2022, and then deal with different merges in later OEWN versions?
It might be better not to handle the merges in OMW-data:
NLTK now handles OMW merges seamlessly with any OEWN version, and @goodmami might eventually consider a similar approach in Wn for solving the related issue goodmami/wn#179

@arademaker
Copy link

merge, and mark the senses with the dialect and register tags
so moke is in donkey but marked with Domain-Region united_kingdom and exemplifies informal

I prefer this option

@goodmami
Copy link
Collaborator

goodmami commented Apr 2, 2023

Also consider fixing #32 for this release.

@goodmami might eventually consider a similar approach in Wn for solving the related issue goodmami/wn#179

The issue is no longer fresh in my mind, but I don't think I was planning on making any significant changes to Wn. More likely I would suggest some documentation about how to deal with such merges, such as using the code snippet I wrote in that issue. But I should first check out how it was handled in the NLTK.

@goodmami
Copy link
Collaborator

goodmami commented Oct 1, 2024

If a 1.5 version is still on the agenda, let's consider adding pre-3.0 versions of the Princeton WordNet data (see goodmami/wn#199).

@goodmami goodmami added this to the Release 1.5 milestone Oct 18, 2024
@fcbond
Copy link
Contributor Author

fcbond commented Oct 28, 2024

I am thinking I will probably not try to do too much here: identifying variants should really be done in the language project (so in OEWN for English).

These are the minimum I would like to see for this:

  • Get more out of the MCR (done, thanks @ekaf)
  • Remove various duplicates
  • Add confidence to the OMW built XML (need for OMW 2.0)
  • Add earlier PWNs
  • Move to wn 0.9.5 (@goodmami )
  • Extend tsv2lmf.py to deal with variants, counts and pronunciation (need for TUFS), almost done
  • show the release summary
  • Maybe add the other French Wordnet?

Most of these are close to done, I need to push out for review, ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants