Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] UDPIPE - Use ISO language codes #1030

Merged
merged 2 commits into from
Feb 22, 2024
Merged

Conversation

PrimozGodec
Copy link
Collaborator

@PrimozGodec PrimozGodec commented Dec 20, 2023

Issue

This PR is part of #963, which I am splitting into smaller pieces for easier review.

The primary motivation behind this is to make Preprocess work with language from Corpus.

Description of changes

This PR prepare a UDPIPE normalizer to communicate (get and return languages) as ISO codes, which is necessary to enable language from Corpus (languages are stored in Corpus in ISO format).

After I changed UDPIPE to work with ISO language codes, I also had to adapt the Preprocess Widget to store settings as ISO codes and call the Lemmagen filter with ISO language code.

This PR also slightly change the names of UDPIPE languages in the dropdown. The change is that the names of language variations (different models for the same language) are now written in parenthesis, and all words in the multi-word language are uppercase (to match ISO standard).

Udpipe will be implemented in separate PRs.

Includes
  • Code changes
  • Tests
  • Documentation

@PrimozGodec PrimozGodec force-pushed the lang-udpipe branch 3 times, most recently from c83c8a6 to 058e521 Compare December 21, 2023 13:17
@codecov-commenter
Copy link

codecov-commenter commented Dec 21, 2023

Codecov Report

Merging #1030 (68966f4) into master (f519388) will increase coverage by 0.30%.
Report is 8 commits behind head on master.
The diff coverage is 95.10%.

❗ Current head 68966f4 differs from pull request most recent head 3a6f9b7. Consider uploading reports for the commit 3a6f9b7 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1030      +/-   ##
==========================================
+ Coverage   82.16%   82.46%   +0.30%     
==========================================
  Files          92       92              
  Lines       12283    12340      +57     
  Branches     1670     1683      +13     
==========================================
+ Hits        10092    10176      +84     
+ Misses       1882     1854      -28     
- Partials      309      310       +1     

@PrimozGodec PrimozGodec force-pushed the lang-udpipe branch 3 times, most recently from 6697c13 to 4c6b9b0 Compare December 22, 2023 08:50
@PrimozGodec PrimozGodec marked this pull request as ready for review December 22, 2023 08:51
@PrimozGodec
Copy link
Collaborator Author

@VesnaT, as #1025, you can make a workflow to test migration with tag 1.15.0 (git checkout 1.15.0) and open it with this change. It should work.

@PrimozGodec PrimozGodec changed the title Lang udpipe [ENH] Snowball - Use ISO language codes Jan 10, 2024
@PrimozGodec PrimozGodec changed the title [ENH] Snowball - Use ISO language codes [ENH] UDPIPE - Use ISO language codes Jan 12, 2024
@PrimozGodec PrimozGodec reopened this Jan 12, 2024
@PrimozGodec PrimozGodec assigned PrimozGodec and VesnaT and unassigned PrimozGodec Jan 12, 2024
@PrimozGodec
Copy link
Collaborator Author

Failing tests are fixed in #1031

@PrimozGodec
Copy link
Collaborator Author

/rebase

@VesnaT
Copy link
Contributor

VesnaT commented Jan 12, 2024

When I open an old workflow I get

---------------------------- ValueError Exception -----------------------------
Traceback (most recent call last):
  File "/Users/vesna/orange3/Orange/widgets/utils/concurrent.py", line 591, in _on_task_done
    super()._on_task_done(future)
  File "/Users/vesna/orange3/Orange/widgets/utils/concurrent.py", line 549, in _on_task_done
    self.on_done(future.result())
  File "/Users/vesna/orange3-text/orangecontrib/text/widgets/owcorpus.py", line 257, in on_done
    self.openContext(self.corpus)
  File "/Users/vesna/orange-widget-base/orangewidget/widget.py", line 1351, in openContext
    self.settingsHandler.open_context(self, *a)
  File "/Users/vesna/orange3-text/orangecontrib/text/widgets/owcorpus.py", line 49, in open_context
    ContextHandler.open_context(
  File "/Users/vesna/orange-widget-base/orangewidget/settings.py", line 833, in open_context
    self.settings_to_widget(widget, *args)
  File "/Users/vesna/orange-widget-base/orangewidget/settings.py", line 945, in settings_to_widget
    _apply_setting(setting, instance, value)
  File "/Users/vesna/orange-widget-base/orangewidget/settings.py", line 204, in _apply_setting
    setattr(instance, setting.name, value)
  File "/Users/vesna/orange-widget-base/orangewidget/gui.py", line 194, in __setattr__
    callback(value)
  File "/Users/vesna/orange-widget-base/orangewidget/gui.py", line 2315, in __call__
    self.action(*args)
  File "/Users/vesna/orange-widget-base/orangewidget/gui.py", line 2405, in action
    raise ValueError("Combo box does not contain item " + repr(value))
ValueError: Combo box does not contain item 'Ancient greek'

@PrimozGodec
Copy link
Collaborator Author

The issue should be solved in #1034.

@PrimozGodec PrimozGodec marked this pull request as draft January 26, 2024 13:59
@PrimozGodec
Copy link
Collaborator Author

/rebase

@PrimozGodec PrimozGodec marked this pull request as ready for review February 5, 2024 10:48
@PrimozGodec
Copy link
Collaborator Author

@VesnaT, I fixed migrations, so I think it is ready for review.

@VesnaT VesnaT merged commit e36fdea into biolab:master Feb 22, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants