[ENH] Lemmagen - Use ISO language codes #1025

PrimozGodec · 2023-12-01T15:49:07Z

Issue

This PR is part of #963, which I am splitting into smaller pieces for easier review.
The main motivation behind this is to make Preprocess work with language from Corpus.

Description of changes

This PR prepare a Lemmagen normalizer to communicate (get and return languages) as ISO codes, which is necessary to enable language from Corpus (languages are stored in Corpus in ISO format).

After I changed Lemmagen to work with ISO language codes, I also had to adapt the Preprocess Widget to store settings as ISO codes and call the Lemmagen filter with ISO language code.

Udpipe and Snowball will be implemented in separate PRs.

Includes

Code changes
Tests
Documentation

PrimozGodec · 2023-12-01T16:12:37Z

/rebase

VesnaT · 2023-12-11T10:59:40Z

I get the following error message when I open a saved workflow.

I'm attaching the workflow (I reset settings before creating it):
untitled2.ows.zip

PrimozGodec · 2023-12-12T14:25:54Z

@VesnaT, the problem here is that I didn't increase the settings version. Since I increased it in #1024, and it has not been released yet, I think it may not be necessary. What do you think? Increasing the settings version would complicate the implementation of the migrations.

If you are okay with not changing the settings version, you can make te workflow with tag 1.15.0 (git checkout 1.15.0) and open it with this change. It should work.

PrimozGodec added 2 commits December 1, 2023 16:12

Lemmagen - Use ISO language codes instead of names

535286b

Preprocess - Use ISO language codes for Lemmagen

6ee6269

biolab-helper force-pushed the language-normalizers branch from f2fb09f to 6ee6269 Compare December 1, 2023 16:12

VesnaT self-assigned this Dec 8, 2023

VesnaT merged commit 0495fd5 into biolab:master Dec 12, 2023
10 of 12 checks passed

PrimozGodec deleted the language-normalizers branch December 12, 2023 14:37

PrimozGodec mentioned this pull request Dec 13, 2023

[ENH] Snowball - Use ISO language codes #1029

Merged

3 tasks

PrimozGodec mentioned this pull request Dec 22, 2023

[ENH] UDPIPE - Use ISO language codes #1030

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] Lemmagen - Use ISO language codes #1025

[ENH] Lemmagen - Use ISO language codes #1025

PrimozGodec commented Dec 1, 2023

PrimozGodec commented Dec 1, 2023

VesnaT commented Dec 11, 2023 •

edited

Loading

PrimozGodec commented Dec 12, 2023

[ENH] Lemmagen - Use ISO language codes #1025

[ENH] Lemmagen - Use ISO language codes #1025

Conversation

PrimozGodec commented Dec 1, 2023

Issue

Description of changes

Includes

PrimozGodec commented Dec 1, 2023

VesnaT commented Dec 11, 2023 • edited Loading

PrimozGodec commented Dec 12, 2023

VesnaT commented Dec 11, 2023 •

edited

Loading