Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: better support for Japanese additives types (e.g. amino-acids) #9073

Merged
merged 7 commits into from
Sep 28, 2023

Conversation

stephanegigandet
Copy link
Contributor

@stephanegigandet stephanegigandet commented Sep 26, 2023

from @Naruyoko:

"
https://openfoodfacts.slack.com/archives/C06A7LENM/p1695487413576439

Explanation of 「調味料」(flavoring?) https://www.hokeniryo.metro.tokyo.lg.jp/shokuhin/shokuten/chomiryo.html
Flavors, as additives, consists of 4 categories: アミノ酸 (amino acids, e.g. sodium L-aspartate), 核酸 (nucleic acids, e.g. disodium inosinate), 有機酸 (organic acids, e.g. calcium citrate), and 無機塩 (inorganic salts, e.g. potassium chloride). They are labeled in form of 「調味料({category name})」, or 「調味料({dominant category name}等)」 if more than two categories are included.
How can this be parsed? There are several occurrences of アミノ酸等 that are currently unrecognized."

  • added new "additives types" like amino-acids in the additives.txt taxonomy
  • changed the way we handle stopwords in Japanese: do not require a word boundary (e.g. a space) as there are usually none
  • added tests

@github-actions github-actions bot added the Tags label Sep 26, 2023
@@ -221,7 +221,7 @@ ga: Frithocsaídeoir
hr: antioksidans, antioksidansi, antioksidant, antioksidat, antioksindans
hu: Antioxidáns, antioxidánsok
it: Antiossidante, antiossidanti, agente antiossidante, agenti antiossidanti
ja: 酸化防止剤
ja: 酸化防止剤,酸化防 止剤
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Placing a space in a word does not make sense. They should be corrected instead.

@@ -13814,7 +13815,7 @@ hr:modificirani škrob, modificiran škrob
hu:Módosított keményítő
is:umbreytt sterkja
it:amido modificato, Amidi modificati
ja:加工デンプン
ja:加工デンプン,加工 デンプン
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as 酸化防止剤, placing a space does not make sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This synonym with a space should be removed as well.

"ja-additives",
{
lc => "ja",
ingredients_text => "増粘剤(加工デンプン、キサンタン)、酢酸Na、トレハロース、加工 デンプン、グリシン、調味料(アミノ酸等)、酸化防 止剤(V.C, V.E)、着色料(野菜色素)",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is dubious because Japanese text should not contain spaces.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I found it in some products, I guess they were introduced by the OCR when there was a line feed. I'll remove the spaces.

@codecov-commenter
Copy link

codecov-commenter commented Sep 27, 2023

Codecov Report

Merging #9073 (f658df0) into main (2d4b2f3) will increase coverage by 0.00%.
The diff coverage is 75.00%.

@@           Coverage Diff           @@
##             main    #9073   +/-   ##
=======================================
  Coverage   47.81%   47.81%           
=======================================
  Files          64       64           
  Lines       19939    19942    +3     
  Branches     4822     4823    +1     
=======================================
+ Hits         9534     9536    +2     
- Misses       9161     9162    +1     
  Partials     1244     1244           
Files Coverage Δ
lib/ProductOpener/Tags.pm 41.16% <100.00%> (+0.06%) ⬆️
lib/ProductOpener/Test.pm 72.64% <0.00%> (-0.32%) ⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@stephanegigandet stephanegigandet changed the title fix: better support for Japanese additives types (e.g. amino-acids) - WIP fix: better support for Japanese additives types (e.g. amino-acids) Sep 27, 2023
@github-actions github-actions bot added the 💥 Merge Conflicts 💥 Merge Conflicts label Sep 27, 2023
@sonarcloud
Copy link

sonarcloud bot commented Sep 28, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

@github-actions github-actions bot removed the 💥 Merge Conflicts 💥 Merge Conflicts label Sep 28, 2023
@@ -104,7 +104,7 @@
"origins_of_ingredients" : {
"aggregated_origins" : [
{
"epi_score" : 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this expected?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an artefact of the tests, Perl has this strange habit of changing the type of scalars according to the last operation that was done on them, and for some reason from one test run to another we get differences.

Copy link
Contributor

@raphael0202 raphael0202 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a comment about GS1 integration tests that seem irrelevant to this PR

@stephanegigandet stephanegigandet merged commit 864cf2c into main Sep 28, 2023
13 checks passed
@stephanegigandet stephanegigandet deleted the ja-additives branch September 28, 2023 14:59
@teolemon teolemon added the 🇯🇵 Japan https://jp.openfoodfacts.org/ label May 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🧪 additives 🥗 Ingredients 🇯🇵 Japan https://jp.openfoodfacts.org/ labels Tags 🧬 Taxonomies https://wiki.openfoodfacts.org/Global_taxonomies 🧪 tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants