-
-
Notifications
You must be signed in to change notification settings - Fork 383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: better support for Japanese additives types (e.g. amino-acids) #9073
Conversation
taxonomies/additives_classes.txt
Outdated
@@ -221,7 +221,7 @@ ga: Frithocsaídeoir | |||
hr: antioksidans, antioksidansi, antioksidant, antioksidat, antioksindans | |||
hu: Antioxidáns, antioxidánsok | |||
it: Antiossidante, antiossidanti, agente antiossidante, agenti antiossidanti | |||
ja: 酸化防止剤 | |||
ja: 酸化防止剤,酸化防 止剤 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Placing a space in a word does not make sense. They should be corrected instead.
@@ -13814,7 +13815,7 @@ hr:modificirani škrob, modificiran škrob | |||
hu:Módosított keményítő | |||
is:umbreytt sterkja | |||
it:amido modificato, Amidi modificati | |||
ja:加工デンプン | |||
ja:加工デンプン,加工 デンプン |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as 酸化防止剤, placing a space does not make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This synonym with a space should be removed as well.
tests/unit/ingredients.t
Outdated
"ja-additives", | ||
{ | ||
lc => "ja", | ||
ingredients_text => "増粘剤(加工デンプン、キサンタン)、酢酸Na、トレハロース、加工 デンプン、グリシン、調味料(アミノ酸等)、酸化防 止剤(V.C, V.E)、着色料(野菜色素)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is dubious because Japanese text should not contain spaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I found it in some products, I guess they were introduced by the OCR when there was a line feed. I'll remove the spaces.
Codecov Report
@@ Coverage Diff @@
## main #9073 +/- ##
=======================================
Coverage 47.81% 47.81%
=======================================
Files 64 64
Lines 19939 19942 +3
Branches 4822 4823 +1
=======================================
+ Hits 9534 9536 +2
- Misses 9161 9162 +1
Partials 1244 1244
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Kudos, SonarCloud Quality Gate passed! |
@@ -104,7 +104,7 @@ | |||
"origins_of_ingredients" : { | |||
"aggregated_origins" : [ | |||
{ | |||
"epi_score" : 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's an artefact of the tests, Perl has this strange habit of changing the type of scalars according to the last operation that was done on them, and for some reason from one test run to another we get differences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just a comment about GS1 integration tests that seem irrelevant to this PR
from @Naruyoko:
"
https://openfoodfacts.slack.com/archives/C06A7LENM/p1695487413576439
Explanation of 「調味料」(flavoring?) https://www.hokeniryo.metro.tokyo.lg.jp/shokuhin/shokuten/chomiryo.html
Flavors, as additives, consists of 4 categories: アミノ酸 (amino acids, e.g. sodium L-aspartate), 核酸 (nucleic acids, e.g. disodium inosinate), 有機酸 (organic acids, e.g. calcium citrate), and 無機塩 (inorganic salts, e.g. potassium chloride). They are labeled in form of 「調味料({category name})」, or 「調味料({dominant category name}等)」 if more than two categories are included.
How can this be parsed? There are several occurrences of アミノ酸等 that are currently unrecognized."