Skip to content

Commit

Permalink
fix: better support for Japanese additives types (e.g. amino-acids) (#…
Browse files Browse the repository at this point in the history
  • Loading branch information
stephanegigandet authored Sep 28, 2023
1 parent 0abdc97 commit 864cf2c
Show file tree
Hide file tree
Showing 14 changed files with 304 additions and 38 deletions.
9 changes: 8 additions & 1 deletion lib/ProductOpener/Tags.pm
Original file line number Diff line number Diff line change
Expand Up @@ -879,7 +879,14 @@ sub remove_stopwords ($tagtype, $lc, $tagid) {

my $regexp = $stopwords_regexps{$tagtype . '.' . $lc};

$tagid =~ s/(^|-)($regexp)(-($regexp))*(-|$)/-/g;
# In Japanese, do not require a word boundary, and do not introduce a hyphen
if ($lc eq 'ja') {
$tagid =~ s/$regexp//g;
}
# In other languages, require a word boundary, and replace stopwords with a hyphen
else {
$tagid =~ s/(^|-)($regexp)(-($regexp))*(-|$)/-/g;
}

$tagid =~ tr/-/-/s;
$tagid =~ s/^-//;
Expand Down
1 change: 1 addition & 0 deletions lib/ProductOpener/Test.pm
Original file line number Diff line number Diff line change
Expand Up @@ -358,6 +358,7 @@ sub compare_to_expected_results ($object_ref, $expected_results_file, $update_ex
my $pretty_json = $json->pretty->encode($object_ref);
print $result $pretty_json;
close($result);
ok(1, "Updated $expected_results_file");
}
else {
# Compare the result with the expected result
Expand Down
21 changes: 21 additions & 0 deletions taxonomies/additives.txt
Original file line number Diff line number Diff line change
Expand Up @@ -24464,3 +24464,24 @@ hu:kálium-jodid
wikipedia:en:https://en.wikipedia.org/wiki/Potassium_iodide
wikidata:en:Q121874

# Japanese additives can be listed only with their type (e.g. amino acids), without the specific additive name

# Explanation of 「調味料」(flavoring?) https://www.hokeniryo.metro.tokyo.lg.jp/shokuhin/shokuten/chomiryo.html
# Flavors, as additives, consists of 4 categories:
# アミノ酸 (amino acids, e.g. sodium L-aspartate),
# 核酸 (nucleic acids, e.g. disodium inosinate),
# 有機酸 (organic acids, e.g. calcium citrate),
# 無機塩 (inorganic salts, e.g. potassium chloride).
# They are labeled in form of 「調味料({category name})」, or 「調味料({dominant category name}等)」 if more than two categories are included.

en:Amino acids
ja:アミノ酸, アミノ酸等

en:Nucleic acids
ja:核酸

en:Organic acids
ja:有機酸

en:Inorganic salts
ja:無機塩
9 changes: 6 additions & 3 deletions taxonomies/ingredients.txt
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,7 @@ stopwords:hu:tartalmaz, változó arányban, min, zsírtartalom, összetevő, ö
stopwords:id:mengandung
stopwords:is:úr
stopwords:it:contiene, nella
# Japanese stopwords are matched without word boundaries, do not put as stopwords characters or words that could be part of an actual ingredient entry
stopwords:ja:等, その他
stopwords:lt:iš, su, su pridėtiniu, mažiausiai, įskaitant
stopwords:lv:no
Expand Down Expand Up @@ -13796,7 +13797,7 @@ vegetarian:en:yes
# usage:fr:fécules (dont blé)

<en:starch
en:modified starch, modified starches, modified food starch, modified food starches, food starch modified
en:modified starch, modified starches, modified food starch, modified food starches, food starch modified, processed starch
ar:نشا معدل
bg:Модифицирано нишесте, модифицирани нишестета, модифицирана скорбяла
ca:Midó modificat, Midons modificats, Fècula modificada, fècules modificades
Expand All @@ -13814,7 +13815,7 @@ hr:modificirani škrob, modificiran škrob
hu:Módosított keményítő
is:umbreytt sterkja
it:amido modificato, Amidi modificati
ja:加工デンプン
ja:加工デンプン,加工 デンプン
lt:modifikuotas krakmolas
lv:Modificēta ciete
mt:Lamtu mmodifikat
Expand Down Expand Up @@ -88019,4 +88020,6 @@ ja:クリーミングパウダー
en:creaming agent
fr:agent de crémage


en:vegetable pigment, vegetable colour
fr:colorant végétal
ja:野菜色素
2 changes: 1 addition & 1 deletion taxonomies/labels.txt
Original file line number Diff line number Diff line change
Expand Up @@ -910,7 +910,7 @@ cs:Fair trade
de:Fairer Handel
es:Comercio justo, Fairtrade-Comercio Justo
fi:Reilu kauppa
fr:Commerce équitable, équitable, issu du commerce équitable, issus du commerce équitable, issue du commerce équitable, issues du commerce équitable, ingrédients issus du commerce équitable, produits issus du commerce équitable, ingrédient issu du commerce équitable, Ingrédients conformes aux standards du commerce équitable Fairtrade/Max Havelaar, ingrédients conformes aux standards du commerce équitable
fr:Commerce équitable, équitable, issu du commerce équitable, issus du commerce équitable, issue du commerce équitable, issues du commerce équitable, ingrédients issus du commerce équitable, produits issus du commerce équitable, ingrédient issu du commerce équitable, Ingrédients conformes aux standards du commerce équitable Fairtrade/Max Havelaar, ingrédients conformes aux standards du commerce équitable, 100% du total des ingrédients d'origine agricole sont conformes aux standards du commerce équitable
he:סחר הוגן
hu:Méltányos kereskedelem, Fair trade, becsületes kereskedelem
it:Commercio equo
Expand Down
24 changes: 12 additions & 12 deletions taxonomies/nutrient_levels.txt
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ cu:Fat in low quantity
cv:Fat in low quantity
cy:Braster in low quantity
da:Fedt i lav mængde
de:Fett in geringen Mengen
de:Fett in geringer Menge
dv:Fat in low quantity
dz:Fat in low quantity
ee:Fat in low quantity
Expand Down Expand Up @@ -224,7 +224,7 @@ cu:Fat in moderate quantity
cv:Fat in moderate quantity
cy:Braster in moderate quantity
da:Fedt i moderat mængde
de:Fett in moderaten Mengen
de:Fett in moderater Menge
dv:Fat in moderate quantity
dz:Fat in moderate quantity
ee:Fat in moderate quantity
Expand Down Expand Up @@ -414,7 +414,7 @@ cu:Fat in high quantity
cv:Fat in high quantity
cy:Braster in high quantity
da:Fedt i høj mængde
de:Fett in hohe Menge
de:Fett in hoher Menge
dv:Fat in high quantity
dz:Fat in high quantity
ee:Fat in high quantity
Expand Down Expand Up @@ -604,7 +604,7 @@ cu:Saturated fat in low quantity
cv:Saturated fat in low quantity
cy:Saturated fat in low quantity
da:Mættede fedtsyrer i lav mængde
de:Gesättigte Fettsäuren in geringen Mengen
de:Gesättigte Fettsäuren in geringer Menge
dv:ފެޓް in low quantity
dz:Saturated fat in low quantity
ee:Saturated fat in low quantity
Expand Down Expand Up @@ -794,7 +794,7 @@ cu:Saturated fat in moderate quantity
cv:Saturated fat in moderate quantity
cy:Saturated fat in moderate quantity
da:Mættede fedtsyrer i moderat mængde
de:Gesättigte Fettsäuren in moderaten Mengen
de:Gesättigte Fettsäuren in moderater Menge
dv:ފެޓް in moderate quantity
dz:Saturated fat in moderate quantity
ee:Saturated fat in moderate quantity
Expand Down Expand Up @@ -984,7 +984,7 @@ cu:Saturated fat in high quantity
cv:Saturated fat in high quantity
cy:Saturated fat in high quantity
da:Mættede fedtsyrer i høj mængde
de:Gesättigte Fettsäuren in hohe Menge
de:Gesättigte Fettsäuren in hoher Menge
dv:ފެޓް in high quantity
dz:Saturated fat in high quantity
ee:Saturated fat in high quantity
Expand Down Expand Up @@ -1174,7 +1174,7 @@ cu:Sugars in low quantity
cv:Сахăр in low quantity
cy:Siwgr in low quantity
da:Sukkerarter i lav mængde
de:Zucker in geringen Mengen
de:Zucker in geringer Menge
dv:Sugars in low quantity
dz:Sugars in low quantity
ee:Sugars in low quantity
Expand Down Expand Up @@ -1364,7 +1364,7 @@ cu:Sugars in moderate quantity
cv:Сахăр in moderate quantity
cy:Siwgr in moderate quantity
da:Sukkerarter i moderat mængde
de:Zucker in moderaten Mengen
de:Zucker in moderater Menge
dv:Sugars in moderate quantity
dz:Sugars in moderate quantity
ee:Sugars in moderate quantity
Expand Down Expand Up @@ -1554,7 +1554,7 @@ cu:Sugars in high quantity
cv:Сахăр in high quantity
cy:Siwgr in high quantity
da:Sukkerarter i høj mængde
de:Zucker in hohe Menge
de:Zucker in hoher Menge
dv:Sugars in high quantity
dz:Sugars in high quantity
ee:Sugars in high quantity
Expand Down Expand Up @@ -1744,7 +1744,7 @@ cu:Salt in low quantity
cv:Salt in low quantity
cy:Halen in low quantity
da:Salt i lav mængde
de:Salz in geringen Mengen
de:Salz in geringer Menge
dv:Salt in low quantity
dz:Salt in low quantity
ee:Salt in low quantity
Expand Down Expand Up @@ -1934,7 +1934,7 @@ cu:Salt in moderate quantity
cv:Salt in moderate quantity
cy:Halen in moderate quantity
da:Salt i moderat mængde
de:Salz in moderaten Mengen
de:Salz in moderater Menge
dv:Salt in moderate quantity
dz:Salt in moderate quantity
ee:Salt in moderate quantity
Expand Down Expand Up @@ -2124,7 +2124,7 @@ cu:Salt in high quantity
cv:Salt in high quantity
cy:Halen in high quantity
da:Salt i høj mængde
de:Salz in hohe Menge
de:Salz in hoher Menge
dv:Salt in high quantity
dz:Salt in high quantity
ee:Salt in high quantity
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,7 @@
"origins_of_ingredients" : {
"aggregated_origins" : [
{
"epi_score" : 0,
"epi_score" : "0",
"origin" : "en:unknown",
"percent" : 100,
"transportation_score" : null
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,7 @@
"origins_of_ingredients" : {
"aggregated_origins" : [
{
"epi_score" : 0,
"epi_score" : "0",
"origin" : "en:unknown",
"percent" : 100,
"transportation_score" : null
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,7 @@
"origins_of_ingredients" : {
"aggregated_origins" : [
{
"epi_score" : 0,
"epi_score" : "0",
"origin" : "en:unknown",
"percent" : 100,
"transportation_score" : null
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,7 @@
"origins_of_ingredients" : {
"aggregated_origins" : [
{
"epi_score" : 0,
"epi_score" : "0",
"origin" : "en:unknown",
"percent" : 100,
"transportation_score" : null
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -428,7 +428,7 @@
"origins_of_ingredients" : {
"aggregated_origins" : [
{
"epi_score" : 0,
"epi_score" : "0",
"origin" : "en:unknown",
"percent" : 100,
"transportation_score" : null
Expand Down
Loading

0 comments on commit 864cf2c

Please sign in to comment.