Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

taxonomy: added some stopwords for ingredients in Croatian #7925

Merged
merged 1 commit into from
Jan 2, 2023

Conversation

benbenben2
Copy link
Collaborator

not sure for the label

added some stopwords in ingredients.pm, based on the following observations:

see lib/ProductOpener/ingredients.pm
    ignore_regexps (-> les informations en gras, etc.)
        "u tragovima" (https://hr.openfoodfacts.org/product/3850104295072/kajen%C5%A1ki-papar-mljeveni-vegeta)
  
    may_contain_regexps
        "proizvod može sadržavati" (https://hr.openfoodfacts.org/product/4337185925511/sultaninen-kaufland)
  
    phrases_before_ingredients_list:
        "HR" (https://hr.openfoodfacts.org/product/3859892109134/per%C5%A1in-usitnjeni-%C5%A1afram)
        "HR BiH " (https://hr.openfoodfacts.org/product/3858881086296/guatemala-100-arabica-franck, https://hr.openfoodfacts.org/product/3850104294426/crni-papar-vegeta)
        "HR/BIH " (https://hr.openfoodfacts.org/product/3859892109080/papar-crnu-miljeveni-%C5%A1afram)
        "Sastojci/Sestavine: " (https://hr.openfoodfacts.org/product/5907707051029/vanilija-aroma-dr-oetker)
    phrases_before_ingredients_list_uppercase
    phrases_after_ingredients_list
        "Bez konzervans" (https://hr.openfoodfacts.org/product/4056489332176/vegan-spread-smoked-tofu-vemondo)
        "Čuvati na sobnoj temperaturi." (https://hr.openfoodfacts.org/product/3859890733676/krafna-s-lino-ladom-lino-lada)
        "Čuvati na temp." (https://hr.openfoodfacts.org/product/3859889622677/maslac-sa-cvijeyom-soli-veronika)
        "Najbolje upotrijebiti do" (https://hr.openfoodfacts.org/product/3856015313249/suncokretovo-ulje-zvijezda, https://hr.openfoodfacts.org/product/4337185462399/tomato-paste-double-concentrated-kaufland, https://hr.openfoodfacts.org/product/3858882210010/suncokretovo-ulje-zvijezda)
        "Nakon otvaranja" (https://hr.openfoodfacts.org/product/3859889622394/gr%C4%8Dki-tip-jogurta-veronika)
        "Pakirano u kontroliranoj atmosferi." (https://hr.openfoodfacts.org/product/3858886934578/chips-x-cut-tommy)
        "Pakirano u zaštitnoj atmosferi." (https://hr.openfoodfacts.org/product/4018077773419/crunchips-x-cut-salted)
        "Pasterizirano" (https://hr.openfoodfacts.org/product/3859893027956/%C5%A1ljiva-d%C5%BEem-regina-adriatica, https://hr.openfoodfacts.org/product/3859893027857/malina-extra-d%C5%BEem-regina-adriatica)
        "Proizvođač" (https://hr.openfoodfacts.org/product/5907707051029/vanilija-aroma-dr-oetker, https://hr.openfoodfacts.org/product/3850104223624/glatko-p%C5%A1eni%C4%8Dno-bra%C5%A1no-tip-550-bijelo-podravka)
        "Prosječna hranjiva vrijednost u 100g" (https://hr.openfoodfacts.org/product/3859889622547/ribani-sir-veronika)
        "Prosječne hranjive vrijednosti na 100 g" (https://hr.openfoodfacts.org/product/3850354002055/kiselo-vrhnje-20-mm-dukat)
        "Upotrijebiti do datuma" (https://hr.openfoodfacts.org/product/3850108051919/camembert-sa-%C5%A1arenim-paprom-vindija)
        "Upozorenje" (https://hr.openfoodfacts.org/product/4337185925511/sultaninen-kaufland)
        "Uputa" (https://hr.openfoodfacts.org/product/3858881682290/cetina)
        "Vakuumirana" (https://hr.openfoodfacts.org/product/3850291049311/colombia-100-arabica-franck, https://hr.openfoodfacts.org/product/3858881086296/guatemala-100-arabica-franck)
        "Vrijeme kuhanja: 10-12 minuta." (https://hr.openfoodfacts.org/product/5201013014519/kritharaki-delphi)
        "Zbog mutan i moguće je pojavljivanje taloga." (https://hr.openfoodfacts.org/product/3850131005118/hidra-iso-citrus-zagreba%C4%8Dka-pivovara)
        "Zbog prisutnosti voćnih vlakana" (https://hr.openfoodfacts.org/product/3850131005651/hidra-up-orange-zagreba%C4%8Dka-pivovara)
        "Zemlja porijekla" (https://hr.openfoodfacts.org/product/3859892109134/per%C5%A1in-usitnjeni-%C5%A1afram, https://hr.openfoodfacts.org/product/3859892109080/papar-crnu-miljeveni-%C5%A1afram)

@benbenben2 benbenben2 requested a review from a team as a code owner January 1, 2023 14:19
@benbenben2 benbenben2 self-assigned this Jan 1, 2023
@benbenben2 benbenben2 changed the title added some stopwords for ingredients in Croatian taxonomy: added some stopwords for ingredients in Croatian Jan 1, 2023
@github-actions github-actions bot added the 🥗🔍 Ingredients analysis https://wiki.openfoodfacts.org/Ingredients_Extraction_and_Analysis label Jan 1, 2023
@sonarcloud
Copy link

sonarcloud bot commented Jan 1, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

@benbenben2
Copy link
Collaborator Author

I noticed that for additives, when we find "E500 i E503", for example, then, the list of ingredients becomes "E500i", "E503" in the Details of the analysis of the ingredients (https://hr.openfoodfacts.org/product/3850102522866/moto-kakao-mlijeko-kra%C5%A1).

Although the variable "my %and =" is populated with "i" for Croatian in the ingredients.pm file and "i" is in the stopwords for the taxonomy file ingredients.txt

Any idea to tackle this?

@stephanegigandet
Copy link
Contributor

I noticed that for additives, when we find "E500 i E503", for example, then, the list of ingredients becomes "E500i", "E503" in the Details of the analysis of the ingredients (https://hr.openfoodfacts.org/product/3850102522866/moto-kakao-mlijeko-kra%C5%A1).

I filed a bug for it: #7927

Copy link
Contributor

@stephanegigandet stephanegigandet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@stephanegigandet stephanegigandet merged commit 5311817 into main Jan 2, 2023
@stephanegigandet stephanegigandet deleted the hr_add_stopwords_in_ingredients_pm branch January 2, 2023 11:30
@teolemon teolemon added the 🇭🇷 Croatia https://hr.openfoodfacts.org/ label Feb 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🇭🇷 Croatia https://hr.openfoodfacts.org/ 🥗🔍 Ingredients analysis https://wiki.openfoodfacts.org/Ingredients_Extraction_and_Analysis 🥗 Ingredients
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants