Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: fix_avoid_eiweiss_false_positive_for_allergens #9317

Merged

Conversation

benbenben2
Copy link
Collaborator

What

This is due to underscores. Replacing: "_Weizen_eiweiß" by: "Weizen eiweiß" or "Weizeneiweiß" avoids having the eggs allergen.

For other allergens, the following regex seems to work:

			$text
				=~ s/(^| - |_|\(|\[|\)|\]|,|$the|$and|$of|;|\.|$)((\s*)\w.+?)(?=(\s*)(^| - |_|\(|\[|\)|\]|,|$and|;|\.|\b($traces_regexp)\b|$))/replace_allergen_between_separators($language,$product_ref,$1, $2, "",$`)/iesg;

However, in the case of _Weizen_eiweiß it leads to 2 allergens. Hence, suggested solution is to add (for German only to minimize probability of false positive) the case when underscore after the allergen is not at the end of the word (remove \b of the previous line. This is done only for \b after the allergen, not before, again to minimize probability of false positive).

Screenshot

Screenshot_20231114_170812

Related issue(s) and discussion

@benbenben2 benbenben2 self-assigned this Nov 14, 2023
@benbenben2 benbenben2 requested a review from a team as a code owner November 14, 2023 16:12
Copy link

sonarcloud bot commented Nov 14, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

Copy link
Contributor

@stephanegigandet stephanegigandet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@stephanegigandet stephanegigandet merged commit 1aacb01 into main Nov 15, 2023
14 checks passed
@stephanegigandet stephanegigandet deleted the fix_avoid_eiweiss_false_positive_for_allergens branch November 15, 2023 08:39
alexgarel pushed a commit that referenced this pull request Nov 21, 2023
fix_avoid_eiweiss_false_positive_for_allergens
@teolemon teolemon added 🥗🔍 Ingredients analysis https://wiki.openfoodfacts.org/Ingredients_Extraction_and_Analysis and removed ingredients analysis labels Nov 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Allergens 🥗🔍 Ingredients analysis https://wiki.openfoodfacts.org/Ingredients_Extraction_and_Analysis ingredients 🧬 Taxonomies https://wiki.openfoodfacts.org/Global_taxonomies 🧪 tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Separate 'Eiweiß' from 'Ei', because 'Eiweiß' does not contain any 'Ei'
3 participants