Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quality facet - Nutrition - Flag products where 3 or more nutritional values are identical #3554

Closed
1 task
Tracked by #10273
teolemon opened this issue Jun 4, 2020 · 3 comments · Fixed by #8109
Closed
1 task
Tracked by #10273
Labels
🧽 Data quality - Measure - Quality facets One of the facets available in Open Food Facts is /quality & allows us to spot products w/ bad data 🧽 Data quality - Nutrition 🧽 Data quality https://wiki.openfoodfacts.org/Quality ✨ Feature Features or enhancements to Open Food Facts server

Comments

@teolemon
Copy link
Member

teolemon commented Jun 4, 2020

What

  • Flag products where 3 or more nutritional values are identical. This is a clear sign that someone entered bogus data.

https://dk-en.openfoodfacts.org/cgi/product.pl?type=edit&code=3760265420094

Part of

@teolemon teolemon added ✨ Feature Features or enhancements to Open Food Facts server 🧽 Data quality https://wiki.openfoodfacts.org/Quality 🧽 Data quality - Measure - Quality facets One of the facets available in Open Food Facts is /quality & allows us to spot products w/ bad data labels Jun 4, 2020
@teolemon teolemon changed the title Flag products where 3 or more nutritional values are identical Quality facet - Nutrition - Flag products where 3 or more nutritional values are identical Oct 11, 2021
@benbenben2
Copy link
Collaborator

Sharing some thoughts about this opened issue.

3 identical values is probably not enough because it is quite easy to find products with 3 identical values in the nutrition table and that are valid.

4 seems also not enough

Even 5 identical values can be correct:

when almost all values are identical:

Maybe the alert could be set for 6 identical values: fat_100g, saturated-fat_100g, carbohydrates_100g, sugars_100g, proteins_100g, salt_100g.
Without fiber that is often missing and without sodium that can be either identical to salt or automatically converted from salt value.
That would include ~25 000 products

@alexgarel
Copy link
Member

@benbenben2 what about ignoring some typical values in this comparison ? (like 0, 0.5, 0.1, 1, 0.01) which are values typically found to mean "small enough".

@benbenben2
Copy link
Collaborator

Yes! Good point.
That could make 2 different quality facets:

  • 6 identical values (most of the time intentionally done, together with a non-food picture)
  • 3 identical values + value is above 1 (to avoid 0.5 or 1 for "less than 0.5" or "less than 1"). Some examples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🧽 Data quality - Measure - Quality facets One of the facets available in Open Food Facts is /quality & allows us to spot products w/ bad data 🧽 Data quality - Nutrition 🧽 Data quality https://wiki.openfoodfacts.org/Quality ✨ Feature Features or enhancements to Open Food Facts server
Projects
Archived in project
3 participants