Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Categories taxonomy improvements for Wikidata and IGPs #6196

Merged
merged 58 commits into from
Jan 5, 2022

Conversation

teolemon
Copy link
Member

@teolemon teolemon commented Dec 14, 2021

What

  • A lot of fixes to the Categories taxonomies (typos…)
    • A lot of Wikidata additions to allow us to show links to Wikipedia articles and other magic things
    • Some improvements on the IGP front, with a normalization from countries to origins to empower the origin automatic addition (Tag rules: Tag A + Tab B = Tag C #3426 )

PS: I'm really sorry that the changes are too big for the web editor :-/

Part of

@teolemon teolemon marked this pull request as draft December 15, 2021 13:11
@teolemon
Copy link
Member Author

teolemon commented Dec 15, 2021

Errors in the categories taxonomy definition:

  • ERROR - it:marrone-del-mugello has an undefined parent
  • ERROR - it:marrone-della-valle-di-susa has an undefined parent
  • ERROR - it:marrone-di-caprese-michelangelo has an undefined parent
  • ERROR - it:marrone-di-castel-del-rio has an undefined parent
  • ERROR - it:marrone-di-combai has an undefined parent
  • ERROR - it:marrone-di-roccadaspide has an undefined parent
  • ERROR - it:marrone-di-san-zeno has an undefined parent
  • ERROR - it:marrone-di-serino has an undefined parent
  • ERROR - it:peperone-di-pontecorvo has an undefined parent
  • ERROR - it:peperone-di-senise has an undefined parent
  • ERROR - it:clementine-di-calabria has an undefined parent
  • ERROR - it:pera-dell-emilia-romagna has an undefined parent

@teolemon
Copy link
Member Author

teolemon commented Dec 18, 2021

Errors in the categories taxonomy definition:

  • ERROR - it:peperone-di-pontecorvo has an undefined parent
  • ERROR - it:peperone-di-senise has an undefined parent
  • ERROR - it:clementine-di-calabria has an undefined parent
  • ERROR - it:pera-dell-emilia-romagna has an undefined parent

@teolemon teolemon marked this pull request as ready for review December 26, 2021 12:21
@teolemon teolemon requested a review from a team as a code owner December 26, 2021 12:21
@teolemon teolemon changed the title More IGPs Categories taxonomy improvements for Wikidata and IGPs Dec 26, 2021
@teolemon teolemon added categories 🧬 Taxonomies https://wiki.openfoodfacts.org/Global_taxonomies WikiData We link taxonomies to Wikidata (& back) - https://wiki.openfoodfacts.org/Structured_Data/Wikidata labels Dec 26, 2021
@teolemon teolemon changed the title Categories taxonomy improvements for Wikidata and IGPs feat: Categories taxonomy improvements for Wikidata and IGPs Dec 28, 2021
@stephanegigandet
Copy link
Contributor

<en:Colas
en:Cola with sugar and artificial sweetener, Cola with sugar and artificial sweeteners
@@ -4632,6 +4964,7 @@ fr:Sodas au cola sans caféine
it:Bibite alla cola senza caffeina, Cole senza caffeina
nl:Caffeinevrije colas
ro:Cola fără cafeină
+#wikidata:en:

The 2988 empty wikidata entries make the PR a pain to review. If you really want to add them, I suggest to first remove them, file a PR, get it merged, and then add back the empty entries, so that we don't have them in the middle of the other changes.

The same strategy could be used for things were one field is renamed in another. Otherwise it's very hard to see the manual changes (the ones that would be useful to review) from the rest.

@sonarcloud
Copy link

sonarcloud bot commented Jan 4, 2022

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

Copy link
Contributor

@stephanegigandet stephanegigandet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thank you!

Just one thing:

<en:Wines from Greece
el:Ritsona, Ριτσώνα
-country:en:Greece
+origins:en: en:Greece

--> we have a mechanism to check properties of parent entries, so you can just add origins:en: en:Greece once on the parent entry en:Wines from Greece, without having to add it to all children.

@teolemon
Copy link
Member Author

teolemon commented Jan 5, 2022

It's ironically the opposite as your comment in Slack. We'll have to use a case by case approach to see where we can safely do so

09:35
Stéphane Gigandet @Pierre (teolemon) for your PR, please split it in 2 PRs, one for categories, and the other for the rest, so that we can review and merge independently
regarding categories:
<en:Beers
 en:Beers from Germany, German beers
@@ -1556,26 +1585,29 @@ hu:Német sörök, Sörök németországból
 it:Birre tedesche, Birre della Germania
 nl:Duitse bieren
 ro:Beri germane, Beri Germania, Bere germană, Bere Germania, Bere germana
-country:en:Germany
+origins:en: en:Germany

the "origins" field is for the origin of ingredients. I'm not sure we can automatically assume a German beer is made from German ingredients.

@teolemon
Copy link
Member Author

teolemon commented Jan 5, 2022

Thanks a lot for the review @stephanegigandet 🎉
This PR is going to unlock a trove of knowledge for Open Food Facts

@teolemon teolemon merged commit b854c27 into main Jan 5, 2022
@teolemon teolemon deleted the fixes-categories branch January 5, 2022 10:17
@alexgarel
Copy link
Member

Kudos for this PR @teolemon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
categories 🧬 Taxonomies https://wiki.openfoodfacts.org/Global_taxonomies WikiData We link taxonomies to Wikidata (& back) - https://wiki.openfoodfacts.org/Structured_Data/Wikidata
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants