Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate and import missing image embeddings #1177

Open
Tracked by #379
raphael0202 opened this issue Aug 24, 2023 · 1 comment
Open
Tracked by #379

Generate and import missing image embeddings #1177

raphael0202 opened this issue Aug 24, 2023 · 1 comment

Comments

@raphael0202
Copy link
Collaborator

The new product categorizer was deployed in March 2023, and since then it categorizes new uploaded products. However, we still don't have predictions for the rest of the database.
It uses the 10 most recent images of the product, using image embedding as input (see https://openfoodfacts.github.io/robotoff/explanations/category-prediction/ for more information about the model, section "ML prediction").
To predict categories on the full dataset, we need to generate and import image embeddings for all missing images, to be able to launch category detection.
The model that is used to generate the embeddings is stored here: https://github.com/openfoodfacts/robotoff-models/releases/tag/clip-vit-base-patch32. See Robotoff codebase for preprocessing code.

Here is a list of all the missing image paths: source_images.txt.gz

Here is a tutorial on how to download images on Open Food Facts: https://openfoodfacts.github.io/openfoodfacts-server/api/how-to-download-images/

@alexgarel
Copy link
Member

As asked by Christelle, here are some embedings from production.

I generated it by using this code

corresponding images are at:

  • 301/762/042/2003/<image_id>.jpg for 3017620422003_embeddings.json
  • 327/408/000/5003/<image_id>.jpg for 3274080005003_embeddings.json

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants