Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detecting UPC as product picture when >50% of an image is the UPC to reduce low quality main images #1056

Closed
brierjon opened this issue Jan 31, 2023 · 9 comments
Labels
✨ enhancement New feature or request

Comments

@brierjon
Copy link

Problem

I'm often coming across product listing where the UPC is the main image of the product with more than 50% of the main photo just being UPC barcode lines. An example: https://world.openfoodfacts.org/images/products/003/800/023/1513/2.100.jpg (UPC removed from affected product page)

Proposed solution

  1. Create preventative detection for images reaching a threshold upon upload attempt and notify user that the image should be added to the "other interesting images (if these are at all desired)". Possibly leveraging OpenCV or other approach in app and browser?
  2. Sift through existing "Product picture" fields and process images to identify probable images for removal. If threshold is high enough consider automatically removing. If threshold is not sufficient -> add to hunger for human review.

Additional context

In hunger, behavior would be reverse of norms instead of adding a tag, it removes a photo.

Mockups

Part of

@brierjon brierjon added the ✨ enhancement New feature or request label Jan 31, 2023
@raphael0202
Copy link
Collaborator

Reed andreas has been developping a module that looks very promising to detect UPCs. However we still have to decide what do we do with such detections (in case they are accurate). I would be in favor of keeping the image, but unselecting the image everywhere (front, nutriment, packaging, ingredient). What's your opinion on this @teolemon @alexgarel?

@teolemon
Copy link
Member

teolemon commented Apr 5, 2023

That's my opinion as well. I've spotted some contributors explicitly selecting barcodes photos as a fallback sometimes when no other photo is available.

@brierjon
Copy link
Author

brierjon commented Apr 5, 2023

I agree. The intent was to remove the UPC as a featured image, not to suggest it be deleted. The UPC image does have value, but not as the main photo or other fields with specific focus.

@ReedAndreas
Copy link
Contributor

Does anyone have good suggestions for how to test this flow? I cloned robotoff and the openfoodfacts-server (dealing with a bit of dependency issues but I should be able to get it up and running). In addition, I am following the guide on how to add a new predictor. Is there a way I can simulate a new image upload or test individual methods. Any help is greatly appreciated! Also if you want to take a look at my additions the branch related to the issue is UPC-Image-Predictor.

@raphael0202
Copy link
Collaborator

raphael0202 commented Apr 10, 2023

@ReedAndreas to test the full pipeline, the easiest way it to send a webhook request to Robotoff API: POST /api/v1/images/import. This endpoint is not yet documented in the API documentation, but the code is easy to read.
You can try to send an image with/without UPC and check that predictions/insights have indeed been created or not.
And you can add unit tests as well in tests/unit. We added support for git LFS, so feel free to add using git LFS medium-sized images in the tests folder to test your algorithm.

edit: I've added the POST /api/v1/images/import documentation here: https://openfoodfacts.github.io/robotoff/references/api/#tag/Images/paths/~1images~1import/post
And completed the "add predictor" documentation with a test section: https://openfoodfacts.github.io/robotoff/how-to-guides/add-predictor/

@raphael0202
Copy link
Collaborator

@ReedAndreas I forgot that local testing has always been cumbersome, as Robotoff performs many checks against the MongoDB during image import and during insight/prediction generation and import (such as: does the product exist, does the product has the image linked to the insight,...).
I've added a DISABLE_PRODUCT_CHECK envvar (which is 1 by default locally) to be able to disable this, and added a CLI command to call the webhook endpoint (see #1082).
Once this PR is merged you can rebase your branch on top of master, and hopefully have a much better local testing experience.

@ReedAndreas
Copy link
Contributor

Hey, thanks so much for the help and added testing ability! This really aided me in making some good progress. I was able today to test the whole flow and using logger I saw that the predictor is working as expected on different product images I tested it on!!

Now I just need to take that result and create the proper prediction to return etc. Any thoughts on how I should structure the prediction I return? The predictor itself is generating two data points, one is whether or not it is a "UPC_Image" and the other is which class it is either "UPC_Image", "Small_UPC" (There is a UPC but this may still be a good main image), "No_UPC" which was mainly for my own purposes during testing since we really only care about UPC and everything else but still might be helpful to store the predicted class. I think I have a decent idea of what could work based on the other predictors but was not sure if you had any tips or suggestions.

@raphael0202
Copy link
Collaborator

To avoid overloading the prediction table with millions of additional datapoints (we have 7M images in production), I would suggest to only create a prediction if we're quite sure the image is an UPC.
So only for UPC_Image class.

@raphael0202
Copy link
Collaborator

Closing this issue, as #1098 has been merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
✨ enhancement New feature or request
Development

No branches or pull requests

4 participants