Detecting UPC as product picture when >50% of an image is the UPC to reduce low quality main images #1056

brierjon · 2023-01-31T20:29:57Z

Problem

I'm often coming across product listing where the UPC is the main image of the product with more than 50% of the main photo just being UPC barcode lines. An example: https://world.openfoodfacts.org/images/products/003/800/023/1513/2.100.jpg (UPC removed from affected product page)

Proposed solution

Create preventative detection for images reaching a threshold upon upload attempt and notify user that the image should be added to the "other interesting images (if these are at all desired)". Possibly leveraging OpenCV or other approach in app and browser?
Sift through existing "Product picture" fields and process images to identify probable images for removal. If threshold is high enough consider automatically removing. If threshold is not sufficient -> add to hunger for human review.

Additional context

In hunger, behavior would be reverse of norms instead of adding a tag, it removes a photo.

Mockups

Part of

🎯 What can I work on ? #374

raphael0202 · 2023-04-05T06:22:31Z

Reed andreas has been developping a module that looks very promising to detect UPCs. However we still have to decide what do we do with such detections (in case they are accurate). I would be in favor of keeping the image, but unselecting the image everywhere (front, nutriment, packaging, ingredient). What's your opinion on this @teolemon @alexgarel?

teolemon · 2023-04-05T06:59:28Z

That's my opinion as well. I've spotted some contributors explicitly selecting barcodes photos as a fallback sometimes when no other photo is available.

brierjon · 2023-04-05T13:24:31Z

I agree. The intent was to remove the UPC as a featured image, not to suggest it be deleted. The UPC image does have value, but not as the main photo or other fields with specific focus.

ReedAndreas · 2023-04-08T19:08:33Z

Does anyone have good suggestions for how to test this flow? I cloned robotoff and the openfoodfacts-server (dealing with a bit of dependency issues but I should be able to get it up and running). In addition, I am following the guide on how to add a new predictor. Is there a way I can simulate a new image upload or test individual methods. Any help is greatly appreciated! Also if you want to take a look at my additions the branch related to the issue is UPC-Image-Predictor.

raphael0202 · 2023-04-10T04:13:08Z

@ReedAndreas to test the full pipeline, the easiest way it to send a webhook request to Robotoff API: POST /api/v1/images/import. ~~This endpoint is not yet documented in the API documentation, but the code is easy to read.~~
You can try to send an image with/without UPC and check that predictions/insights have indeed been created or not.
And you can add unit tests as well in tests/unit. We added support for git LFS, so feel free to add using git LFS medium-sized images in the tests folder to test your algorithm.

edit: I've added the POST /api/v1/images/import documentation here: https://openfoodfacts.github.io/robotoff/references/api/#tag/Images/paths/~1images~1import/post
And completed the "add predictor" documentation with a test section: https://openfoodfacts.github.io/robotoff/how-to-guides/add-predictor/

raphael0202 · 2023-04-10T08:00:55Z

@ReedAndreas I forgot that local testing has always been cumbersome, as Robotoff performs many checks against the MongoDB during image import and during insight/prediction generation and import (such as: does the product exist, does the product has the image linked to the insight,...).
I've added a DISABLE_PRODUCT_CHECK envvar (which is 1 by default locally) to be able to disable this, and added a CLI command to call the webhook endpoint (see #1082).
Once this PR is merged you can rebase your branch on top of master, and hopefully have a much better local testing experience.

ReedAndreas · 2023-04-11T01:52:04Z

Hey, thanks so much for the help and added testing ability! This really aided me in making some good progress. I was able today to test the whole flow and using logger I saw that the predictor is working as expected on different product images I tested it on!!

Now I just need to take that result and create the proper prediction to return etc. Any thoughts on how I should structure the prediction I return? The predictor itself is generating two data points, one is whether or not it is a "UPC_Image" and the other is which class it is either "UPC_Image", "Small_UPC" (There is a UPC but this may still be a good main image), "No_UPC" which was mainly for my own purposes during testing since we really only care about UPC and everything else but still might be helpful to store the predicted class. I think I have a decent idea of what could work based on the other predictors but was not sure if you had any tips or suggestions.

raphael0202 · 2023-04-11T03:18:11Z

To avoid overloading the prediction table with millions of additional datapoints (we have 7M images in production), I would suggest to only create a prediction if we're quite sure the image is an UPC.
So only for UPC_Image class.

raphael0202 · 2023-05-18T08:33:10Z

Closing this issue, as #1098 has been merged.

brierjon added the ✨ enhancement New feature or request label Jan 31, 2023

ReedAndreas mentioned this issue Apr 18, 2023

feat: Create UPC_Image detector #1098

Merged

raphael0202 closed this as completed May 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detecting UPC as product picture when >50% of an image is the UPC to reduce low quality main images #1056

Detecting UPC as product picture when >50% of an image is the UPC to reduce low quality main images #1056

brierjon commented Jan 31, 2023

raphael0202 commented Apr 5, 2023

teolemon commented Apr 5, 2023

brierjon commented Apr 5, 2023

ReedAndreas commented Apr 8, 2023

raphael0202 commented Apr 10, 2023 •

edited

Loading

raphael0202 commented Apr 10, 2023

ReedAndreas commented Apr 11, 2023

raphael0202 commented Apr 11, 2023

raphael0202 commented May 18, 2023

Detecting UPC as product picture when >50% of an image is the UPC to reduce low quality main images #1056

Detecting UPC as product picture when >50% of an image is the UPC to reduce low quality main images #1056

Comments

brierjon commented Jan 31, 2023

Problem

Proposed solution

Additional context

Mockups

Part of

raphael0202 commented Apr 5, 2023

teolemon commented Apr 5, 2023

brierjon commented Apr 5, 2023

ReedAndreas commented Apr 8, 2023

raphael0202 commented Apr 10, 2023 • edited Loading

raphael0202 commented Apr 10, 2023

ReedAndreas commented Apr 11, 2023

raphael0202 commented Apr 11, 2023

raphael0202 commented May 18, 2023

raphael0202 commented Apr 10, 2023 •

edited

Loading