Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect near identical images #1120

Closed
raphael0202 opened this issue May 23, 2023 · 0 comments
Closed

Detect near identical images #1120

raphael0202 opened this issue May 23, 2023 · 0 comments
Labels

Comments

@raphael0202
Copy link
Collaborator

Problem

It seems we have an increasing number of duplicated images in Open Food Facts database: images with different binary content, but that are however almost identical.

Example:

We want to detect these almost identical images to remove them. It enables us to save disk space and make the work of contributors easier.

Proposed solution

Use fingerprinting techniques to assign a single hash to each image. See this blog post for more information about image fingerprinting. Explore the recall/precision trade-off for each hashing techniques.

The documentation of the undouble library is available here.

You can download all images of a selected subset of products using Open Food Facts Images dataset, and detect quasi-similar images. A manual analysis of results should be performed to assess which technique is the most robust for our use case and the precision/recall/accuracy metrics.

Related issues

openfoodfacts/openfoodfacts-server#8445

@github-actions github-actions bot added ⭐ top issue Top issue. ⭐ top feature Top feature request. labels Aug 18, 2023
@openfoodfacts openfoodfacts locked and limited conversation to collaborators Aug 29, 2023
@raphael0202 raphael0202 converted this issue into discussion #1201 Aug 29, 2023

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
Development

No branches or pull requests

1 participant