This simple program evaluates two directories of images and finds images that are similar. It aims to be really fast and simple to use.
Both directories are searched recursively for any compatible image formats (.jpg
, .png
, .heic
).
There are three phases to using this tool: finding duplicate images, confirming the detected duplicates, and deleting the confirmed duplicates.
The first argument is the directory for the images to match against. You can consider this directory the "originals" which you don't want to be deleted. The second argument is the directory containing images which may be duplicated and which you may want to delete files from.
dedugo find-duplicates ./reference/image/directory ./evaluation/image/directory
Things will happen. Silicon will get hot. Fans will spin.
The check-results
subcommand allows the user to visually confirm if detected duplicates are actually duplicate images. Because no algorithm is perfect, false positives are likely to happen. This will allow the user to confirm if a pair of images is a duplicate or not.
dedugo check-results
Once duplicate images are confirmed, they can be deleted in one fell swoop by running:
dedugo delete-duplicates
- Allow user to visually confirm if paired images are indeed duplicates or are actually just very similar
- Convert this to use Cobra
- A GUI would be neat
- Add
delete
command to remove confirmed duplicates - Make image loading faster
- Allow user to specify output filename for
find-duplicates
and input filename forcheck-results
- I probably need to incorporate the idea of similar image clusters rather than just image pairs.
- Write tests...
- GUI should show if an image has already been confirmed and allow user to unmark it.
- Prevent system sleep while running
find-duplicates
.
A special thanks to Vitali Fedulov for writing the Go package upon which this tool is built.
And also, thanks to jdeng for writing the heif decoder and adrium for maintaining a fork of it that runs on my Linux install.