Open Source Testing Framework for image correlation, distance and analysis. Strongly related to : Douglas-Quaid
A lot of information collected or processed by CIRCL are related to images (at large from photos, screenshots of website or screenshots of sandboxes). The datasets become larger and analysts need to classify, search and correlate throught all the images.Building a generic library and services which can establish correlation between pictures. In order to achieve this goals, experiments needs to be conducted. This is the goal of this repository.
- Review of existing algorithms, techniques and libraries for calculating distances between images, State Of The Art : MarkDown | PDF version
See requirements.txt
(...)
(...)
(...)
in /lib_testing you just have to launch "python3 ./launcher.py" Parameters are hardcoded in the launcher.py, as :
- Path to pictures folder
- Output folder to store results
- Requested outputs (result graphe, statistics, LaTeX export, threshold evaluation, similarity matrix ...)
This is currently working on most configuration and will explore following algorithms for matching :
- ImageHash Algorithms (A-hash, P-hash, D-hash, W-hash ... )
- TLSH (be sure to have BMP pictures or uncompressed format at least. A function is available to convert pictures in /utility/manual.py)
- ORB (and its parameters space)
- ORB Bag-Of-Words / Bag-Of-Features (and its parameters space, including size of the "Bag"/Dictionnary)
- ORB RANSAC (with/without homography matrix filtering)
You can also manually generate modified datasets from your original dataset :
- Text detector and hider (DeepLearning, Tesseract, ...)
- Edge detector (DeepLearning, Canny, ...)
- PNG/BMP versions of pictures (compressed/uncompressed)
(...)
(...)
For the algorithms test library : See installation instruction
- Original project structure source
- Clean library implementation of algorithms
- Followed practice for logging
- Text detector model source
- Built-for-the-occasion manual image classificator
- Bibliography
PR are welcomed. New issues are welcomed if you have ideas or usecase that could improve the project.