GitHub - AnnaTz/dff-concept-discovery: Implementation of Deep Feature Factorization to unravel and visualize high-level concepts learned by CNNs, based on methodologies from ECCV 2018’s 'Deep Feature Factorization for Concept Discovery'.

This project builds upon the methodologies discussed in "Deep Feature Factorization for Concept Discovery" by Edo Collins et al., ECCV 2018. The paper introduces an approach to understanding what deep learning models learn by extracting and visualizing high-level concepts from image data. The core idea revolves around using Non-negative Matrix Factorization (NMF) on the features extracted by a Convolutional Neural Network (CNN) classifier (specifically VGG-19). This effectively decomposes a set of images into semantic components (or 'concepts') that provide insights into the features the CNN has learned to focus on.

Key Features

Interpretable Deep Learning: Implements Deep Feature Factorization (DFF), a method to make deep learning models more transparent by breaking down what the models learn from the data.
Training and Inference Pipelines: Beyond replicating the DFF method, this project introduces custom training and inference pipelines for extracting and visualizing concepts.
Training Dataset Optimization: Fine-grains the training dataset to enhance the reliability and quality of concept discovery based on a novel metric of discovery confidence.
Number of Concepts Estimation: Uses this metric to automatically estimate an optimal number of concepts that the model can discover in the training dataset.
Concept Visualization: Utilizes advanced image processing techniques to visualize the heatmaps, extract significant regions, and contour them on images, making the learned features tangible and understandable.

Project Structure

demo.ipynb: An IPython notebook that showcases the use of the training and inference pipelines along with visualization of their results.
concept_discovery/: A directory containing the Python modules for concept discovery.
- data.py: Handles data loading, preprocessing, and fine-graining.
- dff.py: Contains the logic for Deep Feature Factorization, including feature extraction, heatmap preparation, and concept confidence estimation.
- nmf.py: Implements Non-negative Matrix Factorization.
- pipelines.py: Defines the training and inference pipelines for concept discovery.
- utils.py: Provides utility functions for image manipulation, directory management, and visualization.

Getting Started

Source Paper: It is highly recommended to read the paper this project is based upon.
Installation: Ensure Python 3.8+ is installed. Then, use pip install -r requirements.txt to install the necessary dependencies.
Run the Demo: To discover concepts of an ImageNet class, add a directory of images named after the class to the imagenet_data directory. Then launch demo.ipynb with Jupyter Notebook to run the training and inference pipelines and visualize the discovered concepts.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
concept_discovery		concept_discovery
imagenet_data		imagenet_data
results/training_results/flamingo		results/training_results/flamingo
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
demo.ipynb		demo.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Key Features

Project Structure

Getting Started

About

Releases

Packages

Languages

AnnaTz/dff-concept-discovery

Folders and files

Latest commit

History

Repository files navigation

Key Features

Project Structure

Getting Started

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages