Welcome to the repository for the Robot or Brain project. The purpose of the project was to create a computer vision model that recognizes the framing behind images depicting AI. More about the project can be read here .
Clone this repo:
git clone [email protected]:robot-or-brain/robot_or_brain.git
Install requirements:
pip install -r requirements.txt
This project code can be used to train models through a number of different ways. In any case, images used for training and validation need to be organized in the expected file structure, see organize-images. When that's done, either a resnet-50 model can be fine-tuned directly from the images or features from either the resnet50 or CLIP models can be precomputed once in order to train a small neural network on top of those features. Precomputing features once is much more energy and time efficient and also yielded the best results in our project. The main reason to train directly on images however, is that it offers the possibility to include simple augmentations (scale, shift, flip, etc.) on the images during training. Using these augmentations had never worked during our project, but would in theory be a technique against overfitting. Finally, we can create a zero-shot classifier, without using any train data using the clip model.
For easy loading with Keras and other frameworks, we want images organized in folders as follows.
/ root
/images_by_class
/train
/ class_robot
/ file1.jpg
/ file2.jpg
/ class_brain
/ file1.jpg
/ file2.jpg
/validation
..
/test
..
However, our dataset was organized as follows:
/ root
/ metadata.csv
/ database_a
/ file1.jpg
/ file2.jpg
/ database_b
/ file1.jpg
/ file2.jpg
The metadata csv-file contained the following columns:
"id","imageid","database_name","aiframe","status","coder","coded_time"
Use organize_image_folders.py
to get such a dataset reorganized to the keras-ready form as described above.
python organize_image_folders.py example_data/metadata.csv
This will both restructure the data and split data into train, validation and test sets.
A model can be trained on precomputed features (recommended) or directly from the images using a feature extractor model on the fly (inefficient) or using the zero-shot method (bad performance).
Features can be computed and stored to disk in order to train a model on those afterwards. Images need to be in a directory structure as indicated in Organize images.
Either a CLIP or ResNet50 model can be used to compute features. Both can be computed one after another, but not during the same run. In this case, both features will be written to their own column in the datafile that is output. Both feature types cannot be processed in a single command because CLIP is using PyTorch and ResNet50 is using Tensorflow, and importing and using both frameworks is problematic.
Note that precomputing features costs a lot of computation, in the order of a 2-3 seconds per image. This procedure was never optimized given the scope and time budget of the project, as ideally it only has to be run once per image ever. This process could be sped up a little by using a GPU. To fully leverage the GPU however, the script should be adapted to load and process batches of images together in parallel.
python save_clip_and_resnet_features.py --data_base_path my_data_folder/ --model clip
Images will be read from the directory structure under the given folder. Features will be stored in a pickled dataframes in files train.pk
, validation.pk
and test.pk
in the given folder.
A random forest classifier can be trained on precomputed CLIP features. To do so run:
python train_random_forest_on_precomputed_features.py --data_base_path=my_data_folder/ --number_of_trees=50
or append > output.txt
if the output needs to be stored in a text file instead of printed to screen.
This will train a classifier, score it using a validation set, print the scores to screen (or to a text file) and save the model in the data folder.
A neural network classifier can be trained on the precomputed ResNet50 features (or CLIP features, or both types combined). To do this, run:
python train_on_precomputed_features.py --data_base_path=data_folder/ --batch_size=32 --feature_type=clip --learning_rate=0.0001 --dropout_rate=0 --epochs=500 --lr_decay=0
This will train a model and save it to disk. Note the hyperparameters learning_rate
, dropout_rate
, epochs
and lr_decay
. Their explanations are outside the scope of this text but can be found in deep learning tutorials.
The model will be written into a folder containing several files. The model can be validated using the notebook performance_fine_tune_clip.ipynb
.
Using CLIP from openAI, we can do zero-shot classification, meaning that we don't use any training data. This classifier has the worst classification performance by far, but it is amazing that it sort of works at all given the absence of any training on our dataset.
See notebook performance_clip_zero_shot.ipynb
.
A model can be trained directly on the images using a ResNet50 model as a basis. This gives options for fine tuning of the ResNet50 model or using on-the-fly augmentation of images. Neither fine tuning or image augmentation was performed/implemented within this project, so running this doesn't have any benefits over the other options. For this option, using a GPU is highly recommended.
train_model_on_features.py --batch_size=16 --data_base_path=data_folder/ --dropout_rate=0.95 --epochs=50 --learning_rate=0.0003 --lr_decay=0.0001 --use_augmentation=False
Several model files will be saved to a into a folder. The model can be validated using the notebook performance_fine_tuned_resnet.ipynb
.
The best performing model is stored in ./clip_features_model_kf5cnvvi
. It is a small neural network trained on precomputed clip features. To run it on a new image you first need to precompute features from the image using clip. Then the model can be run using those features as done in performance_fine_tune_clip.ipynb.