As part of the implementation series of Joseph Lim's group at USC, our motivation is to accelerate (or sometimes delay) research in the AI community by promoting open-source projects. To this end, we implement state-of-the-art research papers, and publicly share them with concise reports. Please visit our group github site for other projects.
This project is implemented by Shao-Hua Sun and the codes have been reviewed by Te-Lin Wu before being published.
This project is a Tensorflow implementation of Representation Learning by Learning to Count. This paper proposes a novel framework for representation learning, where we are interested in learning good representations of visual content, by utilizing the concept of counting visual primitives.
In particular, it exploits the fact that the number of visual primitives presented in an image should be invariant to transformations such as scaling, rotation, etc. Given this fact, the model is able to learn meaningful representations by minimizing a contrastive loss where we enforce that the counting feature should be different between a pair of randomly selected images. During the fine-tuning phase, we train a set of linear classifiers to perform an image classification task on ImageNet based on learned representations to verify the effectiveness of the proposed framework. An illustration of the proposed framework is as follows.
The implemented model is trained and tested on ImageNet.
Note that this implementation only follows the main idea of the original paper while differing a lot in implementation details such as model architectures, hyperparameters, applied optimizer, etc. For example, the implementation adopts the VGG-19 architecture instead of AlexNet which is used in the origianl paper.
*This code is still being developed and subject to change.
The ImageNet dataset is located in the Downloads section of the website. Please specify the path to the downloaded dataset by changing the variable __IMAGENET_IMG_PATH__
in datasets/ImageNet.py
. Also, please provide a list of file names for trainings in the directory __IMAGENET_LIST_PATH__
with the file name train_list.txt
. By default, the train_list.txt
includes all the training images in ImageNet dataset.
Train models with downloaded datasets. For example:
$ python trainer.py --prefix train_from_scratch --learning_rate 1e-4 --batch_size 8
Train models with downloaded datasets. For example:
$ python trainer_classifier.py --prefix fine_tune --learning_rate 1e-5 --batch_size 8 --checkpoint train_dir/train_from_scratch-ImageNet_lr_0.003-20170828-172936/model-10001
Note that you must specify a checkpoint storing the pretrained model. Also, linear classifiers are applied to all the features including conv1
, conv2
, ..., fc1
, fc2
, ..., etc, coming from the pretrained model with the same learning rate, optimizers, etc. To fine tune the model only with a certain feature, please specify it in the code model_classifier.py
.
$ python evaler.py --checkpoint train_dir/fine_tune-ImageNet_lr_0.0001-20170915-172936/model-10001
- Create a directory
$ mkdir datasets/YOUR_DATASET
- Create a input helper
datasets/YOUR_DATASET.py
following the format ofdatasets/ImageNet.py
- Specify the path ot the image and the list of file names.
- Modify
trainer.py
. - Finally, train and test your models
$ python trainer.py --dataset YOUR_DATASET
$ python trainer_classifier.py --dataset YOUR_DATASET --checkpoint train_dir/train_from_scratch-YOUR_DATASET_lr_0.003-20170828-172936/model-10001
$ python evaler.py --dataset YOUR_DATASET --checkpoint train_dir/fine_tune-YOUR_DATASET_lr_0.0001-20170915-172936/model-10001
- Representation Learning: A Review and New Perspectives by Bengio et. al.
- Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles by Noroozi et. al.
- Unsupervised Representation Learning by Sorting Sequence by Lee et. al.
Shao-Hua Sun / @shaohua0116 @ Joseph Lim's research lab @ USC