Script for content based image classification using the bag of visual words approach.
The script is a Python version of phow_caltech101.m, a 'one file' example script using the VLFeat library to train and evaluate a image classifier on the Caltech-101 data set.
Like the original Matlab version this Python script achives the same (State-of-the-Art in 2008) average accuracy of around 65% as the original file:
- PHOW features (dense multi-scale SIFT descriptors)
- Elkan k-means for fast visual word dictionary construction
- Spatial histograms as image descriptors
- A homogeneous kernel map to transform a Chi2 support vector machine (SVM) into a linear one
- Liblinear SVM (instead of the Pegasos SVM of the Matlab script)
If you need 2016 state of the art performance for image classification check out keras.
The code also works with other datasets if the images are organized like in the Calltech data set, where all images belonging to one class are in the same folder:
.
|-- path_to_folders_with_images
| |-- class1
| | |-- some_image1.jpg
| | |-- some_image1.jpg
| | |-- some_image1.jpg
| | └ ...
| |-- class2
| | └ ...
| |-- class3
...
| └-- classN
There are no constraints for the names of the files or folders. File extensions can be configures in conf.extensions
Requisite:
- VLFeat with a Python wrapper
- scikit-learn to replace VLFeat ML functions that don't have a Python wrapper yet.
- The Caltech101 dataset