Project in DD2434 Machine Learning Advance Course, Winter 2016.
Name | GitHub |
---|---|
Federico Baldassarre | baldassarreFe |
Zacharie Brodard | zach-b |
Alfredo Fanghella | alfredojf |
Lucas Rodés | lucasrodes |
We reproduced the experiments presented in the paper Kernel PCA and De-noising in Feature Spaces by Sebastian Mika, Bernhard Schölkopf, Alex Smola Klaus-Robert Müller, Matthias Scholz and Gunnar Rätsch. In this regard, you can read our report and our presentation.
In order to run the experiments, make sure you have all dependencies installed
- matplotlib (>= 2.0.0)
- pandas (>=0.19.2)
- rpy2 (>=2.8.5)
- scikit-image (>=0.12.3)
- scipy (>=0.19.0)
- numpy (>=1.12.1)
- sklearn (>=0.0)
You can install them by typing
pip3 install -r requirements.txt
We strongly recommend using a virtual environment in order to keep these dependencies isolated from the rest of the system. Follow the instructions here to set up you virtual environment.
In the paper, there are three major experiments:
- Toy example: 11 Gaussians
- Toy example: De-noising
- Digit denoising (USPS Dataset)
The file our_kpca.py contains our own implementation of the kPCA method, based on the paper approach.
The code related to this example can be found in example1.py.
Run the script as
python3 example1.py
By default, this script outputs the kPCA MSE, PCA MSE and their ratio for 45 different settings of sigma.
The code related to this example can be found in example2.py
Run the script as
python3 example2.py
Once the execution has ended, a picture as follows will be displayed.
You might get some warnings, just ignore them.
⚠️ Known issue: the USPS dataset is no longer available at mldata.org, we will look into an alternative source
The code related to this example can be found in example3.py
Run the script as
python3 example3.py