DS-GA 1003 (Machine Learning) - Group project on "Extreme Classification"
-
Download
train.csv
anddev.csv
from here. -
Place the files inside the directory
data/raw/
in the root of the repository. (Important) -
Run the following command to convert sparse data to "normal" dataframes, from the root of the repository.
cd code python construct_data.py
-
The dataframes can then be loaded as follows:
import pandas as pd NUM_FEATURES = 5000 NUM_CLASSES = 3993 # Assuming we are in one of the sub-directories (code, notebooks, etc) features = pd.read_csv("../data/expanded/train_features.csv", names=range(NUM_FEATURES)) labels = pd.read_csv("../data/expanded/train_labels.csv", names=range(NUM_CLASSES))
Refer to the project tracker.
- numpy
- pandas
- scikit-learn
- matplotlib
- tqdm
- PyTorch
- PyTorch-Lightning
- Tensorboard (for visualizing results)