Skip to content

DS-GA 1003 (Machine Learning) - Group project on "Extreme Classification"

Notifications You must be signed in to change notification settings

MrinalJain17/Extreme-Classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Extreme-Classification

DS-GA 1003 (Machine Learning) - Group project on "Extreme Classification"

Dataset

  1. Download train.csv and dev.csv from here.

  2. Place the files inside the directory data/raw/ in the root of the repository. (Important)

  3. Run the following command to convert sparse data to "normal" dataframes, from the root of the repository.

    cd code
    python construct_data.py
  4. The dataframes can then be loaded as follows:

    import pandas as pd
    
    NUM_FEATURES = 5000
    NUM_CLASSES = 3993
    
    # Assuming we are in one of the sub-directories (code, notebooks, etc)
    features = pd.read_csv("../data/expanded/train_features.csv", names=range(NUM_FEATURES))
    labels = pd.read_csv("../data/expanded/train_labels.csv", names=range(NUM_CLASSES))

Tracking the project

Refer to the project tracker.

Requirements

  • numpy
  • pandas
  • scikit-learn
  • matplotlib
  • tqdm
  • PyTorch
  • PyTorch-Lightning
  • Tensorboard (for visualizing results)

About

DS-GA 1003 (Machine Learning) - Group project on "Extreme Classification"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •