Extreme-Classification

DS-GA 1003 (Machine Learning) - Group project on "Extreme Classification"

Dataset

Download train.csv and dev.csv from here.
Place the files inside the directory data/raw/ in the root of the repository. (Important)
Run the following command to convert sparse data to "normal" dataframes, from the root of the repository.
```
cd code
python construct_data.py
```

The dataframes can then be loaded as follows:

import pandas as pd

NUM_FEATURES = 5000
NUM_CLASSES = 3993

# Assuming we are in one of the sub-directories (code, notebooks, etc)
features = pd.read_csv("../data/expanded/train_features.csv", names=range(NUM_FEATURES))
labels = pd.read_csv("../data/expanded/train_labels.csv", names=range(NUM_CLASSES))

Tracking the project

Refer to the project tracker.

Requirements

numpy
pandas
scikit-learn
matplotlib
tqdm
PyTorch
PyTorch-Lightning
Tensorboard (for visualizing results)

Name		Name	Last commit message	Last commit date
Latest commit History 104 Commits
code		code
docs		docs
notebooks		notebooks
public_data		public_data
relevant_papers		relevant_papers
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Extreme-Classification

Dataset

Tracking the project

Requirements

About

Releases

Packages

Contributors 4

Languages

MrinalJain17/Extreme-Classification

Folders and files

Latest commit

History

Repository files navigation

Extreme-Classification

Dataset

Tracking the project

Requirements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages