This code generates the 1st place solution of Tradeshift Text Classification from our team "carl and snow"
https://www.kaggle.com/c/tradeshift-text-classification
It mainly includes two kinds of models:
- two-stage models using Xgboost and sklearn.
- online logistic regression.
Dependencies Python 2.7 pypy 2.4.0 Scikit learn-0.15.2 numpy 1.7.1 scipy 0.11.0 Xgboost 0.3
To generate a solution:
- Set Up all the dependencies
- change the data dir in run.py
- change the xgboost wrapper path in ./src/xgb_classifier.py
- python run.py
The best single solution: xgb-part1-d18-e0.09-min6-tree120-xgb_base.csv private LB 0.0044595
The best ensemble solution: best-solution.csv private LB 0.0043324 (1st place)