Skip to content

Latest commit

 

History

History
66 lines (50 loc) · 4.33 KB

Datathons.md

File metadata and controls

66 lines (50 loc) · 4.33 KB

Datathons

Discover

Checklist

  1. Learn more about the problem. Search for similar Kaggle competitions. Check the task in Papers with Code.
  2. Do a basic data exploration. Try to understand the problem and gather a sense of what can be important.
  3. Get baseline model working.
  4. Design an evaluation method as close as the final evaluation. Plot local evaluation metrics against the public ones (correlation) to validate how well your validation strategy works.
  5. Try different approaches for preprocessing (encodings, Deep Feature Synthesis, lags, aggregations, imputers, ...). If you're working as a group, split preprocessing feature generation between files.
  6. Plot learning curves (sklearn or external tools) to avoid overfitting.
  7. Plot real and predicted target distribution to see how well your model understand the underlying distribution. Apply any postprocessing that might fix small things.
  8. Tune hyper-parameters once you've settled on an specific approach ([hyperopt](target distribution), optuna).
  9. Plot and visualize the predictions (histograms, random prediction, ...) to make sure they're doing as expected. Explain the predictions with SHAP.
  10. Think about what postprocessing heuristics can be done to improve or correct predictions.
  11. Stack classifiers (example).
  12. Try AutoML models. For tabular data: TPOT, AutoSklearn, AutoGluon, Google AI Platform, PyCaret, Fast.ai, Alex.For time series: AtsPy, DeepAR.

Preprocessing Resources

Exploratory Data Analysis Resources

Scikit Learn Compatible Transformers

Other Compatible Tools

Time Series Resources

Datathon Platforms