Skip to content

Collection of stats, modeling, and data science tools in Python and R.

Notifications You must be signed in to change notification settings

pmaji/data-science-toolkit

Repository files navigation

Introduction

Welcome! The purpose of this repository is to serve as stockpile of statistical methods, modeling techniques, and data science tools. The content itself includes everything from educational vignettes on specific topics, to tailored functions and modeling pipelines built to enhance and optimize analyses, to notes and code from various data science conferences, to general data science utilities. This will remain a work in progress, and I welcome all contributions and constructive criticism. If you have a suggestion or request, please use the "Issues" tab and I will endeavor to respond expeditiously!

Note: GitHub often has trouble rendering larger .ipynb files in particular. If you find that you are unable to view one of the jupyter notebooks linked below, I recommend copy and pasting the result into jupyter's nbviewer, which will take you to a viewable link like this one here for my "Visualization with Plotly" notebook. Note that if you want to ensure that you are viewing the most up-to-date version of the notebook with nbviewer, you should add ?flush_cache=true to the end of the generated URL as is described here; otherwise, your link risks being slightly out-of-date.

Table of Contents

  1. Playground and Basics
    1. Rough Notes from ISLR Exercises -- R
    2. Rough Notes from Python Data Scientist Track -- Python
  2. Exploratory Data Analysis (EDA) and Visualization
    1. Practical Data Visualization with Python (Full Course) -- Python
    2. EDA and Basic Viz. -- R
    3. Visualizing Geographic Data -- Python
    4. Radar Charts -- Python
  3. Hypothesis Testing
    1. Kolmogorov-Smirnov Test (KS Test) -- R
    2. Useful Hypothesis Testing Functions -- R
  4. Classification
    1. Logistic Regression (Ridge and Lasso Methods Included) -- R
    2. Useful Classification Functions -- R
    3. Basic Tree Models -- R
    4. KNN -- R
  5. Regression
    1. Linear Regression -- Python
  6. Reinforcement Learning
  7. Text Mining and Natural Language Processing (NLP)
    1. Basic Texting Mining and NLP -- R
  8. Time Series
    1. Time Series Forecasting with Facebook's Prophet Package -- Python
  9. Notes and Material from Data Science Conferences
    1. PyData 2018 DC Conference (Notes and Tutorial Code) -- Python
    2. Max Khun / RStudio Supervised Learning 2019 DC Conference -- R
    3. PyCon 2019 Conference (Notes and Session Code) -- Python
  10. Utilities
    1. HTML File Appender (Using Beautiful Soup) -- Python

Contribution Info

All are welcome and encouraged to contribute to this repository. My only request is that you include a detailed description of your contribution, that your code be thoroughly-commented, and that you test your contribution locally with the most recent version of the master branch integrated prior to submitting the PR.