Skip to content

gapatino/Doing-frequentist-statistics-with-Scipy

Repository files navigation

Doing frequentist statistics with Scipy

Repository for the PyData DC 2016 tutorial

The objective of the project is to review the functions and methods available in the Scipy.stats library to perform common frequentist statistical tests; including how to format the data and interpret results. The tests will be run using data from the iris data set. Some common data handling commands in Pandas, along with plotting using Matplotlib and Seaborn will also be mentioned. The following statistical tests will be covered:

  • Normality testing
  • Homogeneity of variance testing
  • Comparing 2 samples of a continuous measure: t-tests, Cohen's d, Wilcoxon rank-sum, Mann-Whitney U test, Wilcoxon test
  • Comparing multiple groups: ANOVA, Kruskal-Wallis H
  • Contingency tables: Chi square, Fisher's exact test
  • Correlation: Pearson's correlation coefficient r, Spearman rank-order correlation coefficient rho, Point-biserial correlation coefficient, Kendall's Tau
  • Linear regression. This test will require the use of the Statsmodels library
  • Logistic regression. This test will require the use of the Statsmodels library

Files in the repository:

  • The iris data set in CSV format. This is the same dataset available in the Scikit-learn library
  • The Python 3 Jupyter Notebook with the code

The video of the presentation is available at:

https://www.youtube.com/watch?v=UNp9Bavok0o

About

Repository for the PyData DC 2016 tutorial

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published