These are the materials for a roughly one day course intended to provide an introduction to some of the key methods and concepts in machine learning, aimed at a scientific audience.
The presentation can be viewed at http://ljdursi.github.io/ML-for-scientists .
The intent is that attendees with some experience in scientific data analysis (curve fitting, etc) and some familiarity with python or R would, after working through this material:
- Have some basic familiarity with key terms,
- Have used a few standard fundamental methods, and have a grounding in the underlying theory,
- Understand some basic concepts with broad applicability.
It covers, in python (sklearn, but also some other packages), most or all of the following methods:
- Regression:
- OLS
- LOESS
- Lasso
- Classification
- Logistic Regression
- kNN
- Naive Bayes
- Density estimation
- Kernel Methods
- Clustering
- k-means,
- hierarchical clustering
... but more importantly, it covers these concepts:
- Bias-Variance Tradeoff
- Resampling methods
- Bootstrapping
- Cross-Validation
- Permutation tests
- Model Selection
- Variable Selection
- Multiple Hypothesis Testing