This repository contains code for the NGI internal Machine Learning workshop presented in spring / early summer 2022. We upload code to this repo after each session.
Before you start to code, make sure you have installed:
- An IDE: VSCode, Spyder, Atom, Pycharm etc.
- An package handling and coding environment system. In further coding sessions we show instructions using conda, but feel free to use other systems such as pipenv etc. Conda can be downladed either in a GUI version, using Anaconda (https://www.anaconda.com/products/distribution), or the version called miniconda (https://docs.conda.io/en/latest/miniconda.html) without GUI and tons of other stuff in Anaconda you probably don't need :-)
Build the directory structure:
├── Data
│ ├── raw <- Raw data from third party sources.
│ ├── processed <- Processed data ready for modelling
├── Figures <- Saved figures from processing and results
├── src <- Script files with functionality
It is also possible to run the bash-script in the repo to setup the structure by running:
bash make_data_structure.sh
Download and save the dataset into the Data/raw directory.
First we would like to mention three excellent papers that describe good practise in scientific computing.
- https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005510&ref=https://githubhelp.com
- https://doi.org/10.1016/j.patter.2021.100206.
- https://journals.plos.org/plosbiology/article/info:doi/10.1371/journal.pbio.1001745
Git is a version control system. To get the code locally on your computer. You do this only once.
-
Install git
-
On a linux or windows terminal maneuver to your project directory where you want to store different coding projects.
-
Clone repo with:
git clone <url copied from repo>
Use one environment for each coding project. For these 5 sessions we will use the same datasets and all sessions are basically a part of the same project. It is then ok with the the same environment for all sessions.
-
Create an environment called
ml_sessions_2022
usingenvironment.yaml
with the help ofconda
. If you get pip errors, install pip libraries manually, e.g.pip install pandas
conda env create --file environment.yaml
-
Activate the new environment with:
conda activate ml_sessions_2022
We recommend to register for your own github account and make one repo for your code in the workshop sessions. After every session you push the code to your personal github repo. That is a good learning task for taking care of version control!
We recommend some steps before each workshop session
-
Get the latest code in repo. You will probably be asked for a git-token for authentication:
git pull
-
Update the necessary libraries and activate the environment by calling:
conda env update --file environment.yaml conda activate ml_sessions_2022