This workshop is being developed by the Ifakara Health Institute (IHI) and the Swiss Tropical & Public Health Institute (Swiss TPH), with the support of the Leading House Africa (LHA) which promotes and fosters bilateral collaboration with partner institutions in Africa.
Welcome!
🗓️ September 26-28, 2022
🕘 09:00 - 17:00
🌇 Dar-es-Salaam, Tanzania (Protea Hotel by Marriott Dar es Salaam Courtyard)
Data science and artificial intelligence have the potential to generate fundamentally new insights on global health policies in Africa, but the full realization of this potential depends on the availability of a critical mass of highly trained health data scientists on the continent. The goal of this project is to jointly develop and implement a public health data science curriculum to enable researchers from the Ifakara Health Institute (IHI) in Tanzania to strengthen their expertise in the area.
Two complementary aspects of moving into data science are:
- the mindset about how scientists think and collaborate about data, and
- the skillsets which is composed of an ecosystem of tools (mostly open-source) and practices.
Upon completing the workshop, participants will have gained:
- exposure to data science approach, tools and collaborative practices
- hands-on experience on how to interface between Stata and R, learned the basics of working with data in R/RStudio, and how to incrementally incorporate R into your existing data analysis workflows in Stata. The idea is not to replace everything you do in Stata into R but that you can continue your learning after this workshop at your own pace.
This workshop is relevant for individuals who answer yes to the following questions:
- Do you who want to develop data science projects in public health?
- Do you wants to learn more about how open and reproducible science approaches can be used in your daily practice?
- Are you a Stata user (or any other data analysis language) who would like to expand your data analysis skillset with R?
- Do you want to bridge analyses between data analysis tools (Stata, R or Python) and to more easily collaborate with other researchers who use another of these tools?
Time (group) | Day 1 (stream 1) | Day 1 (stream 2) | Day 2 (stream 1) | Day 2 (stream 2) |
---|---|---|---|---|
8:30-9:00 | Welcome | Welcome | Welcome | Welcome |
9:00-10:30 | Data science introduction | Data science introduction | Sharing | Sharing |
Break | ||||
11:00-11:30 | Introduction to Git | Introduction to Git | ||
11.30-13:00 | Public Health question (need) | Public Health question (data strategy) | TBD | Machine learning analysis |
Lunch break | ||||
14:00-15:30 | Analysis Plan | R, RStudio and RMarkdown | Sharing | Sharing |
Break | ||||
16:00-17:30 | N/A | ODK Central API, ruODK | Wrap-up | Wrap-up |
This project aims to accompany researchers to progress on the following development axes:
- Data science mindset
- Use of reproducible research practices in public health
- Data provenance
- Use of distinct data sources for the development of public health indicators
- Research data vs. real world evidence data
- Ethical data science
- Data papers
- Data science skillset
- Programming tools
- Move from Stata to R (prerequisite: Stata)
- R programming
- dplyr
- Python programming
- pandas
- scikit-learn (prerequisite: independent Python user)
- Coding with best practices (R/RStudio/tidyverse)
- Versioning using GitHub (all)
- Using targets (prerequisite: independent R user)
- Reporting and publishing: Dynamic report generation
- Using Stata (prerequisite: Stata)
- Using R/Rmarkdown (prerequisite: R basics)
- Notebooks (Python Jupyter Notebook, Rmarkdown as a notebook - prerequisite: Python/R basics)
- Reproducible data
- Use APIs (prerequisite: IT programming basics)
- Open access data (all)
- Statistical methods for reproducible research (advanced)
- Programming tools
What is not covered
- Reproducible workflows (targets)
- Reproducible environments (Binder, Docker, renv, etc)
- Samwel Lwambura
- Hajirani Msuya
- Ibrahim Mtebene
- Charles Festo
- Fenella Beynon
- Hélène Langet
- Silvia Cicconi
- Gillian Levine
- Fabian Schär
Soon - here guidance to install the (free) software to be used in the workshop will be given.
Soon - here a succinct description of the data to be used as part of the workshop will be given.
This work is licensed under a Attribution 4.0 International (CC BY 4.0).
You are free to:
- Share — copy and redistribute the material in any medium or format
- Adapt — remix, transform, and build upon the material for any purpose, even commercially.
Under the following terms:
- Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
- No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.