Skip to content

Commit

Permalink
Merge pull request #62 from uwhackweek/tech-module
Browse files Browse the repository at this point in the history
Update technology training
  • Loading branch information
scottyhq authored Jun 18, 2024
2 parents 178171d + 6691b6a commit d64faf5
Show file tree
Hide file tree
Showing 19 changed files with 266 additions and 98 deletions.
3 changes: 1 addition & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,7 @@ jb build docs

## Contact

* [Anthony Arendt](mailto:[email protected])
* [Scott Henderson](mailto:[email protected])
* [email eScience](mailto:[email protected])

## License

Expand Down
13 changes: 8 additions & 5 deletions docs/_toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ parts:

- caption: Organizer Training
chapters:
- file: training/tutorials/index
- file: training/tutorials/index
sections:
- file: training/tutorials/tutorial-design
- file: training/tutorials/tutorial-notebooks
Expand All @@ -19,17 +19,20 @@ parts:
- file: training/projects/project-during
- file: training/projects/project-after
- file: training/projects/project-github
- file: training/technology/index
- file: training/technology/index
sections:
- file: training/technology/data-management
- file: training/technology/recognition
- file: training/technology/jupyter-book
- file: training/technology/data-management
- file: training/technology/help-me

- file: training/strategy/index
- file: training/strategy/index

- file: training/culture/index
- file: training/culture/index
sections:
- file: training/culture/psychological-safety.md
- caption: Reference
chapters:
- file: reference/bibliography
- file: reference/glossary
- file: reference/acknowledgements
1 change: 1 addition & 0 deletions docs/images/SupportDecisionTree.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/commit-changes.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/copy-team-template.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/create-pull-request.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/create-team-entry.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/github-actions-checks.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/github-actions-details.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/meet-the-team.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/suggest-edit-side-by-side.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8 changes: 4 additions & 4 deletions docs/overview/training-modules.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,10 @@

The modules in this website are designed to prepare organizers of the University of Washington eScience Institute's Hackweek program. We hope to share professional development tools for you to improve your capacity to teach, share ideas, lead project work and advocate for open and reproducible science. We hope the skills acquired here are applicable to other related training you participate in, including those in a Formal Higher Education setting.

Our first two modules focus on improving our capacity to train hackweek participants to learn new tools for their research. We'll begin by exploring how to choose appropriate *learning content*, which refers to the topics that we choose to present to learners and the depth and breadth of coverage of that content. We will also consider *pedagogy*, which refers to how we teach material, whether in a lecture, discussion- or project-based format, as well as the methods we employ to interact with learners. In our [Tutorial Design](tutorials/index.md) and [Project Design](projects/index.md) modules we will teach you about designing hackweek learning content, and our recommended pedagogical approaches for hackweek particiapants, within the existing framework of our tutorial and project sessions.
Our first two modules focus on improving our capacity to train hackweek participants to learn new tools for their research. We'll begin by exploring how to choose appropriate *learning content*, which refers to the topics that we choose to present to learners and the depth and breadth of coverage of that content. We will also consider *pedagogy*, which refers to how we teach material, whether in a lecture, discussion- or project-based format, as well as the methods we employ to interact with learners. In our [Tutorial Design](../training/tutorials/index.md) and [Project Design](../training/projects/index.md) modules we will teach you about designing hackweek learning content, and our recommended pedagogical approaches for hackweek particiapants, within the existing framework of our tutorial and project sessions.

Next we will focus on [Technology](technology/index.md) where we will teach you about the tools we use to create centralized, web-browser accessible learning materials. You will learn to use automated workflows in GitHub to generate consistent and quality controlled Jupyter Notebooks, and we will share resources for hosting sample datasets.
Next we will focus on [Technology](../training/technology/index.md) where we will teach you about the tools we use to create centralized, web-browser accessible learning materials. You will learn to use automated workflows in GitHub to generate consistent and quality controlled Jupyter Notebooks, and we will share resources for hosting sample datasets.

In our [Strategy and Planning](strategy/index.md) module, we will teach you about the tools we use to manage our time and set clear expectations for the various tasks of the organizing team. By structuring the way we work together we can better honor the limited time each of us has to contribute, and ensure that we still meet our broad goals for the event.
In our [Strategy and Planning](../training/strategy/index.md) module, we will teach you about the tools we use to manage our time and set clear expectations for the various tasks of the organizing team. By structuring the way we work together we can better honor the limited time each of us has to contribute, and ensure that we still meet our broad goals for the event.

Finally, in our module on [Learning Culture](culture/index.md), we will provide opportunities for all of us to practice ways to support a vibrant and healthy learning environment. We will practice skills in building empathy, listening and navigating complex group dynamics.
Finally, in our module on [Learning Culture](../training/culture/index.md), we will provide opportunities for all of us to practice ways to support a vibrant and healthy learning environment. We will practice skills in building empathy, listening and navigating complex group dynamics.
69 changes: 69 additions & 0 deletions docs/reference/glossary.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# Glossaries

## Tools and Technology (general)

```{glossary}
[Conda](https://docs.conda.io)
Package, dependency and environment management for any language—Python, R,
Ruby, Lua, Scala, Java, JavaScript, C/ C++, FORTRAN, and more.
[Mamba](https://mamba.readthedocs.io)
Is an alternative package manager to conda that is fast, robust, and cross-platform.
[Conda-forge](https://conda-forge.org)
Is the main open-access repository for hosting packages that are installed via conda or mamba.
[Docker](https://www.docker.com)
Docker provides the ability to package and run an application in a loosely
isolated environment called a container. It is widely used for creating
reproducible software environments to run code on different computers.
[Git](https://git-scm.com)
A popular version control system that is used in many open source software
projects to manage their software code base.
[GitHub](https://github.com)
A service platform that allows developers to create, store, manage and share their code using the `git` command.
[GitHub Actions](https://github.com/features/actions)
Continuous integration and continuous delivery (CI/CD) GitHub feature that allows you to automate computational workflows for a GitHub repository.
[GitHub Pages](https://pages.github.com)
GitHub feature that allows you to host a website connected to a repository or organization
[Hackweek](https://uwhackweek.github.io/hackweeks-as-a-service)
Participant-driven events that strive to create welcoming spaces to learn new
things, build community and gain hands-on experience with collaboration and
team science.
[Project Jupyter](https://jupyter.org)
Project Jupyter (name derived from "JUlia PYThon and R") exists to develop
open-source software, open-standards, and services for interactive computing
across dozens of programming languages.
[Jupyter Book](https://jupyterbook.org/intro.html)
Jupyter Book is an open source project for building beautiful,
publication-quality books and documents from computational material.
[JupyterHub](https://jupyterhub.readthedocs.io)
JupyterHub allows you to deploy an application that provides remote data science environments (typically Jupyter Lab) to multiple users. It can be deployed with a cloud service provider, or on your own hardware.
[JupyterLab](https://jupyterlab.readthedocs.io)
JupyterLab is the next-generation web-based user interface for Project Jupyter
intended to replace the JupyterNotebook interface.
[Jupyter Notebook](https://jupyterbook.org)
open-source web application that allows you to create and share documents that
contain live code, equations, visualizations and narrative text.
[MyST](https://mystmd.org/guide/quickstart-myst-markdown)
Markedly Structured Text (MyST) is a rich and extensible flavor of Markdown
meant for technical documentation and publishing. It is used by Jupyter Book and Myst tools.
[Slack](https://slack.com)
A communication platform that we use to share information. We use separate channels
for each project and also rely on the video features. If possible we recommend
[downloading the Slack app](https://slack.com/downloads). If your agency does not allow
you to use the app, you can still interface with Slack in a web browser.
```
73 changes: 55 additions & 18 deletions docs/training/technology/data-management.md
Original file line number Diff line number Diff line change
@@ -1,43 +1,80 @@
# Data management

A challenging aspect of tutorial development is how to manage dataset dependencies. Ideally, the data you use will be *publicly accessible* and *permanent* for reproducibility.
A challenging aspect of tutorial development is how to manage dataset dependencies. Ideally, the data you use will be *publicly accessible* and *permanent* for reproducibility. In this section we present practical guidelines for effective data sharing for Hackweek Tutorials and Projects.

```{important}
Remember, a hackweek tutorial is *learning-oriented* and should guide participants through a step-wise process with a meaningful outcome.
Remember, a hackweek tutorial is *learning-oriented* and should guide participants through a step-wise process with a meaningful outcome. If you typically work with large datasets, consider designing your tutorial to work with a small subset (~10 MB) that still enables your learning objectives to be met.
```

If you typically work with large datasets, consider designing your tutorial to work with a small subset (~10 MB) to achieve your learning objectives. Below are some general guidelines based on past hackweeks:
## Computational resource considerations
In order for tutorial notebooks to be executable on widely available public computing infrastructure *we recommend targeting limited computational requirements such as 2-core CPU, 8 GB of RAM memory, 10 GB of disk space (at the time of writing)*

## Guidelines for Tutorials

## Make your tutorial data publicly accessible!
Try to use the smallest amount of data possible for your tutorial. If your tutorial starts with downloading data from a remote location, keep in mind that it may take longer than usual if hundreds of participants are accessing the same datasets simultaneously. Below we provide recommendations for common data volumes.

### My data is small (<10MB)
### <10MB
If your tutorial just needs a small image, or tabular data like a `.csv` file, go ahead and add it to the repository along with your tutorial code.

### My data is moderate (10 - 100 MB)
You can create a separatel repository on GitHub to publicly host your tutorial dataset. [Per GitHub repository limits](https://docs.github.com/en/repositories/working-with-files/managing-large-files/about-large-files-on-github), *individual files are required to be < 100 MB*.
Here is an [example for images](https://github.com/snowex-hackweek/tutorial-data), and here is an [example for n-dimensional arrays](https://github.com/scottyhq/zarrdata).
### 10 - 100 MB
You can create a separate repository on GitHub to publicly host your tutorial dataset. [Per GitHub repository limits](https://docs.github.com/en/repositories/working-with-files/managing-large-files/about-large-files-on-github), *individual files are required to be < 100 MB*.
* Here is an [example for images](https://github.com/snowex-hackweek/tutorial-data)
* Here is an [example for n-dimensional arrays](https://github.com/scottyhq/zarrdata).

```{note}
If using a subset be sure to capture data provenance, for example by including a script that you used to access the original full-sized dataset from the data provider.
```

### My data is cumbersome (>100 MB)
Increasingly there are ways to load remote data in a streaming fashion, which allows you to avoid download and storage! At a basic level, software can read URLs instead of local file paths, such as at the beginning of [this tutorial](https://snowex-hackweek.github.io/website/tutorials/sar/sentinel1.html#dive-right-in).
#### GitHub Release artifacts

Generally it is not advisable to store binary files in GitHub repositories. Event if you make small changes to a file, an entire new copy is saved in the revision history and the size of the repository will quickly get unwieldy.

[GitHub Releases](https://docs.github.com/en/repositories/releasing-projects-on-github/about-releases) are a feature of GitHub repositories that archive a snapshot of files in your repository *in addition to other auxiliary files*. According to official GitHub documentation:

> You can create a release to package software, along with release notes and links to binary files, for other people to use.
At the time of writing, *each file included in a release must be under 2 GiB*. So storing tutorial data as files attached to a GitHub release of tutorial code can work well to keep code and associated data together.

* Here is an [example of attaching a large 100 MB Geotiff to a release artifact](https://github.com/scottyhq/share-a-raster/releases/tag/v0.0.1)

```{note}
The GitHub Command Line Interface (CLI) provides a convenient method for downloading release data https://cli.github.com/manual/gh_release_download
```

### >100 MB

#### Stream from URLs
Increasingly there are ways to load remote data in a streaming fashion, which allows you to avoid download and storage altogether! Essential this means using software that can read URLs instead of local file paths, such as at the beginning of [this tutorial](https://snowex-hackweek.github.io/website/tutorials/sar/sentinel1.html#dive-right-in).

```{note}
Software that can read URLs still ultimately must download data! It will either be stored only in RAM, or as a temporary file on your hard drive, so be aware that you are still constrained by your local computing resources.
```

```{warning}
If your tutorial streams data directly from a data provider, check that scheduled server downtime for maintenance isn't planned during your presentation! Also, be aware that URLs can be changed at any time by the data provider.
Check that scheduled server downtime for maintenance isn't planned during your presentation! Also, be aware that URLs can be changed at any time by the data provider.
```
* Here is an [example using the Python earthaccess library](https://earthaccess.readthedocs.io/en/latest/tutorials/file-access/)

## Data permanence
If you want long-term hosting of a tutorial dataset that receives a citable Digital Object Identifier (DOI), you can use [Zenodo.org](https://about.zenodo.org).
* Libraries like Xarray can [read data directly from cloud storage](https://docs.xarray.dev/en/stable/user-guide/io.html#cloud-storage-buckets)

1. [Link your GitHub repository with data Zenodo](https://docs.github.com/en/repositories/archiving-a-github-repository/referencing-and-citing-content) *subject to GitHub repository size limits
#### Use Zenodo.org
Another approach is to upload your data on Zenodo, which at the time of writing has a standard 50 GB limit (https://library.cfa.harvard.edu/data-archiving-and-sharing).

2. [Use Zenodo directly](https://library.cfa.harvard.edu/data-archiving-and-sharing) *50 GB standard limit
```{note}
https://github.com/fatiando/pooch is a nice Python utility to fetch data from Zenodo
```

[Here](https://snowex-hackweek.github.io/website/tutorials/thermal-ir/thermal-ir-tutorial.html#) is an example tutorial that retrieves a dataset from a [Zenodo 'record'](https://zenodo.org/record/5504396)
## Data permanence considerations
Be aware that GitHub repositories can be deleted at any time by repository owners. For guaranteed long-term (10+years) hosting of a tutorial dataset that receives a citable Digital Object Identifier (DOI) you can use [Zenodo.org](https://about.zenodo.org). You can easily [link a GitHub repository with Zenodo](https://docs.github.com/en/repositories/archiving-a-github-repository/referencing-and-citing-content)

* [Here](https://snowex-hackweek.github.io/website/tutorials/thermal-ir/thermal-ir-tutorial.html#) is an example tutorial that retrieves a dataset from a [Zenodo 'record'](https://zenodo.org/record/5504396)

## Computational resource considerations
In order for tutorial notebooks to be executable on different machines, *we recommend targeting limited computational requirements such as 2-core CPU, 8 GB of RAM memory, 10 GB of disk space (at the time of writing)*
## Guidelines for Projects

### JupterHub Data Sharing

During a hackweek, teams often want to share data with each other for collaborative analysis. In contrast to tutorial datasets which are usually hand-picked, project data is dynamic and changing over time. By using a JupyterHub during a hackweek, participants can take advantage of networked storage drives and pre-configured Cloud Object Storage.

```{note}
JupyterHubs do not always have the same configuration, but we encourage you to review this guide from 2i2c which explains options for JupyterHub storage (https://docs.2i2c.org/user/topics/data/)
```
5 changes: 5 additions & 0 deletions docs/training/technology/help-me.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Where to go for help during a hackweek

With all the moving pieces, it can be hard to know where to turn for help. Check out this decision tree to help you figure out the best sources of information depending on your issue.

![ques_dec_tree](../../images/SupportDecisionTree.svg)
18 changes: 13 additions & 5 deletions docs/training/technology/index.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,19 @@
# Technology

*Paragraph providing brief overview of this module*
Hackweeks are highly technical in nature. We utilize multiple websites and software tools to facilitate full participation by all organizing team members and participants. We strive to utilize technologies that are open-source and facilitated open science - that is enabling reproducibility and wide participation.

The technological landscape evolves rapidly! We created this [glossary page](../../reference/glossary.md) to help you keep track of tools that we regularly refer to.

In this section, you will learn how to make changes to Hackweek websites via pull requests on GitHub. You will also learn how we use automated GitHub Actions workflows to generate consistent and quality controlled Jupyter Notebooks that are converted to a public website. Finally we will discuss best practices for data management when designing tutorials working collaboratively on projects during and after a hackweek.

## Learning Objectives

After completing this module, hackweek supporters will:
After completing this module, hackweek organizers and participants will:

* Be familiar with the suite of technology used during a UW Hackweek (GitHub, Jupyter Hub, Jupyter Book)
* Understand recommended tools for tutorial creation and project work
* Know where to go for technology support before and during the hackweek

* have a comprehensive understand of the suite of technology tools used to support hackweek learners
* understand how to use our technology tools within their specific supporting role (e.g. tutorial creation, project work)
* know where to go for technology support before and during the hackweek
Specific walk-throughs are provided for:
* How to effectively share data and code during and after a hackweek
* Adding yourself to the event website as a member of the organizing team
Loading

0 comments on commit d64faf5

Please sign in to comment.