-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #62 from uwhackweek/tech-module
Update technology training
- Loading branch information
Showing
19 changed files
with
266 additions
and
98 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -18,8 +18,7 @@ jb build docs | |
|
||
## Contact | ||
|
||
* [Anthony Arendt](mailto:[email protected]) | ||
* [Scott Henderson](mailto:[email protected]) | ||
* [email eScience](mailto:[email protected]) | ||
|
||
## License | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
# Glossaries | ||
|
||
## Tools and Technology (general) | ||
|
||
```{glossary} | ||
[Conda](https://docs.conda.io) | ||
Package, dependency and environment management for any language—Python, R, | ||
Ruby, Lua, Scala, Java, JavaScript, C/ C++, FORTRAN, and more. | ||
[Mamba](https://mamba.readthedocs.io) | ||
Is an alternative package manager to conda that is fast, robust, and cross-platform. | ||
[Conda-forge](https://conda-forge.org) | ||
Is the main open-access repository for hosting packages that are installed via conda or mamba. | ||
[Docker](https://www.docker.com) | ||
Docker provides the ability to package and run an application in a loosely | ||
isolated environment called a container. It is widely used for creating | ||
reproducible software environments to run code on different computers. | ||
[Git](https://git-scm.com) | ||
A popular version control system that is used in many open source software | ||
projects to manage their software code base. | ||
[GitHub](https://github.com) | ||
A service platform that allows developers to create, store, manage and share their code using the `git` command. | ||
[GitHub Actions](https://github.com/features/actions) | ||
Continuous integration and continuous delivery (CI/CD) GitHub feature that allows you to automate computational workflows for a GitHub repository. | ||
[GitHub Pages](https://pages.github.com) | ||
GitHub feature that allows you to host a website connected to a repository or organization | ||
[Hackweek](https://uwhackweek.github.io/hackweeks-as-a-service) | ||
Participant-driven events that strive to create welcoming spaces to learn new | ||
things, build community and gain hands-on experience with collaboration and | ||
team science. | ||
[Project Jupyter](https://jupyter.org) | ||
Project Jupyter (name derived from "JUlia PYThon and R") exists to develop | ||
open-source software, open-standards, and services for interactive computing | ||
across dozens of programming languages. | ||
[Jupyter Book](https://jupyterbook.org/intro.html) | ||
Jupyter Book is an open source project for building beautiful, | ||
publication-quality books and documents from computational material. | ||
[JupyterHub](https://jupyterhub.readthedocs.io) | ||
JupyterHub allows you to deploy an application that provides remote data science environments (typically Jupyter Lab) to multiple users. It can be deployed with a cloud service provider, or on your own hardware. | ||
[JupyterLab](https://jupyterlab.readthedocs.io) | ||
JupyterLab is the next-generation web-based user interface for Project Jupyter | ||
intended to replace the JupyterNotebook interface. | ||
[Jupyter Notebook](https://jupyterbook.org) | ||
open-source web application that allows you to create and share documents that | ||
contain live code, equations, visualizations and narrative text. | ||
[MyST](https://mystmd.org/guide/quickstart-myst-markdown) | ||
Markedly Structured Text (MyST) is a rich and extensible flavor of Markdown | ||
meant for technical documentation and publishing. It is used by Jupyter Book and Myst tools. | ||
[Slack](https://slack.com) | ||
A communication platform that we use to share information. We use separate channels | ||
for each project and also rely on the video features. If possible we recommend | ||
[downloading the Slack app](https://slack.com/downloads). If your agency does not allow | ||
you to use the app, you can still interface with Slack in a web browser. | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,43 +1,80 @@ | ||
# Data management | ||
|
||
A challenging aspect of tutorial development is how to manage dataset dependencies. Ideally, the data you use will be *publicly accessible* and *permanent* for reproducibility. | ||
A challenging aspect of tutorial development is how to manage dataset dependencies. Ideally, the data you use will be *publicly accessible* and *permanent* for reproducibility. In this section we present practical guidelines for effective data sharing for Hackweek Tutorials and Projects. | ||
|
||
```{important} | ||
Remember, a hackweek tutorial is *learning-oriented* and should guide participants through a step-wise process with a meaningful outcome. | ||
Remember, a hackweek tutorial is *learning-oriented* and should guide participants through a step-wise process with a meaningful outcome. If you typically work with large datasets, consider designing your tutorial to work with a small subset (~10 MB) that still enables your learning objectives to be met. | ||
``` | ||
|
||
If you typically work with large datasets, consider designing your tutorial to work with a small subset (~10 MB) to achieve your learning objectives. Below are some general guidelines based on past hackweeks: | ||
## Computational resource considerations | ||
In order for tutorial notebooks to be executable on widely available public computing infrastructure *we recommend targeting limited computational requirements such as 2-core CPU, 8 GB of RAM memory, 10 GB of disk space (at the time of writing)* | ||
|
||
## Guidelines for Tutorials | ||
|
||
## Make your tutorial data publicly accessible! | ||
Try to use the smallest amount of data possible for your tutorial. If your tutorial starts with downloading data from a remote location, keep in mind that it may take longer than usual if hundreds of participants are accessing the same datasets simultaneously. Below we provide recommendations for common data volumes. | ||
|
||
### My data is small (<10MB) | ||
### <10MB | ||
If your tutorial just needs a small image, or tabular data like a `.csv` file, go ahead and add it to the repository along with your tutorial code. | ||
|
||
### My data is moderate (10 - 100 MB) | ||
You can create a separatel repository on GitHub to publicly host your tutorial dataset. [Per GitHub repository limits](https://docs.github.com/en/repositories/working-with-files/managing-large-files/about-large-files-on-github), *individual files are required to be < 100 MB*. | ||
Here is an [example for images](https://github.com/snowex-hackweek/tutorial-data), and here is an [example for n-dimensional arrays](https://github.com/scottyhq/zarrdata). | ||
### 10 - 100 MB | ||
You can create a separate repository on GitHub to publicly host your tutorial dataset. [Per GitHub repository limits](https://docs.github.com/en/repositories/working-with-files/managing-large-files/about-large-files-on-github), *individual files are required to be < 100 MB*. | ||
* Here is an [example for images](https://github.com/snowex-hackweek/tutorial-data) | ||
* Here is an [example for n-dimensional arrays](https://github.com/scottyhq/zarrdata). | ||
|
||
```{note} | ||
If using a subset be sure to capture data provenance, for example by including a script that you used to access the original full-sized dataset from the data provider. | ||
``` | ||
|
||
### My data is cumbersome (>100 MB) | ||
Increasingly there are ways to load remote data in a streaming fashion, which allows you to avoid download and storage! At a basic level, software can read URLs instead of local file paths, such as at the beginning of [this tutorial](https://snowex-hackweek.github.io/website/tutorials/sar/sentinel1.html#dive-right-in). | ||
#### GitHub Release artifacts | ||
|
||
Generally it is not advisable to store binary files in GitHub repositories. Event if you make small changes to a file, an entire new copy is saved in the revision history and the size of the repository will quickly get unwieldy. | ||
|
||
[GitHub Releases](https://docs.github.com/en/repositories/releasing-projects-on-github/about-releases) are a feature of GitHub repositories that archive a snapshot of files in your repository *in addition to other auxiliary files*. According to official GitHub documentation: | ||
|
||
> You can create a release to package software, along with release notes and links to binary files, for other people to use. | ||
At the time of writing, *each file included in a release must be under 2 GiB*. So storing tutorial data as files attached to a GitHub release of tutorial code can work well to keep code and associated data together. | ||
|
||
* Here is an [example of attaching a large 100 MB Geotiff to a release artifact](https://github.com/scottyhq/share-a-raster/releases/tag/v0.0.1) | ||
|
||
```{note} | ||
The GitHub Command Line Interface (CLI) provides a convenient method for downloading release data https://cli.github.com/manual/gh_release_download | ||
``` | ||
|
||
### >100 MB | ||
|
||
#### Stream from URLs | ||
Increasingly there are ways to load remote data in a streaming fashion, which allows you to avoid download and storage altogether! Essential this means using software that can read URLs instead of local file paths, such as at the beginning of [this tutorial](https://snowex-hackweek.github.io/website/tutorials/sar/sentinel1.html#dive-right-in). | ||
|
||
```{note} | ||
Software that can read URLs still ultimately must download data! It will either be stored only in RAM, or as a temporary file on your hard drive, so be aware that you are still constrained by your local computing resources. | ||
``` | ||
|
||
```{warning} | ||
If your tutorial streams data directly from a data provider, check that scheduled server downtime for maintenance isn't planned during your presentation! Also, be aware that URLs can be changed at any time by the data provider. | ||
Check that scheduled server downtime for maintenance isn't planned during your presentation! Also, be aware that URLs can be changed at any time by the data provider. | ||
``` | ||
* Here is an [example using the Python earthaccess library](https://earthaccess.readthedocs.io/en/latest/tutorials/file-access/) | ||
|
||
## Data permanence | ||
If you want long-term hosting of a tutorial dataset that receives a citable Digital Object Identifier (DOI), you can use [Zenodo.org](https://about.zenodo.org). | ||
* Libraries like Xarray can [read data directly from cloud storage](https://docs.xarray.dev/en/stable/user-guide/io.html#cloud-storage-buckets) | ||
|
||
1. [Link your GitHub repository with data Zenodo](https://docs.github.com/en/repositories/archiving-a-github-repository/referencing-and-citing-content) *subject to GitHub repository size limits | ||
#### Use Zenodo.org | ||
Another approach is to upload your data on Zenodo, which at the time of writing has a standard 50 GB limit (https://library.cfa.harvard.edu/data-archiving-and-sharing). | ||
|
||
2. [Use Zenodo directly](https://library.cfa.harvard.edu/data-archiving-and-sharing) *50 GB standard limit | ||
```{note} | ||
https://github.com/fatiando/pooch is a nice Python utility to fetch data from Zenodo | ||
``` | ||
|
||
[Here](https://snowex-hackweek.github.io/website/tutorials/thermal-ir/thermal-ir-tutorial.html#) is an example tutorial that retrieves a dataset from a [Zenodo 'record'](https://zenodo.org/record/5504396) | ||
## Data permanence considerations | ||
Be aware that GitHub repositories can be deleted at any time by repository owners. For guaranteed long-term (10+years) hosting of a tutorial dataset that receives a citable Digital Object Identifier (DOI) you can use [Zenodo.org](https://about.zenodo.org). You can easily [link a GitHub repository with Zenodo](https://docs.github.com/en/repositories/archiving-a-github-repository/referencing-and-citing-content) | ||
|
||
* [Here](https://snowex-hackweek.github.io/website/tutorials/thermal-ir/thermal-ir-tutorial.html#) is an example tutorial that retrieves a dataset from a [Zenodo 'record'](https://zenodo.org/record/5504396) | ||
|
||
## Computational resource considerations | ||
In order for tutorial notebooks to be executable on different machines, *we recommend targeting limited computational requirements such as 2-core CPU, 8 GB of RAM memory, 10 GB of disk space (at the time of writing)* | ||
## Guidelines for Projects | ||
|
||
### JupterHub Data Sharing | ||
|
||
During a hackweek, teams often want to share data with each other for collaborative analysis. In contrast to tutorial datasets which are usually hand-picked, project data is dynamic and changing over time. By using a JupyterHub during a hackweek, participants can take advantage of networked storage drives and pre-configured Cloud Object Storage. | ||
|
||
```{note} | ||
JupyterHubs do not always have the same configuration, but we encourage you to review this guide from 2i2c which explains options for JupyterHub storage (https://docs.2i2c.org/user/topics/data/) | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
# Where to go for help during a hackweek | ||
|
||
With all the moving pieces, it can be hard to know where to turn for help. Check out this decision tree to help you figure out the best sources of information depending on your issue. | ||
|
||
![ques_dec_tree](../../images/SupportDecisionTree.svg) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,11 +1,19 @@ | ||
# Technology | ||
|
||
*Paragraph providing brief overview of this module* | ||
Hackweeks are highly technical in nature. We utilize multiple websites and software tools to facilitate full participation by all organizing team members and participants. We strive to utilize technologies that are open-source and facilitated open science - that is enabling reproducibility and wide participation. | ||
|
||
The technological landscape evolves rapidly! We created this [glossary page](../../reference/glossary.md) to help you keep track of tools that we regularly refer to. | ||
|
||
In this section, you will learn how to make changes to Hackweek websites via pull requests on GitHub. You will also learn how we use automated GitHub Actions workflows to generate consistent and quality controlled Jupyter Notebooks that are converted to a public website. Finally we will discuss best practices for data management when designing tutorials working collaboratively on projects during and after a hackweek. | ||
|
||
## Learning Objectives | ||
|
||
After completing this module, hackweek supporters will: | ||
After completing this module, hackweek organizers and participants will: | ||
|
||
* Be familiar with the suite of technology used during a UW Hackweek (GitHub, Jupyter Hub, Jupyter Book) | ||
* Understand recommended tools for tutorial creation and project work | ||
* Know where to go for technology support before and during the hackweek | ||
|
||
* have a comprehensive understand of the suite of technology tools used to support hackweek learners | ||
* understand how to use our technology tools within their specific supporting role (e.g. tutorial creation, project work) | ||
* know where to go for technology support before and during the hackweek | ||
Specific walk-throughs are provided for: | ||
* How to effectively share data and code during and after a hackweek | ||
* Adding yourself to the event website as a member of the organizing team |
Oops, something went wrong.