Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplifying documentation and shared responsibility documentation #174

Merged
merged 4 commits into from
Nov 30, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -133,4 +133,7 @@ _build/

# Docs data
environments.txt
build_assets
build_assets
images/shared_responsibility_diagram.png
images/collaborative_learning_hub.png
images/scalable_research_hub.png
21 changes: 21 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,24 @@ This repository serves as the user-facing documentation and communication space
Most of the infrastructure that we discuss in the documentation is deployed [in the `infrastructure/` repository](https://github.com/2i2c-org/infrastructure).

See [the service documentation](https://docs.2i2c.org) for more information.

## How to preview this documentation

To preview this documentation, use the `Nox` tool.
First install it:

```
pip install nox
```

To build the documentation and place the HTML files in `_build/html`:

```
nox -s docs
```

To build the documentation with a server that **watches for changes and auto-builds the documentation with a preview**, run the following:

```
nox -s docs -- live
```
5 changes: 3 additions & 2 deletions about/distributions/education.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
(hub-types:education)=
# Collaborative learning hub
# Education and teaching

The 2i2c Educational Hubs provide learning environments and infrastructure that is meant for teaching data science.
These hubs are inspired by 2i2c's experience with the [DataHubs at UC Berkeley](https://docs.datahub.berkeley.edu/en/latest/) and the [Syzygy service](https://syzygy.ca/) for Canada.
Expand All @@ -11,7 +11,8 @@ This hub deployment is designed for distributed learning for students with a var

Below is a diagram that showcases some of the major components of this hub:

```{figure} https://drive.google.com/uc?export=download&id=1Mr51-s3D_KHPsAuTXbczaQ7mlPZUs9gm
% automatically downloaded in conf.py
```{figure} /images/collaborative_learning_hub.png

A high level overview of major components in a collaborative learning hub.
```
Expand Down
86 changes: 52 additions & 34 deletions about/distributions/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,39 +3,18 @@
2i2c builds and operates **distributions of JupyterHubs** that are tailored for particular use-cases.
These services share many of the same infrastructure components, but have customizations and optimizations that are more domain- or community-specific.

:::{note}
Our services are in an "alpha" state - we are still learning a lot about the best way that these hubs can serve communities in research and education.
The infrastructure and service may change over the coming months!
See [our strategy page](../strategy/index.md) for an overview of what we're hoping to do and where we're headed next.
:::
```{figure} https://drive.google.com/uc?export=download&id=1vL8ekAtUQ4TEik4-oWIn36VAOITdlmpR
:width: 80%

For more information about specific hub distributions, see the links below.
Otherwise, read onward for high-level information about all of our Managed JupyterHubs.

## What technology makes up each hub?

🚀 core infrastructure
: Underneath each 2i2c JupyterHub is a [JupyterHub](https://jupyter.org/hub). These provide interactive computing sessions for each of your users, and connect to the other infrastructure in the cloud. We use [`auth0`](https://auth0.com/) and [CILogon](https://www.cilogon.org/) for authenticating users, which can connect to a number of other authentication protocols (such as OAuth2).

💻 interfaces
: Each 2i2c JupyterHub has two main interactive interfaces: Jupyter interfaces (Notebook and Lab), and RStudio. Each of them is accessible from your session via `/tree`, `/lab`, and `/rstudio` endpoints in your URL.

🌄 environment
: Your 2i2c JupyterHub has an environment that has been created for your particular use-case. It exists as a Docker image that your JupyterHub loads when a user starts a new session. These images can either be built with the tool [repo2docker](https://repo2docker.readthedocs.io/), or pulled directly from a Docker registry. The environment also comes pre-loaded with some tools that are helpful for working with JupyterHub, such as [nbgitpuller](https://jupyterhub.github.io/nbgitpuller). See [](environment/custom) for more information.

🤖 hardware
: 2i2c JupyterHubs can run on most major cloud providers - the primary thing that is needed is a working Kubernetes deployment. By default, 2i2c runs its hubs on Google Cloud, but if communities wish to use a different provider, this can be accomplished as well. This also means that the hardware underlying the Kubernetes deployment is configurable.

📦 data
: The data that is used by your 2i2c JupyterHub is provided by you! 2i2c JupyterHubs can connect with a variety of public data sources. We recommend using standard data structures or specifications via libraries like [Intake](https://intake.readthedocs.io/en/latest/). Note that 2i2c does not host this data itself, but can build connections between 2i2c JupyterHubs and these data sources.
A high-level technical overview of an Interactive Computing Service collaboratively run by 2i2c and a community of practice. Each hub is a JupyterHub Distribution with a collection of community-led open source projects that are customized for a particular use-case.
```

## Features of each hub

Here is a brief overview of the major features that are present in each.

```{csv-table}
:header-rows: 1
:widths: 20, 70, 5, 5
:widths: auto
:file: ../../build_assets/feature-matrix.csv
```

Expand All @@ -58,27 +37,66 @@ Here is a brief overview of the major features that are present in each.
}
</style>

(note-on-urls)=
## Where are hubs accessed?

By default all 2i2c JupyterHub get their own URL with the following form:
## JupyterHub in the cloud

At the core of a community service is one or more [JupyterHubs](https://jupyter.org/hub) that provide an access point for interactive computing and cloud infrastructure for your community members.

You may access your community JupyterHub at a URL with the following form (though you may choose a custom URL if you wish):

```
<hub-name>.<community-name>.2i2c.cloud
```

Each 2i2c JupyterHub has a **hub name** (denoted by `<hub-name>`) and a **community name** (denoted by `<community-name>`). Communities are collections of hubs around a particular community or collaboration. Each community infrastructure may be run by different teams. For more information, see [](../service/team.md).
JupyterHub provides interactive computing sessions for each of your users, and connect to the other infrastructure in the cloud.
Our JupyterHubs can run on Google Cloud, Amazon AWS, or Microsoft Azure.

## Authentication

We use [`auth0`](https://auth0.com/) and [CILogon](https://www.cilogon.org/) for authenticating users, which can connect to a number of other authentication protocols (such as OAuth2).

## User interfaces

It is also possible to provide your own URL that points to a 2i2c JupyterHub.
Each 2i2c JupyterHub has two main interactive interfaces: Jupyter interfaces (Notebook and Lab), and RStudio. Each of them is accessible from your session via `/tree`, `/lab`, and `/rstudio` endpoints in your URL.

## Data outside of the hub
## Custom user environments

If you wish to access data that exists outside of your 2i2c Hub, it is your responsibility to put this data in the cloud and manage the infrastructure around it. 2i2c does not control this data, it merely provides access to it via your hub infrastructure.
Your 2i2c JupyterHub has an environment that has been created for your particular use-case. It exists as a Docker image that your JupyterHub loads when a user starts a new session. These images can either be built with the tool [repo2docker](https://repo2docker.readthedocs.io/), or pulled directly from a Docker registry. The environment also comes pre-loaded with some tools that are helpful for working with JupyterHub, such as [nbgitpuller](https://jupyterhub.github.io/nbgitpuller). See [](environment/custom) for more information.

## Where are hubs configured and deployed?
## Transparent infrastructure and operations

All of the configuration and deployment scripts for the 2i2c JupyterHub can be found at [the `infrastructure/` repository](https://github.com/2i2c-org/infrastructure). This repository contains both the deployment code as well as documentation that explains how it works. It should be treated as "for advanced users only", and is provided for transparency and as a guide for the community to follow if they wish to manage their own infrastructure similar to 2i2c JupyterHub.

To learn about how the `infrastructure/` repository works, we recommend checking out the [`infrastructure` documentation](infra:index).

See the next sections for more information about each hub distribution.

## Secure out of the box

The cloud infrastructure that we manage follows best-practices in deploying cloud applications in a secure manner.
The [Zero to JupyterHub Helm Chart](https://zero-to-jupyterhub.readthedocs.io/en/latest/) is the community standard in deploying JupyterHub in the cloud, and is what 2i2c uses in all of its cloud hubs.
This project follows the principle of "secure by default", and has a number of configuration and design decisions that properly isolate user environments from one another, and prevent them from being able to access resources or data that is forbidden to them.

As members of the JupyterHub team, we are constantly looking for ways to improve [the security of Zero to JupyterHub](https://zero-to-jupyterhub.readthedocs.io/en/latest/administrator/security.html), and use our experience running these hubs to further improve JupyterHub's security.

### Data privacy

2i2c will not collect user data for any purpose beyond what is required in order to run a JupyterHub.
Depending on the choices of your community the hub might contain identifiable information (e.g., e-mail addresses used as usernames for authentication), but this will remain within your hub's configuration and is not shared publicly.

Our {role}`Site Reliability Engineer`s will have access to all of the information that is inside a hub (which it requires in order to debug problems and and assist with upgrades), however we will not retain any of this data or move it *outside* of the hub, and will not retain it once the hub is shut down (except in order to transfer data to you at your request).

## Monitored for abuse and unexpected costs

We deploy [Grafana Dashboards](https://grafana.com/grafana/dashboards/) along with a [Prometheus Server](https://prometheus.io/) to continuously monitor the usage across all of our hubs.
This provides visual dashboards that allow us to identify abnormal behavior on a hub (such as a single user using unusual amounts of RAM, using a lot of CPU, or making unusual networking requests).

### Cryptocurrency mining

Cryptocurrency mining abuse occurs when users take advantage of cloud CPU in order to make money by mining cryptocurrency.
It is a common problem with cloud-based services and platforms.

There are many different cryptocurrencies out there, but the most common by-far for abuse is [the Monero cryptocurrency](https://www.getmonero.org/) due to its anonymous nature.

We deploy an open-source tool called [`cryptnono`](https://github.com/yuvipanda/cryptnono) to each of the clusters we manage.
This tool monitors any process that runs on the 2i2c hubs, and automatically kills any that are associated with Monero.
5 changes: 3 additions & 2 deletions about/distributions/research.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
(hub-types:scalable-research)=
# Scalable computing hub
# Research and collaboration

Scalable computing hubs are designed to let researchers and data scientists leverage cloud infrastructure to facilitate collaboration and interactive computation.
They are heavily inspired by [the Pangeo Community infrastructure](https://pangeo.io).
Expand All @@ -10,7 +10,8 @@ This hub deployment is designed for researchers and teams that wish to do their

Below is a diagram that showcases some of the major components of this hub:

```{figure} https://drive.google.com/uc?export=download&id=1gWAIQVKcB-uxuJsBHqlDlRTq88oki1zn
% automatically downloaded in conf.py
```{figure} /images/scalable_research_hub.png

A high level overview of major components in a scalable computing hub.
```
Expand Down
18 changes: 0 additions & 18 deletions about/infrastructure/index.md

This file was deleted.

35 changes: 0 additions & 35 deletions about/infrastructure/security.md

This file was deleted.

Loading