Merge branch 'main' into google-cloud-python

nebari-dev · Nov 13, 2024 · 2d4e23d · 2d4e23d
2 parents 4e644be + 859ee45
commit 2d4e23d
Show file tree

Hide file tree

Showing 6 changed files with 304 additions and 4 deletions.
diff --git a/docs/docs/explanations/advanced-provider-configuration.md b/docs/docs/explanations/advanced-provider-configuration.md
@@ -98,6 +98,103 @@ amazon_web_services:
   permissions_boundary: arn:aws:iam::01234567890:policy/<permissions-boundary-policy-name>
 ```
 
+### EKS KMS ARN (Optional)
+
+You can use AWS Key Management Service (KMS) to enhance security by encrypting Kubernetes secrets in
+Amazon Elastic Kubernetes Service (EKS). This approach adds an extra layer of protection for sensitive
+information, like passwords, credentials, and TLS keys, by applying user-managed encryption keys to Kubernetes
+secrets, supporting a [defense-in-depth strategy](https://aws.amazon.com/blogs/containers/using-eks-encryption-provider-support-for-defense-in-depth/).
+
+Nebari supports setting an existing KMS key while deploying Nebari to implement encryption of secrets
+created in Nebari's EKS cluster. The KMS key must be a **Symmetric** key set to **encrypt and decrypt** data.
+
+:::warning
+Enabling EKS cluster secrets encryption, by setting `amazon_web_services.eks_kms_arn`, is an
+_irreversible_ action and re-deploying Nebari to try to remove a previously set `eks_kms_arn` will fail.
+On the other hand, if you try to change the KMS key in use for cluster encryption, by re-deploying Nebari
+after setting a _different_ key ARN, the re-deploy should succeed but the KMS key used for encryption will
+not actually change in the cluster config and the original key will remain set. The integrity of a faulty
+deployment can be restored, following a failed re-deploy attempt to remove a previously set KMS key, by
+simply re-deploying Nebari while ensuring `eks_kms_arn` is set to the original KMS key ARN.
+:::
+
+:::danger
+If the KMS key used for envelope encryption of secrets is ever deleted, then there is no way to recover
+the EKS cluster.
+:::
+
+:::note
+After enabling cluster encryption on your cluster, you must encrypt all existing secrets with the
+new key by running the following command:
+`kubectl get secrets --all-namespaces -o json | kubectl annotate --overwrite -f - kms-encryption-timestamp="time value"`
+Consult [Encrypt K8s secrets with AWS KMS on existing clusters](https://docs.aws.amazon.com/eks/latest/userguide/enable-kms.html) for more information.
+:::
+
+Here is an example of how you would set KMS key ARN in `nebari-config.yaml`.
+
+```yaml
+amazon_web_services:
+  # the arn for the AWS Key Management Service key
+  eks_kms_arn: "arn:aws:kms:us-west-2:01234567890:key/<aws-kms-key-id>"
+```
+
+### Launch Templates (Optional)
+
+Nebari supports configuring launch templates for your node groups, enabling you to customize settings like the AMI ID and pre-bootstrap commands. This is particularly useful if you need to use a custom AMI or perform specific actions before the node joins the cluster.
+
+:::warning
+If you add a `launch_template` to an existing node group that was previously created without one, AWS will treat this as a change requiring the replacement of the entire node group. This action will trigger a reallocation of resources, effectively destroying the current node group and recreating it. This behavior is due to how AWS handles self-managed node groups versus those using launch templates with custom settings.
+:::
+
+:::tip
+To avoid unexpected downtime or data loss, consider creating a new node group with the launch template settings and migrating your workloads accordingly. This approach allows you to implement the new configuration without disrupting your existing resources.
+:::
+
+#### Configuring a Launch Template
+
+To configure a launch template for a node group in your `nebari-config.yaml`, add the `launch_template` section under the desired node group:
+
+```yaml
+amazon_web_services:
+  region: us-west-2
+  kubernetes_version: "1.18"
+  node_groups:
+    custom-node-group:
+      instance: "m5.large"
+      min_nodes: 1
+      max_nodes: 5
+      gpu: false  # Set to true if using GPU instances
+      launch_template:
+        # Replace with your custom AMI ID
+        ami_id: ami-0abcdef1234567890
+        # Command to run before the node joins the cluster
+        pre_bootstrap_command: |
+          #!/bin/bash
+          # This script is executed before the node is bootstrapped
+          # You can use this script to install additional packages or configure the node
+          # For example, to install the `htop` package, you can run:
+          # sudo apt-get update
+          # sudo apt-get install -y htop"
+```
+
+**Parameters:**
+
+- `ami_id` (Optional): The ID of the custom AMI to use for the nodes in this group; this assumes the AMI provided is an EKS-optimized AMI derivative. If specified, the `ami_type` is automatically set to `CUSTOM`.
+- `pre_bootstrap_command` (Optional): A command or script to execute on the node before
+  it joins the Kubernetes cluster. This can be used for custom setup or configuration
+  tasks. The format should be a single string in conformation with the shell syntax.
+  This command is injected in the `user_data` field of the launch template. For more
+  information, see [User Data](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/user-data.html).
+
+> If you're using a `launch_template` with a custom `ami_id`, there's an issue with updating the `scaling.desired_size` via Nebari configuration (terraform). To scale up, you must recreate the node group or adjust the scaling settings directly in the AWS Console UI (recommended). We are aware of this inconsistency and plan to address it in a future update.
+
+:::note
+If an `ami_id` is not provided, AWS will use the default Amazon Linux 2 AMI for the
+specified instance type. You can find the latest optimized AMI IDs for Amazon EKS in your
+cluster region by inspecting its respective SSM parameters. For more information, see
+[Retrieve recommended Amazon Linux AMI IDs](https://docs.aws.amazon.com/eks/latest/userguide/retrieve-ami-id.html).
+:::
+
 </TabItem>
 
 <TabItem value="azure" label="Azure">

diff --git a/docs/docs/references/RELEASE.md b/docs/docs/references/RELEASE.md
@@ -9,7 +9,9 @@ This file is copied to nebari-dev/nebari-docs using a GitHub Action. -->
 
 ---
 
-## Release 2024.9.1 - September 27, 2024
+## Release 2024.9.1 - September 27, 2024 (Broken Release)
+
+> WARNING: This release was later found to have unresolved issues described further in [issue 2798](https://github.com/nebari-dev/nebari/issues/2798). We have marked this release as broken on conda-forge and yanked it on PyPI. One of the bugs prevents any upgrade from 2024.9.1 to 2024.11.1. Users should skip this release entirely and upgrade directly from 2024.7.1 to 2024.11.1.
 
 > WARNING: This release changes how group directories are mounted in JupyterLab pods: only groups with specific permissions will have their directories mounted. If you rely on custom group mounts, we strongly recommend running `nebari upgrade` before updating. This will prompt you to confirm how Nebari should handle your groups—either keep them mounted or allow unmounting. **No data will be lost**, and you can reverse this anytime.
 

diff --git a/docs/docs/references/container-sources.md b/docs/docs/references/container-sources.md
@@ -0,0 +1,138 @@
+## Deploying and Running Nebari from a Private Container Repository
+
+Nebari deploys and runs FOSS components as containers running in Kubernetes.
+By default, Nebari sources each container from the container's respective public repository, typically `docker.io` or `quay.io`.
+This introduces supply-chain concerns for security-focused customers.
+
+One solution to these supply-chain concerns is to deploy Nebari from private locally-mirrored containers:
+
+- Create a controlled private container repository (e.g. ECR)
+- Mirror all containers used by Nebari into this private container repository
+- Use the `pre_bootstrap_command` mechanism in `nebari-config.yaml` to specify the mirrored container repo
+
+Deploying Nebari in this fashion eliminates significant supply chain surface-area, but requires identifying all containers used by Nebari.
+
+The following configurations demonstrate how to specify a private repo denoted by the string `[PRIVATE_REPO]`.
+
+**Note:** Authorization tokens are used in the examples below. It is important for administrators to understand the expiration policy of these tokens, because the Nebari k8s cluster may in some cases need to **use these tokens to pull container images at any time during run-time operation**.
+
+### Set ECR as default container registry mirror
+
+```
+amazon_web_services:
+  node_groups:
+    general:
+      instance: m5.2xlarge
+      launch_template:
+        pre_bootstrap_command: |
+            #!/bin/bash
+            # Verify that IP forwarding is enabled for worker nodes, as is required for containerd
+            if [[ $(sysctl net.ipv4.ip_forward | grep "net.ipv4.ip_forward = 1") ]]; then echo "net.ipv4.ip_forward is on"; else sysctl -w net.ipv4.ip_forward=1; fi
+            # Set ECR as default container registry mirror
+            mkdir -p /etc/containerd/certs.d/_default
+            ECR_TOKEN="$(aws ecr get-login-password --region us-east-1)"
+            BASIC_AUTH="$(echo -n "AWS:$ECR_TOKEN" | base64 -w 0)"
+            cat <<-EOT > /etc/containerd/certs.d/_default/hosts.toml
+            [host."https://[PRIVATE_REPO].dkr.ecr.us-east-1.amazonaws.com"]
+              capabilities = ["pull", "resolve"]
+              [host."https://[PRIVATE_REPO].dkr.ecr.us-east-1.amazonaws.com".header]
+                authorization = "Basic $BASIC_AUTH"
+            EOT
+
+```
+
+### Set GitLab CR as default container registry mirror
+
+```
+# Set GitLab CR as default container registry mirror in hosts.toml;
+# must have override_path set if project/group names don't match upstream container
+amazon_web_services:
+  node_groups:
+    general:
+      instance: m5.2xlarge
+      launch_template:
+        pre_bootstrap_command: |
+            #!/bin/bash
+            # Verify that IP forwarding is enabled for worker nodes, as is required for containerd
+            if [[ $(sysctl net.ipv4.ip_forward | grep "net.ipv4.ip_forward = 1") ]]; then echo "net.ipv4.ip_forward is on"; else sysctl -w net.ipv4.ip_forward=1; fi
+            # Set default container registry mirror in hosts.toml; must have override_path set if project/group names don't match upstream container
+            CONTAINER_REGISTRY_URL="[PRIVATE_REPO]"
+            CONTAINER_REGISTRY_USERNAME="[username]"
+            CONTAINER_REGISTRY_TOKEN="[token]"
+            CONTAINER_REGISTRY_GROUP=as-nebari
+            CONTAINER_REGISTRY_PROJECT=nebari-test
+            mkdir -p /etc/containerd/certs.d/_default
+            cat <<-EOT > /etc/containerd/certs.d/_default/hosts.toml
+            [host."https://$CONTAINER_REGISTRY_URL/v2/$CONTAINER_REGISTRY_GROUP/$CONTAINER_REGISTRY_PROJECT"]
+              override_path = true
+              capabilities = ["pull", "resolve"]
+            EOT
+
+            # Set containerd registry config auth in config.d .toml import dir
+            mkdir -p /etc/containerd/config.d
+            cat <<EOT | sudo tee /etc/containerd/config.d/config-import.toml
+            version = 2
+            [plugins."io.containerd.grpc.v1.cri".registry]
+              config_path = "/etc/containerd/certs.d:/etc/docker/certs.d"
+              [plugins."io.containerd.grpc.v1.cri".registry.auths]
+              [plugins."io.containerd.grpc.v1.cri".registry.configs]
+                [plugins."io.containerd.grpc.v1.cri".registry.configs."$CONTAINER_REGISTRY_URL".auth]
+                  username = "$CONTAINER_REGISTRY_USERNAME"
+                  password = "$CONTAINER_REGISTRY_TOKEN"
+            EOT
+```
+
+### Set GitLab CR as default container registry mirror, with custom Client SSL/TLS Certs
+
+```
+# must have override_path set if project/group names don't match upstream container
+# Also add/set GitLab Client SSL/TLS Certificate for Containerd
+amazon_web_services:
+  node_groups:
+    general:
+      instance: m5.2xlarge
+      launch_template:
+        pre_bootstrap_command: |
+            #!/bin/bash
+            # Verify that IP forwarding is enabled for worker nodes, as is required for containerd
+            if [[ $(sysctl net.ipv4.ip_forward | grep "net.ipv4.ip_forward = 1") ]]; then echo "net.ipv4.ip_forward is on"; else sysctl -w net.ipv4.ip_forward=1; fi
+            # Set default container registry mirror in hosts.toml; must have override_path set if project/group names don't match upstream container
+            CONTAINER_REGISTRY_URL="[PRIVATE_REPO]"
+            CONTAINER_REGISTRY_USERNAME="[username]"
+            CONTAINER_REGISTRY_TOKEN="[token]"
+            CONTAINER_REGISTRY_GROUP=as-nebari
+            CONTAINER_REGISTRY_PROJECT=nebari-test
+            mkdir -p /etc/containerd/certs.d/_default
+            cat <<-EOT > /etc/containerd/certs.d/_default/hosts.toml
+            [host."https://$CONTAINER_REGISTRY_URL/v2/$CONTAINER_REGISTRY_GROUP/$CONTAINER_REGISTRY_PROJECT"]
+              override_path = true
+              capabilities = ["pull", "resolve"]
+              client = ["/etc/containerd/certs.d/$CONTAINER_REGISTRY_URL/client.pem"]
+            EOT
+
+            # Set containerd registry config auth in config.d .toml import dir
+            mkdir -p /etc/containerd/config.d
+            cat <<EOT | sudo tee /etc/containerd/config.d/config-import.toml
+            version = 2
+            [plugins."io.containerd.grpc.v1.cri".registry]
+              config_path = "/etc/containerd/certs.d:/etc/docker/certs.d"
+              [plugins."io.containerd.grpc.v1.cri".registry.auths]
+              [plugins."io.containerd.grpc.v1.cri".registry.configs]
+                [plugins."io.containerd.grpc.v1.cri".registry.configs."$CONTAINER_REGISTRY_URL".auth]
+                  username = "$CONTAINER_REGISTRY_USERNAME"
+                  password = "$CONTAINER_REGISTRY_TOKEN"
+            EOT
+
+            # Add client key/cert to containerd
+            mkdir -p /etc/containerd/certs.d/$CONTAINER_REGISTRY_URL
+            cat <<-EOT >> /etc/containerd/certs.d/$CONTAINER_REGISTRY_URL/client.pem
+            -----BEGIN CERTIFICATE-----
+            XzxzxzxzxxzxzxzxzxzxzxzxxzxzxzxzxzxzxzxxzxzxzxzxzxzxzxzxzxxzxzZx
+            ZxyzxzxzxxzxzxzxzxzxzxzxxzxzxzxzxzxzxzxxzxzxzxzxzxzxzxzxzxxzxzXz
+            -----END CERTIFICATE-----
+            -----BEGIN PRIVATE KEY-----
+            XzxzxzxzxxzxzxzxzxzxzxzxxzxzxzxzxzxzxzxxzxzxzxzxzxzxzxzxzxxzxzZx
+            ZxyzxzxzxxzxzxzxzxzxzxzxxzxzxzxzxzxzxzxxzxzxzxzxzxzxzxzxzxxzxzXz
+            -----END PRIVATE KEY-----
+            EOT
+```
diff --git a/docs/docs/references/enhanced-security.md b/docs/docs/references/enhanced-security.md
@@ -0,0 +1,62 @@
+## Nebari Security Considerations
+
+The security of _AWS Nebari_ deployments can be enhanced through the following deployment configuration options in `nebari-config.yaml`:
+
+- **Explicit definition of container sources**  
+  This option allows for the use of locally mirrored, security-hardened, or otherwise customized container images in place of the containers used by default.
+  See: [container-sources](container-sources.md)
+
+- **Installation of custom SSL certificate(s) into EKS hosts**  
+  Install private certificates used by (e.g.) in-line content inspection engines which re-encrypt traffic.
+
+```
+# Add client certificate to CA trust on node
+amazon_web_services:
+  node_groups:
+    general:
+      instance: m5.2xlarge
+      launch_template:
+        pre_bootstrap_command: |
+            #!/bin/bash
+            cat <<-EOT >> /etc/pki/ca-trust/source/anchors/client.pem
+            -----BEGIN CERTIFICATE-----
+            XzxzxzxzxxzxzxzxzxzxzxzxxzxzxzxzxzxzxzxxzxzxzxzxzxzxzxzxzxxzxzZx
+            ZxyzxzxzxxzxzxzxzxzxzxzxxzxzxzxzxzxzxzxxzxzxzxzxzxzxzxzxzxxzxzXz
+            -----END CERTIFICATE-----
+            EOT
+            sudo update-ca-trust extract
+```
+
+- **Private EKS endpoint configuration**  
+  Mirrors the corresponding AWS console option, which routes all EKS traffic within the VPC.
+
+```
+  amazon_web_services:
+    eks_endpoint_access: private # valid values: [public, private, public_and_private]
+```
+
+- **Deploy into existing subnets**  
+  Instructs Nebari to be deployed into existing subnets, rather than creating its own new subnets.
+  An advantage of deploying to existing subnets is the ability to use private subnets. Note that the **ingress load-balancer-annotation** must be set appropriately based on the type (private or public) of subnet.
+
+```
+existing_subnet_ids:
+    - subnet-0123456789abcdef
+    - subnet-abcdef0123456789
+  existing_security_group_id: sg-0123456789abcdef
+ingress:
+  terraform_overrides:
+    load-balancer-annotations:
+      service.beta.kubernetes.io/aws-load-balancer-internal: "true"
+      # Ensure the subnet IDs are also set below
+      service.beta.kubernetes.io/aws-load-balancer-subnets: "subnet-0123456789abcdef,subnet-abcdef0123456789"
+```
+
+- **Use existing SSL certificate**  
+  Instructs Nebari to use the SSL certificate specified by `[k8s-custom-secret-name]`
+
+```
+certificate:
+  type: existing
+  secret_name: [k8s-custom-secret-name]
+```
diff --git a/docs/docs/references/index.mdx b/docs/docs/references/index.mdx
@@ -11,6 +11,8 @@ import {useCurrentSidebarCategory} from '@docusaurus/theme-common';
   />
 </div>
 
-Nitty-gritty technical descriptions of how Nebari works.
+Technical descriptions of how Nebari works.
 
+- [Enhanced Security](enhanced-security.md) - Nebari security configuration guide
+- [Local Container Repo](container-sources.md) - Deploying Nebari from a Local Container Repo 
 <DocCardList items={useCurrentSidebarCategory().items}/>
diff --git a/docs/nebari-slurm/configuration.md b/docs/nebari-slurm/configuration.md
@@ -186,8 +186,7 @@ _Note_: All slurm related configuration needs to be passed down as a string.
 ### Services
 
 Additional services can be added to the `jupyterhub_services`
-variable. Currently this is only `<service-name>:
-<service-apikey>`. You must keep the `dask_gateway` section.
+variable. Currently this is only `<service-name>: <service-apikey>`. You must keep the `dask_gateway` section.
 
 ```yaml
 jupyterhub_services: