From 42cbd8b0b057079e5f15770fa8e24e905c383918 Mon Sep 17 00:00:00 2001 From: Jeremy Lewi Date: Wed, 8 Apr 2020 15:55:44 -0700 Subject: [PATCH] Private GKE: Document image mirroring (#1886) * Private GKE: Document image mirroring * Add instructions for mirroring docker images to private repositories * Fix kubeflow/kubeflow#3210 * Delete instructions under private GKE and just link to the doc issue #1705 * The instructions are outdated. Since managed certificates are used there should be no reason to need to update iap-ingress.yaml anymore. * Fix #1811 * Most of the other instructions under the private GKE section are also very obsolete. * Fix indentation. * Fix indetation. * Fix indentation. * Fix alert. * More formatting fixes. * Add comment about Tekton. --- content/docs/gke/private-clusters.md | 254 ++++++++++----------------- 1 file changed, 93 insertions(+), 161 deletions(-) diff --git a/content/docs/gke/private-clusters.md b/content/docs/gke/private-clusters.md index ee3a37027f..0db6d40413 100644 --- a/content/docs/gke/private-clusters.md +++ b/content/docs/gke/private-clusters.md @@ -4,12 +4,12 @@ description = "How to secure Kubeflow clusters using VPC service controls and pr weight = 70 +++ -{{% alert title="Alpha version" color="warning" %}} +{{% alert title="Alpha" color="warning" %}} This feature is currently in **alpha** release status with limited support. The Kubeflow team is interested in any feedback you may have, in particular with regards to usability of the feature. Note the following issues already reported: -* [Documentation on how to use Kubeflow with shared VPC](https://github.com/kubeflow/kubeflow/issues/3082) +* [Documentation on how to use Kubeflow with private GKE and VPC service controls](https://github.com/kubeflow/website/issues/1705) * [Replicating Docker images to private Container Registry](https://github.com/kubeflow/kubeflow/issues/3210) * [Installing Istio for Kubeflow on private GKE](https://github.com/kubeflow/kubeflow/issues/3650) * [Profile-controller crashes on GKE private cluster](https://github.com/kubeflow/kubeflow/issues/4661) @@ -158,207 +158,139 @@ export PROJECT_NUMBER=$(gcloud projects describe ${PROJECT} --format='value(proj --add-access-levels=kubeflow \ --policy=${POLICYID} ``` -1. Set up container registry for GKE private clusters (for more info see [instructions](https://cloud.google.com/vpc-service-controls/docs/set-up-gke)): - - 1. Create a managed private zone - - ``` - export ZONE_NAME=kubeflow - export NETWORK= - gcloud beta dns managed-zones create ${ZONE_NAME} \ - --visibility=private \ - --networks=https://www.googleapis.com/compute/v1/projects/${PROJECT}/global/networks/${NETWORK} \ - --description="Kubeflow DNS" \ - --dns-name=gcr.io \ - --project=${PROJECT} - ``` - - 1. Start a transaction - - ``` - gcloud dns record-sets transaction start \ - --zone=${ZONE_NAME} \ - --project=${PROJECT} - ``` - - 1. Add a CNAME record for \*.gcr.io - - ``` - gcloud dns record-sets transaction add \ - --name=*.gcr.io. \ - --type=CNAME gcr.io. \ - --zone=${ZONE_NAME} \ - --ttl=300 \ - --project=${PROJECT} - ``` - - 1. Add an A record for the restricted VIP - ``` - gcloud dns record-sets transaction add \ - --name=gcr.io. \ - --type=A 199.36.153.4 199.36.153.5 199.36.153.6 199.36.153.7 \ - --zone=${ZONE_NAME} \ - --ttl=300 \ - --project=${PROJECT} - ``` +## Set up container registry for GKE private clusters: - 1. Commit the transaction +Follow the step belows to configure your GCR registry to be accessible from your secured clusters. +For more info see [instructions](https://cloud.google.com/vpc-service-controls/docs/set-up-gke). - ``` - gcloud dns record-sets transaction execute \ - --zone=${ZONE_NAME} \ - --project=${PROJECT} - ``` - -## Deploy Kubeflow with Private GKE +1. Create a managed private zone -1. Set user credentials. You only need to run this command once: - ``` - gcloud auth application-default login + export ZONE_NAME=kubeflow + export NETWORK= + gcloud beta dns managed-zones create ${ZONE_NAME} \ + --visibility=private \ + --networks=https://www.googleapis.com/compute/v1/projects/${PROJECT}/global/networks/${NETWORK} \ + --description="Kubeflow DNS" \ + --dns-name=gcr.io \ + --project=${PROJECT} ``` -1. Copy non-GCR hosted images to your GCR registry: - - 1. Clone the Kubeflow source - - ``` - git clone https://github.com/kubeflow/kubeflow.git git_kubeflow - ``` - 1. Use [Google Cloud Builder(GCB)](https://cloud.google.com/cloud-build/docs/) to replicate the images - ``` - cd git_kubeflow/scripts/gke - PROJECT= make copy-gcb - ``` +1. Start a transaction - * This is needed because your GKE nodes won't be able to pull images from non GCR - registries because they don't have public internet addresses - - - * gcloud may return an error even though the job is - submited successfully and will run successfully - see [kubeflow/kubeflow#3105](https://github.com/kubeflow/kubeflow/issues/3105) - - * You can use the Cloud console to monitor your GCB job. - -1. Follow the guide to [deploying Kubeflow on GCP](/docs/gke/deploy/deploy-cli/). - When you reach the - [setup and deploy step](/docs/gke/deploy/deploy-cli/#set-up-and-deploy), - **skip the `kfctl apply` command** and run the **`kfctl build`** command - instead, as described in that step. Now you can edit the configuration files - before deploying Kubeflow. Retain the environment variables that you set - during the setup, including `${KF_NAME}`, `${KF_DIR}`, and `${CONFIG_FILE}`. + ``` + gcloud dns record-sets transaction start \ + --zone=${ZONE_NAME} \ + --project=${PROJECT} + ``` -1. Enable private clusters by editing `${KF_DIR}/gcp_config/cluster-kubeflow.yaml` and updating the following two parameters: +1. Add a CNAME record for \*.gcr.io ``` - privatecluster: true - gkeApiVersion: v1beta1 + gcloud dns record-sets transaction add \ + --name=*.gcr.io. \ + --type=CNAME gcr.io. \ + --zone=${ZONE_NAME} \ + --ttl=300 \ + --project=${PROJECT} + ``` + +1. Add an A record for the restricted VIP + + ``` + gcloud dns record-sets transaction add \ + --name=gcr.io. \ + --type=A 199.36.153.4 199.36.153.5 199.36.153.6 199.36.153.7 \ + --zone=${ZONE_NAME} \ + --ttl=300 \ + --project=${PROJECT} ``` -1. Remove components which are not useful in private clusters: - Open `${KF_DIR}/kfctl_gcp_iap.v1.0.0.yaml` and remove kustomizeConfig `cert-manager`, `cert-manager-crds`, and `cert-manager-kube-system-resources`. -1. Create the deployment: +1. Commit the transaction ``` - cd ${KF_DIR} - kfctl apply -V -f ${CONFIG_FILE} - ``` + gcloud dns record-sets transaction execute \ + --zone=${ZONE_NAME} \ + --project=${PROJECT} + ``` - * If you get an error **legacy networks not supported**, follow the - [troubleshooting guide]( /docs/gke/troubleshooting-gke/#legacy-networks-are-not-supported) to create a new network. +## Mirror Kubeflow Application Images - * You will need to manually create the network as a work around for [kubeflow/kubeflow#3071](https://github.com/kubeflow/kubeflow/issues/3071) +Since private GKE can only access gcr.io, we need to mirror all images outside gcr.io for Kubeflow applications. We will use the `kfctl` tool to accomplish this. - ``` - cd ${KF_DIR}/gcp_config - gcloud --project=${PROJECT} deployment-manager deployments create ${KF_NAME}-network --config=network.yaml - ``` - * Then edit `${KF_DIR}/gcp_config/cluster.jinja` to add a field **network** in your cluster - - ``` - cluster: - name: {{ CLUSTER_NAME }} - network: - ``` - - * To get the name of the new network run - - ``` - gcloud --project=${PROJECT} compute networks list - ``` +1. Set your user credentials. You only need to run this command once: + + ``` + gcloud auth application-default login + ``` - * The name will contain the value ${KF_NAME} +1. Inside your `${KFAPP}` directory create a local configuration file `mirror.yaml` based on this [template](https://github.com/kubeflow/manifests/blob/master/experimental/mirror-images/gcp_template.yaml) -1. Update iap-ingress component parameters: + 1. Change destination to your project gcr registry. +1. Generate pipeline files to mirror images by running + ``` - cd ${KF_DIR}/kustomize - gvim iap-ingress.yaml + cd ${KFAPP} + ./kfctl alpha mirror build mirror.yaml -V -o pipeline.yaml --gcb ``` - * Find and set the `privateGKECluster` parameter to true: - - ``` - privateGKECluster: "true" - ``` + * If you want to use Tekton rather than Google Cloud Build(GCB) drop `--gcb` to emit a Tekton pipeline + * The instructions below assume you are using GCB - * Then apply your changes: +1. Edit the couldbuild.yaml file - ``` - kubectl apply -f iap-ingress.yaml - ``` + 1. In the `images` section add -1. Obtain an HTTPS certificate for your ${FQDN} and create a Kubernetes secret with it. + ``` + - //docker.io/istio/proxy_init:1.1.6 + ``` + + * Replace `/` with your registry - * You can create a self signed cert using [kube-rsa](https://github.com/kelseyhightower/kube-rsa) + 1. Under `steps` section add ``` - go get github.com/kelseyhightower/kube-rsa - kube-rsa ${FQDN} + - args: + - build + - -t + - //docker.io/istio/proxy_init:1.1.6 + - --build-arg=INPUT_IMAGE=docker.io/istio/proxy_init:1.1.6 + - . + name: gcr.io/cloud-builders/docker + waitFor: + - '-' ``` - * The fully qualified domain is the host field specified for your ingress; - you can get it by running - - ``` - cd ${KF_DIR}/kustomize - grep hostname: iap-ingress.yaml - ``` - * Then create your Kubernetes secret + 1. Remove the mirroring of cos-nvidia-installer:fixed image. You don’t need it to be replicated because this image is privately available through GKE internal repo. - ``` - kubectl create secret tls --namespace=kubeflow envoy-ingress-tls --cert=ca.pem --key=ca-key.pem - ``` + 1. Remove the images from the `images` section + 1. Remove it from the `steps` section - * An alternative option is to upgrade to GKE 1.12 or later and use - [managed certificates](https://cloud.google.com/kubernetes-engine/docs/how-to/managed-certs#migrating_to_google-managed_certificates_from_self-managed_certificates) +1. Create a cloud build job to do the mirroring - * See [kubeflow/kubeflow#3079](https://github.com/kubeflow/kubeflow/issues/3079) + ``` + gcloud builds submit --async gs://kubeflow-examples/image-replicate/replicate-context.tar.gz --project --config cloudbuild.yaml + ``` -1. Update the various kustomize manifests to use `gcr.io` images instead of Docker Hub images. +1. Update your manifests to use the mirror'd images -1. Apply all the Kubernetes resources: + ``` + kfctl alpha mirror overwrite -i pipeline.yaml + ``` - ``` - cd ${KF_DIR} - kfctl apply -V -f ${CONFIG_FILE} - ``` -1. Wait for Kubeflow to become accessible and then access it at this URL: +1. Edit file “kustomize/istio-install/base/istio-noauth.yaml”: - ``` - https://${FQDN}/ - ``` - * ${FQDN} is the host associated with your ingress + 1. Replace `docker.io/istio/proxy_init:1.16` to `gcr.io//docker.io/istio/proxy_init:1.16` + 1. Replace `docker.io/istio/proxyv2:1.1.6` to `gcr.io//docker.io/istio/proxyv2:1.1.6` - * You can get it by running `kubectl get ingress` +## Deploy Kubeflow with Private GKE - * Follow the [instructions](/docs/gke/deploy/monitor-iap-setup/) to monitor the - deployment - - * It can take 10-20 minutes for the endpoint to become fully available +{{% alert title="Coming Soon" color="warning" %}} +You can follow the issue: [Documentation on how to use Kubeflow with private GKE and VPC service controls](https://github.com/kubeflow/website/issues/1705) +{{% /alert %}} ## Next steps