Skip to content

Commit

Permalink
Private GKE: Document image mirroring (#1886)
Browse files Browse the repository at this point in the history
* Private GKE: Document image mirroring

* Add instructions for mirroring docker images to private repositories
  * Fix kubeflow/kubeflow#3210

* Delete instructions under private GKE and just link to the doc issue #1705

  * The instructions are outdated. Since managed certificates are used there
    should be no reason to need to update iap-ingress.yaml anymore.
    * Fix #1811

  * Most of the other instructions under the private GKE section are also
    very obsolete.

* Fix indentation.

* Fix indetation.

* Fix indentation.

* Fix alert.

* More formatting fixes.

* Add comment about Tekton.
  • Loading branch information
jlewi authored Apr 8, 2020
1 parent ae886e4 commit 42cbd8b
Showing 1 changed file with 93 additions and 161 deletions.
254 changes: 93 additions & 161 deletions content/docs/gke/private-clusters.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,12 @@ description = "How to secure Kubeflow clusters using VPC service controls and pr
weight = 70
+++

{{% alert title="Alpha version" color="warning" %}}
{{% alert title="Alpha" color="warning" %}}
This feature is currently in **alpha** release status with limited support. The
Kubeflow team is interested in any feedback you may have, in particular with
regards to usability of the feature. Note the following issues already reported:

* [Documentation on how to use Kubeflow with shared VPC](https://github.com/kubeflow/kubeflow/issues/3082)
* [Documentation on how to use Kubeflow with private GKE and VPC service controls](https://github.com/kubeflow/website/issues/1705)
* [Replicating Docker images to private Container Registry](https://github.com/kubeflow/kubeflow/issues/3210)
* [Installing Istio for Kubeflow on private GKE](https://github.com/kubeflow/kubeflow/issues/3650)
* [Profile-controller crashes on GKE private cluster](https://github.com/kubeflow/kubeflow/issues/4661)
Expand Down Expand Up @@ -158,207 +158,139 @@ export PROJECT_NUMBER=$(gcloud projects describe ${PROJECT} --format='value(proj
--add-access-levels=kubeflow \
--policy=${POLICYID}
```
1. Set up container registry for GKE private clusters (for more info see [instructions](https://cloud.google.com/vpc-service-controls/docs/set-up-gke)):

1. Create a managed private zone

```
export ZONE_NAME=kubeflow
export NETWORK=<Network you are using for your cluster>
gcloud beta dns managed-zones create ${ZONE_NAME} \
--visibility=private \
--networks=https://www.googleapis.com/compute/v1/projects/${PROJECT}/global/networks/${NETWORK} \
--description="Kubeflow DNS" \
--dns-name=gcr.io \
--project=${PROJECT}
```

1. Start a transaction

```
gcloud dns record-sets transaction start \
--zone=${ZONE_NAME} \
--project=${PROJECT}
```

1. Add a CNAME record for \*.gcr.io

```
gcloud dns record-sets transaction add \
--name=*.gcr.io. \
--type=CNAME gcr.io. \
--zone=${ZONE_NAME} \
--ttl=300 \
--project=${PROJECT}
```

1. Add an A record for the restricted VIP

```
gcloud dns record-sets transaction add \
--name=gcr.io. \
--type=A 199.36.153.4 199.36.153.5 199.36.153.6 199.36.153.7 \
--zone=${ZONE_NAME} \
--ttl=300 \
--project=${PROJECT}
```
## Set up container registry for GKE private clusters:

1. Commit the transaction
Follow the step belows to configure your GCR registry to be accessible from your secured clusters.
For more info see [instructions](https://cloud.google.com/vpc-service-controls/docs/set-up-gke).

```
gcloud dns record-sets transaction execute \
--zone=${ZONE_NAME} \
--project=${PROJECT}
```

## Deploy Kubeflow with Private GKE
1. Create a managed private zone

1. Set user credentials. You only need to run this command once:

```
gcloud auth application-default login
export ZONE_NAME=kubeflow
export NETWORK=<Network you are using for your cluster>
gcloud beta dns managed-zones create ${ZONE_NAME} \
--visibility=private \
--networks=https://www.googleapis.com/compute/v1/projects/${PROJECT}/global/networks/${NETWORK} \
--description="Kubeflow DNS" \
--dns-name=gcr.io \
--project=${PROJECT}
```
1. Copy non-GCR hosted images to your GCR registry:

1. Clone the Kubeflow source

```
git clone https://github.com/kubeflow/kubeflow.git git_kubeflow
```
1. Use [Google Cloud Builder(GCB)](https://cloud.google.com/cloud-build/docs/) to replicate the images

```
cd git_kubeflow/scripts/gke
PROJECT=<PROJECT> make copy-gcb
```
1. Start a transaction

* This is needed because your GKE nodes won't be able to pull images from non GCR
registries because they don't have public internet addresses


* gcloud may return an error even though the job is
submited successfully and will run successfully
see [kubeflow/kubeflow#3105](https://github.com/kubeflow/kubeflow/issues/3105)

* You can use the Cloud console to monitor your GCB job.

1. Follow the guide to [deploying Kubeflow on GCP](/docs/gke/deploy/deploy-cli/).
When you reach the
[setup and deploy step](/docs/gke/deploy/deploy-cli/#set-up-and-deploy),
**skip the `kfctl apply` command** and run the **`kfctl build`** command
instead, as described in that step. Now you can edit the configuration files
before deploying Kubeflow. Retain the environment variables that you set
during the setup, including `${KF_NAME}`, `${KF_DIR}`, and `${CONFIG_FILE}`.
```
gcloud dns record-sets transaction start \
--zone=${ZONE_NAME} \
--project=${PROJECT}
```

1. Enable private clusters by editing `${KF_DIR}/gcp_config/cluster-kubeflow.yaml` and updating the following two parameters:
1. Add a CNAME record for \*.gcr.io

```
privatecluster: true
gkeApiVersion: v1beta1
gcloud dns record-sets transaction add \
--name=*.gcr.io. \
--type=CNAME gcr.io. \
--zone=${ZONE_NAME} \
--ttl=300 \
--project=${PROJECT}
```

1. Add an A record for the restricted VIP

```
gcloud dns record-sets transaction add \
--name=gcr.io. \
--type=A 199.36.153.4 199.36.153.5 199.36.153.6 199.36.153.7 \
--zone=${ZONE_NAME} \
--ttl=300 \
--project=${PROJECT}
```
1. Remove components which are not useful in private clusters:

Open `${KF_DIR}/kfctl_gcp_iap.v1.0.0.yaml` and remove kustomizeConfig `cert-manager`, `cert-manager-crds`, and `cert-manager-kube-system-resources`.
1. Create the deployment:
1. Commit the transaction

```
cd ${KF_DIR}
kfctl apply -V -f ${CONFIG_FILE}
```
gcloud dns record-sets transaction execute \
--zone=${ZONE_NAME} \
--project=${PROJECT}
```

* If you get an error **legacy networks not supported**, follow the
[troubleshooting guide]( /docs/gke/troubleshooting-gke/#legacy-networks-are-not-supported) to create a new network.
## Mirror Kubeflow Application Images

* You will need to manually create the network as a work around for [kubeflow/kubeflow#3071](https://github.com/kubeflow/kubeflow/issues/3071)
Since private GKE can only access gcr.io, we need to mirror all images outside gcr.io for Kubeflow applications. We will use the `kfctl` tool to accomplish this.

```
cd ${KF_DIR}/gcp_config
gcloud --project=${PROJECT} deployment-manager deployments create ${KF_NAME}-network --config=network.yaml
```

* Then edit `${KF_DIR}/gcp_config/cluster.jinja` to add a field **network** in your cluster

```
cluster:
name: {{ CLUSTER_NAME }}
network: <name of the new network>
```
* To get the name of the new network run

```
gcloud --project=${PROJECT} compute networks list
```
1. Set your user credentials. You only need to run this command once:

```
gcloud auth application-default login
```

* The name will contain the value ${KF_NAME}
1. Inside your `${KFAPP}` directory create a local configuration file `mirror.yaml` based on this [template](https://github.com/kubeflow/manifests/blob/master/experimental/mirror-images/gcp_template.yaml)

1. Update iap-ingress component parameters:
1. Change destination to your project gcr registry.

1. Generate pipeline files to mirror images by running

```
cd ${KF_DIR}/kustomize
gvim iap-ingress.yaml
cd ${KFAPP}
./kfctl alpha mirror build mirror.yaml -V -o pipeline.yaml --gcb
```

* Find and set the `privateGKECluster` parameter to true:

```
privateGKECluster: "true"
```
* If you want to use Tekton rather than Google Cloud Build(GCB) drop `--gcb` to emit a Tekton pipeline
* The instructions below assume you are using GCB

* Then apply your changes:
1. Edit the couldbuild.yaml file

```
kubectl apply -f iap-ingress.yaml
```
1. In the `images` section add

1. Obtain an HTTPS certificate for your ${FQDN} and create a Kubernetes secret with it.
```
- <registry domain>/<project_id>/docker.io/istio/proxy_init:1.1.6
```
* Replace `<registry domain>/<project_id>` with your registry

* You can create a self signed cert using [kube-rsa](https://github.com/kelseyhightower/kube-rsa)
1. Under `steps` section add

```
go get github.com/kelseyhightower/kube-rsa
kube-rsa ${FQDN}
- args:
- build
- -t
- <registry domain>/<project id>/docker.io/istio/proxy_init:1.1.6
- --build-arg=INPUT_IMAGE=docker.io/istio/proxy_init:1.1.6
- .
name: gcr.io/cloud-builders/docker
waitFor:
- '-'
```
* The fully qualified domain is the host field specified for your ingress;
you can get it by running

```
cd ${KF_DIR}/kustomize
grep hostname: iap-ingress.yaml
```

* Then create your Kubernetes secret
1. Remove the mirroring of cos-nvidia-installer:fixed image. You don’t need it to be replicated because this image is privately available through GKE internal repo.

```
kubectl create secret tls --namespace=kubeflow envoy-ingress-tls --cert=ca.pem --key=ca-key.pem
```
1. Remove the images from the `images` section
1. Remove it from the `steps` section

* An alternative option is to upgrade to GKE 1.12 or later and use
[managed certificates](https://cloud.google.com/kubernetes-engine/docs/how-to/managed-certs#migrating_to_google-managed_certificates_from_self-managed_certificates)
1. Create a cloud build job to do the mirroring

* See [kubeflow/kubeflow#3079](https://github.com/kubeflow/kubeflow/issues/3079)
```
gcloud builds submit --async gs://kubeflow-examples/image-replicate/replicate-context.tar.gz --project <project_id> --config cloudbuild.yaml
```

1. Update the various kustomize manifests to use `gcr.io` images instead of Docker Hub images.
1. Update your manifests to use the mirror'd images

1. Apply all the Kubernetes resources:
```
kfctl alpha mirror overwrite -i pipeline.yaml
```

```
cd ${KF_DIR}
kfctl apply -V -f ${CONFIG_FILE}
```
1. Wait for Kubeflow to become accessible and then access it at this URL:
1. Edit file “kustomize/istio-install/base/istio-noauth.yaml”:

```
https://${FQDN}/
```
* ${FQDN} is the host associated with your ingress
1. Replace `docker.io/istio/proxy_init:1.16` to `gcr.io/<project_id>/docker.io/istio/proxy_init:1.16`
1. Replace `docker.io/istio/proxyv2:1.1.6` to `gcr.io/<project_id>/docker.io/istio/proxyv2:1.1.6`

* You can get it by running `kubectl get ingress`
## Deploy Kubeflow with Private GKE

* Follow the [instructions](/docs/gke/deploy/monitor-iap-setup/) to monitor the
deployment

* It can take 10-20 minutes for the endpoint to become fully available
{{% alert title="Coming Soon" color="warning" %}}
You can follow the issue: [Documentation on how to use Kubeflow with private GKE and VPC service controls](https://github.com/kubeflow/website/issues/1705)
{{% /alert %}}

## Next steps

Expand Down

0 comments on commit 42cbd8b

Please sign in to comment.