Skip to content

Commit

Permalink
Add instructions and a tool for people who want to try ou a new versi…
Browse files Browse the repository at this point in the history
…on of the ingress controller before it is released.
  • Loading branch information
rramkumar1 committed Mar 14, 2018
1 parent df54af3 commit 5723efc
Show file tree
Hide file tree
Showing 8 changed files with 554 additions and 21 deletions.
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ Please read the [beta limitations](BETA_LIMITATIONS.md) doc to before using this
- It relies on a beta Kubernetes resource.
- The loadbalancer controller pod is not aware of your GCE quota.

**If you are running a cluster on GKE and interested in trying out alpha releases of the GLBC before they are officially released please visit the deploy/glbc/ directory.**

## Overview

__A reminder on GCE L7__: Google Compute Engine does not have a single resource that represents a L7 loadbalancer. When a user request comes in, it is first handled by the global forwarding rule, which sends the traffic to an HTTP proxy service that sends the traffic to a URL map that parses the URL to see which backend service will handle the request. Each backend service is assigned a set of virtual machine instances grouped into instance groups.
Expand Down
118 changes: 118 additions & 0 deletions deploy/glbc/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
# Overview

Welcome, you are reading this because you want to run an alpha version of the
GCP Ingress Controller (GLBC) before it is officially released! The purpose of this is to
allow users to find bugs and report them, while also getting early access to improvements and
new features. You will notice that the following things are sitting in this directory:

1. script.sh
2. gce.conf
3. yaml/

We will explain what each of these things mean in a bit. However, you will only be interacting
with one file (script.sh).

**Disclaimer: Running this script could potentially be disruptive to traffic. It
is not advisable to run this on a production cluster. Furthermore you should
refrain from contacting GKE support if there are issues. Run at your own risk.**

# Important Prerequisites

There are two prerequisite steps that needs to be taken before running the script.

Run the command below:

`gcloud config list`

You will need to ensure a couple things. First, ensure that the project listed
under [core]/project is the one that contains your cluster. Second, ensure that
the current logged-in account listed under [core]/account is the owner of the project.
In other words, this account should have full project ownership permissions and be listed as
having the "Owner" role on the IAM page of your GCP project. You might ask why this
is needed? Well, the reason is that the script invokes a kubectl command which
creates a new k8s RBAC role (see below for explanation why). In order to do this, the
current user must be the project owner. The "Owner" role also gives the account
permission to do basically anything so all commands the script runs should
theoretically work. If not, the script will do its best to fail gracefully
and let you know what might have went wrong.

The second step is to make sure you populate the gce.conf file. The instructions
for populating the file are in the file itself. You just have to fill it in.

# Important Details

Most likely, you want to know what this script is doing to your cluster in order
to run the new controller and why it is doing it. If you do not, care then you
can go ahead and skip this section.

Here is a brief summary of each major thing we do and why:

1. Turn off GLBC and turn on new GLBC in the cluster
* To be brief, the maintenance cost of running a new controller on the master
is actually pretty high. This is why we chose to move the controller
to the cluster.
1. Create a new k8s RBAC role
* On the master, the GLBC has unauthenticated access to the k8s API server.
Once we move the GLBC to the cluster, that path is gone. Therefore, we need to
configure a new RBAC role that allows GLBC the same access.
3. Create new GCP service account + key
* On the master, the GLBC is authorized to use the GCP Compute API through a
token pulled from a private GKE endpoint. Moving to the cluster will result in
us not being able to utilize this. Therefore, we need to create a new GCP
service account and a corresponding key which will grant access.
4. Start new GLBC (and default backend) in the cluster
* As stated before, we need to run the GLBC in the cluster. We also need to
startup a new default backend because the mechanism we use to turn off the
master GLBC removes both the GLBC and the default backend.
* Because we have to recreate the default backend, there will be a small
segment of time when requests to the default backend will time out.

The script is commented heavily, so it should be pretty easy to follow along
with what we described above.

## Dependencies

As promised, here is an explanation of each script dependency.

1. gce.conf
* This file normally sits on the GKE master and provides important config for
the GCP Compute API client within the GLBC. The GLBC is configured to know
where to look for this file. In this case, we simply mount the file as a
volume and tell GLBC to look for it there.
2. yaml/default-http-backend.yaml
* This file contains the specifications for both the default-http-backend
deployment and service. This is no different than what you are used to
seeing in your cluster. In this case, we need to recreate the default
backend since turning off the GLBC on the master removes it.
3. yaml/rbac.yaml
* This file contains specification for an RBAC role which gives the GLBC
access to the resources it needs from the k8s API server.
4. yaml/glbc.yaml
* This file contains the specification for the GLBC deployment. Notice that in
this case, we need a deployment because we want to preserve the controller
in case of node restarts.

Take a look at the script to understand where each file is used.

# Running the Script

Run the command below to see the usage:

`./script.sh --help`

After that, it should be self-explanatory!

# Common Issues

One common issue is that the script outputs an error indicating that something
went wrong with permissions. The quick fix for this is to make sure that during
the execution of the script, the logged-in account you see in gcloud should be
the project owner. We say for the duration of the script because potentially
the role for the account could change mid-execution (ex. fat finger in GCP UI).
issue.

If you have issues with the controller after the script execution and you do not
know what it causing it, invoke the script in its cleanup mode. The is a quick
and simple way of going back to how everything was before.


7 changes: 7 additions & 0 deletions deploy/glbc/gce.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
[global]
token-url = nil
project-id = YOUR CLUSTER'S PROJECT
network = YOUR CLUSTER'S NETWORK
subnetwork = YOUR CLUSTER'S SUBNETWORK
node-instance-prefix = gke-YOUR CLUSTER'S NAME
node-tags = NETWORK TAGS FOR YOUR CLUSTER'S INSTANCE GROUP
238 changes: 238 additions & 0 deletions deploy/glbc/script.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,238 @@
*/
Copyright 2018 The Kubernetes Authors.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

#!/bin/bash

function usage() {
echo -e "Usage: ./script.sh -n myCluster -z myZone [-c] [-r]\n"
echo " -c, --cleanup Cleanup resources created by a previous run of the script"
echo " -n, --cluster-name Name of the cluster (Required)"
echo " -z, --zone Zone the cluster is in (Required)"
echo -e " --help Display this help and exit"
exit
}

function arg_check {
# Check that the necessary arguments were provided and that they are correct.
if [[ -z "$ZONE" || -z "$CLUSTER_NAME" ]];
then
usage
fi
# Get gcloud credentials for the cluster so kubectl works automatically.
# Any error/typo in the required command line args will be caught here.
gcloud container clusters get-credentials ${CLUSTER_NAME} --zone=${ZONE}
[[ $? -eq 0 ]] || error_exit "Error-bot: Command line arguments were incorrect. See above error for more info."
}

function error_exit {
echo -e "${RED}$1${NC}" >&2
exit 1
}

function cleanup() {
arg_check
# Get the project id associated with the cluster.
PROJECT_ID=`gcloud config list --format 'value(core.project)' 2>/dev/null`
# Cleanup k8s and GCP resources in same order they are created.
# Note: The GCP service account key needs to be manually cleaned up.
# Note: We don't delete the default-http-backend we created so that when the
# GLBC is restored on the GKE master, the addon manager does not try to create a
# new one.
kubectl delete clusterrolebinding one-binding-to-rule-them-all
kubectl delete -f yaml/rbac.yaml
kubectl delete configmap gce-config -n kube-system
gcloud iam service-accounts delete glbc-service-account@${PROJECT_ID}.iam.gserviceaccount.com
gcloud projects remove-iam-policy-binding ${PROJECT_ID} \
--member serviceAccount:glbc-service-account@${PROJECT_ID}.iam.gserviceaccount.com \
--role roles/compute.admin
kubectl delete secret glbc-gcp-key -n kube-system
kubectl delete -f yaml/glbc.yaml
# Ask if user wants to reenable GLBC on the GKE master.
while true; do
echo -e "${GREEN}Script-bot: Do you want to reenable GLBC on the GKE master?${NC}"
echo -e "${GREEN}Script-bot: Press [C | c] to continue.${NC}"
read input
case $input in
[Cc]* ) break;;
* ) echo -e "${GREEN}Script-bot: Press [C | c] to continue.${NC}"
esac
done
gcloud container clusters update ${CLUSTER_NAME} --zone=${ZONE} --update-addons=HttpLoadBalancing=ENABLED
echo -e "${GREEN}Script-bot: Cleanup successful! You need to cleanup your GCP service account key manually.${NC}"
exit 0
}

RED='\033[0;31m'
GREEN='\033[0;32m'
NC='\033[0m'
CLEANUP_HELP="Invoking me with the -c option will get you back to a clean slate."
NO_CLEANUP="Nothing has to be cleaned up :)"
PERMISSION_ISSUE="If this looks like a permissions problem, see the README."

# Parsing command line arguments
while [[ $# -gt 0 ]]
do
key="$1"

case $key in
--help)
usage
shift
shift
;;
-c|--cleanup)
cleanup
shift
shift
;;
-n|--cluster-name)
CLUSTER_NAME=$2
shift
shift
;;
-z|--zone)
ZONE=$2
shift
;;
*)
shift
;;
esac
done

arg_check

# Check that the gce.conf is valid for the cluster
NODE_INSTANCE_PREFIX=`cat gce.conf | grep node-instance-prefix | awk '{print $3}'`
[[ "$NODE_INSTANCE_PREFIX" == "gke-${CLUSTER_NAME}" ]] || error_exit "Error bot: --cluster-name does not match gce.conf. ${NO_CLEANUP}"

# Get the project id associated with the cluster.
PROJECT_ID=`gcloud config list --format 'value(core.project)' 2>/dev/null`
# Store the nodePort for default-http-backend
NODE_PORT=`kubectl get svc default-http-backend -n kube-system -o yaml | grep "nodePort:" | cut -f2- -d:`
# Get the GCP user associated with the current gcloud config.
GCP_USER=`gcloud config list --format 'value(core.account)' 2>/dev/null`

# Grant permission to current GCP user to create new k8s ClusterRole's.
kubectl create clusterrolebinding one-binding-to-rule-them-all --clusterrole=cluster-admin --user=${GCP_USER}
[[ $? -eq 0 ]] || error_exit "Error-bot: Issue creating a k8s ClusterRoleBinding. ${PERMISSION_ISSUE} ${NO_CLEANUP}"

# Create a new service account for glbc and give it a
# ClusterRole allowing it access to API objects it needs.
kubectl create -f yaml/rbac.yaml
[[ $? -eq 0 ]] || error_exit "Error-bot: Issue creating the RBAC spec. ${CLEANUP_HELP}"

# Inject gce.conf onto the user node as a ConfigMap.
# This config map is mounted as a volume in glbc.yaml
kubectl create configmap gce-config --from-file=gce.conf -n kube-system
[[ $? -eq 0 ]] || error_exit "Error-bot: Issue creating gce.conf ConfigMap. ${CLEANUP_HELP}"

# Create new GCP service acccount.
gcloud iam service-accounts create glbc-service-account \
--display-name "Service Account for GLBC"
[[ $? -eq 0 ]] || error_exit "Error-bot: Issue creating a GCP service account. ${PERMISSION_ISSUE} ${CLEANUP_HELP}"

# Give the GCP service account the appropriate roles.
gcloud projects add-iam-policy-binding ${PROJECT_ID} \
--member serviceAccount:glbc-service-account@${PROJECT_ID}.iam.gserviceaccount.com \
--role roles/compute.admin
[[ $? -eq 0 ]] || error_exit "Error-bot: Issue creating IAM role binding for service account. ${PERMISSION_ISSUE} ${CLEANUP_HELP}"

# Create key for the GCP service account.
gcloud iam service-accounts keys create \
key.json \
--iam-account glbc-service-account@${PROJECT_ID}.iam.gserviceaccount.com
[[ $? -eq 0 ]] || error_exit "Error-bot: Issue creating GCP service account key. ${PERMISSION_ISSUE} ${CLEANUP_HELP}"

# Store the key as a secret in k8s. This secret is mounted
# as a volume in glbc.yaml
kubectl create secret generic glbc-gcp-key --from-file=key.json -n kube-system
if [[ $? -eq 1 ]];
then
error_exit "Error-bot: Issue creating a k8s secret from GCP service account key. ${PERMISSION_ISSUE} ${CLEANUP_HELP}"
fi
rm key.json

# Turn off the glbc running on the GKE master. This will not only delete the
# glbc pod, but it will also delete the default-http-backend
# deployment + service.
gcloud container clusters update ${CLUSTER_NAME} --zone=${ZONE} --update-addons=HttpLoadBalancing=DISABLED
[[ $? -eq 0 ]] || error_exit "Error-bot: Issue turning of GLBC. ${PERMISSION_ISSUE} ${CLEANUP_HELP}"

# Approximate amount of time it takes the API server to start accepting all
# requests.
sleep 90
# In case the previous sleep was not enough, prompt user so that they can choose
# when to proceed.
while true; do
echo -e "${GREEN}Script-bot: Before proceeding, please ensure your API server is accepting all requests.
Failure to do so may result in the script creating a broken state."
echo -e "${GREEN}Script-bot: Press [C | c] to continue.${NC}"
read input
case $input in
[Cc]* ) break;;
* ) echo -e "${GREEN}Script-bot: Press [C | c] to continue.${NC}"
esac
done

# Recreate the default-http-backend k8s service with the same NodePort as the
# service which was removed when turning of the glbc previously. This is to
# ensure that a brand new NodePort is not created.

# Wait till old service is removed
while true; do
kubectl get svc -n kube-system | grep default-http-backend &>/dev/null
if [[ $? -eq 1 ]];
then
break
fi
sleep 5
done
# Wait till old glbc pod is removed
while true; do
kubectl get pod -n kube-system | grep default-backend &>/dev/null
if [[ $? -eq 1 ]];
then
break
fi
sleep 5
done

# Recreates the deployment and service for the default backend.
sed -i "/name: http/a \ \ \ \ nodePort: ${NODE_PORT}" default-http-backend.yaml
kubectl create -f yaml/default-http-backend.yaml
if [[ $? -eq 1 ]];
then
# Prompt the user to finish the last steps by themselves. We don't want to
# have to cleanup and start all over again if we are this close to finishing.
error_exit "Error-bot: Issue starting default backend. ${PERMISSION_ISSUE}. We are so close to being done so just manually start the default backend with NodePort: ${NODE_PORT} and create glbc.yaml when ready"
fi

# Startup glbc
kubectl create -f yaml/glbc.yaml
[[ $? -eq 0 ]] || manual_glbc_provision
if [[ $? -eq 1 ]];
then
# Same idea as above, although this time we only need to prompt the user to start the glbc.
error_exit: "Error_bot: Issue starting GLBC. ${PERMISSION_ISSUE}. We are so close to being done so just manually create glbc.yaml when ready"
fi

# Do a final verification that the NodePort stayed the same for the
# default-http-backend.
NEW_NODE_PORT=`kubectl get svc default-http-backend -n kube-system -o yaml | grep "nodePort:" | cut -f2- -d:`
[[ "$NEW_NODE_PORT" == "$NODE_PORT" ]] || error_exit "Error-bot: The NodePort for the new default-http-backend service is different than the original. Please recreate this service with NodePort: ${NODE_PORT} or traffic to this service will time out."

echo -e "${GREEN}Script-bot: I'm done!${NC}"
Loading

0 comments on commit 5723efc

Please sign in to comment.