Fix master regex when running multiple clusters #58561

jesseshieh · 2018-01-20T02:47:35Z

What this PR does / why we need it:
I'm running two Kubernetes clusters on GCE. One for production and one for staging. The instance prefix I use for production is kubernetes and for staging it's staging-kubernetes. This caused a problem when running kube-up.sh for production because when it tries to find all instances which match kubernetes(-...)? it finds both the production and staging instances. This probably results in multiple problems, but the most noticeable one for me was that INITIAL_ETCD_CLUSTER was incorrect and so etcd wouldn't start up correctly so the api server doesn't start up correctly so nothing else starts up. I tested this manually and it seems to work for me, but I didn't write an automated test.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

Release note:

Fixes bug finding master replicas in GCE when running multiple Kubernetes clusters

I'm running two Kubernetes clusters on GCE. One for production and one for staging. The instance prefix I use for production is `kubernetes` and for staging it's `staging-kubernetes`. This caused a problem when running `kube-up.sh` for production because when it tries to find all instances which match `kubernetes(-...)?` it finds both the production and staging instances. This probably results in multiple problems, but the most noticeable one for me was that I`NITIAL_ETCD_CLUSTER` was incorrect and so etcd wouldn't start up correctly so the api server doesn't start up correctly so nothing else starts up. I tested this manually and it seems to work for me, but I didn't write an automated test.

resouer · 2018-01-20T04:18:16Z

/ok-to-test

resouer · 2018-01-20T04:19:15Z

cluster/gce/util.sh

@@ -1868,7 +1868,7 @@ function get-master-replicas-count() {
 # Prints regexp for full master machine name. In a cluster with replicated master,
 # VM names may either be MASTER_NAME or MASTER_NAME with a suffix for a replica.
 function get-replica-name-regexp() {
-  echo "${MASTER_NAME}(-...)?"
+  echo "^${MASTER_NAME}(-...)?"


why not use kubernetes-staging in your own env? :)

Good point :) Had I known I was gonna run into this issue I would have totally done that! I'm just hoping this will help someone else down the line.

I'm not sure I understand this change. What's the motivation of it?

I wrote a little bit about it in the PR description above, but basically, I'm running a production cluster and staging cluster and I have GCE instances named kubernetes-master-001, kubernetes-master-002, etc and more GCE instances named staging-kubernetes-master-001, staging-kubernetes-master-002, etc.

When running kube-up.sh to create another production master, the INITIAL_ETCD_CLUSTER contains the production and staging instances when it should only be production instances. As a result, etcd fails to startup with an errors like member count unequal -> the api server fails to startup -> nothing else works and the newly created master is broken.

This change makes sure that the INITIAL_ETCD_CLUSTER only contains the production master instances.

jesseshieh · 2018-01-21T06:56:01Z

/assign @jszczepkowski

jesseshieh · 2018-01-24T01:16:18Z

@wojtek-t any thoughts on this? I think it should be pretty harmless and could help some people down the line in similar situations as mine.

wojtek-t · 2018-01-24T01:35:32Z

OK - that makes sense.

/lgtm
/approve no-issue

k8s-ci-robot · 2018-01-24T01:35:39Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jesseshieh, wojtek-t

Associated issue requirement bypassed by: wojtek-t

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

~~cluster/gce/OWNERS~~ [wojtek-t]

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

fejta-bot · 2018-01-24T04:56:24Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

fejta-bot · 2018-01-24T07:44:23Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

fejta-bot · 2018-01-24T10:32:23Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

fejta-bot · 2018-01-24T14:02:24Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

k8s-github-robot · 2018-01-24T14:48:30Z

/test all [submit-queue is verifying that this PR is safe to merge]

k8s-ci-robot · 2018-01-24T14:50:45Z

@jesseshieh: The following test failed, say /retest to rerun them all:

Test name	Commit	Details	Rerun command
pull-kubernetes-e2e-gke-gci	`f9e43f3`	link	`/test pull-kubernetes-e2e-gke-gci`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

k8s-github-robot · 2018-01-24T15:30:47Z

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here.

k8s-ci-robot requested review from jszczepkowski and vishh January 20, 2018 02:47

k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jan 20, 2018

resouer reviewed Jan 20, 2018

View reviewed changes

k8s-ci-robot assigned jszczepkowski Jan 21, 2018

wojtek-t assigned wojtek-t and unassigned jszczepkowski Jan 21, 2018

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 24, 2018

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 24, 2018

k8s-github-robot merged commit 6e65c23 into kubernetes:master Jan 24, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix master regex when running multiple clusters #58561

Fix master regex when running multiple clusters #58561

jesseshieh commented Jan 20, 2018 •

edited

Loading

resouer commented Jan 20, 2018

resouer Jan 20, 2018 •

edited

Loading

jesseshieh Jan 20, 2018

wojtek-t Jan 21, 2018

jesseshieh Jan 21, 2018

jesseshieh commented Jan 21, 2018

jesseshieh commented Jan 24, 2018

wojtek-t commented Jan 24, 2018

k8s-ci-robot commented Jan 24, 2018

fejta-bot commented Jan 24, 2018

fejta-bot commented Jan 24, 2018

fejta-bot commented Jan 24, 2018

fejta-bot commented Jan 24, 2018

k8s-github-robot commented Jan 24, 2018

k8s-ci-robot commented Jan 24, 2018 •

edited

Loading

k8s-github-robot commented Jan 24, 2018

Fix master regex when running multiple clusters #58561

Fix master regex when running multiple clusters #58561

Conversation

jesseshieh commented Jan 20, 2018 • edited Loading

resouer commented Jan 20, 2018

resouer Jan 20, 2018 • edited Loading

Choose a reason for hiding this comment

jesseshieh Jan 20, 2018

Choose a reason for hiding this comment

wojtek-t Jan 21, 2018

Choose a reason for hiding this comment

jesseshieh Jan 21, 2018

Choose a reason for hiding this comment

jesseshieh commented Jan 21, 2018

jesseshieh commented Jan 24, 2018

wojtek-t commented Jan 24, 2018

k8s-ci-robot commented Jan 24, 2018

fejta-bot commented Jan 24, 2018

fejta-bot commented Jan 24, 2018

fejta-bot commented Jan 24, 2018

fejta-bot commented Jan 24, 2018

k8s-github-robot commented Jan 24, 2018

k8s-ci-robot commented Jan 24, 2018 • edited Loading

k8s-github-robot commented Jan 24, 2018

jesseshieh commented Jan 20, 2018 •

edited

Loading

resouer Jan 20, 2018 •

edited

Loading

k8s-ci-robot commented Jan 24, 2018 •

edited

Loading