Add wait-timeout flag to start command and refactor util/kubernetes #5121

medyagh · 2019-08-18T02:08:08Z

Removes more than a few unused functions such as :
- NewPodStore, StartPods, WaitForPodDelete, WaitForEvent , WaitForServiceEndpointsNum, VersionedExtraOption
Moves funcs out of pkg/utils into pgk/kube.
Moves ExtraOptions type and its funcs from pkg/util to pkg/minikube/config package. and Rename func .ContainsString to .ContainsParam
Choose better timeout based on kubernetes consts

Added logs for how long it took for each component and k8s-app to come up ( to be used later to fine-tune our default waiting time) which will appear in the logs like :

 kube.go:103] duration metric: took 1m17.514019465s to wait for component=etcd ...
 kube.go:103] duration metric: took 7.289066ms to wait for component=kube-scheduler ...

Added a new flag for start cmd wait-timeout that specifies max wait per component.
Reduced the default wait per component from 5 minutes to 3 minutes for end users
Increase default wait per component from 5 minutes to 13 minutes for parallel integration tests.
Parameterized integration tests to accept wait-timeout
Fixed the test setup for givsor test which was put after the e2e test run. moved up the script.

Closes #5122
and Hopefully reduces some test flakes due to timeout

topics I like to know the reviewer's opinion on :

name of the package, "kube" vs "kubernetes"
path of package, current: "k8s.io/minikube/pkg/kube" vs "k8s.io/minikube/pkg/util/kube"

Golang Parallel Test logging gotcha! :

I was hoping to see the duration metric logs in the tests !

	elapsed := time.Since(start)
       glog.Infof("duration metric: took %s to wait for %s ...", elapsed, label)

but I found out sometimes it doesn't log at all and sometimes logs at most 2 sets of them.

KMV logs shows only two sets of duration metrics https://storage.googleapis.com/minikube-builds/logs/5121/KVM_Linux.txt
Hyperkit logs showed no duration metric at all !!! https://storage.googleapis.com/minikube-builds/logs/5121/HyperKit_macOS.txt (the logs might be overwritten by the next commit)

That makes me believe golang only outputs the not PAUSED Tests and the paused tests (which their VM still are running and our wait for running func is still working on them) will not output to the logs. I created an issue in golang to track this : golang/go#33706

k8s-ci-robot · 2019-08-18T02:08:16Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: medyagh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [medyagh]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

medyagh · 2019-08-18T17:53:25Z

the flaky test is related to virtualbox corrupt issue #5083
also I created an issue for making waitCluster smarter after : #5125

tstromberg

Thank you for taking on the much needed refactor.

tstromberg · 2019-08-19T21:57:27Z

hack/jenkins/common.sh

@@ -241,6 +241,11 @@ export MINIKUBE_HOME="${TEST_HOME}/.minikube"
 export MINIKUBE_WANTREPORTERRORPROMPT=False
 export KUBECONFIG="${TEST_HOME}/kubeconfig"

+# Build the gvisor image. This will be copied into minikube and loaded by ctr.
+# Used by TestContainerd for Gvisor Test.
+docker build -t gcr.io/k8s-minikube/gvisor-addon:latest -f testdata/gvisor-addon-Dockerfile out


This was added by Priya to fix the gvisor test ( to be added before the integration tests) but it was added after the minikube clean up ! so it was not being used by the test.

this only moved the command up the script

tstromberg · 2019-08-19T22:09:31Z

cmd/minikube/cmd/start.go

@@ -148,6 +150,7 @@ func initMinikubeFlags() {
 	startCmd.Flags().String(networkPlugin, "", "The name of the network plugin.")
 	startCmd.Flags().Bool(enableDefaultCNI, false, "Enable the default CNI plugin (/etc/cni/net.d/k8s.conf). Used in conjunction with \"--network-plugin=cni\".")
 	startCmd.Flags().Bool(waitUntilHealthy, true, "Wait until Kubernetes core services are healthy before exiting.")
+	startCmd.Flags().Duration(waitTimeout, 3*time.Minute, "max time to wait for Kubernetes core services to be healthy.")


This default seems quite short for certain environments: Previously, it was 5-minutes per pod. 5 minutes overall perhaps?

I should had added Per Kubernetes service... ( this is not for all) and it used to be 5 min per componenent

pkg/kube/kube.go

pkg/minikube/bootstrapper/kubeadm/versions.go

tstromberg · 2019-08-19T22:24:52Z

test/integration/flags.go

@@ -32,7 +32,7 @@ func TestMain(m *testing.M) {
 	os.Exit(m.Run())
 }

-var startTimeout = flag.Int("timeout", 25, "number of minutes to wait for minikube start")
+var startTimeout = flag.Duration("timeout", 25*time.Minute, "max duration to wait for a full minikube start")


8x the default we give to users is crazy. I can understand 2x, but more than that I feel like we are making ourselves reliable at the expense of users.

sorry for the confusion, the --wait-time out is Per component not for all start.

before this PR the total wait was 5 x Per component
this PR reduces it to 3 min for end user (per component)

so it is not too much more than what we do for the end user. (if we count 5* per component")

but I intend to have another clean up PR in the integration tests ,to get rid of all kind of Retrying Start in parallel. once fixed all the flakes ( certs, and corruptions...)

medyagh · 2019-08-19T23:24:36Z

@tstromberg I believe I solved all the comments

medyagh · 2019-08-20T01:37:45Z

/retest this please

refactor util/kuberentes and ExtraOptions

5811610

k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 18, 2019

k8s-ci-robot requested review from RA489 and sharifelgamal August 18, 2019 02:08

medyagh changed the title ~~refactor util/kuberentes and ExtraOptions~~ Refactor util/kuberentes and ExtraOptions Aug 18, 2019

medyagh added 3 commits August 17, 2019 23:18

use kubeadm consts and add duration metrics logs

a85db2d

adding a new flag wait-timeout and reduce default wait from 5 to 3

5464a7c

Fixed givsor test setup and added time out to integration test

af16d33

medyagh requested a review from tstromberg August 18, 2019 08:44

medyagh changed the title ~~Refactor util/kuberentes and ExtraOptions~~ Refactor util/kuberentes and ExtraOptions and add wait-timeout flag to start cmd Aug 18, 2019

medyagh changed the title ~~Refactor util/kuberentes and ExtraOptions and add wait-timeout flag to start cmd~~ Add wait-timeout flag to start cmd and refactor util/kuberentes and ExtraOptions and Aug 18, 2019

medyagh changed the title ~~Add wait-timeout flag to start cmd and refactor util/kuberentes and ExtraOptions and~~ Add wait-timeout flag to start command and refactor util/kubernetes into a package Aug 18, 2019

make TestVersionUpgrade campatible with the new flag

6554a51

medyagh requested a review from josedonizetti August 18, 2019 09:36

medyagh mentioned this pull request Aug 18, 2019

make WaitCluster smarter. #5125

Closed

medyagh changed the title ~~Add wait-timeout flag to start command and refactor util/kubernetes into a package~~ Add wait-timeout flag to start command and refactor util/kubernetes Aug 18, 2019

tstromberg suggested changes Aug 19, 2019

View reviewed changes

medyagh added 2 commits August 19, 2019 16:11

rename kube pkg to kapi

f75b558

move the gvisor test setup before the minikube start

dd94e15

medyagh merged commit c3cfedf into kubernetes:master Aug 20, 2019

medyagh deleted the refactor_util_kube branch August 20, 2019 05:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add wait-timeout flag to start command and refactor util/kubernetes #5121

Add wait-timeout flag to start command and refactor util/kubernetes #5121

medyagh commented Aug 18, 2019 •

edited

Loading

k8s-ci-robot commented Aug 18, 2019

medyagh commented Aug 18, 2019 •

edited

Loading

tstromberg left a comment

tstromberg Aug 19, 2019

medyagh Aug 19, 2019

medyagh Aug 19, 2019

tstromberg Aug 19, 2019

medyagh Aug 19, 2019

tstromberg Aug 19, 2019

medyagh Aug 19, 2019

medyagh commented Aug 19, 2019

medyagh commented Aug 20, 2019

Add wait-timeout flag to start command and refactor util/kubernetes #5121

Add wait-timeout flag to start command and refactor util/kubernetes #5121

Conversation

medyagh commented Aug 18, 2019 • edited Loading

Golang Parallel Test logging gotcha! :

k8s-ci-robot commented Aug 18, 2019

medyagh commented Aug 18, 2019 • edited Loading

tstromberg left a comment

Choose a reason for hiding this comment

tstromberg Aug 19, 2019

Choose a reason for hiding this comment

medyagh Aug 19, 2019

Choose a reason for hiding this comment

medyagh Aug 19, 2019

Choose a reason for hiding this comment

tstromberg Aug 19, 2019

Choose a reason for hiding this comment

medyagh Aug 19, 2019

Choose a reason for hiding this comment

tstromberg Aug 19, 2019

Choose a reason for hiding this comment

medyagh Aug 19, 2019

Choose a reason for hiding this comment

medyagh commented Aug 19, 2019

medyagh commented Aug 20, 2019

medyagh commented Aug 18, 2019 •

edited

Loading

medyagh commented Aug 18, 2019 •

edited

Loading