-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd: wait for the cluster to be initialized #1132
cmd: wait for the cluster to be initialized #1132
Conversation
@smarterclayton Is this what you are looking for with regards to having the installer verify that the cluster version is valid? |
This is the error from the installer when run with these changes.
But the cluster did eventually initialize, which means that the installer should not stop when the cluster version object has a Failing/True condition. |
Would be interesting to see how it recovered. Maybe it will turn up in CI where we'll have logs. |
Output from installer with it changed to not stop after seeing a Failure/True condition.
|
Pushed update to wait up to 30 minutes, to not stop when ClusterVersion reports Failing/True, and to better downsample the log messages. |
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of comments but yes.
The ClusterVersion client is not available in the version of openshift/client-go that was pinned. These changes remove the pinnings of openshift/client-go and openshift/api. Commands run: dep ensure dep ensure -update github.com/openshift/api
Vendor openshift/library-go for access to helper functions for evaluating the status conditions of the ClusterVersion object.
After creating the cluster, wait up to 30 minutes for the ClusterVersion object to indicate that the cluster has been initialized prior to exiting the installer.
/approve This looks good, will wait for sometime to allow other people to comment. |
/test e2e-aws Sorry, using you as a guinea pig |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: abhinavdahiya, staebler, wking The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest Please review the full test history for this PR and help us cut down flakes. |
Notes on the StatefulSet panics here. /retest |
The console may become optional [1], so teach the installer to handle its absence gracefully. We've waited on the console since way back in ff53523 (add logs at end of install for kubeadmin, consoleURL, 2018-12-06, openshift#806). Back then, install-complete timing was much less organized, and since e17ba3c (cmd: wait for the cluster to be initialized, 2019-01-25, openshift#1132) we've blocked on ClusterVersion going Available=True. So the current dependency chain is: 1. Console route admission blocks console operator from going Available=True in its ClusterOperator. 2. Console ClusterOperator blocks cluster-version operator from going Available=True in ClusterVersion. 3. ClusterVersion blocks installer's waitForInitializedCluster. So we no longer need to wait for the route to show up, and can fail fast if we get a clear IsNotFound. I'm keeping a bit of polling so we don't fail an install on a temporary network hiccup. We don't want to drop the console check entirely, because when it is found, we want: * To continue to log that access pathway on install-complete. * To continue to append the router CA to the kubeconfig. That latter point has been done since 4033577 (append router CA to cluster CA in kubeconfig, 2019-02-12, openshift#1242). The motication in that commit message is not explicit, but the idea is to support folks who naively run 'oc login' with the kubeadmin kubeconfig [2] (despite that kubeconfig already having cluster-root access) when the console route's cert's CA happens to be something that the user's local trust store doesn't include by default. [1]: openshift/enhancements#922 [2]: openshift#1541 (comment)
The console may become optional [1], so teach the installer to handle its absence gracefully. We've waited on the console since way back in ff53523 (add logs at end of install for kubeadmin, consoleURL, 2018-12-06, openshift#806). Back then, install-complete timing was much less organized, and since e17ba3c (cmd: wait for the cluster to be initialized, 2019-01-25, openshift#1132) we've blocked on ClusterVersion going Available=True. So the current dependency chain is: 1. Console route admission blocks console operator from going Available=True in its ClusterOperator. 2. Console ClusterOperator blocks cluster-version operator from going Available=True in ClusterVersion. 3. ClusterVersion blocks installer's waitForInitializedCluster. So we no longer need to wait for the route to show up, and can fail fast if we get a clear IsNotFound. I'm keeping a bit of polling so we don't fail an install on a temporary network hiccup. We don't want to drop the console check entirely, because when it is found, we want: * To continue to log that access pathway on install-complete. * To continue to append the router CA to the kubeconfig. That latter point has been done since 4033577 (append router CA to cluster CA in kubeconfig, 2019-02-12, openshift#1242). The motivation in that commit message is not explicit, but the idea is to support folks who naively run 'oc login' with the kubeadmin kubeconfig [2] (despite that kubeconfig already having cluster-root access) when the console route's cert's CA happens to be something that the user's local trust store doesn't include by default. [1]: openshift/enhancements#922 [2]: openshift#1541 (comment)
The console may become optional [1], so teach the installer to handle its absence gracefully. We've waited on the console since way back in ff53523 (add logs at end of install for kubeadmin, consoleURL, 2018-12-06, openshift#806). Back then, install-complete timing was much less organized, and since e17ba3c (cmd: wait for the cluster to be initialized, 2019-01-25, openshift#1132) we've blocked on ClusterVersion going Available=True. So the current dependency chain is: 1. Console route admission blocks console operator from going Available=True in its ClusterOperator. 2. Console ClusterOperator blocks cluster-version operator from going Available=True in ClusterVersion. 3. ClusterVersion blocks installer's waitForInitializedCluster. So we no longer need to wait for the route to show up, and can fail fast if we get a clear IsNotFound. I'm keeping a bit of polling so we don't fail an install on a temporary network hiccup. We don't want to drop the console check entirely, because when it is found, we want: * To continue to log that access pathway on install-complete. * To continue to append the router CA to the kubeconfig. That latter point has been done since 4033577 (append router CA to cluster CA in kubeconfig, 2019-02-12, openshift#1242). The motivation in that commit message is not explicit, but the idea is to support folks who naively run 'oc login' with the kubeadmin kubeconfig [2] (despite that kubeconfig already having cluster-root access) when the console route's cert's CA happens to be something that the user's local trust store doesn't include by default. [1]: openshift/enhancements#922 [2]: openshift#1541 (comment)
After creating the cluster, wait until the ClusterVersion object indicates that the cluster has been initialized prior to exiting the installer.