Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Terraform error caused by unavailable metrics.k8s.io/v1beta1 #1827

Merged
merged 3 commits into from
Jun 5, 2023

Conversation

NimJay
Copy link
Collaborator

@NimJay NimJay commented Jun 5, 2023

Background

│ Error running command 'kubectl wait --for=condition=ready pods --all -n default --timeout=-1s': exit status 1. Output: + kubectl wait
│ --for=condition=ready pods --all -n default --timeout=-1s
│ E0605 13:26:36.342439    2173 memcache.go:287] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the
│ request
  • kubectl wait ... pods ... is failing because the metrics.k8s.io/v1beta1 resource is not ready.
  • See more logs in this comment.
  • According to this StackOverflow post and my personal testing, this error happens on new clusters (just created) — during the first few minutes.
  • In the first few minutes after cluster creation, running kubectl get apiservices outputs:
NAME                               SERVICE                      AVAILABLE                  AGE
...
v1beta1.metrics.k8s.io             kube-system/metrics-server   False (MissingEndpoints)   39s
...
  • After a few minutes, the output is:
NAME                               SERVICE                      AVAILABLE   AGE
...
v1beta1.metrics.k8s.io             kube-system/metrics-server   True        83m
...
  • A solution is to wait until the v1beta1.metrics.k8s.io resource is ready before kubectl wait ... pods ....

Change Summary

  • Before kubectl wait-ing for Pods, the Terraform now kubectl waits for the v1beta1.metrics.k8s.io API service.
  • We also add a timeout of 20 minutes (each) to both kubectl wait commands.

Testing Procedure

  • I've already tested terraform apply with this change, and it worked!
  • Also, once this is merged, we can check the DeployStack build run against the main branch (to see if terraform apply works):
    image

Additional info

  • Running kubectl get apiservice v1beta1.metrics.k8s.io -o yaml during the first few minutes after cluster creation outputs:
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
...
  name: v1beta1.metrics.k8s.io
...
status:
  conditions:
  - lastTransitionTime: "2023-06-05T13:25:13Z"
    message: endpoints for service/metrics-server in "kube-system" have no addresses
      with port name ""
    reason: MissingEndpoints
    status: "False"
    type: Available

command = <<-EOT
kubectl wait --for=condition=AVAILABLE apiservice/v1beta1.metrics.k8s.io --timeout=1200s
kubectl wait --for=condition=ready pods --all -n ${var.namespace} --timeout=1200s
EOT
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NimJay NimJay changed the title Improve Terraform's wait_conditions Fix Terraform error caused by missing metrics.k8s.io/v1beta1 Jun 5, 2023
@NimJay NimJay changed the title Fix Terraform error caused by missing metrics.k8s.io/v1beta1 Fix Terraform error caused by unavailable metrics.k8s.io/v1beta1 Jun 5, 2023
command = <<-EOT
kubectl wait --for=condition=AVAILABLE apiservice/v1beta1.metrics.k8s.io --timeout=1200s
kubectl wait --for=condition=ready pods --all -n ${var.namespace} --timeout=1200s
EOT
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previously used --timeout of -1s refers to 1 week. Related docs here.

@NimJay NimJay marked this pull request as ready for review June 5, 2023 15:47
@NimJay NimJay requested a review from a team as a code owner June 5, 2023 15:47
Copy link
Member

@bourgeoisor bourgeoisor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

terraform/main.tf Outdated Show resolved Hide resolved
@NimJay NimJay merged commit 6b64bc8 into main Jun 5, 2023
@NimJay NimJay deleted the nimjay-terraform-wait branch June 5, 2023 18:57
sitaramkm pushed a commit to sitaramkm/microservices-demo that referenced this pull request Aug 24, 2023
…gleCloudPlatform#1827)

* Improve Terraform's wait_conditions

* Update timeouts of kubectl wait
mrcrgl pushed a commit to fiberfjord/microservices-demo that referenced this pull request Sep 11, 2023
…gleCloudPlatform#1827)

* Improve Terraform's wait_conditions

* Update timeouts of kubectl wait
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

terraform apply errs on kubectl wait
2 participants