-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Address 503s when the autoscaler is being rolled #12617
Conversation
The activator's readiness depends on the status of web socket connection to the autoscaler. When the connection is down the activator will report ready=false. This can occur when the autoscaler deployment is updating. PR knative#12614 made the activator's readiness probe fail aggressively after a single failure. This didn't seem to impact istio but with contour it started returning 503s since the activator started to report ready=false immediately. This PR does two things to mitigate 503s: - bump the readiness threshold to give the autoscaler more time to rollout/startup. This still remains lower than the drain duration - Update the autoscaler rollout strategy so we spin up a new instance prior to bring down the older one. This is done using maxUnavailable=0
Codecov Report
@@ Coverage Diff @@
## main #12617 +/- ##
==========================================
- Coverage 87.52% 87.48% -0.05%
==========================================
Files 195 195
Lines 9718 9718
==========================================
- Hits 8506 8502 -4
- Misses 928 931 +3
- Partials 284 285 +1
Continue to review full report at Codecov.
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dprotaso, mattmoor The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/cherry-pick release-1.2 |
@dprotaso: #12617 failed to apply on top of branch "release-1.2":
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cherry-pick release-1.2 |
/cherry-pick release-1.1 |
/cherry-pick release-1.0 |
@dprotaso: new pull request created: #12621 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@dprotaso: new pull request created: #12622 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@dprotaso: #12617 failed to apply on top of branch "release-1.0":
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cherry-pick release-1.0 |
@dprotaso: new pull request created: #12623 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
The activator's readiness depends on the status of web socket connection
to the autoscaler. When the connection is down the activator will report
ready=false. This can occur when the autoscaler deployment is updating.
PR #12614 made the activator's readiness aggressively fail after
a single probe failure. This didn't seem to impact istio (maybe it retries?) but with
contour it started returning 503s since the activator started to report ready=false
immediately.
This PR does two things to mitigate 503s:
rollout/startup. This still remains lower than the drain duration
prior to bring down the older one. This is done using maxUnavailable=0
Fixes #12524 flake
Release Note