Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip default probe if container freezer in use #12967

Merged
merged 1 commit into from
May 30, 2022

Conversation

psschwei
Copy link
Contributor

@psschwei psschwei commented May 25, 2022

Signed-off-by: Paul S. Schweigert [email protected]

Paused containers and readiness probes are not the best of friends.
Kubernetes has no concept of a paused container, and so the container
freezer does a bit of an end around to actually freeze the
container (using the container runtime rather than doing it
through Kubernetes).

However, because we're doing an end around, paused containers show as
not ready in Kubernetes... and not ready means that the readiness
probe we default in will eventually fail the container. In practice
what happens is a bunch of "aggressive probe errors" before queue
proxy basically becomes unresponsive.

To prevent this from happening, this PR proposes disabling the default
probe when the freezer is enabled (which is done by setting a value
for the concurrency state endpoint). There's future work in the
pipeline that may render this change moot at some point down the
line (such as switching the default probe to a startup one), but for
now this provides a quick solution to make the container freezer
usable outside of demos / toy environments.

/assign @dprotaso @nader-ziada

Release Note

Enabling the container freezer will disable the readiness probe defaulted in by Knative. 

Signed-off-by: Paul S. Schweigert <[email protected]>

Paused containers and readiness probes are not the best of friends.
Kubernetes has no concept of a paused container, and so the container
freezer does a bit of an end around to actually freeze the
container (using the container runtime rather than doing it
through Kubernetes).

However, because we're doing an end around, paused containers show as
not ready in Kubernetes... and not ready means that the readiness
probe we default in will eventually fail the container. In practice
what happens is a bunch of "aggressive probe errors" before queue
proxy basically becomes unresponsive.

To prevent this from happening, this PR proposes disabling the default
probe when the freezer is enabled (which is done by setting a value
for the concurrency state endpoint). There's future work in the
pipeline that may render this change moot at some point down the
line (such as switching the default probe to a startup one), but for
now this provides a quick solution to make the container freezer
usable outside of demos / toy environments.
@knative-prow knative-prow bot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. area/API API objects and controllers area/autoscale area/networking approved Indicates a PR has been approved by an approver from all required OWNERS files. labels May 25, 2022
@psschwei
Copy link
Contributor Author

/approve cancel

@knative-prow knative-prow bot removed the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 25, 2022
@psschwei
Copy link
Contributor Author

Not entirely convinced this approach is the right way to go, but given that there's been some interest in the freezer since KnativeCon, I thought it would be good to remove one of the main barriers from folks being able to use it...

@codecov
Copy link

codecov bot commented May 25, 2022

Codecov Report

Merging #12967 (3568d2e) into main (3446ca3) will decrease coverage by 0.01%.
The diff coverage is 0.00%.

@@            Coverage Diff             @@
##             main   #12967      +/-   ##
==========================================
- Coverage   87.01%   87.00%   -0.02%     
==========================================
  Files         197      197              
  Lines       14437    14439       +2     
==========================================
  Hits        12562    12562              
- Misses       1580     1582       +2     
  Partials      295      295              
Impacted Files Coverage Δ
cmd/queue/main.go 0.64% <0.00%> (-0.01%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3446ca3...3568d2e. Read the comment docs.

probe := func() bool { return true }
if env.ServingReadinessProbe != "" {
if env.ServingReadinessProbe != "" && env.ConcurrencyStateEndpoint == "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about putting this behind a flag since you are not 100% convinced with it, even if enabled by default, but at least allows users to turn it off if they want to?

@dprotaso
Copy link
Member

ConcurrencyStateEndpoint is alpha/experimental anyway so I don't think it's necessary to introduce sub flags.

/lgtm
/approve

@knative-prow knative-prow bot added the lgtm Indicates that a PR is ready to be merged. label May 30, 2022
@knative-prow
Copy link

knative-prow bot commented May 30, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dprotaso

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow knative-prow bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 30, 2022
@knative-prow knative-prow bot merged commit d7027b5 into knative:main May 30, 2022
@psschwei psschwei deleted the started-check branch August 10, 2022 15:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/API API objects and controllers area/autoscale area/networking lgtm Indicates that a PR is ready to be merged. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants