Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide configuration option for revisions becoming Unschedulable #14862

Open
SaschaSchwarze0 opened this issue Feb 2, 2024 · 6 comments · May be fixed by #15397
Open

Provide configuration option for revisions becoming Unschedulable #14862

SaschaSchwarze0 opened this issue Feb 2, 2024 · 6 comments · May be fixed by #15397
Labels
area/autoscale kind/feature Well-understood/specified features, ready for coding.

Comments

@SaschaSchwarze0
Copy link
Contributor

SaschaSchwarze0 commented Feb 2, 2024

/area autoscale

Describe the feature

In Bubble up pod schedule errors to revision status, the Revision reconciler was changed to propagate pod scheduling issues up to the Revision. This is done whenever a scale from 0 happens but not when a scale from for example 1 happens due to this condition.

I generally can understand the reason behind this for a Knative installation where users have full control and where the cluster size is fixed.

We are running Knative as a managed service with cluster autoscaling. There is actually no way to get to a Pod that cannot be scheduled. Even if suddenly there is no capacity available for a moment, every Pod will eventually be scheduled. In our environment, revisions (temporarily) going into that status are confusing our users.

What I would like to ask for is a configuration option to turn that code path off when the flag is active and the condition's reason is Unschedulable.

If you agree that such a flag makes sense, I would be willing to PR the change. I just would need guidance on how to name the configuration option (pod-is-always-schedulable for example ?) and whether that would go into config-features or if you prefer an environment variable.

@SaschaSchwarze0 SaschaSchwarze0 added the kind/feature Well-understood/specified features, ready for coding. label Feb 2, 2024
Copy link

github-actions bot commented Jul 5, 2024

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 5, 2024
@SaschaSchwarze0
Copy link
Contributor Author

/remove-lifecycle stale

@knative-prow knative-prow bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 19, 2024
Copy link

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 18, 2024
@SaschaSchwarze0
Copy link
Contributor Author

/remove-lifecycle stale

As long as there is no feedback from the Knative community at all, one could consider it offending for the issue to be considered stale. :-)

@knative-prow knative-prow bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 18, 2024
@skonto
Copy link
Contributor

skonto commented Oct 24, 2024

Hi @SaschaSchwarze0!

In our environment, revisions (temporarily) going into that status are confusing our users.

Does that mean that users see the actual revision and get confused? Could the managed service communicate that this is temporary as a workaround? If users see the revision resource, I understand that hiding the temporary error is more friendly I guess because otherwise you are leaking that your just provisioning resources vs resources are always there.

@SaschaSchwarze0
Copy link
Contributor Author

Hi @SaschaSchwarze0!

In our environment, revisions (temporarily) going into that status are confusing our users.

Does that mean that users see the actual revision and get confused? Could the managed service communicate that this is temporary as a workaround? If users see the revision resource, I understand that hiding the temporary error is more friendly I guess because otherwise you are leaking that your just provisioning resources vs resources are always there.

Hi @skonto, we have an indicator for the readiness of a revision (green for ready=true, yellow for ready=unknown, red for ready=false). That's where the user gets worried because once something is not green, that is not good.

And yes, we probably could figure out if the revision is not ready because Knative assumes the pods cannot be scheduled and still indicate this as green. But yeah, it would only be something our UX puts on top of it. If the user goes down to look at the revision on the Kubernetes API, they would still see it as not ready.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/autoscale kind/feature Well-understood/specified features, ready for coding.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants