Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continue to set latest ready revision if error encountered #6592

Merged
merged 4 commits into from
Feb 27, 2020

Conversation

taragu
Copy link
Contributor

@taragu taragu commented Jan 21, 2020

/lint

Fixes #6588
Fixes #6876

Proposed Changes

Currently in the config reconciler, if the LRR does not exist, we don't continue the rest of the logic to set the LRR. This PR fixes it by logging the error and continuing.

Release Note

NONE

@googlebot googlebot added the cla: yes Indicates the PR's author has signed the CLA. label Jan 21, 2020
@knative-prow-robot knative-prow-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jan 21, 2020
Copy link
Contributor

@knative-prow-robot knative-prow-robot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@taragu: 0 warnings.

In response to this:

/lint

Fixes #6588

Proposed Changes

Currently in the config reconciler, if the LRR does not exist, we don't continue the rest of the logic to set the LRR. This PR fixes it by logging the error and continuing.

Release Note

NONE

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@knative-prow-robot knative-prow-robot added the area/API API objects and controllers label Jan 21, 2020
start := lrr.Generation
var generations []string
for i := start; i <= int64(config.Generation); i++ {
generations = append(generations, strconv.FormatInt(i, 10))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
generations = append(generations, strconv.FormatInt(i, 10))
generations = append(generations, strconv.Itoa(i))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually i here is int64 so FormatInt(..) might be better because then we don't have to cast int64 to int

@taragu
Copy link
Contributor Author

taragu commented Jan 22, 2020

/retest

@taragu
Copy link
Contributor Author

taragu commented Jan 23, 2020

@vagababov would you take a look at this again when you get a chance?

Copy link
Contributor

@vagababov vagababov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave to Dan to review in depth.
/assign @dgerd

lister := c.revisionLister.Revisions(config.Namespace)
configSelector := labels.SelectorFromSet(map[string]string{
serving.ConfigurationLabelKey: config.Name,
})
if config.Status.LatestReadyRevisionName != "" {
lrr, err := lister.Get(config.Status.LatestReadyRevisionName)
// Record the error and continue because we still want to set the LRR to the correct revision
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: sentences end in periods.

Suggested change
// Record the error and continue because we still want to set the LRR to the correct revision
// Record the error and continue because we still want to set the LRR to the correct revision.

generations,
)
if err != nil {
return nil, err
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You also have an opportunity to exit out of this function here without returning a list of eligible Revisions.

IIUC this whole section is just limiting the search space down to Revisions between the previous Ready and the latest Created. Why not fallback to the broader search in this case as well?

@taragu
Copy link
Contributor Author

taragu commented Feb 20, 2020

Verified that this also fixes #6876

for i := start; i <= int64(config.Generation); i++ {
generations = append(generations, strconv.FormatInt(i, 10))
}
logger.Errorf("Error getting latest ready revision %q: %v", config.Status.LatestReadyRevisionName, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a detailed comment here?

Something like:

Suggested change
logger.Errorf("Error getting latest ready revision %q: %v", config.Status.LatestReadyRevisionName, err)
// If the user deletes the LatestReadyRevision then this may return an error due to the
// dangling reference. Proceed to calculate the next-latest ready revision so that the
// caller can synthesize a new Revision at the current generation to replace the one deleted.
logger.Errorf("Error getting latest ready revision %q: %v", config.Status.LatestReadyRevisionName, err)

Tara Gu added 4 commits February 21, 2020 09:20
Currently in the config reconciler, if the LRR does not exist, we don't continue the rest of the logic to set the LRR. This PR fixes it by logging the error and continuing.
@knative-metrics-robot
Copy link

The following is the coverage report on the affected files.
Say /test pull-knative-serving-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/configuration/configuration.go 91.6% 93.5% 1.9

@taragu
Copy link
Contributor Author

taragu commented Feb 21, 2020

/retest

@dprotaso
Copy link
Member

dprotaso commented Feb 21, 2020

Looking through the lineage of the changes around getting/setting the latest ready revision I'm curious if things would be simpler if we were to relax the need for the LatestReadyRevision to always roll forward. This seems to be a conclusion of the discussions here

As an example, here's a timeline of a configuration's revisions:

time revision event
1 Revision-01 created
2 Revision-02 created
3 Revision-03 created
4 Revision-01 ready
5 Revision-02 ready
6 Revision-02 not ready

@mattmoor @dgerd Should the LastReadyRevision rollback to Revision-01?

@@ -468,6 +468,35 @@ func TestReconcile(t *testing.T) {
Eventf(corev1.EventTypeNormal, "LatestReadyUpdate", "LatestReadyRevisionName updated to %q", "revnotready-00002"),
},
Key: "foo/revnotready",
}, {
Name: "current LRR doesn't exist, LCR is ready",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no acronyms please

@dprotaso
Copy link
Member

Also based on the comment being added it seems like we're fine with rolling back the latest ready revision when the revision is deleted

Proceed to calculate the next-latest ready revision so that the caller can synthesize a new Revision at the current generation to replace the one deleted

@dprotaso
Copy link
Member

A suggestion from @isdal is to prevent revision deletions under certain conditions (ie. they're the latest ready) but I think you'll still encounter this issue if your informer cache is stale and not up to date.

@isdal
Copy link
Contributor

isdal commented Feb 26, 2020

@whaught should take credit :-) He has some background around pros/cons here which I think apply. Preventing deletion is the least surprising option to customers in my mind, however it has a few downsides too.

@whaught
Copy link
Contributor

whaught commented Feb 26, 2020

The least surprising thing might go a little further. In the ValidatingWebHook, on delete you might look for a parented Route and throw an error if the Revision is still referenced (and actively serving or being migrated to).

This prevents users from causing an unintended interruption if they try to delete such a revision, only to have it be mysteriously taken down and then recreated.

@dprotaso
Copy link
Member

Moving the discussion to an issue in order to unblock this PR (who's immediate purpose is to address the flakes)

/lgtm
/approve

@knative-prow-robot knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 27, 2020
@knative-prow-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dprotaso, taragu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow-robot knative-prow-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 27, 2020
@dprotaso
Copy link
Member

/test pull-knative-serving-autotls-tests

@knative-test-reporter-robot

The following jobs failed:

Test name Triggers Retries
pull-knative-serving-unit-tests 0/3

Failed non-flaky tests preventing automatic retry of pull-knative-serving-unit-tests:

pkg/reconciler/ingress.TestGlobalResyncOnUpdateGatewayConfigMap

@taragu
Copy link
Contributor Author

taragu commented Feb 27, 2020

/retest

@knative-prow-robot knative-prow-robot merged commit 627315b into knative:master Feb 27, 2020
@taragu taragu deleted the delete-latest-rev-6588 branch July 29, 2020 11:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/API API objects and controllers cla: yes Indicates the PR's author has signed the CLA. lgtm Indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet