Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for latestRevision routes may point to older revisions if latestCreated is not ready #5319

Merged

Conversation

taragu
Copy link
Contributor

@taragu taragu commented Aug 28, 2019

/lint

Fixes #5285

Proposed Changes

  • In configuration reconciliation, obtain a list of sorted revisions instead of just the last created revision, so that we can update the last ready revision if the last created revision is not ready yet.

Release Note

NONE

/cc @dgerd

@googlebot googlebot added the cla: yes Indicates the PR's author has signed the CLA. label Aug 28, 2019
@knative-prow-robot knative-prow-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. area/API API objects and controllers labels Aug 28, 2019
Copy link
Contributor

@knative-prow-robot knative-prow-robot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@taragu: 0 warnings.

In response to this:

/lint

Fixes #5285

Proposed Changes

  • In configuration reconciliation, obtain a list of sorted revisions instead of just the last created revision, so that we can update the last ready revision if the last created revision is not ready yet.

Release Note

NONE

/cc @dgerd

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@knative-prow-robot knative-prow-robot added the area/test-and-release It flags unit/e2e/conformance/perf test issues for product features label Aug 28, 2019
@@ -151,8 +153,14 @@ func (c *Reconciler) reconcile(ctx context.Context, config *v1alpha1.Configurati
} else if err != nil {
logger.Errorf("Failed to reconcile Configuration %q - failed to get Revision: %v", config.Name, err)
return err
} else if sortedRevisions == nil || len(sortedRevisions) == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sortedRevisions == nil || len(sortedRevisions) == 0

I think both checks same condition behavior?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the 2nd covers both and is better since ... == nil doesn't cover the case of an empty list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@savitaashture I think they are slightly different, but we should only need len(sortedRevisions) == 0 here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this check is not needed at all, since a not found error will be returned by getSortedCreatedRevisions

@taragu taragu force-pushed the bug-latestCreatedRevision-5285 branch 2 times, most recently from b84051c to 849d7f5 Compare August 28, 2019 15:59
serving.ConfigurationLabelKey: config.Name,
}))

if err == nil && len(list) > 0 {
return list[0], nil
// Return a sorted list with Generation in descending order
sort.Slice(list[:], func(i, j int) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sort.Slice does surprisingly many things, including reflection before it starts sorting, so may be avoid sorting in case of len==1.

@taragu taragu changed the title Fix for latestRevision routes may point to older revisions if latestCreated is not ready [WIP] Fix for latestRevision routes may point to older revisions if latestCreated is not ready Aug 28, 2019
@knative-prow-robot knative-prow-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 28, 2019
@mattmoor mattmoor self-assigned this Aug 29, 2019
@taragu taragu force-pushed the bug-latestCreatedRevision-5285 branch from 849d7f5 to 2092e2b Compare August 30, 2019 13:15
@taragu taragu changed the title [WIP] Fix for latestRevision routes may point to older revisions if latestCreated is not ready Fix for latestRevision routes may point to older revisions if latestCreated is not ready Aug 30, 2019
@knative-prow-robot knative-prow-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 30, 2019
@taragu taragu force-pushed the bug-latestCreatedRevision-5285 branch from 2092e2b to 2177097 Compare August 30, 2019 13:45
@taragu
Copy link
Contributor Author

taragu commented Aug 30, 2019

@mattmoor @vagababov this PR is ready for another review. Would you please take a look?

pkg/reconciler/configuration/configuration.go Outdated Show resolved Hide resolved
pkg/reconciler/configuration/configuration.go Outdated Show resolved Hide resolved
pkg/reconciler/configuration/configuration.go Outdated Show resolved Hide resolved
@@ -127,7 +128,8 @@ func (c *Reconciler) reconcile(ctx context.Context, config *v1alpha1.Configurati
}

// First, fetch the revision that should exist for the current generation.
lcr, err := c.latestCreatedRevision(config)
sortedRevisions, err := c.getSortedCreatedRevisions(config)
var lcr *v1alpha1.Revision
if errors.IsNotFound(err) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way things are written (if I'm reading correctly), this won't be true after the first Revision is created, so I don't think createRevision will be called after the first Revision is created.

I think we now also need some logic that sortedRevisions[0].Labels[serving.ConfigurationGenerationLabelKey] matches metadata.generation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, but if we add the check for generation, it wouldn't support this use case, right?

pkg/reconciler/configuration/configuration.go Outdated Show resolved Hide resolved
pkg/reconciler/configuration/configuration.go Outdated Show resolved Hide resolved
@knative-prow-robot knative-prow-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Sep 3, 2019
@taragu taragu force-pushed the bug-latestCreatedRevision-5285 branch 2 times, most recently from 3c51bf0 to 9762403 Compare September 10, 2019 21:16
@taragu
Copy link
Contributor Author

taragu commented Sep 17, 2019

/test pull-knative-serving-unit-tests
/test pull-knative-serving-integration-tests

@taragu
Copy link
Contributor Author

taragu commented Sep 17, 2019

@mattmoor @vagababov would you please take another look at this?

@taragu
Copy link
Contributor Author

taragu commented Sep 19, 2019

/test pull-knative-serving-integration-tests

configSelector = configSelector.Add(*inReq)
}

list, err := lister.List(configSelector)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if err not nil ?
I would atleast log the error if not returning that particular err WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this! I've updated the PR to add a logger here

@taragu
Copy link
Contributor Author

taragu commented Sep 26, 2019

@mattmoor @vagababov this PR is ready for another look

intI, errI := strconv.Atoi(list[i].Labels[serving.ConfigurationGenerationLabelKey])
intJ, errJ := strconv.Atoi(list[j].Labels[serving.ConfigurationGenerationLabelKey])
if errI != nil || errJ != nil {
return true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we should force malformed revisions to the end of the list, so errI and errJ should return different things, I think.

func (c *Reconciler) getSortedCreatedRevisions(ctx context.Context, config *v1alpha1.Configuration) ([]*v1alpha1.Revision, error) {
logger := logging.FromContext(ctx)
if err := CheckNameAvailability(config, c.revisionLister); err != nil {
return nil, err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still changed, and breaks BYO revision name since line 243 will return "Not Found".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattmoor I'm not getting the "Not found" error with the rollback e2e test #5702 (because checkNameAvailability(...) is able to find a Configuration with the template name). Could you confirm if I'm doing the rollback test correctly?

@taragu taragu force-pushed the bug-latestCreatedRevision-5285 branch from 90181df to f5ab443 Compare October 24, 2019 17:57
@taragu
Copy link
Contributor Author

taragu commented Oct 28, 2019

/retest

@taragu
Copy link
Contributor Author

taragu commented Oct 30, 2019

@mattmoor would you take another look at this when you get a chance?


if err = c.findAndSetLatestReadyRevision(config); err != nil {
return fmt.Errorf("failed to find and set latest ready revision: %w", err)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

184-188 look like a special case of this, so perhaps we can just eliminate that and sink this call? I like getting as much mileage through code as possible, so if we can make this a single code path 🎉

Comment on lines 189 to 196
if rc != nil && rc.Status == corev1.ConditionTrue {
old, new := config.Status.LatestReadyRevisionName, lcr.Name
config.Status.SetLatestReadyRevisionName(lcr.Name)
if old != new {
c.Recorder.Eventf(config, corev1.EventTypeNormal, "LatestReadyUpdate",
"LatestReadyRevisionName updated to %q", new)
}
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, what I meant by my previous comment was: Can we just delete these lines, and if so will findAndSet... do the right thing? If not why not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it won't work for the byo revision name rollback use case. Because findAndSet... will try to get the list of created revision starting from the generation number of the current LRR, it won't fetch the rollback revision and consequently update LRR to the correct previous revision.

Comment on lines 241 to 257
var generations []string
for i := start + 1; i <= int64(config.Generation); i++ {
generations = append(generations, strconv.FormatInt(i, 10))
}

// Add an "In" filter so that the configurations we get back from List have generation
// in range (config's latest ready generation, config's generation]
generationKey := serving.ConfigurationGenerationLabelKey
inReq, err := labels.NewRequirement(generationKey,
selection.In,
generations,
)
if err != nil {
return nil, err
}

configSelector = configSelector.Add(*inReq)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant by my other comment was: Can we move this entire selection into the if LRR != "" { ... } block?

@knative-test-reporter-robot

The following jobs failed:

Test name Triggers Retries
pull-knative-serving-integration-tests 0/3

Failed non-flaky tests preventing automatic retry of pull-knative-serving-integration-tests:

test/e2e.TestRollbackBYOName

// Return a sorted list with Generation in descending order
if len(list) > 1 {
sort.Slice(list, func(i, j int) bool {
intI, errI := strconv.Atoi(list[i].Labels[serving.ConfigurationGenerationLabelKey])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on what you have said above, what is here seems incomplete.

I think that we need a check that says that if config.Spec.Template.Name == list[i].Name then return true (BYO Name always sorts first).


if err == nil && len(list) > 0 {
// Return a sorted list with Generation in descending order
if len(list) > 1 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This double predicate is a bit weird, I would use:

if err != nil {
  return nil, err
}
// Does this panic if list is nil? I doubt it would...
sort.Slice(list, func(){})

return nil, err
}
// Return a sorted list with Generation in descending order
if len(list) > 1 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this after Victor made this suggestion #5319 (comment)

sortedRevisions, err := c.getSortedCreatedRevisions(config)
if err != nil {
return err
}
for _, rev := range sortedRevisions {
if rev.Status.IsReady() {
// No need to update latest ready revision in this case
if rev.Name == config.Status.LatestReadyRevisionName {
if rev.Name == config.Status.LatestReadyRevisionName && !lcrReady {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get the lcrReady check here. Why can't we return whenever the LRRN matches?

Copy link
Contributor Author

@taragu taragu Nov 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for the failed revision recovers use case. We still need to call SetLatestReadyRevisionName(...) in this case because it marks the configuration ready to true.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think threading through the added state probably makes things more confusing than its worth given that we save a tiny bit of redundancy in the SetLatestReadyRevision call when it's already true?

config.Status.SetLatestReadyRevisionName(rev.Name)
c.Recorder.Eventf(config, corev1.EventTypeNormal, "LatestReadyUpdate",
"LatestReadyRevisionName updated to %q", rev.Name)
if old != new {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except for the lcrReady check, this is the same test as above, so this should be the only meaningful work we'd perform when not returning. Am I missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think part of the issue is that SetLatestReadyRevisionName(..) not only sets the LRR, but also marks configuration as ready. So there are cases where we want to call SetLatestReadyRevisionName because we need to set the config to ready, but we don't really need to set the LRR because it's already the same as the current LRR.

return nil
}

// findAndSetLatestReadyRevision finds the last ready revision and sets LatestReadyRevisionName to it.
// lcrReady indicates whether the latest created revision is ready or not.
func (c *Reconciler) findAndSetLatestReadyRevision(config *v1alpha1.Configuration, lcrReady bool) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we drop the unused param?

@knative-metrics-robot
Copy link

The following is the coverage report on the affected files.
Say /test pull-knative-serving-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/configuration/configuration.go 88.8% 87.0% -1.8

Copy link
Member

@mattmoor mattmoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

Let's get it baking. thanks!

@knative-prow-robot knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Nov 8, 2019
@mattmoor
Copy link
Member

mattmoor commented Nov 8, 2019

/hold cancel

@knative-prow-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mattmoor, taragu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow-robot knative-prow-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Nov 8, 2019
@knative-prow-robot knative-prow-robot merged commit fea8402 into knative:master Nov 8, 2019
@taragu taragu deleted the bug-latestCreatedRevision-5285 branch November 12, 2019 16:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/API API objects and controllers area/test-and-release It flags unit/e2e/conformance/perf test issues for product features cla: yes Indicates the PR's author has signed the CLA. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

latestRevision routes may point to older revisions if latestCreated is not ready
10 participants