Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RolloutAnalysis check on presence of metrics before trying to create AnalysisRuns #3441

Closed
kevinqian-db opened this issue Mar 12, 2024 · 4 comments
Labels
enhancement New feature or request

Comments

@kevinqian-db
Copy link
Contributor

kevinqian-db commented Mar 12, 2024

Summary

Today we have the following when reconciling AnalysisRuns (ARs)

func (c *rolloutContext) reconcileBackgroundAnalysisRun() (*v1alpha1.AnalysisRun, error) {
currentAr := c.currentArs.CanaryBackground
if c.rollout.Spec.Strategy.Canary.Analysis == nil {
err := c.cancelAnalysisRuns([]*v1alpha1.AnalysisRun{currentAr})
return nil, err
}

However, this check will not catch the case if the analysis field is an empty object vs a nil. If an empty object is placed here, the subsequent AR creation code will run, and eventually result in the failure spec.metrics is empty and errors out. Argo today will keep retry this failure and it will never realize that it can never create this AR.

Right now we did encounter that some entity (we are not sure which one, might even been Argo itself) setting it to empty when trying to skip health checks during an emergency fix. Nevertheless, we believe having an extra check that the configuration here actually yields any metrics should still be beneficial to avoid unintended infinite retry.

Use Cases

When user passed an empty object instead of nil.


Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.

@kevinqian-db kevinqian-db added the enhancement New feature or request label Mar 12, 2024
@zachaller
Copy link
Collaborator

zachaller commented Mar 13, 2024

Yea, we actually get hit by by this as well, would be open to a fix.

@zachaller
Copy link
Collaborator

zachaller commented Mar 13, 2024

Having a rollout object with args but no templates I think also recreates this?

      analysis:
        args:
          - name: stable-hash
            valueFrom:
              podTemplateHashValue: Stable
        startingStep: 1

@kevinqian-db
Copy link
Contributor Author

Having a rollout object with args but no templates I think also recreates this?

Yeah. My idea is to probably check that if the manifest does not provide any templates

@kevinqian-db
Copy link
Contributor Author

Create PR #3446, PTAL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants