Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky Test: Trial status is succeeded and metrics are properly populated #2269

Closed
tenzen-y opened this issue Mar 2, 2024 · 9 comments
Closed
Labels

Comments

@tenzen-y
Copy link
Member

tenzen-y commented Mar 2, 2024

/kind bug

What steps did you take and what happened:
[A clear and concise description of what the bug is.]
Flaky Test: "Expect that Trial status is succeeded and metrics are properly populated Metrics available because GetTrialObservationLog returns values":

// Expect that Trial status is succeeded and metrics are properly populated
// Metrics available because GetTrialObservationLog returns values
g.Eventually(func() bool {
if err = c.Get(ctx, trialKey, trial); err != nil {
return false
}
return trial.IsSucceeded() &&
len(trial.Status.Observation.Metrics) > 0 &&
trial.Status.Observation.Metrics[0].Min == "0.11" &&
trial.Status.Observation.Metrics[0].Max == "0.99" &&
trial.Status.Observation.Metrics[0].Latest == "0.11"
}, timeout).Should(gomega.BeTrue())

--- FAIL: TestReconcileBatchJob (82.19s)
    trial_controller_test.go:274: 
        Timed out after 80.001s.
        Expected
            <bool>: false
        to be true
FAIL
	github.com/kubeflow/katib/pkg/controller.v1beta1/trial	coverage: 83.6% of statements

https://github.com/kubeflow/katib/actions/runs/8125174959/job/22207477654?pr=2267#step:4:106

What did you expect to happen:
No errors occur.

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

  • Katib version (check the Katib controller image version):
  • Kubernetes version: (kubectl version):
  • OS (uname -a):

Impacted by this bug? Give it a 👍 We prioritize the issues with the most 👍

@PeterWrighten
Copy link

May I work for this bug?
/assign

@PeterWrighten
Copy link

I consider it should work by just modifying timeout threshold from 80 to 100.

@tenzen-y
Copy link
Member Author

I consider it should work by just modifying timeout threshold from 80 to 100.

I don't think so. Despite we applied a similar approach, this issue still remains.

@PeterWrighten
Copy link

I consider it should work by just modifying timeout threshold from 80 to 100.

I don't think so. Despite we applied a similar approach, this issue still remains.

Seem that it's an interesting issue. I would do some surveys and try working on it.

Copy link

github-actions bot commented Jun 9, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@tenzen-y
Copy link
Member Author

/remove-lifecycle stale

@tenzen-y
Copy link
Member Author

@PeterWrighten Are you still working? If not, could you unassign yourself from this issue?

@tenzen-y
Copy link
Member Author

This should be fixed by #2350
/close

Copy link

@tenzen-y: Closing this issue.

In response to this:

This should be fixed by #2350
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants