-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sync prometheus ext tests running in parallel #17717
sync prometheus ext tests running in parallel #17717
Conversation
/lgtm |
/test cmd |
conformance install failure flake #17605 /test extended_conformance_install |
func bringUpPrometheusFromTemplate(oc *exutil.CLI) (ns, host, bearerToken string, statsPort int) { | ||
ns = oc.KubeFramework().Namespace.Name | ||
host = "prometheus.kube-system.svc" | ||
statsPort = 443 | ||
mustCreate := false | ||
mux.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't work, other jobs are run in separate processes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ... gotcha.
So seems like we need to interpret the error, and if it is "already exists", do not abort the test case.
I'll circle back and give that a go (unless some other direction is supplied).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll see if I can remove the lgtm label
/hold
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
appears I can :-)
100628c
to
16d25f8
Compare
adjustment pushed ... fyi, totally removed the mutex (seemed unnecessary given the multi-process basis) |
o.Expect(err).NotTo(o.HaveOccurred()) | ||
// still check if it exists, as the prior not found check above may have been in a race with a test | ||
// running in another process | ||
if !kapierrs.IsAlreadyExists(err) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're not making an api call here, you're effectively shelling out to invoke "oc create -f", so this error check is not going to work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you may have to literally scan the output from the command. Or just log the error w/o failing the test (so if the test ultimately fails, we can at least go back and discover the error that occurred)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll go with the latter ... the existing use of IsNotFound
presumably also falls into this bucket
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existing call is not using the oc client:
if _, err := oc.AdminKubeClient().Apps().StatefulSets("kube-system").Get("prometheus", metav1.GetOptions{}); err != nil {
if !kapierrs.IsNotFound(err) {
it's using the KubeClient (aka the API client)
so it's fine.
You could consider doing the same thing here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been exploring the various client entryways in cli.go but am not finding a verb that allows me to "process" the template
I'll look a bit more but unless you have a precise pointer, thinking of falling back to parsing output.
And thanks for the clarification around the not found path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't have to change the process logic, just the create logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but honestly this should fail so rarely i'd be fine w/ you just logging the error so if the test fails further down, we have the error on record.
the error in https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/17717/test_pull_request_origin_extended_conformance_install/3954/ |
16d25f8
to
0269f40
Compare
update pushed @bparees |
/lgtm |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest |
/test extended_conformance_install |
0269f40
to
9b9f68d
Compare
rebase pushed ... @bparees please re-review / re-post the good to merge comment at your convenience |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bparees, gabemontero The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
Opened #17811 for the cmd failure. |
conformance gce was my flake #17694 hopefully the debug data PR merged in time for it to capture data for this one. UPDATE: unfortunately the timing was not in our favor. The debug PR merge must have happened after this job started ... no intermediate |
conformance install was pre-test, env failures in origin prereqs and install origin |
/retest |
conformance install failure was flake #17556 |
/retest |
opened and taking flake #17836 |
/test extended_conformance_gce |
Automatic merge from submit-queue (batch tested with PRs 16281, 17293, 17717, 17753, 17830). |
Fixes #17693
@bparees @smarterclayton @mfojtik fyi