Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync prometheus ext tests running in parallel #17717

Merged

Conversation

gabemontero
Copy link
Contributor

@openshift-ci-robot openshift-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Dec 11, 2017
@bparees
Copy link
Contributor

bparees commented Dec 11, 2017

/lgtm
thanks @gabemontero

@openshift-ci-robot openshift-ci-robot added lgtm Indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Dec 11, 2017
@gabemontero
Copy link
Contributor Author

/test cmd

@gabemontero
Copy link
Contributor Author

conformance install failure flake #17605

/test extended_conformance_install

func bringUpPrometheusFromTemplate(oc *exutil.CLI) (ns, host, bearerToken string, statsPort int) {
ns = oc.KubeFramework().Namespace.Name
host = "prometheus.kube-system.svc"
statsPort = 443
mustCreate := false
mux.Lock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't work, other jobs are run in separate processes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ... gotcha.

So seems like we need to interpret the error, and if it is "already exists", do not abort the test case.

I'll circle back and give that a go (unless some other direction is supplied).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll see if I can remove the lgtm label

/hold

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

appears I can :-)

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 11, 2017
@openshift-ci-robot openshift-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Dec 12, 2017
@openshift-merge-robot openshift-merge-robot removed the lgtm Indicates that a PR is ready to be merged. label Dec 12, 2017
@gabemontero
Copy link
Contributor Author

adjustment pushed ... fyi, totally removed the mutex (seemed unnecessary given the multi-process basis)

@gabemontero gabemontero removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 12, 2017
o.Expect(err).NotTo(o.HaveOccurred())
// still check if it exists, as the prior not found check above may have been in a race with a test
// running in another process
if !kapierrs.IsAlreadyExists(err) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're not making an api call here, you're effectively shelling out to invoke "oc create -f", so this error check is not going to work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you may have to literally scan the output from the command. Or just log the error w/o failing the test (so if the test ultimately fails, we can at least go back and discover the error that occurred)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll go with the latter ... the existing use of IsNotFound presumably also falls into this bucket

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing call is not using the oc client:

if _, err := oc.AdminKubeClient().Apps().StatefulSets("kube-system").Get("prometheus", metav1.GetOptions{}); err != nil {
if !kapierrs.IsNotFound(err) {

it's using the KubeClient (aka the API client)

so it's fine.

You could consider doing the same thing here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been exploring the various client entryways in cli.go but am not finding a verb that allows me to "process" the template

I'll look a bit more but unless you have a precise pointer, thinking of falling back to parsing output.

And thanks for the clarification around the not found path.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't have to change the process logic, just the create logic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but honestly this should fail so rarely i'd be fine w/ you just logging the error so if the test fails further down, we have the error on record.

@gabemontero
Copy link
Contributor Author

the error in https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/17717/test_pull_request_origin_extended_conformance_install/3954/
appears to be a problem starting builds because of the space / devmapper issue @coreydaley is tracking

@gabemontero
Copy link
Contributor Author

update pushed @bparees

@bparees
Copy link
Contributor

bparees commented Dec 12, 2017

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Dec 12, 2017
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@gabemontero
Copy link
Contributor Author

/retest

@gabemontero
Copy link
Contributor Author

/test extended_conformance_install

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 14, 2017
@openshift-merge-robot openshift-merge-robot removed the lgtm Indicates that a PR is ready to be merged. label Dec 14, 2017
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 14, 2017
@gabemontero
Copy link
Contributor Author

rebase pushed ... @bparees please re-review / re-post the good to merge comment at your convenience

@bparees
Copy link
Contributor

bparees commented Dec 14, 2017

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Dec 14, 2017
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bparees, gabemontero

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@gabemontero
Copy link
Contributor Author

Opened #17811 for the cmd failure.

@gabemontero
Copy link
Contributor Author

gabemontero commented Dec 14, 2017

conformance gce was my flake #17694

hopefully the debug data PR merged in time for it to capture data for this one.

UPDATE: unfortunately the timing was not in our favor. The debug PR merge must have happened after this job started ... no intermediate "query %s for tests %#v had results %s" and no dump of the prometheus pod.

@gabemontero
Copy link
Contributor Author

conformance install was pre-test, env failures in origin prereqs and install origin

@gabemontero
Copy link
Contributor Author

/retest

@gabemontero
Copy link
Contributor Author

conformance install failure was flake #17556

@gabemontero
Copy link
Contributor Author

/retest

@gabemontero
Copy link
Contributor Author

opened and taking flake #17836

@gabemontero
Copy link
Contributor Author

/test extended_conformance_gce

@openshift-merge-robot
Copy link
Contributor

Automatic merge from submit-queue (batch tested with PRs 16281, 17293, 17717, 17753, 17830).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants