Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flake: jenkins-plugin test imagestream SCM DSL #17487

Closed
jim-minter opened this issue Nov 27, 2017 · 9 comments
Closed

flake: jenkins-plugin test imagestream SCM DSL #17487

jim-minter opened this issue Nov 27, 2017 · 9 comments
Assignees
Labels
area/tests kind/test-flake Categorizes issue or PR as related to test flakes. priority/P1
Milestone

Comments

@jim-minter
Copy link
Contributor

jim-minter commented Nov 27, 2017

Expected error:
    <*errors.errorString | 0xc420828030>: {
        s: "timed out while waiting of an image stream tag extended-test-jenkins-plugin-6r9td-fkh6n/localjenkins:develop",
    }
    timed out while waiting of an image stream tag extended-test-jenkins-plugin-6r9td-fkh6n/localjenkins:develop
not to have occurred
/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/test/extended/image_ecosystem/jenkins_plugin.go:659
@jim-minter jim-minter added area/tests kind/test-flake Categorizes issue or PR as related to test flakes. priority/P1 labels Nov 27, 2017
@jim-minter
Copy link
Contributor Author

@bparees: looks like the deployment controller redeployed the Jenkins master pod unexpectedly mid-test.
@mfojtik would you be able to take a look?

05:18:52.691687 deployer-controller deletes jenkins-1-deploy pod (I think this is expected)

I1123 05:20:21.506190 19435 replication_controller.go:218] Replication controller jenkins-1 updated. Desired pod count change: 1->0
E1123 05:20:21.509291 19435 deployer_controller.go:221] Failing rollout for "extended-test-jenkins-plugin-6r9td-fkh6n-jenkins/jenkins-1" because its deployer pod "jenkins-1-deploy" disappeared (unexpected)

05:20:21.591491 extended-test-jenkins-plugin-6r9td-fkh6n-jenkins/jenkins-1-7dsch deleted

05:20:21.912792 extended-test-jenkins-plugin-6r9td-fkh6n-jenkins/jenkins-1-td8kx created

@bparees
Copy link
Contributor

bparees commented Nov 28, 2017

@mfojtik unexpected/undesired deployment changes have been an issue the devex tests have smoked out before and turned out to be significant bugs. Can you please make sure someone triages this with high urgency so we don't end up in the same situation again?

@mfojtik
Copy link
Contributor

mfojtik commented Nov 28, 2017

@tnozicka ^ we need to dig why deployer pod disappeared, might be some GC regression?

@tnozicka
Copy link
Contributor

@mfojtik deployer controller deleted the pod and for that it had to reach state Complete (v1.PodSucceeded) to trigger the error message after the pod would have to transition back to phase pending. Normally I would say impossible but it isn't long we had this #17011

1088114:Nov 23 05:18:54 ip-172-18-11-102.ec2.internal atomic-openshift-master-api[19400]: I1123 05:18:52.691687   19400 wrap.go:42] DELETE /api/v1/namespaces/extended-test-jenkins-plugin-6r9td-fkh6n-jenkins/pods/jenkins-1-deploy: (13.601395ms) 200 [[openshift/v1.8.1+0d5291c (linux/amd64) kubernetes/0d5291c/system:serviceaccount:openshift-infra:deployer-controller] 172.18.11.102

@jim-minter
Copy link
Contributor Author

@tnozicka I still think there's problems here. See:

https://ci.openshift.redhat.com/jenkins/job/test_branch_origin_extended_image_ecosystem/303/testReport/junit/(root)/Extended/_image_ecosystem__jenkins__Slow__openshift_pipeline_plugin_jenkins_plugin_test_context___jenkins_plugin_test_create_obj_delete_obj__Suite_openshift__2/

05:05:00.227779 19467 wrap.go:42] DELETE /api/v1/namespaces/extended-test-jenkins-plugin-5zxhf-sxgv4-jenkins/pods/jenkins-1-hrklq: (80.512948ms) 200 [[openshift/v1.8.1+0d5291c (linux/amd64) kubernetes/0d5291c/system:serviceaccount:kube-system:replication-controller] 172.18.10.111:48438]

Why did the RC suddenly delete jenkins-1-hrklq?
Why did the DC deploy jenkins-1-jzc85?
Are the repeated DELETE calls by the DC controller against jenkins-1-deploy expected?

04:58:58.403270   19467 wrap.go:42] DELETE /api/v1/namespaces/extended-test-jenkins-plugin-5zxhf-sxgv4-jenkins/pods/jenkins-1-deploy: (130.818547ms) 200 [[openshift/v1.8.1+0d5291c (linux/amd64) kubernetes/0d5291c/system:serviceaccount:openshift-infra:deployer-controller] 172.18.10.111:48438]
05:05:01.180682   19467 wrap.go:42] DELETE /api/v1/namespaces/extended-test-jenkins-plugin-5zxhf-sxgv4-jenkins/pods/jenkins-1-deploy: (138.272181ms) 200 [[openshift/v1.8.1+0d5291c (linux/amd64) kubernetes/0d5291c/system:serviceaccount:openshift-infra:deployer-controller] 172.18.10.111:48438]
05:05:01.307466   19467 wrap.go:42] DELETE /api/v1/namespaces/extended-test-jenkins-plugin-5zxhf-sxgv4-jenkins/pods/jenkins-1-deploy: (27.342066ms) 200 [[openshift/v1.8.1+0d5291c (linux/amd64) kubernetes/0d5291c/system:serviceaccount:openshift-infra:deployer-controller] 172.18.10.111:48438]
05:05:01.438160   19467 wrap.go:42] DELETE /api/v1/namespaces/extended-test-jenkins-plugin-5zxhf-sxgv4-jenkins/pods/jenkins-1-deploy: (43.811552ms) 200 [[openshift/v1.8.1+0d5291c (linux/amd64) kubernetes/0d5291c/system:serviceaccount:openshift-infra:deployer-controller] 172.18.10.111:48438]
05:05:01.553263   19467 wrap.go:42] DELETE /api/v1/namespaces/extended-test-jenkins-plugin-5zxhf-sxgv4-jenkins/pods/jenkins-1-deploy: (42.218211ms) 200 [[openshift/v1.8.1+0d5291c (linux/amd64) kubernetes/0d5291c/system:serviceaccount:openshift-infra:deployer-controller] 172.18.10.111:48438]
05:05:01.714058   19467 wrap.go:42] DELETE /api/v1/namespaces/extended-test-jenkins-plugin-5zxhf-sxgv4-jenkins/pods/jenkins-1-deploy: (104.786782ms) 200 [[openshift/v1.8.1+0d5291c (linux/amd64) kubernetes/0d5291c/system:serviceaccount:openshift-infra:deployer-controller] 172.18.10.111:48438]
05:05:01.941160   19467 wrap.go:42] DELETE /api/v1/namespaces/extended-test-jenkins-plugin-5zxhf-sxgv4-jenkins/pods/jenkins-1-deploy: (51.951351ms) 200 [[openshift/v1.8.1+0d5291c (linux/amd64) kubernetes/0d5291c/system:serviceaccount:openshift-infra:deployer-controller] 172.18.10.111:48438]
05:05:02.913068   19467 wrap.go:42] DELETE /api/v1/namespaces/extended-test-jenkins-plugin-5zxhf-sxgv4-jenkins/pods/jenkins-1-deploy: (4.275087ms) 200 [[openshift/v1.8.1+0d5291c (linux/amd64) kubernetes/0d5291c/system:serviceaccount:openshift-infra:deployer-controller] 172.18.10.111:48438]
05:05:14.010227   19467 wrap.go:42] DELETE /api/v1/namespaces/extended-test-jenkins-plugin-5zxhf-sxgv4-jenkins/pods/jenkins-1-deploy: (225.396022ms) 200 [[openshift/v1.8.1+0d5291c (linux/amd64) kubernetes/0d5291c] 172.18.10.111:49976]
05:05:14.041383   19467 wrap.go:42] DELETE /api/v1/namespaces/extended-test-jenkins-plugin-5zxhf-sxgv4-jenkins/pods/jenkins-1-deploy: (156.975435ms) 404 [[openshift/v1.8.1+0d5291c (linux/amd64) kubernetes/0d5291c/system:serviceaccount:openshift-infra:deployer-controller] 172.18.10.111:48438]

@tnozicka
Copy link
Contributor

tnozicka commented Dec 5, 2017

kubelet still seems to have issues #17595

pod.status.phase can make invalid transitions; if DC sees failed deployer pod it will delete it

@liggitt liggitt added this to the 3.9.0 milestone Dec 13, 2017
@tnozicka
Copy link
Contributor

logs are already gone, but this has been likely caused by the kubelet pod state transition issues we track in another issue. Feel free to reopen if you see it again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/tests kind/test-flake Categorizes issue or PR as related to test flakes. priority/P1
Projects
None yet
Development

No branches or pull requests

6 participants