Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase journald rate limiter when running test-end-to-end-docker.sh #13669

Merged
merged 4 commits into from
Jun 7, 2017

Conversation

soltysh
Copy link
Contributor

@soltysh soltysh commented Apr 7, 2017

Fixes #12558 by increasing rate limiter values while running test-end-to-end-docker.sh. Additionally, this re-enables the tests that were disabled due to that error.

@stevekuznetsov another try...
@mfojtik fyi

@soltysh
Copy link
Contributor Author

soltysh commented Apr 11, 2017

I've removed that offending deprecation warning using the recommended/new approach for oc exec.

@soltysh
Copy link
Contributor Author

soltysh commented Apr 13, 2017

Flake #13757

@soltysh soltysh changed the title Increase journald rate limiter when running test-end-to-end-docker.sh [DO_NOT_MERGE] Increase journald rate limiter when running test-end-to-end-docker.sh Apr 14, 2017
@soltysh
Copy link
Contributor Author

soltysh commented Apr 14, 2017

@ncdc I'm baffled by this error. The case is when I increase RateLimitInterval and RateLimitBurst in journald.conf and get to oc rsh dc/docker-registry cat config.yml the end-to-end test will hang waiting for the connection. Looking closer at the instance I've noticed that generally docker will hang after some time. Usually, only restart will help. Any ideas/thoughts what might be causing this? Pointers where should I look for it? I can't reproduce this manually when running oc cluster up and trying oc rsh.

@ncdc
Copy link
Contributor

ncdc commented Apr 14, 2017 via email

@soltysh
Copy link
Contributor Author

soltysh commented Apr 14, 2017

Sure, sure, no rush :)

@ncdc
Copy link
Contributor

ncdc commented Apr 17, 2017

I wonder if we're starting to run into kubernetes/kubernetes#43922

@soltysh
Copy link
Contributor Author

soltysh commented Apr 18, 2017

I'll try to spin up a new env with that fix in, I'm hitting this quite consistently with this PR.

@ncdc
Copy link
Contributor

ncdc commented Apr 18, 2017 via email

@soltysh
Copy link
Contributor Author

soltysh commented Apr 19, 2017

Nope, that PR didn't help with the problem 😢 My last resort is the current rebase, I guess.

@ncdc
Copy link
Contributor

ncdc commented Apr 19, 2017 via email

@soltysh
Copy link
Contributor Author

soltysh commented Apr 19, 2017

Nope, the rebase does not help here, either. Sigh...

@smarterclayton do you have any ideas why oc rsh dc/docker-registry cat config.yml in end-to-end test will hang after increasing RateLimitInterval and RateLimitBurst in journald.conf? I've tried manually changing those values and running oc rsh and it worked perfectly, but for some reason it will fail when part of end-to-end.

@ncdc
Copy link
Contributor

ncdc commented Apr 19, 2017 via email

@smarterclayton
Copy link
Contributor

smarterclayton commented Apr 19, 2017 via email

@soltysh
Copy link
Contributor Author

soltysh commented Apr 20, 2017

The symptoms are as follows, the end-to-end will hang (not reacting to any sort of input), openshift seems to be working correctly. After some time docker daemon will hang (not reacting to any docker commands), and only restarting the daemon helps with it. I've collected some stack dumps and I'll be going through them today, with hope to find something useful.

@ncdc
Copy link
Contributor

ncdc commented Apr 26, 2017

If you can reliably reproduce, make sure you talk to @jwhonce @nalind

@openshift-bot openshift-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 30, 2017
@openshift-bot openshift-bot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels May 19, 2017
@openshift-bot openshift-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 1, 2017
@smarterclayton
Copy link
Contributor

Getting stable tests is p0.

@soltysh soltysh force-pushed the issue12558 branch 2 times, most recently from b816dd0 to b180270 Compare June 6, 2017 15:35
@soltysh soltysh changed the title [DO_NOT_MERGE] Increase journald rate limiter when running test-end-to-end-docker.sh Increase journald rate limiter when running test-end-to-end-docker.sh Jun 7, 2017
@soltysh
Copy link
Contributor Author

soltysh commented Jun 7, 2017

It looks like the problem previously was that docker does not react well to restarting journald, so I've added restarting docker after restarting journald, as well. Seems to be working, let's give it a try.
/me crosses fingers
[test]

@soltysh
Copy link
Contributor Author

soltysh commented Jun 7, 2017

Fixed package typo in UPSTEAM cherry-pick commit. Waiting for green tests.

@soltysh
Copy link
Contributor Author

soltysh commented Jun 7, 2017

Flake #14434.
[test]

${USE_SUDO:+sudo} systemctl restart systemd-journald.service
# Docker has "some" problems when journald is restarted, so we need to
# restart docker, as well.
${USE_SUDO:+sudo} systemctl restart docker.service
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fairly invasive. But I don't have a better place for it.

Also, please ensure that you're actually on a system with systemctl, i.e. macs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fixed, I've added os::util::ensure::system_binary_exists 'systemctl' checks around these calls.

@smarterclayton
Copy link
Contributor

Removing LGTM, this breaks macs.

@soltysh
Copy link
Contributor Author

soltysh commented Jun 7, 2017

Flke #10773.

@openshift-bot
Copy link
Contributor

Evaluated for origin test up to b34ba02

@smarterclayton
Copy link
Contributor

LGTM

@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pull_request_origin/2003/) (Base Commit: 6473593)

@smarterclayton
Copy link
Contributor

Going to the head of the line

@smarterclayton
Copy link
Contributor

Does not conflict with the PR being merged right now. Moving to the head of the line to fix flakes.

@smarterclayton smarterclayton merged commit 3c221d5 into openshift:master Jun 7, 2017
@stevekuznetsov
Copy link
Contributor

Nice

@soltysh
Copy link
Contributor Author

soltysh commented Jun 8, 2017

img

@soltysh soltysh deleted the issue12558 branch June 8, 2017 05:10
@mfojtik
Copy link
Contributor

mfojtik commented Jun 8, 2017

\o/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants