Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] DockerTests test080ConfigurePasswordThroughEnvironmentVariableFile failure #53662

Closed
dliappis opened this issue Mar 17, 2020 · 3 comments
Closed
Assignees
Labels
:Delivery/Packaging RPM and deb packaging, tar and zip archives, shell and batch scripts Team:Delivery Meta label for Delivery team >test-failure Triaged test failures from CI

Comments

@dliappis
Copy link
Contributor

Observed on 7.6 only once so far in:

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+7.6+multijob+packaging-tests-unix-gcp/os=fedora-29-packaging/186/console / https://gradle-enterprise.elastic.co/s/ucotb6hmvhqws

This seems to be a side-effect of #51316; in the corresponding PR #53437 the way to check whether Elasticsearch has really started has changed to the more rigorous: https://github.com/elastic/elasticsearch/pull/53437/files#diff-1dd1960812a4e4d22b8b3cc1d1f07ae0R185

Looking at the build logs I see that ps -ww ax has been invoked several times (as it should be based on this loop) and the status when the check failed shows:

1> [2020-03-17T09:19:40,759][INFO ][o.e.p.u.Shell            ] [TEST-DockerTests.test080ConfigurePasswordThroughEnvironmentVariableFile-seed#[1A6636C0632BB2E]-workerest_thread_info] Ran: [bash, -c, docker logs 65a279eaf1384b4259d69ab0c58e0ceb89f4a9bb98bb4bb2237f68603c3ce21d] exitCode = [0] stdout = [Created elasticsearch keystore in /usr/share/elasticsearch/config] stderr = [Setting ELASTIC_PASSWORD from ELASTIC_PASSWORD_FILE at /run/secrets/password.txt]
11:19:41   2> java.lang.AssertionError: Elasticsearch container did not start successfully.
11:19:41 
11:19:41     ps output:
11:19:41         PID TTY      STAT   TIME COMMAND
11:19:41           1 ?        Rs     0:00 /bin/bash /usr/share/elasticsearch/bin/elasticsearch
11:19:41         222 pts/0    Rs+    0:00 ps -ww ax
11:19:41 
11:19:41     Stdout:
11:19:41     Created elasticsearch keystore in /usr/share/elasticsearch/config
11:19:41 
11:19:41     Stderr:
11:19:41     Setting ELASTIC_PASSWORD from ELASTIC_PASSWORD_FILE at /run/secrets/password.txt

Since stderr already looks that it almost reach the bottom of the bin/elasticsearch-env-from-file:

echo "Setting $VAR_NAME from $VAR_NAME_FILE at ${!VAR_NAME_FILE}" >&2
export "$VAR_NAME"="$(cat ${!VAR_NAME_FILE})"
unset VAR_NAME
# Unset the suffixed environment variable
unset "$VAR_NAME_FILE"
fi
done

I think that we just exhausted the retries STARTUP_ATTEMPTS_MAX and had there been more retries, it would have succeeded.

This is further evidenced by the gap between the timestamps:

First ps -ww ax attempt:

1> [2020-03-17T09:19:32,110][INFO ][o.e.p.u.D.DockerShell ] [TEST-DockerTests.test080ConfigurePasswordThroughEnvironmentVariableFile-seed#[1A6636C0632BB2E]-workerest_thread_info] Ran: [bash, -c, docker exec --user elasticsearch:root --tty 65a279eaf1384b4259d69ab0c58e0ceb89f4a9bb98bb4bb2237f68603c3ce21d ps -ww ax] exitCode = [0] stdout = [PID TTY STAT TIME COMMAND

Last ps -ww ax attempt:

1> [2020-03-17T09:19:40,634][INFO ][o.e.p.u.D.DockerShell ] [TEST-DockerTests.test080ConfigurePasswordThroughEnvironmentVariableFile-seed#[1A6636C0632BB2E]-workerest_thread_info] Ran: [bash, -c, docker exec --user elasticsearch:root --tty 65a279eaf1384b4259d69ab0c58e0ceb89f4a9bb98bb4bb2237f68603c3ce21d ps -ww ax] exitCode = [0] stdout = [PID TTY STAT TIME COMMAND

I see that the retries have been bumped to 10 in master whereas in 7.6 we are still having 5 retries so I think we should do the same for 7.6.

@dliappis dliappis added :Delivery/Packaging RPM and deb packaging, tar and zip archives, shell and batch scripts >test-failure Triaged test failures from CI labels Mar 17, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (:Core/Infra/Packaging)

@dliappis
Copy link
Contributor Author

cc @williamrandolph since you worked on #53437, I believe backporting the increased retries from master to 7.6 should help.

@williamrandolph williamrandolph self-assigned this Mar 17, 2020
@williamrandolph
Copy link
Contributor

@dliappis Thanks for investigating — I'll open a backport PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Delivery/Packaging RPM and deb packaging, tar and zip archives, shell and batch scripts Team:Delivery Meta label for Delivery team >test-failure Triaged test failures from CI
Projects
None yet
Development

No branches or pull requests

4 participants