Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Systemd startup timeouts after upgrade to 7.4 #49593

Closed
PhaedrusTheGreek opened this issue Nov 26, 2019 · 6 comments · Fixed by #49784
Closed

Systemd startup timeouts after upgrade to 7.4 #49593

PhaedrusTheGreek opened this issue Nov 26, 2019 · 6 comments · Fixed by #49784
Assignees
Labels
:Delivery/Packaging RPM and deb packaging, tar and zip archives, shell and batch scripts Team:Delivery Meta label for Delivery team

Comments

@PhaedrusTheGreek
Copy link
Contributor

It has been observed/reported in a few separate environments where after an upgrade from Elasticsearch 7.3 to 7.4, systemd will kill the elasticsearch process before it finishes starting up.

The end of the log looks like this:

[2019-11-25T07:16:46,095][DEBUG][o.e.a.ActionModule ] [node1] Using REST wrapper from plugin org.elasticsearch.xpack.security.Security
[2019-11-25T07:17:22,306][INFO ][o.e.x.m.p.NativeController] [node1] Native controller process has stopped - no new native processes can be started

Trace inspection of the log reveals that the ES node is busy upgrading.

Systemd requires that processes signal after successful startup with sd_notify(READY) before TimeoutStartSec, which in ES doesn't happen until metadata upgrades are complete. It seems that we need to update SystemdPlugin to support EXTEND_TIMEOUT_USEC in the event of any expected startup delays such as upgrade.

@jasontedor jasontedor self-assigned this Nov 26, 2019
@PhaedrusTheGreek
Copy link
Contributor Author

Also curious here is why this hasn't been observed prior to 7.4. Has something changed in this area?

@ywelsch ywelsch added the :Delivery/Packaging RPM and deb packaging, tar and zip archives, shell and batch scripts label Dec 2, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (:Core/Infra/Packaging)

@jasontedor
Copy link
Member

jasontedor commented Dec 2, 2019

Also curious here is why this hasn't been observed prior to 7.4. Has something changed in this area?

Yes, the sd_notify functionality was not used until 7.4.0: #44673

@ppf2
Copy link
Member

ppf2 commented Sep 15, 2021

(Please ignore previous notification. I was reading 7.4 as 7.14 😄 ). This was added back in 7.4. Though some users may not hit this until later on nodes that are slow to start :)

@jasontedor
Copy link
Member

Since 7.5.1/7.6 (#49784), nodes that are slow to start should not present an issue.

@ppf2
Copy link
Member

ppf2 commented Sep 15, 2021

Thanks Jason! It looks like we have user run into this on 7.14. I will collect the details and file an issue accordingly for investigation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Delivery/Packaging RPM and deb packaging, tar and zip archives, shell and batch scripts Team:Delivery Meta label for Delivery team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants