Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

systemd service provider does not recognize services that are starting up as 'running' #9426

Open
bugfood opened this issue Jul 24, 2024 · 6 comments
Labels
accepted Valid issue that we intend to work on when we have the bandwidth bug Something isn't working

Comments

@bugfood
Copy link
Contributor

bugfood commented Jul 24, 2024

Describe the Bug

With systemd, a service that is currently running an ExecStartPre command will have a unit state of activating. Puppet does not recognize this as any different from when a service is stopped (with a state of inactive).

If puppet is configured to stop a service via ensure => 'stopped', a service in an activating state will remain untouched, potentially starting up at any time afterward, even though it was supposed to be stopped.

Expected Behavior

For ensure => 'stopped', puppet should ensure that the service is fully stopped, not in an intermediate state.

Steps to Reproduce

# This uses apache2 as an example. Other services _should_ work the same way for testing;
# modify commands as needed.
sudo mkdir /etc/systemd/system/apache2.service.d
# A long TimeoutStartSec and a long sleep allows us ample time to test.
echo -e '[Service]\nTimeoutStartSec=999999\nExecStartPre=/usr/bin/sleep 999999' | sudo tee /etc/systemd/system/apache2.service.d/tmp.conf
sudo systemctl daemon-reload
sudo systemctl restart apache2
# The above command will hang; move to another terminal (or background the command) and continue.

systemctl is-active apache2 ; echo $?
# Note the output of "activating" and exit status of 3.

echo "service { 'apache2': ensure => 'stopped' }" > service.pp
puppet apply service.pp --noop
# Note that puppet does not try to stop the service.

sudo systemctl stop apache2
# Note that systemd will cancel the startup.

# Clean up:
sudo rm /etc/systemd/system/apache2.service.d/tmp.conf
sudo rmdir /etc/systemd/system/apache2.service.d
sudo systemctl daemon-reload
sudo systemctl restart apache2

Environment

Any recent-ish puppet version should work, up to current git.
Any system with systemd should work, unless older versions of systemd behave differently.

Example 1:

  • Puppet 7.14.0 (packaged by puppetlabs)
  • AlmaLinux 8.7

Example 2:

  • Puppet 8.4.0 (packaged by Debian)
  • Debian Sid

Example 3:

  • Puppet git 82ad86e (run directly from git dir)
  • Debian Sid

Additional Context

I think the cause of this is the following.

The service provider calls statuscmd:

if @resource[:status] or statuscmd

The systemd service provider defines statuscmd:

[command(:systemctl), "is-active", '--', @resource[:name]]

The statuscmd runs e.g. systemctl is-active -- apache2; when a service is in activating state, the return value is non-zero. The service provider considers anything non-zero to be stopped.

The list of possible states appears to be defined here:
https://github.com/systemd/systemd/blob/11d5e2b5fbf9f6bfa5763fd45b56829ad4f0777f/src/basic/unit-def.c#L107

This could probably be fixed by defining a more comprehensive status function for the systemd provider.

There could be some tricky ramifications, though. For example:

  • A state of activating should probably be considered the same as "stopped" when a service is ensure => 'running'. If puppet attempts to start a service that is activating, this will be idempotent and return once the service has finished starting up (or fails).
  • When a service resource receives a refresh, a state of activating should be considered the same as "running". This is necessary in order for puppet to restart (or stop/start) a service, e.g. in order to fix whatever condition caused the service to get stuck in activating state.

Thanks,
Corey

@bugfood bugfood added the bug Something isn't working label Jul 24, 2024
@AriaXLi
Copy link
Contributor

AriaXLi commented Aug 6, 2024

Hi Corey, thanks for opening this issue. For whomever picks this up, here is a similar/duplicated Jira ticket for this issue: https://perforce.atlassian.net/browse/PUP-4993

@bugfood
Copy link
Contributor Author

bugfood commented Aug 7, 2024

Thanks. I don't know if it matters, but I don't seem to have access to that jira issue.

I have a login to atlassian and I can authenticate ok, but I get an error:

<my-email-address> doesn't have access to Jira on perforce.atlassian.net.

-Corey

@bugfood
Copy link
Contributor Author

bugfood commented Aug 7, 2024

I had missed a command in the steps to reproduce; fixed now.

-Corey

@joshcooper
Copy link
Contributor

@bugfood The link above is for internal folks. It's visible publicly read-only as https://puppet.atlassian.net/browse/PUP-4993

@joshcooper
Copy link
Contributor

I'm hoping this could be fixed by adding a third value to the ensure parameter for services and then modifying the various providers to report on the transient states. Part of the challenge is making sure all providers report on these other states and taking into account that systemd, windows, etc all have their own service states.

@joshcooper joshcooper added the accepted Valid issue that we intend to work on when we have the bandwidth label Aug 27, 2024
@bugfood
Copy link
Contributor Author

bugfood commented Aug 27, 2024

@bugfood The link above is for internal folks. It's visible publicly read-only as https://puppet.atlassian.net/browse/PUP-4993

Aha, thanks.

I agree with your comment in that ticket:

It's important that puppet wait for the service to reach the 'activated' state before moving onto the next resource so we can't assume 'activating' is the same as 'running'.

Another consideration is that if puppet considers an 'activating' service to be already 'running', then the startup of that service won't trigger any refreshes within puppet (e.g. for an exec that subscribes to the service). Whether this is good or bad may be debatable; my preference would be for the refreshes to indeed trigger, since I'm wanting puppet to do its best to take over the configuration of the client system from whatever state the client system was in previously.

For systemd, at least, I covered that in my proposal:

A state of activating should probably be considered the same as "stopped" when a service is ensure => 'running'. If puppet attempts to start a service that is activating, this will be idempotent and return once the service has finished starting up (or fails).

This would not exactly fix PUP-4993 as stated, but I think puppet would still do the best thing possible under the circumstances.

I don't know how/if that would apply to other service providers; I was only considering the systemd provider to be in-scope for this.

I'm hoping this could be fixed by adding a third value to the ensure parameter [..]

Do you mean this ensure?
https://www.puppet.com/docs/puppet/8/types/service.html#service-attribute-ensure

I'm not following what you mean--I think ensure is for the target end state, and it's up to the provider to manage the intermediate states. I understand it's tricky, though.

Thanks,
Corey

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Valid issue that we intend to work on when we have the bandwidth bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants