Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SYSTEMD: replace 'sssd_check_socket_activated_responders' #7607

Closed

Conversation

alexey-tikhonov
Copy link
Member

@alexey-tikhonov alexey-tikhonov commented Sep 18, 2024

with a schell script.

All sockets already have

After=sssd.service
BindsTo=sssd.service

-- this ensures SSSD was started and running before socket
activation.

New 'ExecStartPre' condition checks if a responder with the
same name is running and, if so, if it runs in the same
mnt namespace. The latter is to ignore processes run in
a container on the same host.

Resolves: #4333
Resolves: #5013

@alexey-tikhonov alexey-tikhonov added the no-backport This should go to target branch only. label Sep 18, 2024
@alexey-tikhonov alexey-tikhonov force-pushed the check-socket-activated branch 2 times, most recently from 024c9b8 to 679d308 Compare September 19, 2024 12:43
@alexey-tikhonov
Copy link
Member Author

# cat /etc/sssd/sssd.conf | grep service
services = pam, nss

# ps aux | grep sssd
sssd        3405  0.0  0.4 303712 19192 ?        Ss   14:19   0:00 /usr/sbin/sssd -i --logger=files
sssd        3411  0.0  0.5 305512 21056 ?        S    14:19   0:00 /usr/libexec/sssd/sssd_be --domain files --logger=files
sssd        3413  0.0  0.5 304452 21120 ?        S    14:19   0:00 /usr/libexec/sssd/sssd_pam --logger=files
sssd        3414  0.1  1.3 337976 55292 ?        S    14:19   0:00 /usr/libexec/sssd/sssd_nss --logger=files


# systemctl enable --now sssd-nss.socket 
Job failed. See "journalctl -xe" for details.

# systemctl status sssd-nss.socket 
× sssd-nss.socket - SSSD NSS Service responder socket
    ...
    Process: 5184 ExecStartPre=/bin/sh -c ! pidof -q sssd_nss (code=exited, status=1/FAILURE)


# systemctl enable --now sssd-sudo.socket

# systemctl status sssd-sudo.socket 
● sssd-sudo.socket - SSSD Sudo Service responder socket
     Loaded: loaded (/usr/lib/systemd/system/sssd-sudo.socket; enabled; preset: disabled)
     Active: active (listening) since Thu 2024-09-19 14:26:00 CEST; 2min 41s ago

@alexey-tikhonov
Copy link
Member Author

@sumit-bose noted, that this approach might fail in case multiple SSSD instances are running on the same host (for example, in containers).

@alexey-tikhonov
Copy link
Member Author

@sumit-bose noted, that this approach might fail in case multiple SSSD instances are running on the same host (for example, in containers).

The solution could be to compare /proc/self/ns/mnt vs /proc/$pidof(sssd_...)/ns/mnt -- if they differ then found process is in different mount namespace (most probably in a container) and hence should be ignored.

I'm yet to come up with not too-overly-ugly command line to convey this.

@alexey-tikhonov
Copy link
Member Author

I'm yet to come up with not too-overly-ugly command line to convey this.

It's not too trivial...

! (pidof -q sssd_nss && 
     pidof sssd_nss | xargs -n 1 ps -o mntns -p | grep -q `lsns -n -t mnt -o NS -p \\$\\$`)

First part stays the same - pidof -q sssd_nss - it merely checks if a process with this name is running.

Second part is to check if any of found PIDs have the same mount namespace id as a current:

  • pidof sssd_nss | xargs -n 1 ps -o mntns -p prints mnt ns IDs of every found pid:
$ pidof bash | xargs -n 1 ps -o mntns -p
     MNTNS
4026531841
     MNTNS
4026531841
     MNTNS
4026531841
     MNTNS
4026531841
  • lsns -n -t mnt -o NS -p \\$\\$ get mnt ns ID of current process:
# lsns -n -t mnt -o NS -p $$
4026531841

Simplification suggestions are welcome...

@alexey-tikhonov
Copy link
Member Author

# ps aux | grep sssd
sssd        3405  0.0  0.4 303712 19192 ?        Ss   Sep19   0:00 /usr/sbin/sssd -i --logger=files
sssd        3411  0.0  0.5 305720 21268 ?        S    Sep19   0:01 /usr/libexec/sssd/sssd_be --domain files --logger=files
sssd        3413  0.0  0.5 304452 21120 ?        S    Sep19   0:00 /usr/libexec/sssd/sssd_pam --logger=files
sssd        3414  0.0  1.3 338108 55460 ?        S    Sep19   0:00 /usr/libexec/sssd/sssd_nss --logger=files


# systemctl start sssd-nss.socket 
Job failed. See "journalctl -xe" for details.

systemctl status sssd-nss.socket 
× sssd-nss.socket - SSSD NSS Service responder socket
     Loaded: loaded (/usr/lib/systemd/system/sssd-nss.socket; enabled; preset: disabled)
     Active: failed (Result: exit-code) since Fri 2024-09-20 20:57:47 CEST; 9s ago
...
    Process: 16496 ExecStartPre=/bin/sh -c ! (pidof -q sssd_nss && pidof sssd_nss | xargs -n 1 ps -o mntns -p | grep -q `lsns -n -t mnt -o NS -p \$\$`) (code=exited, status=1/FAILURE)


# systemctl start sssd-sudo.socket

# systemctl status sssd-sudo.socket 
● sssd-sudo.socket - SSSD Sudo Service responder socket
     Loaded: loaded (/usr/lib/systemd/system/sssd-sudo.socket; enabled; preset: disabled)
     Active: active (listening) since Fri 2024-09-20 20:58:19 CEST; 5s ago
...
    Process: 16531 ExecStartPre=/bin/sh -c ! (pidof -q sssd_sudo && pidof sssd_sudo | xargs -n 1 ps -o mntns -p | grep -q `lsns -n -t mnt -o NS -p \$\$`) (code=exited, status=0/SUCCESS)

@alexey-tikhonov
Copy link
Member Author

And to take the look under the hood, one can add strace to ExecStartPre and see what happens:

# grep execve strace.log.155*
strace.log.15575:execve("/bin/sh", ["/bin/sh", "-c", "! (pidof -q sssd_nss && pidof sssd_nss | xargs -n 1 ps -o mntns -p | grep -q `lsns -n -t mnt -o NS -p \\$\\$`)"], 0x7ffe75fc22b8 /* 9 vars */) = 0
strace.log.15577:execve("/usr/sbin/pidof", ["pidof", "-q", "sssd_nss"], 0x562bae16b620 /* 12 vars */) = 0
strace.log.15578:execve("/usr/sbin/pidof", ["pidof", "sssd_nss"], 0x562bae16b620 /* 12 vars */) = 0
strace.log.15579:execve("/usr/bin/xargs", ["xargs", "-n", "1", "ps", "-o", "mntns", "-p"], 0x562bae16b620 /* 12 vars */) = 0
strace.log.15580:execve("/usr/bin/grep", ["grep", "-q", "4026531841"], 0x562bae16b620 /* 12 vars */) = 0
strace.log.15581:execve("/usr/bin/lsns", ["lsns", "-n", "-t", "mnt", "-o", "NS", "-p", "15575"], 0x562bae16b620 /* 12 vars */) = 0
strace.log.15582:execve("/usr/local/sbin/ps", ["ps", "-o", "mntns", "-p", "3414"], 0x7ffc70384ca8 /* 12 vars */) = -1 ENOENT (No such file or directory)
strace.log.15582:execve("/usr/local/bin/ps", ["ps", "-o", "mntns", "-p", "3414"], 0x7ffc70384ca8 /* 12 vars */) = -1 ENOENT (No such file or directory)
strace.log.15582:execve("/usr/sbin/ps", ["ps", "-o", "mntns", "-p", "3414"], 0x7ffc70384ca8 /* 12 vars */) = -1 ENOENT (No such file or directory)
strace.log.15582:execve("/usr/bin/ps", ["ps", "-o", "mntns", "-p", "3414"], 0x7ffc70384ca8 /* 12 vars */) = 0

Tricky part is escaping of \\$\\$ in lsns -n -t mnt -o NS -p \\$\\$

This is because of systemd preprocessing of a string even within " "
Honestly, I do not understand how does it happen that it is escaped in a very first execve() but correctly expanded when lsns is executed.
I got this working via trial and error...

@alexey-tikhonov
Copy link
Member Author

Well, -n 1 can be removed from xargs.

with a schell script.

All sockets already have
```
After=sssd.service
BindsTo=sssd.service
```
 - this ensures SSSD was started and running before socket
activation.

New 'ExecStartPre' condition checks if a responder with the
same name is running and, if so, if it runs in the same
mnt namespace. The latter is to ignore processes run in
a container on the same host.

Resolves: SSSD#4333
Resolves: SSSD#5013
@alexey-tikhonov
Copy link
Member Author

Well, -n 1 can be removed from xargs.

Done.

@pbrezina
Copy link
Member

pbrezina commented Sep 24, 2024

Maybe it would be possible to look at process cgroup? This would be way easier and more readable. But I don't know if this is documented anywhere.

# sssd on host
[pbrezina ~/workspace/sssd (citest)]$ sudo cat /proc/15229/cgroup
0::/system.slice/sssd.service

# sssd in container
[pbrezina ~/workspace/sssd (citest)]$ sudo cat /proc/13079/cgroup
0::/machine.slice/libpod-6aac6f46cac4e8166248f63e107040c09617ed1ea2b88b21a07dec907471307b.scope/container/system.slice/sssd.service

Another solution, probably good for us in long term would be to stop managing our processes on systemd systems and just run systemd start sssd-nss.service from the monitor if systemd is used to start sssd.service and nss is in services. Or finally fully switch to systemd, but that's out of scope so far, I guess.

@pbrezina
Copy link
Member

Your code works nicely, but it just fails silently if the condition is not met. The original code produced a nice error message. Why do you want to remove it anyway?

@alexey-tikhonov
Copy link
Member Author

Your code works nicely, but it just fails silently if the condition is not met. The original code produced a nice error message.

I can add a comment before this ExecStartPre, this + systemctl status should be telling enough, imo.

Why do you want to remove it anyway?

There are issues (like #5013)
To fix those issues tool needs to check confdb.ldb instead of sssd.conf (like ad85fc3 but dropping reading sssd.conf completely and using new helpers to read existing confdb without creating it)
But why maintaining a dedicated C code tool if it can be replaced with a shell script?

@alexey-tikhonov
Copy link
Member Author

Maybe it would be possible to look at process cgroup?

Should be possible.
But we still need to iterate over a list returned by pidof.
So this would replace ps -o mntns -p and grep -q 'lsns -n -t mnt -o NS -p \\$\\$') with something like test 'cat /proc/$/cgroups' = 'cat /proc/self/cgroups'

Is this what you meant?

@pbrezina
Copy link
Member

Your code works nicely, but it just fails silently if the condition is not met. The original code produced a nice error message.

I can add a comment before this ExecStartPre, this + systemctl status should be telling enough, imo.

Not really.

[root /home/pbrezina]# systemctl start sssd-nss.socket
Job failed. See "journalctl -xe" for details.

But journal does not have any information on why it failed:

Sep 26 11:11:07 pbrezina.my systemd[1]: Starting sssd-nss.socket - SSSD NSS Service responder socket...
░░ Subject: A start job for unit sssd-nss.socket has begun execution
░░ Defined-By: systemd
░░ Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
░░ 
░░ A start job for unit sssd-nss.socket has begun execution.
░░ 
░░ The job identifier is 5024.
Sep 26 11:11:07 pbrezina.my systemd[1]: sssd-nss.socket: Control process exited, code=exited, status=1/FAILURE
░░ Subject: Unit process exited
░░ Defined-By: systemd
░░ Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
░░ 
░░ An ExecStartPre= process belonging to unit sssd-nss.socket has exited.
░░ 
░░ The process' exit code is 'exited' and its exit status is 1.
Sep 26 11:11:07 pbrezina.my systemd[1]: sssd-nss.socket: Failed with result 'exit-code'.
░░ Subject: Unit failed
░░ Defined-By: systemd
░░ Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
░░ 
░░ The unit sssd-nss.socket has entered the 'failed' state with result 'exit-code'.
Sep 26 11:11:07 pbrezina.my systemd[1]: Failed to listen on sssd-nss.socket - SSSD NSS Service responder socket.
░░ Subject: A start job for unit sssd-nss.socket has failed
░░ Defined-By: systemd
░░ Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
░░ 
░░ A start job for unit sssd-nss.socket has finished with a failure.
░░ 
░░ The job identifier is 5024 and the job result is failed.

So you really have to call systemctl status to get any information, but who's going to understand what the command does?

[root /home/pbrezina]# systemctl status sssd-nss.socket
× sssd-nss.socket - SSSD NSS Service responder socket
     Loaded: loaded (/usr/lib/systemd/system/sssd-nss.socket; disabled; preset: disabled)
     Active: failed (Result: exit-code) since Thu 2024-09-26 11:11:07 CEST; 8s ago
   Triggers: ● sssd-nss.service
       Docs: man:sssd.conf(5)
     Listen: /var/lib/sss/pipes/nss (Stream)
    Process: 17482 ExecStartPre=/bin/sh -c ! (pidof -q sssd_nss && pidof sssd_nss | xargs ps -o mntns -p | grep -q `lsns -n -t mnt -o NS -p \$\$`) (code=exited, status=1/FAILURE)
        CPU: 84ms

Sep 26 11:11:07 pbrezina.my systemd[1]: Starting sssd-nss.socket - SSSD NSS Service responder socket...
Sep 26 11:11:07 pbrezina.my systemd[1]: sssd-nss.socket: Control process exited, code=exited, status=1/FAILURE
Sep 26 11:11:07 pbrezina.my systemd[1]: sssd-nss.socket: Failed with result 'exit-code'.
Sep 26 11:11:07 pbrezina.my systemd[1]: Failed to listen on sssd-nss.socket - SSSD NSS Service responder socket.

Any comment won't be visible unless one calls systemctl cat sssd-nss.socket, but I doubt that anyone will call that.

Maybe we can put this in a script file and print message to syslog?

@pbrezina
Copy link
Member

Maybe it would be possible to look at process cgroup?

Should be possible. But we still need to iterate over a list returned by pidof. So this would replace ps -o mntns -p and grep -q 'lsns -n -t mnt -o NS -p \\$\\$') with something like test 'cat /proc/$/cgroups' = 'cat /proc/self/cgroups'

Is this what you meant?

I meant more something like grep machine.slice /proc/$/cgroup

Either way, you need to add findutils (xargs), procps-ng (ps, pidof) grep (grep) and util-linux (lsns) deps to spec file.

It might be also safer to use abs path instead of just the binary name.

@alexey-tikhonov
Copy link
Member Author

I meant more something like grep machine.slice /proc/$/cgroup

Maybe 'system.slice'? And is it going to be the same if everything already happens in the container?

@alexey-tikhonov
Copy link
Member Author

alexey-tikhonov commented Sep 27, 2024

Taking into account the need to:

  • add a bunch of deps
  • package and ship a new shell script (instead of putting it into service file)

I'm no more sure that is so good idea. Maybe fixing existing C-program would be better.

@pbrezina, @sumit-bose, @aplopez,
what would you say?

@sumit-bose
Copy link
Contributor

Taking into account the need to:

* add a bunch of deps

* package and ship a new shell script (instead of putting it into service file)

I'm no more sure that is so good idea. Maybe fixing existing C-program would be better.

@pbrezina, @sumit-bose, @aplopez, what would you say?

Hi,

I agree that improving the existing program would be better.

bye,
Sumit

@aplopez
Copy link
Contributor

aplopez commented Oct 1, 2024

Taking into account the need to:

* add a bunch of deps

* package and ship a new shell script (instead of putting it into service file)

I'm no more sure that is so good idea. Maybe fixing existing C-program would be better.

@pbrezina, @sumit-bose, @aplopez, what would you say?

I agree in that improving the C program would be better. However, in my ignorance of the subject, I don't see how this could be achieved. If the program is launched by systemd and it detects that there is another instance running, what will it do? If it stops, systemd will consider it failed and relaunch it a number of times, and it will stop every time.

Do you have in mind a different solution?

@alexey-tikhonov
Copy link
Member Author

If the program is launched by systemd and it detects that there is another instance running, what will it do? If it stops, systemd will consider it failed and relaunch it a number of times, and it will stop every time.

That's not an issue. If ExecStartPre exits with error code then main unit (ExecStart) isn't started.

@alexey-tikhonov
Copy link
Member Author

Ok, it's a pity, but I'll re-do this.

@aplopez
Copy link
Contributor

aplopez commented Oct 1, 2024

If the program is launched by systemd and it detects that there is another instance running, what will it do? If it stops, systemd will consider it failed and relaunch it a number of times, and it will stop every time.

That's not an issue. If ExecStartPre exits with error code then main unit (ExecStart) isn't started.

So the C-program you want to fix is sssd_check_socket_activated_responders? I understood you wanted to fix the daemons to also check for other instances and get rid if sssd_check_socket_activated_responders.

I agree with this proposition.

@alexey-tikhonov
Copy link
Member Author

To fix those issues tool needs to check confdb.ldb instead of sssd.conf

Well, this doesn't work either: monitor's add_implicit_services() doesn't update config.ldb...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no-backport This should go to target branch only.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Duplicate sssd_pac process when using IPA backend Avoid running two instances of the same service
4 participants