Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NOTIFYCMD does not work #1760

Closed
man55 opened this issue Jan 3, 2023 · 13 comments
Closed

NOTIFYCMD does not work #1760

man55 opened this issue Jan 3, 2023 · 13 comments
Labels
documentation impacts-release-2.8.0 Issues reported against NUT release 2.8.0 (maybe vanilla or with minor packaging tweaks) upsmon

Comments

@man55
Copy link

man55 commented Jan 3, 2023

A new clean install of version 2.8.0 and notifications do not work and nothing is written to the syslog.
I have an old installation of version 2.7.4 (installed with apt-get) with absolutely identical configs and everything works as expected.

syslog:

Jan  3 16:15:10 ups usbhid-ups[1202]: libusb_get_interrupt: could not claim interface 0: Device or resource busy
Jan  3 16:15:11 ups kernel: [  174.935387] usb 1-1.1.3: USB disconnect, device number 4
Jan  3 16:15:12 ups usbhid-ups[1202]: libusb_get_interrupt: error submitting URB: No such device
Jan  3 16:15:14 ups nut-server[1207]: Data for UPS [internet] is stale - check driver
Jan  3 16:15:14 ups upsd[1207]: Data for UPS [internet] is stale - check driver
Jan  3 16:17:01 ups CRON[1624]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Jan  3 16:19:23 ups kernel: [  427.132591] usb 1-1.1.3: new low-speed USB device number 6 using dwc_otg
Jan  3 16:19:23 ups kernel: [  427.313819] usb 1-1.1.3: New USB device found, idVendor=051d, idProduct=0002, bcdDevice= 0.06
Jan  3 16:19:23 ups kernel: [  427.313860] usb 1-1.1.3: New USB device strings: Mfr=3, Product=1, SerialNumber=2
Jan  3 16:19:23 ups kernel: [  427.313880] usb 1-1.1.3: Product: Smart-UPS 2200 FW:654.19.I USB FW:7.4
Jan  3 16:19:23 ups kernel: [  427.313898] usb 1-1.1.3: Manufacturer: American Power Conversion
Jan  3 16:19:23 ups kernel: [  427.313914] usb 1-1.1.3: SerialNumber: AS1250244567
Jan  3 16:19:23 ups kernel: [  427.539981] hid-generic 0003:051D:0002.0002: hiddev96,hidraw0: USB HID v1.10 Device [American Power Conversion Smart-UPS 2200 FW:654.19.I USB FW:7.4] on usb-3f980000.usb-1.1.3/input0
Jan  3 16:19:23 ups mtp-probe: checking bus 1, device 6: "/sys/devices/platform/soc/3f980000.usb/usb1/1-1/1-1.1/1-1.1.3"
Jan  3 16:19:23 ups mtp-probe: bus: 1, device: 6 was not an MTP device
Jan  3 16:19:23 ups mtp-probe: checking bus 1, device 6: "/sys/devices/platform/soc/3f980000.usb/usb1/1-1/1-1.1/1-1.1.3"
Jan  3 16:19:23 ups mtp-probe: bus: 1, device: 6 was not an MTP device
Jan  3 16:19:26 ups nut-server[1207]: UPS [internet] data is no longer stale
Jan  3 16:19:26 ups upsd[1207]: UPS [internet] data is no longer stale

I tried both through a custom script and through upssched, checked script privileges (nut:nut), executable, but nothing.
#NOTIFYCMD /etc/nut/notifycmd.sh
NOTIFYCMD /usr/sbin/upssched

@jimklimov
Copy link
Member

Can not confirm (works for me) :\

  • Here's the script I've just put together for the test:
# ls -la /usr/local/ups/bin/notifyme
-rwxr-xr-x 1 root root 127 Jan  3 17:03 /usr/local/ups/bin/notifyme

# cat /usr/local/ups/bin/notifyme
#!/bin/sh

printf '%s\t[%s]\t%s:\t%s\t(%s tokens)\n' "`date -u`" "`id`" "$NOTIFYTYPE" "$*" "$#" >> "/tmp/notifyme-`id -u`.log"
  • Applied to upsmon.conf (added the NOTIFYCMD that was missing before, and the +EXEC in NOTIFYFLAGs):
# cat /etc/nut/upsmon.conf
MONITOR eco650 1 upsmon "myPass" primary
MINSUPPLIES 1

SHUTDOWNCMD "/sbin/shutdown -h +0"

POLLFREQ 5
POLLFREQALERT 5
HOSTSYNC 15
DEADTIME 15

POWERDOWNFLAG /etc/killpower

NOTIFYMSG ONLINE      "UPS %s on line power"
NOTIFYMSG ONBATT      "UPS %s on battery"
NOTIFYMSG LOWBATT     "UPS %s battery is low"
NOTIFYMSG FSD         "UPS %s: forced shutdown in progress"
NOTIFYMSG COMMOK      "Communications with UPS %s established"
NOTIFYMSG COMMBAD     "Communications with UPS %s lost"
NOTIFYMSG SHUTDOWN    "Auto logout and shutdown proceeding"
NOTIFYMSG REPLBATT    "UPS %s battery needs to be replaced"
NOTIFYMSG NOCOMM      "UPS %s is unavailable"
NOTIFYMSG NOPARENT    "upsmon parent process died - shutdown impossible"

NOTIFYFLAG ONLINE     SYSLOG+WALL+EXEC
NOTIFYFLAG ONBATT     SYSLOG+WALL+EXEC
NOTIFYFLAG LOWBATT    SYSLOG+WALL+EXEC
NOTIFYFLAG FSD        SYSLOG+WALL+EXEC
NOTIFYFLAG COMMOK     SYSLOG+WALL+EXEC
NOTIFYFLAG COMMBAD    SYSLOG+WALL+EXEC
NOTIFYFLAG SHUTDOWN   SYSLOG+WALL+EXEC
NOTIFYFLAG REPLBATT   SYSLOG+WALL+EXEC
NOTIFYFLAG NOCOMM     SYSLOG+WALL+EXEC
NOTIFYFLAG NOPARENT   SYSLOG+WALL+EXEC

NOTIFYCMD "/usr/local/ups/bin/notifyme"

RBWARNTIME 43200
NOCOMMWARNTIME 300
FINALDELAY 5
  • Restarted (might reload probably) the upsmon process:
# systemctl restart nut-monitor
  • Restarted a NUT driver to trigger comms events (journalctl output for NUT service units rolling to console):
# systemctl restart nut-driver@eco650
Jan 03 17:04:01 pve systemd[1]: Stopping Network UPS Tools - device driver for NUT device 'eco650'...
Jan 03 17:04:02 pve usbhid-ups[620558]: Signal 15: exiting
Jan 03 17:04:02 pve nut-driver@eco650[624441]: fopen /var/state/ups/usbhid-ups-eco650.pid: No such file or directory
Jan 03 17:04:02 pve nut-driver@eco650[624441]: Network UPS Tools - UPS driver controller 2.8.0-Windows-243-gfe18dd77d
Jan 03 17:04:02 pve systemd[1]: [email protected]: Succeeded.
Jan 03 17:04:02 pve systemd[1]: Stopped Network UPS Tools - device driver for NUT device 'eco650'.
Jan 03 17:04:02 pve systemd[1]: Starting Network UPS Tools - device driver for NUT device 'eco650'...

Jan 03 17:04:06 pve nut-driver@eco650[624544]: Using subdriver: MGE HID 1.46
Jan 03 17:04:06 pve nut-driver@eco650[624544]: Network UPS Tools - Generic HID driver 0.49 (2.8.0-Windows-243-gfe18dd77d)
Jan 03 17:04:06 pve nut-driver@eco650[624544]: USB communication driver (libusb 1.0) 0.43

Jan 03 17:04:08 pve nut-driver@eco650[624543]: Network UPS Tools - UPS driver controller 2.8.0-Windows-243-gfe18dd77d
Jan 03 17:04:08 pve nut-server[2055866]: Connected to UPS [eco650]: usbhid-ups-eco650
Jan 03 17:04:08 pve upsd[2055866]: Connected to UPS [eco650]: usbhid-ups-eco650
Jan 03 17:04:08 pve usbhid-ups[624651]: Startup successful
Jan 03 17:04:08 pve systemd[1]: Started Network UPS Tools - device driver for NUT device 'eco650'.
Jan 03 17:04:09 pve usbhid-ups[624651]: sock_connect: enabling asynchronous mode (auto)
  • Log file appeared, and is owned by nut:nut runtime account:
# ls -la /tmp/notify*
-rw-r--r-- 1 nut nut 258 Jan  3 17:04 /tmp/notifyme-77.log

# cat /tmp/notifyme-77.log
Tue Jan  3 16:04:03 UTC 2023    [uid=77(nut) gid=77(nut) groups=77(nut)]        COMMBAD:        Communications with UPS eco650 lost     (1 tokens)
Tue Jan  3 16:04:13 UTC 2023    [uid=77(nut) gid=77(nut) groups=77(nut)]        COMMOK: Communications with UPS eco650 established      (1 tokens)

@jimklimov
Copy link
Member

jimklimov commented Jan 3, 2023

Updated previous test with DEBUG_MIN 6 in upsmon.conf, and see it reporting the notifications when driver changes state.

In combined log stream on console it is mixed like this (note each log is buffered by systemd or OS or whoever, so lines from different streams do come out of order a bit):

# systemctl restart nut-driver@eco650
Jan 03 17:42:39 pve nut-monitor[708211]:   30.003522        [D2] pollups: eco650
Jan 03 17:42:39 pve nut-monitor[708211]:   30.003543        [D3] get_var: eco650 / status

### HERE driver is lost:
Jan 03 17:42:39 pve nut-monitor[708211]:   30.003621        Poll UPS [eco650] failed - Driver not connected
Jan 03 17:42:39 pve nut-monitor[708211]:   30.003627        [D2] do_notify: ntype 0x0005 (COMMBAD)

Jan 03 17:42:39 pve systemd[1]: Stopping Network UPS Tools - device driver for NUT device 'eco650'...

Jan 03 17:42:39 pve nut-monitor[708211]:   30.003644        Communications with UPS eco650 lost

Jan 03 17:42:39 pve usbhid-ups[708176]: Signal 15: exiting

Jan 03 17:42:39 pve nut-monitor[708211]:   30.003750        [D3] Current power value: 1
Jan 03 17:42:39 pve nut-monitor[708211]:   30.003762        [D3] Minimum power value: 1
Jan 03 17:42:39 pve nut-monitor[708748]: Network UPS Tools upsmon 2.8.0-Windows-243-gfe18dd77d

Jan 03 17:42:40 pve nut-driver@eco650[708747]: fopen /var/state/ups/usbhid-ups-eco650.pid: No such file or directory
Jan 03 17:42:40 pve nut-driver@eco650[708747]: Network UPS Tools - UPS driver controller 2.8.0-Windows-243-gfe18dd77d
Jan 03 17:42:40 pve systemd[1]: [email protected]: Succeeded.
Jan 03 17:42:40 pve systemd[1]: Stopped Network UPS Tools - device driver for NUT device 'eco650'.
Jan 03 17:42:40 pve systemd[1]: Starting Network UPS Tools - device driver for NUT device 'eco650'...

Jan 03 17:42:44 pve nut-driver@eco650[708784]: Using subdriver: MGE HID 1.46
Jan 03 17:42:44 pve nut-driver@eco650[708784]: Network UPS Tools - Generic HID driver 0.49 (2.8.0-Windows-243-gfe18dd77d)
Jan 03 17:42:44 pve nut-driver@eco650[708784]: USB communication driver (libusb 1.0) 0.43

Jan 03 17:42:44 pve nut-monitor[708211]:   35.003836        [D2] pollups: eco650
Jan 03 17:42:44 pve nut-monitor[708211]:   35.003853        [D3] get_var: eco650 / status

### Here driver is still AWOL, same as before so no new notifications:
Jan 03 17:42:44 pve nut-monitor[708211]:   35.003931        Poll UPS [eco650] failed - Driver not connected
Jan 03 17:42:44 pve nut-monitor[708211]:   35.003937        [D3] Current power value: 1
Jan 03 17:42:44 pve nut-monitor[708211]:   35.003940        [D3] Minimum power value: 1

Jan 03 17:42:46 pve usbhid-ups[708906]: Startup successful
Jan 03 17:42:46 pve nut-driver@eco650[708783]: Network UPS Tools - UPS driver controller 2.8.0-Windows-243-gfe18dd77d
Jan 03 17:42:46 pve systemd[1]: Started Network UPS Tools - device driver for NUT device 'eco650'.
Jan 03 17:42:46 pve nut-server[2055866]: Connected to UPS [eco650]: usbhid-ups-eco650
Jan 03 17:42:46 pve upsd[2055866]: Connected to UPS [eco650]: usbhid-ups-eco650
Jan 03 17:42:47 pve usbhid-ups[708906]: sock_connect: enabling asynchronous mode (auto)

Jan 03 17:42:49 pve nut-monitor[708211]:   40.004025        [D2] pollups: eco650
Jan 03 17:42:49 pve nut-monitor[708211]:   40.004042        [D3] get_var: eco650 / status
Jan 03 17:42:49 pve nut-monitor[708211]:   40.004126        [D2] parse_status: [OL]

### HERE driver is seen again:
Jan 03 17:42:49 pve nut-monitor[708211]:   40.004133        [D2] do_notify: ntype 0x0004 (COMMOK)
Jan 03 17:42:49 pve nut-monitor[708211]:   40.004139        Communications with UPS eco650 established

Jan 03 17:42:49 pve nut-monitor[708211]:   40.004267        [D3] parsing: [OL]
Jan 03 17:42:49 pve nut-monitor[708211]:   40.004279        [D4] ups_on_line: eco650 (no change)
Jan 03 17:42:49 pve nut-monitor[708211]:   40.004284        [D3] Current power value: 1
Jan 03 17:42:49 pve nut-monitor[708211]:   40.004289        [D3] Minimum power value: 1
Jan 03 17:42:49 pve nut-monitor[708999]: Network UPS Tools upsmon 2.8.0-Windows-243-gfe18dd77d

Jan 03 17:42:54 pve nut-monitor[708211]:   45.004361        [D2] pollups: eco650
Jan 03 17:42:54 pve nut-monitor[708211]:   45.004380        [D3] get_var: eco650 / status
Jan 03 17:42:54 pve nut-monitor[708211]:   45.004474        [D2] parse_status: [OL]
Jan 03 17:42:54 pve nut-monitor[708211]:   45.004480        [D3] parsing: [OL]
Jan 03 17:42:54 pve nut-monitor[708211]:   45.004483        [D4] ups_on_line: eco650 (no change)
Jan 03 17:42:54 pve nut-monitor[708211]:   45.004487        [D3] Current power value: 1
Jan 03 17:42:54 pve nut-monitor[708211]:   45.004490        [D3] Minimum power value: 1
...

@man55
Copy link
Author

man55 commented Jan 3, 2023

my upsmon.conf contains:

NOTIFYCMD /etc/nut/notifycmd.sh

NOTIFYFLAG ONLINE     SYSLOG+EXEC
NOTIFYFLAG ONBATT     SYSLOG+EXEC
NOTIFYFLAG LOWBATT    SYSLOG+EXEC
NOTIFYFLAG FSD        SYSLOG+EXEC
NOTIFYFLAG COMMOK     SYSLOG+EXEC
NOTIFYFLAG COMMBAD    SYSLOG+EXEC
NOTIFYFLAG SHUTDOWN   SYSLOG+EXEC

yes, I did reboot the system many times. This is the fifth time I've reinstalled the flash drive for raspberry((

I have two SD flash drives for one raspberry.
They contain exactly the same configurations. absolutely identical.

The only difference: on one is version 2.7.4 installed via apt-get install nut nut-snmp.
And the second one is compiled from github according to this instruction.

Maybe something is wrong with the instructions, maybe some problem with the rights or users, or folders, I don’t really understand this.
But on 2.7.4 notifications work as expected.
And on 2.8.0. NOTIFYCMD just doesn't work at all. There is nothing even in the syslog, that is, the function is not even called or cannot work.

There is not a single clear instruction on how to install it correctly on the entire Internet. Everybody do it through an apt-get install.
But on version 2.7.4. the driver does not work correctly in my case with 4 upses in different places: when one device disappears from the online, the driver starts to fall and then other devices start to fall off, and then appear again. This causes a bunch of false positives notifications. As I see it in 2.8.0. the driver is run separately for each UPS using nut-driver-enumerator.sh, that's why I'm trying to install 2.8.0
I'm even ready to pay money to make it just work, but I don't know where. I've been hitting the wall for two weeks now...

@jimklimov
Copy link
Member

By the way, in a recent discussion at #1754 a very detailed set of notes was posted about building and setting up current NUT with Raspbian. Maybe that cross-reference would inspire you about what you did differently?

Alas, I can not reproduce the situation exactly since the Raspberry I had some time ago kicked the bucket :(

I'm really trying hard to not look snobbish - build systems, CI automation and recipes are my sysadmin bread for decades, so I might consider "apparent" something that you in fact stumble upon and it is hard to guess...

Did you bump the debug verbosity to check what upsmon says in its journal - does it at least recognize that it should do_notify when events happen (e.g. driver restart)? Does it report any failures trying to actually notify?

@man55
Copy link
Author

man55 commented Jan 3, 2023

Thanks for the link, I'm already reading. I will reinstall the SD flash drive again and then increase the debug level.

I would like to know exactly under which user it is correct to compile and everything else?
I always do everything as a root user (in my experience this usually avoids various difficulties in the future).
maybe it should be different here?

@jimklimov
Copy link
Member

jimklimov commented Jan 3, 2023

As long as you specify the --with-user and --with-group for run-time (or specify/override RUN_AS_USER etc. in config files, and that account exists), it should not matter who you build as. Generally it is deemed safer to configure/build/check as an unprivileged user, and only elevate to sudo make install in the end if tests passed.

On the other hand, when running the daemon programs (upsd, upsmon, drivers...) they usually expect to be started by root and drop privileges into the unprivileged run-time account on their own (possibly after accessing otherwise protected filesystem data etc.). A non-root user may not change to another user; however starting them as the nut user right away should probably work (except for upsmon who needs to leave a root owned process with rights to shut down the system).

However (sudo) make install is not (currently) a replacement for complete package-like installation, e.g. it does not create run-time directories (for pipe and PID files), register services in a distro-dependent fashion, change ownerships, possibly might not even set access rights to files the way daemons like them (e.g. to keep plaintext passwords as secure as feasible). In fact, it only installs "samples" of config files but the end-user is expected to copy and tailor them (and chown/chmod/enable along the way).

@jimklimov
Copy link
Member

Trying to reproduce with a Debian 11 Bullseye based system, it seems that by default root shell has mesg n in force, so does not receive messages from the wall -- a mesg y enables that:

# mesg y
# systemctl restart nut-driver@eco650

Broadcast message from nut@pve (somewhere) (Tue Jan  3 19:41:23 2023):

Communications with UPS eco650 lost


Broadcast message from nut@pve (somewhere) (Tue Jan  3 19:41:33 2023):

Communications with UPS eco650 established

Regarding the notification script (and maybe wall notifications), one possible issue to check for is still about FS access rights:

  • may your nut:nut read (or at least "exec") the directory and its parents?
  • may your nut:nut both read and "exec" the script file?
  • can the system shell be weird? possibly, does nut:nut have a reasonable shell or a nologin? In code I see that system() is used, so may be calling processes via user's (or system?) shell... at least it is something easy to experimentally test for.

Regarding syslog, on my system it is maintained by rsyslog daemon, and I see some upsmon messages in /var/log/syslog which by wording seem to duplicate the journal messages (coming from systemd unit)...

# tail -F /var/log/messages &
# su - nut
su: warning: cannot change directory to /home/nut: No such file or directory
$ logger -p NOTICE 'Test'
Jan  3 19:56:00 pve root: Test

At least, this is also checkable (why root though?..) to make sure it works at least outside of NUT...

@jimklimov
Copy link
Member

Looking at https://man7.org/linux/man-pages/man3/popen.3.html and https://www.gnu.org/software/libc/manual/html_node/Running-a-Command.html the user's shell may be irrelevant with GNU libc which calls /bin/sh. Still, worth checking to rule out.

@man55
Copy link
Author

man55 commented Jan 3, 2023

So, thank you very much for your help, I finally beat the problems.
And I found one problem that needs to be fixed.

The main thing is to create the user and group correctly at the start, thanks #1754

groupadd --system nut
useradd --system --gid=nut --home-dir=/var/lib/nut --shell=/sbin/nologin nut
usermod -a -G nut nut

apt-get update
apt-get install -y build-essential autoconf gettext libusb-dev libtool libgd-dev libsnmp-dev -y

wget https://github.com/networkupstools/nut/releases/download/v2.8.0/nut-2.8.0.tar.gz
tar xzf nut-2.8.0.tar.gz
mv nut-2.8.0 nut
cd nut

./configure --prefix=/ --includedir=/usr/include --mandir=/usr/share/man \
--infodir=/usr/share/info --sysconfdir=/etc/nut --localstatedir=/var \
--libexecdir=/usr/lib/nut --srcdir=. --enable-maintainer-mode \
--disable-silent-rules --libdir=/usr/lib/`gcc -print-multiarch` \
--with-cgi --with-usb --with-snmp --enable-static --with-statepath=/var/run/nut \
--with-altpidpath=/var/run/nut --with-drvpath=/lib/nut \
--with-cgipath=/usr/lib/cgi-bin/nut --with-htmlpath=/usr/share/nut/www \
--with-pidpath=/var/run/nut --datadir=/usr/share/nut \
--with-pkgconfig-dir=/usr/lib/`gcc -print-multiarch`/pkgconfig \
--with-user=nut --with-group=nut --with-udev-dir=/lib/udev \
--with-systemdsystemunitdir=/lib/systemd/system

make
make install

After then:
mv /usr/lib/tmpfiles.d/nut-common.tmpfiles /usr/lib/tmpfiles.d/nut-common.conf
But there was a surprise in nut-common.conf, the path is incorrectly specified.
was like this:

d /var/run/nut/nut 0770 nut nut - -
X /var/run/nut/nut

but it needs to be:

d /var/run/nut 0770 nut nut - -
X /var/run/nut

which corresponds to our configure ... --with-pidpath=/var/run/nut

Now the folder is created correctly and all services start as expected.

The only thing is I installed UPS Web Server and it shows the old version 2.7.4. Wondering why?

apt install apache2 nut-cgi
a2enmod cgi
systemctl restart apache2

Screenshot 2023-01-03 213026

The drivers ver is OK 2.8.0.
image

@jimklimov
Copy link
Member

jimklimov commented Jan 3, 2023

Sounds great! And thanks for the report :)

Regarding tmpfiles - it was fixed (different aspects at different times) after 2.8.0 release. So you might want to try builds of the latest and greatest bugfixes (and new bugs, might be) from Git checkouts. There you might need to run ./autogen.sh to create the configure script and some other files, so possibly some more dependencies to install. https://github.com/networkupstools/nut/tree/master/docs/config-prereqs.txt details the setups of different platforms for CI farm, with package lists relevant for those OSes.

It is arguably better formatted but (due to chunking) clumsier to read on the web site: https://networkupstools.org/docs/user-manual.chunked/index.html => https://networkupstools.org/docs/user-manual.chunked/apks02.html#_debian_10_11

As for the website version report - you've just installed nut-cgi from packaging ;)

@jimklimov
Copy link
Member

jimklimov commented Jan 3, 2023 via email

@jimklimov
Copy link
Member

So, back to original topic of this specific issue: after the methodology changes you posted above, did NOTIFYCMD begin working for you? :) If yes, we can close this issue at least.

@man55
Copy link
Author

man55 commented Jan 4, 2023

Yes, NOTIFYCMD working perfectly after correct user&group creation before compilation.

@man55 man55 closed this as completed Jan 4, 2023
@jimklimov jimklimov added the impacts-release-2.8.0 Issues reported against NUT release 2.8.0 (maybe vanilla or with minor packaging tweaks) label Jun 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation impacts-release-2.8.0 Issues reported against NUT release 2.8.0 (maybe vanilla or with minor packaging tweaks) upsmon
Projects
None yet
Development

No branches or pull requests

2 participants