Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mge-shut 0.86 can't deal with serial lines #2022

Closed
shining-fnml opened this issue Aug 24, 2023 · 18 comments · Fixed by #2023
Closed

mge-shut 0.86 can't deal with serial lines #2022

shining-fnml opened this issue Aug 24, 2023 · 18 comments · Fixed by #2023
Labels
impacts-release-2.8.0 Issues reported against NUT release 2.8.0 (maybe vanilla or with minor packaging tweaks) MGE MGE branded devices and (USB) chipsets, now part of Eaton; mostly bcmxcp drivers serial port

Comments

@shining-fnml
Copy link

Hello,
since I upgraded from 2.7.4 to 2.8.0, nut stopped working with Eaton 5P. Actually I moved from Debian Buster to Bookwork (amd64).

This is my (unmodified) definition in ups.conf:

[5p1550]
        driver = mge-shut
        port = /dev/ttyS5
        desc = "Server"

Running upsdrvctl shows mge-shut is trying to operate on the serial line the same way as it use to do for an Usb device. At least this is my guess.

root# upsdrvctl -D -D -d start
Network UPS Tools - UPS driver controller 2.8.0
   0.000000     [D1] Starting UPS: 5p1550
   0.000011     [D2] 3 remaining attempts
   0.000014     [D2] exec:  /lib/nut/mge-shut -DD -a 5p1550
Network UPS Tools - Generic HID driver 0.47 (2.8.0)
SHUT communication driver 0.86
Warning: This is an experimental driver.
Some features may not function correctly.

   0.000000     [D1] debug level is '2'
   0.000720     [D2] Initializing an USB-connected UPS with library (null) (NUT subdriver name='SHUT communication driver' ver='0.86')
   0.000723     [D1] upsdrv_initups (SHUT)...
   0.000725     [D2] libshut_open: using port /dev/ttyS5
   0.000933     [D2] entering shut_synchronise()
   0.029173     [D2] Communication with UPS established
   0.029183     [D2] entering shut_get_descriptor(n 01, 18)
   0.097102     [D2] shut_wait_ack(): ACK received
   0.381948     [D2] shut_wait_ack(): ACK received
   0.561697     [D2] shut_wait_ack(): ACK received
   0.813969     [D2] shut_wait_ack(): ACK received
   0.983708     [D2] - VendorID: 0463
   0.983717     [D2] - ProductID: ffff
   0.983718     [D2] - Manufacturer: EATON
   0.983719     [D2] - Product: Eaton 5P
   0.983720     [D2] - Serial Number: [hidden]
   0.983722     [D2] - Bus: serial
   0.983723     [D2] - Device release number: 0100
   0.983724     [D2] Device matches
   0.983726     [D2] entering shut_get_descriptor(n 21, 9)
   1.083967     [D2] shut_wait_ack(): ACK received
   1.159121     [D2] HID descriptor retrieved (Reportlen = 20046)
   1.159129     [D2] HID descriptor too long 20046 (max 6144)
   1.159131     No matching HID UPS found
   1.169966     Driver failed to start (exit status=1)
   6.170205     [D2] 2 remaining attempts
   6.170221     [D2] exec:  /lib/nut/mge-shut -DD -a 5p1550

As a workaround I simply copied the old 0.85 binary driver from Buster to Bookworm. And the results were great!

root# upsdrvctl -D -D -d start      
Network UPS Tools - UPS driver controller 2.8.0
   0.000000     [D1] Starting UPS: nutdev1
   0.001539     [D2] 3 remaining attempts
   0.001546     [D2] exec:  /lib/nut/mge-shut -DD -a nutdev1
Network UPS Tools - Generic HID driver 0.41 (2.7.4)
SHUT communication driver 0.85
Warning: This is an experimental driver.
Some features may not function correctly.

   0.000000     debug level is '2'
   0.000624     upsdrv_initups...
   0.000629     libshut_open: using port /dev/ttyS5
   0.000846     entering shut_synchronise()
   0.040110     Communication with UPS established
   0.040122     entering shut_get_descriptor(n 01, 18)
   0.108130     shut_wait_ack(): ACK received
   0.392995     shut_wait_ack(): ACK received
   0.572874     shut_wait_ack(): ACK received
   0.824750     shut_wait_ack(): ACK received
   0.995245     - VendorID: 0463
   0.995253     - ProductID: ffff
   0.995254     - Manufacturer: EATON
   0.995256     - Product: Eaton 5P
   0.995257     - Serial Number: [hidden]
   0.995258     - Bus: serial
   0.995278     - Device release number: 0100
   0.995280     Device matches
   0.995281     entering shut_get_descriptor(n 21, 9)
   1.094996     shut_wait_ack(): ACK received
   1.170046     HID descriptor retrieved (Reportlen = 2126)
   1.170054     entering shut_get_descriptor(n 22, 2126)
   1.242111     shut_wait_ack(): ACK received

I hope this can be useful for anyone meantime you fix the bug.
Thank you very much for your efforts!

@jimklimov
Copy link
Member

jimklimov commented Aug 24, 2023

Just in case, can you build the current sources and check from that build workspace (no need to break your packaged installation)? Wondering if the problem was solved by something that landed during the past year since 2.8.0...

https://github.com/networkupstools/nut/wiki/Building-NUT-for-in%E2%80%90place-upgrades-or-non%E2%80%90disruptive-tests

@jimklimov jimklimov added MGE MGE branded devices and (USB) chipsets, now part of Eaton; mostly bcmxcp drivers serial port impacts-release-2.8.0 Issues reported against NUT release 2.8.0 (maybe vanilla or with minor packaging tweaks) labels Aug 24, 2023
@jimklimov
Copy link
Member

jimklimov commented Aug 24, 2023

A bit surprised to see

...Initializing an USB-connected UPS with library (null)...

the null should have been an identifier of the libusb-0.1/1.x version built against...

But this was mostly tested with usbhid-ups and nutdrv_qx, so maybe something got overlooked in the less-frequently trodded path.

For example:

:; /usr/local/ups/*bin/usbhid-ups -DDDDDD -h
Network UPS Tools - Generic HID driver 0.50 (2.8.0-2287-gb0c97f71c)
USB communication driver (libusb 1.0) 0.45
...

...or rather when actually starting a driver:

   0.000828     [D2] Initializing an USB-connected UPS with library libusb-1.0.24 (API: 0x1000108) (NUT subdriver name='USB communication driver (libusb 1.0)' ver='0.45')
   0.000830     [D1] upsdrv_initups (non-SHUT)...

@shining-fnml
Copy link
Author

I built from commit 3d9630e and sadly there are no good news to report:

/usr/local/src/nut$ sudo ./upsdrvctl -F -D -D -d start > /tmp/with-usb.txt 2>&1
/usr/local/src/nut$ cat /tmp/with-usb.txt
  0.000000     [D1] upsdrvctl commanding all drivers (1 found): (null)
   0.000025     [D1] Starting UPS: 5p1550
   0.000052     [D2] 3 remaining attempts
   0.000056     [D2] exec:  /usr/local/ups/bin/mge-shut -DD -F -a 5p1550
   0.000060     [D1] Starting the only driver with explicitly requested foregrounding mode, not forking
   0.000000     Debug level is 2, dump data count is off, but backgrounding mode requested as off
   0.000010     [D1] Network UPS Tools version 2.8.0-2306-g3d9630e0f (release/snapshot of 2.8.0.1) built with gcc (Debian 12.2.0-14) 12.2.0 and configured with flags: --with-user=nut --with-group=nut
   0.000014     [D1] debug level is '2'
   0.000606     [D1] Succeeded to become_user(nut): now UID=116 GID=124
   0.000623     [D2] Initializing an USB-connected UPS with library (null) (NUT subdriver name='SHUT communication driver' ver='0.86')
   0.000628     [D1] upsdrv_initups (SHUT)...
   0.000632     [D2] libshut_open: using port /dev/ttyS5
   0.000844     [D2] entering shut_synchronise()
   0.039776     [D2] Communication with UPS established
   0.039891     [D2] entering shut_get_descriptor(n 01, 18)
   0.122637     [D2] shut_wait_ack(): ACK received
   0.374285     [D2] shut_wait_ack(): ACK received
   0.554276     [D2] shut_wait_ack(): ACK received
   0.806314     [D2] shut_wait_ack(): ACK received
   0.976156     [D2] - VendorID: 0463
   0.976186     [D2] - ProductID: ffff
   0.976190     [D2] - Manufacturer: EATON
   0.976194     [D2] - Product: Eaton 5P
   0.976198     [D2] - Serial Number: [hidden]
   0.976215     [D2] - Bus: serial
   0.976220     [D2] - Device release number: 0100
   0.976245     [D2] Device matches
   0.976250     [D2] entering shut_get_descriptor(n 21, 9)
   1.076380     [D2] shut_wait_ack(): ACK received
   1.151487     [D2] HID descriptor retrieved (Reportlen = 20046)
   1.151530     [D2] HID descriptor too long 20046 (max 6144)
   1.151548     No matching HID UPS found
Network UPS Tools - Generic HID driver 0.50 (2.8.0-2306-g3d9630e0f)
SHUT communication driver 0.86
Warning: This is an experimental driver.
Some features may not function correctly.

Not happy with this I tried to compile the program again, this time disabling usb support.

/usr/local/src/nut$ sudo ./upsdrvctl -F -D -D -d start > /tmp/without-usb.txt 2>&1
/usr/local/src/nut$ cat /tmp/without-usb.txt
   0.000000     [D1] upsdrvctl commanding all drivers (1 found): (null)
   0.000025     [D1] Starting UPS: 5p1550
   0.000050     [D2] 3 remaining attempts
   0.000054     [D2] exec:  /usr/local/ups/bin/mge-shut -DD -F -a 5p1550
   0.000078     [D1] Starting the only driver with explicitly requested foregrounding mode, not forking
   0.000000     Debug level is 2, dump data count is off, but backgrounding mode requested as off
   0.000010     [D1] Network UPS Tools version 2.8.0-2306-g3d9630e0f (release/snapshot of 2.8.0.1) built with gcc (Debian 12.2.0-14) 12.2.0 and configured with flags: --with-user=nut --with-group=nut --without-usb
   0.000013     [D1] debug level is '2'
   0.000603     [D1] Succeeded to become_user(nut): now UID=116 GID=124
   0.000619     [D2] Initializing an USB-connected UPS with library (null) (NUT subdriver name='SHUT communication driver' ver='0.86')
   0.000625     [D1] upsdrv_initups (SHUT)...
   0.000629     [D2] libshut_open: using port /dev/ttyS5
   0.000836     [D2] entering shut_synchronise()
   0.039760     [D2] Communication with UPS established
   0.039788     [D2] entering shut_get_descriptor(n 01, 18)
   0.122703     [D2] shut_wait_ack(): ACK received
   0.375185     [D2] shut_wait_ack(): ACK received
   0.554448     [D2] shut_wait_ack(): ACK received
   0.806238     [D2] shut_wait_ack(): ACK received
   0.976128     [D2] - VendorID: 0463
   0.976159     [D2] - ProductID: ffff
   0.976165     [D2] - Manufacturer: EATON
   0.976170     [D2] - Product: Eaton 5P
   0.976175     [D2] - Serial Number: [hidden]
   0.976180     [D2] - Bus: serial
   0.976185     [D2] - Device release number: 0100
   0.976191     [D2] Device matches
   0.976196     [D2] entering shut_get_descriptor(n 21, 9)
   1.076158     [D2] shut_wait_ack(): ACK received
   1.151045     [D2] HID descriptor retrieved (Reportlen = 20046)
   1.151076     [D2] HID descriptor too long 20046 (max 6144)
   1.151082     No matching HID UPS found
Network UPS Tools - Generic HID driver 0.50 (2.8.0-2306-g3d9630e0f)
SHUT communication driver 0.86
Warning: This is an experimental driver.
Some features may not function correctly.

Surprisingly upsdrvctl still tries to manage /dev/ttyS5 as an Usb device.

@jimklimov
Copy link
Member

Huh, got the problem now. Looked at it and missed, seeing USB-ish identifiers in the dump from 2.7.4 driver with serial code-path and remembering how those devices are sort of dual-protocol internally.

@jimklimov
Copy link
Member

So that could be some fallout from #300 (IIRC) adding support for libusb1.x and reshuffling lots of code around it.

@shining-fnml
Copy link
Author

If you need more tests just drop a line here :-)

@jimklimov
Copy link
Member

Preliminary guess is that something went awry with past refactoring that could impact #if(n)def SHUT_MODE and actually defining that mode in some build codepaths (e.g. in drivers/Makefile.am it went from -DSHUT_MODE to -DSHUT_MODE=1). I'll try to make a branch and PR after revising, maybe rewording, those clauses...

Another idea could be that this is some fallout from stricter types used (e.g. uint8_t or size_t instead of relaxed platform-dependent int)...

Otherwise, Bus: serial seen in all your cases looks promising (as not USB mode at least there) :)

Finally, noting that "SHUT" stands for "Serial HID UPS Transfer" - hence the codebase shared with native USB access. Same HID model over USB or Serial wiring and signalling.

@jimklimov
Copy link
Member

Ok, so at least trying to report the usb driver version in non-usb mode was a rather cosmetic bug (message was posted outside the ifdef picking SHUT vs. USB builds), the subsequent upsdrv_initups (SHUT) confirms the code-path taken there.

@jimklimov
Copy link
Member

Another note: the report lengths returned are:

  • 2126 = 0x084E
  • 20046 = 0x4E4E

so wondering if some bytes get mixed up, e.g. if align_request() and BYTESWAP() for example kick in (and assuming they are are buggy?) or not...

@jimklimov
Copy link
Member

In fact there is a change here (IIRC due to compiler complaints about casting - so was simplified), gotta check if ultimately benign or not, in drivers/libshut.c::libshut_open():

  • 2.7.4:
	/* USB_LE16_TO_CPU(desc->wDescriptorLength); */
	desc->wDescriptorLength = buf[7] | (buf[8] << 8);
	upsdebugx(2, "HID descriptor retrieved (Reportlen = %u)", desc->wDescriptorLength);
  • 2.8.0:
	/* USB_LE16_TO_CPU(desc->wDescriptorLength); */
	desc->wDescriptorLength = (uint16_t)(buf[7]);
	desc->wDescriptorLength |= (((uint16_t)buf[8]) << 8);
	upsdebugx(2, "HID descriptor retrieved (Reportlen = %u)", desc->wDescriptorLength);

Equivalents in libusb1.c go closer to original:

rdlen1 = ((uint8_t)buf[7]) | (((uint8_t)buf[8]) << 8);
...
rdlen2 = ((uint8_t)p[7]) | (((uint8_t)p[8]) << 8);

@jimklimov
Copy link
Member

Assuming this is the problem and trying to explain it to myself, I can only guess that the original expectation was that we take a usb_ctrl_char (type of buf elements, which for libshut.h are unsigned char's. Then casting extends the char to uint16_t adding a zero high-order byte (where applicable for the CPU), and shifts the 8'th byte and or's its bits into the high-order byte of the cast 7'th byte. After bit-shifting, the new bits should be zeroes (which get ORed with the contents of 7'th byte).

What I guess does happen is that in fact it casts right in memory, so sees the neighboring byte, and shifts 8'th 08 over 7th 4E or something of the sort (at least, bit-wise 0x08 is a subset of 0x4E) :\ Possibly declaring that buffer as aligned has some such impact?

Also wondering if it would be safer to multiply and add in such cases where we want to turn wire bytes into CPU integers, and not go bitwise and shifting, to be surely CPU agnostic :\ <== CC @aquette @clepple WDYT?

jimklimov added a commit to jimklimov/nut that referenced this issue Aug 25, 2023
jimklimov added a commit to jimklimov/nut that referenced this issue Aug 25, 2023
…USB-connected UPS with library..." when not in SHUT_MODE [networkupstools#2022]

Signed-off-by: Jim Klimov <[email protected]>
jimklimov added a commit to jimklimov/nut that referenced this issue Aug 25, 2023
jimklimov added a commit to jimklimov/nut that referenced this issue Aug 25, 2023
…differently (closer to what libusbX.c do and NUT v2.7.4 did); bump driver version [networkupstools#2022]

Signed-off-by: Jim Klimov <[email protected]>
@jimklimov
Copy link
Member

Gave it an exploratory shot, can you please check out https://github.com/jimklimov/nut/tree/issue-2022 and build similarly to what you tested recently? In that same workspace you can:

:; git remote add jimklimov https://github.com/jimklimov
:; git checkout issue-2022
:; make

I believe it should pick up your older configuration and rebuild it... If not - retry according to official instructions :D

@clepple
Copy link
Member

clepple commented Aug 25, 2023

Also wondering if it would be safer to multiply and add in such cases where we want to turn wire bytes into CPU integers, and not go bitwise and shifting, to be surely CPU agnostic :\ <== CC @aquette @clepple WDYT?

If the types and casting aren't correct, it won't matter whether you multiply or shift.

I think the actual problem has to do with not packing this struct:

struct my_hid_descriptor {
        uint8_t  bLength;
        uint8_t  bDescriptorType;
        uint16_t bcdHID;
        uint8_t  bCountryCode;
        uint8_t  bNumDescriptors;
        uint8_t  bReportDescriptorType;
        uint16_t wDescriptorLength;
};

I suspect the compiler is inserting a padding byte before wDescriptorLength. If the individual length bytes are read in one statement as in 2.7.4, it doesn't matter whether there is padding. With the 2.8.0 code, if the struct was packed, the two-step approach would work, but it seems like the first store to wDescriptorLength is clobbering the original buf[8] with the contents of buf[7]. (Edit: I don't think the two-step approach would work, actually.)

I think the real solution is not to alias the struct to the buffer, though I haven't looked too far to see what else that would break.

@shining-fnml
Copy link
Author

You fixed it!!!
I don't want to bother you with very verbose logs but in case you need them I will send them.
Here is a remind for myself if tomorrow I need to retype the commands you suggested to me.

$ git remote add jimklimov https://github.com/jimklimov/nut
$ git fetch jimklimov
$ git checkout remotes/jimklimov/issue-2022
$ make clean
$ ./configure --with-user=nut --with-group=nut
$ make

Thank you very much for the quickest fix ever seen by me on github.com!
I hope you will soon merge it to upstream so I can notify Debian package mantainer

Bye

@jimklimov
Copy link
Member

CC @bigon for Debian (AFAIK)

alexwbaule pushed a commit to alexwbaule/nut that referenced this issue Oct 4, 2023
alexwbaule pushed a commit to alexwbaule/nut that referenced this issue Oct 4, 2023
…USB-connected UPS with library..." when not in SHUT_MODE [networkupstools#2022]

Signed-off-by: Jim Klimov <[email protected]>
Signed-off-by: Alex W Baulé <[email protected]>
alexwbaule pushed a commit to alexwbaule/nut that referenced this issue Oct 4, 2023
alexwbaule pushed a commit to alexwbaule/nut that referenced this issue Oct 4, 2023
…differently (closer to what libusbX.c do and NUT v2.7.4 did); bump driver version [networkupstools#2022]

Signed-off-by: Jim Klimov <[email protected]>
Signed-off-by: Alex W Baulé <[email protected]>
@mdonz
Copy link

mdonz commented Jan 11, 2024

FWIW,
I am new here and found this discussion only after much searching ...
Thanks for all your work on the nut project!
We were hit with nut stopping to work for serial monitored Eaton UPS models after upgrading to Debian 12 in December 2023 on several servers. Actually it was an upgrade for Proxmox VE server virtualisation (PVE) from version 7.x (Debian 11) to version 8.x (which included the nut upgrade to version 2.8). Nut on all other servers with either different UPS models or using USB monitoring cables continued to work after the upgrade but not the serial connected Eaton UPSes (= mge-shut driver).
For now we have copied over the previous driver file to get nut working again and we are looking forward to an updated mge-shut driver.

@clepple
Copy link
Member

clepple commented Jan 11, 2024

@mdonz is it possible to test 2.8.1 from Debian testing/unstable? https://packages.debian.org/search?keywords=nut

That might be useful for requesting a backport in Debian.

@mdonz
Copy link

mdonz commented Jan 11, 2024

Thanks for your reply. I have done and hope that I did it the right way. Did not want to enable the testing repository and downloaded the required packages and install manually, version 2.8.1-1:
nut, nut-client, nut-server, nut-cgi, libnutscan2 (and libmodbus which was a requirement)
All installed (kept all config files unchanged) and restated the nut services and this is working:

From systemctl status:
Jan 12 07:17:25 tkt250sv nut-driver@eaton-server-room[1298795]: SHUT communication driver 0.88
Jan 12 07:17:25 tkt250sv nut-driver@eaton-server-room[1298795]: Warning: This is an experimental driver.
Jan 12 07:17:25 tkt250sv nut-driver@eaton-server-room[1298795]: Some features may not function correctly.
Jan 12 07:17:34 tkt250sv mge-shut[1298889]: Startup successful
Jan 12 07:17:34 tkt250sv mge-shut[1298889]: upsnotify: failed to notify about state 2: no notification tech defined, will not spam more about it
Jan 12 07:17:34 tkt250sv systemd[1]: Started [email protected] - Network UPS Tools - device driver for eaton/server/room.
Jan 12 07:17:34 tkt250sv nut-driver@eaton-server-room[1298768]: Network UPS Tools - UPS driver controller 2.8.1
Jan 12 07:17:35 tkt250sv mge-shut[1298889]: sock_connect: enabling asynchronous mode (auto)
Jan 12 07:18:31 tkt250sv mge-shut[1298889]: sock_connect: enabling asynchronous mode (auto)
Jan 12 07:18:32 tkt250sv mge-shut[1298889]: WARNING: send_to_all: write 34 bytes to socket 6 failed (ret=-1), disconnecting: Broken pipe

Also:
root@tkt250sv:~# ls -la /lib/nut/mge*
-rwxr-xr-x 1 root root 241192 Dec 11 01:31 /lib/nut/mge-shut
-rwxr-xr-x 1 root root 204072 Jan 25 2023 /lib/nut/mge-shut.backup
-rwxr-xr-x 1 root root 139528 Dec 11 01:31 /lib/nut/mge-utalk

Should I be worried at this stage about the (new to me) mge-shut originating messages/warnings?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
impacts-release-2.8.0 Issues reported against NUT release 2.8.0 (maybe vanilla or with minor packaging tweaks) MGE MGE branded devices and (USB) chipsets, now part of Eaton; mostly bcmxcp drivers serial port
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants