-
-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
V255 stable batch #357
V255 stable batch #357
Conversation
Avoid passing a NULL message to sd_bus_message_is_signal(), to not trip over an assertion: [ 132.869436] H testsuite-82.sh[614]: + systemctl --no-block --check-inhibitors=yes soft-reboot [ 132.967386] H systemd[1]: Created slice system-systemd\x2dcoredump.slice. [ 133.018292] H systemd[1]: Starting inhibit.service... [ 133.122610] H systemd[1]: Started [email protected]. [ 133.163643] H systemd[1]: Started inhibit.service. [ 133.206836] H testsuite-82.sh[614]: + exec sleep infinity [ 133.236762] H systemd-logind[611]: The system will reboot now! [ 135.891607] H systemd-coredump[667]: [🡕] Process 663 (busctl) of user 0 dumped core. Stack trace of thread 663: #0 0x00007f2ec45e6acf raise (libc.so.6 + 0x4eacf) #1 0x00007f2ec45b9ea5 abort (libc.so.6 + 0x21ea5) #2 0x00007f2ec4b5c9a6 log_assert_failed (libsystemd-shared-255.so + 0x1ff9a6) #3 0x00007f2ec4b5dca5 log_assert_failed_return (libsystemd-shared-255.so + 0x200ca5) #4 0x00007f2ec4bb3df6 sd_bus_message_is_signal (libsystemd-shared-255.so + 0x256df6) #5 0x000000000040e478 monitor (busctl + 0xe478) #6 0x000000000040e82f verb_monitor (busctl + 0xe82f) #7 0x00007f2ec4b202cb dispatch_verb (libsystemd-shared-255.so + 0x1c32cb) #8 0x00000000004074fa busctl_main (busctl + 0x74fa) #9 0x0000000000407525 run (busctl + 0x7525) #10 0x000000000040ff67 main (busctl + 0xff67) #11 0x00007f2ec45d2d85 __libc_start_main (libc.so.6 + 0x3ad85) #12 0x00000000004044be _start (busctl + 0x44be) ELF object binary architecture: AMD x86-64 [ 136.141152] H dbus-daemon[634]: [system] Monitoring connection :1.2 closed. [ 136.152233] H systemd[1]: busctl.service: Main process exited, code=dumped, status=6/ABRT [ 136.153996] H systemd[1]: busctl.service: Failed with result 'core-dump'. The asertion in question: Assertion 'm' failed at src/libsystemd/sd-bus/bus-message.c:1015, function sd_bus_message_is_signal(). Aborting. We can get a NULL message here through sd_bus_process() -> bus_process_internal() -> process_running(), so let's handle this case appropriately. (cherry picked from commit b4a21d5)
Hwdb call for hidraw subsystem is missing and AV controller devices defined in hwdb.d/70-av-production.hwdb never get the proper permissions for /dev/hidraw*. This patch implements hwdb execution also for hidraw devices. (cherry picked from commit 43ee987)
Since in that case the event loop is already finished and we'd hit an assertion: [ 1295.993300] testsuite-75.sh[50]: + systemctl stop systemd-resolved.service [ 1296.005152] systemd-resolved[298]: Assertion 'e->state != SD_EVENT_FINISHED' failed at src/libsystemd/sd-event/sd-event.c:1252, function sd_event_add_io(). Aborting. Thread 1 (Thread 0x7f17d25e2940 (LWP 298)): #0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44 #1 0x00007f17d16ac8a3 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78 #2 0x00007f17d165c668 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 #3 0x00007f17d16444b8 in __GI_abort () at abort.c:79 #4 0x00007f17d2402d2d in log_assert_failed (text=<optimized out>, file=<optimized out>, line=<optimized out>, func=<optimized out>) at ../build/src/basic/log.c:968 #5 0x00007f17d240401c in log_assert_failed_return (text=text@entry=0x7f17d2533f13 "e->state != SD_EVENT_FINISHED", file=file@entry=0x7f17d25195d9 "src/libsystemd/sd-event/sd-event.c", line=line@entry=1252, func=func@entry=0x7f17d2567260 <__func__.140> "sd_event_add_io") at ../build/src/basic/log.c:987 #6 0x00007f17d24d011a in sd_event_add_io (e=0x55e5cb497270, ret=0x55e5cb4a5120, fd=fd@entry=26, events=events@entry=1, callback=callback@entry=0x55e5caff5466 <on_io_event>, userdata=0x55e5cb4a5110) at ../build/src/libsystemd/sd-event/sd-event.c:1252 #7 0x000055e5caff571c in manager_add_socket_to_graveyard (m=0x55e5cb43cf00, fd=26) at ../build/src/resolve/resolved-socket-graveyard.c:117 #8 0x000055e5cafd4253 in dns_transaction_close_connection (t=t@entry=0x55e5cb57c7d0, use_graveyard=use_graveyard@entry=true) at ../build/src/resolve/resolved-dns-transaction.c:78 #9 0x000055e5cafd8444 in dns_transaction_complete (t=t@entry=0x55e5cb57c7d0, state=state@entry=DNS_TRANSACTION_ABORTED) at ../build/src/resolve/resolved-dns-transaction.c:427 #10 0x000055e5cafc4969 in dns_scope_abort_transactions (s=s@entry=0x55e5cb4b1a70) at ../build/src/resolve/resolved-dns-scope.c:91 #11 0x000055e5cafc6aee in dns_scope_free (s=0x55e5cb4b1a70) at ../build/src/resolve/resolved-dns-scope.c:106 #12 0x000055e5cafe72d1 in link_free (l=0x55e5cb4a5160) at ../build/src/resolve/resolved-link.c:94 #13 0x000055e5cafedefc in manager_free (m=0x55e5cb43cf00) at ../build/src/resolve/resolved-manager.c:697 #14 0x000055e5caff99b6 in manager_freep (p=p@entry=0x7ffd71fab8f8) at ../build/src/resolve/resolved-manager.h:198 #15 0x000055e5caff9d66 in run (argc=argc@entry=1, argv=argv@entry=0x7ffd71faba78) at ../build/src/resolve/resolved.c:25 #16 0x000055e5caff9fe3 in main (argc=1, argv=0x7ffd71faba78) at ../build/src/resolve/resolved.c:99 Resolves: #30618 (cherry picked from commit ac1b7b9)
(cherry picked from commit 6f7936c)
Partially reverts 122f6f1 Fixes systemd/systemd#29938 (comment) (cherry picked from commit e14348c)
…path Before this commit, this field could spuriously contain the path of the swapfile. (cherry picked from commit 66b9956)
write_resume_config() logs error on its own. (cherry picked from commit fe33920)
(cherry picked from commit 79272d3)
…ve services The verb works only on running service units, so complete on that as the first parameter, and a local file as the second. The other parameters are inside the service namespace so we can't autocomplete from the outside, return early. (cherry picked from commit c24c63e)
Adds some new dns record types. Also, some types were inserted into the middle of the enum — this corrects an error where the enum constants for some of the record types previously held an incorrect value. (cherry picked from commit 818bb6f)
The similar check already exists in schedule_post_change(). The function is currently called at two places. - journal_file_open() in sd-journal: In this case, if the timer is not set up, then journal_file_post_change() will be called at the end of journal_file_append_entry(). So, the necessary task will be done sequentially when an journal entry is stored to the opened journal file. That is desired when the function is called at outside of the event loop. - server_open_journal() in journald: This is not called after we exit the event loop. So, we can safely do nothing in the function if the event loop is being finished or already finished. Fixes #30644. (cherry picked from commit 5b201ff)
(cherry picked from commit 1276e63)
udevadm lock did not propagate the return code from the child process because all positive values were treated as success. v2: Now 'udevadm test-builtin' ignores all positive return values from the builtin commands. Otherwise, as the hwdb builtin returns an positive value when a matching entry found, 'udevadm test-builtin hwdb' will fail. v3: Initialize partition table before calling 'sfdisk --delete'. Co-authored-by: Yu Watanabe <[email protected]> (cherry picked from commit ba340e2)
(cherry picked from commit 13a30c6)
(cherry picked from commit 22a8f00)
Fixes #30669. (cherry picked from commit a53082f)
Otherwise, IPv6 enable/disable setting may be changed after resolved is started. (cherry picked from commit 6e6b59e)
Currently, link_queue_request_safe(), which is a wrapper of request_new(), is called with a free function at - link_request_stacked_netdev() at netdev/netdev.c, - link_request_address() at networkd-address.c, - link_request_nexthop() at networkd-nexthop.c, - link_request_neighbor() at networkd-networkd.c. For the netdev case, the reference counter of the passed object is increased only when the function returns 1. So, on failure (with -ENOMEM) previously we unexpectedly dropped the reference of the NetDev object. Similarly, for Address and friends, the ownership of the object is moved to the Request object only when the function returns 1. And on failure, previously the object was freed twice. Also, netdev_queue_request(), which is another wrapper of request_new() potentially leaks memory when the same NetDev object is queued twice. Fortunately, that should not happen as the function is called only once per object. This fixes the above issue, and now the ownership or the reference counter of the object is changed only when it is succeeded with 1. (cherry picked from commit 6ba1474)
Closes #30699. (cherry picked from commit e0feaed)
…, imply a check for TPM2 We simply don't carry any userspace support for TPM1.2 in our tree, and we shouldn't given it's too weak by today's standards. Hence, if we check if we are booted in UKI measured boot mode, don't just check if we are booted in EFI, but also check that we have a TPM2 chip (as opposed to none or only a TPM1.2 chip). This is an alternative to #30652 but more comprehensive (and simpler), since it covers all invocations of efi_measured_uki(). Fixes: #30650 Replaces: #30652 (cherry picked from commit 03d808c)
…ction When we receive a goodby packet about a host, and we have a cache entry about the host, we do not immediately remove the cache entry, but update it with TTL 1. See RFC 6762 section 10.1 and 3755027. If we receive a request soon after the goodby packet, previously the entry was included in the known answers section of the reply. But such information should not be appended. Follow-up for 3755027. (cherry picked from commit 04d4086)
…TPM2_RC_VALUES If a TPM doesn't do ECC it could either return zero curves when asked for it, or it could simply fail with TPM2_RC_VALUES because it doesn't recognize the capability at all. Handle both cases the same way. Fixes: #30679 (cherry picked from commit ae17fcb)
When KeepCarrier is set, networkd doesn't close tun/tap file descriptor preserving the active interface state, but doesn't disable its queue which makes kernel to think that it's still active and send packets to it. This patch disables the created queue right after tun/tap interface creation. Here is the steps to reproduce the bug: Having: systemd/network/10-tun-test.netdev: [NetDev] Name=tun-test Kind=tun [Tun] MultiQueue=yes KeepCarrier=yes systemd/network/10-tun-test.network: [Match] Name=tun-test [Network] DHCP=no IPv6AcceptRA=false LLMNR=false MulticastDNS=false Address=172.31.0.1/24 app.c: #include <fcntl.h> #include <stdio.h> #include <string.h> #include <unistd.h> #include <linux/if.h> #include <sys/ioctl.h> #include <linux/if_tun.h> int main() { int fd; struct ifreq ifr; memset(&ifr, 0, sizeof ifr); strcpy(ifr.ifr_name, "tun-test"); ifr.ifr_flags = IFF_TUN | IFF_NO_PI | IFF_MULTI_QUEUE; if((fd = open("/dev/net/tun", O_RDWR)) < 0) { perror("Open error"); return 1; } if(ioctl(fd, TUNSETIFF, &ifr)) { perror("Configure error"); return 1; } puts("Ready."); char buf[1500]; while(1) { int size = read(fd, buf, sizeof buf); if(size < 0) { perror("Read error"); return 1; } printf("Read %d bytes.\n", size); } return 0; } Run: * gcc -o app app.c && ./app * ping -I tun-test 172.31.0.2 Before the patch the app shows no pings, but after it works properly. (cherry picked from commit 0e1ab22)
Resolves: #30703 (cherry picked from commit 519f007)
(Hopefully) a temporary workaround for #30573 where starting a user session when PID 1 is rate limited stalls even after it leaves the rate limited state: [ 11.658201] H systemd[1]: Sent message type=signal sender=n/a destination=n/a path=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=UnitRemoved cookie=4208 reply_cookie=0 signature=so error-name=n/a error-mes> [ 11.658233] H systemd[1]: Event source 0x559babdd8bb0 (mount-monitor-dispatch) left rate limit state. [ 101.562697] H busctl[784]: Failed to get credentials: Transport endpoint is not connected [ 101.563480] H systemd[1]: systemd-journald.service: Got notification message from PID 300 (WATCHDOG=1) [ 101.563725] H testsuite-74.sh[784]: BusAddress=unixexec:path=systemd-run,argv1=-M.host,argv2=-PGq,argv3=--wait,argv4=-pUser%3dtestuser,argv5=-pPAMName%3dlogin,argv6=systemd-stdio-bridge,argv7=-punix:path%3d%24%7bXDG_RUNTIME_DIR%7d/bus [ 101.564136] H systemd[1]: Successfully forked off '(sd-expire)' as PID 787. [ 101.564754] H systemd[1]: Successfully forked off '(sd-expire)' as PID 788. [ 101.564831] H testsuite-74.sh[381]: + echo 'Subtest /usr/lib/systemd/tests/testdata/units/testsuite-74.busctl.sh failed' The issue appeared after ee07fff which does a bunch of mounts/umounts that get PID 1 into a rate limited state, and is frequent enough to be annoying, so let's temporarily bump the rate limit to alleviate that. (cherry picked from commit c707e34)
The current check checks for n_sigbus_queue being greater than or equal to SIGBUS_QUEUE_MAX, when it should be just greater than as n_sigbus_queue being SIGBUS_QUEUE_MAX indicates that the queue is full, but not overflowed. (cherry picked from commit b4a9d19)
(cherry picked from commit 72bbd74)
Aaaand for the C8S networkd fail 1ce2ffa should help. |
NFT sets must be installed before starting networkd, otherwise some sets may be installed too late. Closes #30427 (cherry picked from commit 1ce2ffa)
That looks familiar, didn't you fix it in main @mrc0mmand ? |
Yup, that should be 68785c7. |
Otherwise we might occasionally hit the start rate limit, as we restart the service a bunch of times: [ 3702.280886] testsuite-75.sh[1135]: + tee /tmp/tmp.wUL8bkJwrt [ 3702.283684] testsuite-75.sh[1135]: {} [ 3702.284254] testsuite-75.sh[46]: + restart_resolved [ 3702.284302] testsuite-75.sh[46]: + systemctl stop systemd-resolved.service [ 3702.310678] testsuite-75.sh[1140]: + systemctl is-failed systemd-resolved.service [ 3702.316766] testsuite-75.sh[1141]: inactive [ 3702.316998] testsuite-75.sh[46]: + systemctl start systemd-resolved.service [ 3702.322315] systemd[1]: systemd-resolved.service: Start request repeated too quickly. [ 3702.322343] systemd[1]: systemd-resolved.service: Failed with result 'start-limit-hit'. [ 3702.322609] systemd[1]: Failed to start systemd-resolved.service - Network Name Resolution. [ 3702.323619] systemctl[1142]: Job for systemd-resolved.service failed. [ 3702.323839] systemctl[1142]: See "systemctl status systemd-resolved.service" and "journalctl -xeu systemd-resolved.service" for details. [ 3702.325035] systemd[1]: testsuite-75.service: Failed with result 'exit-code'. [ 3702.325391] systemd[1]: Failed to start testsuite-75.service - Tests for systemd-resolved. Follow-up for b1384db and 6ef512c. (cherry picked from commit 68785c7)
Any idea about this one?
|
No idea so far, but I can reproduce it locally with both this branch and v255-stable, so we're missing something from main. Hopefully I'll be able to bisect my way to the answer. |
So, the the winner is 5592608 from systemd/systemd#30532 (and the whole PR would need to be backported to make it work properly). For it to backport cleanly* one would also need to backport the two commits from systemd/systemd#30614. I'll leave the decision up to you if it's something that should be backported. * there's one docs conflict in b16c607, but that's easily fixable manually |
Thanks - that seems risky, can we fix the test instead? Or just disable the subtest? It must have worked at some point, did we backport something by mistake? |
I suspect this is some kernel change, since it fails with fresh v255 on Arch as well. Maybe @yuwata will know more here. |
The udev PRs are not necessary to be backported, as it changes (rather than fixes) minor behavior on timeout, and hence the test was also updated in the PR. There's no kernel change related to the PRs. |
We would use alloca to extend the format string with "\n". We do this automatically in order to not forget appending the newline everywhere. We can simplify the whole thing by using a macro to append the newline instead, which means that we don't need to copy the string. Because we concatenate the string argument with another literal string, we know it must a literal string. Thus it's not a problem that it is "evaluated" two times. Quoting Hristo Venev: > Since commit f5e757f, mhd_respond() adds a > newline to its argument before passing it on to mhd_respond_internal(). This > is done via an alloca()-allocated buffer. However, MHD_RESPMEM_PERSISTENT is > given as a flag to MHD_create_response_from_buffer(), leading to a > use-after-free later when the response is sent. Replacing > MHD_RESPMEM_PERSISTENT with MHD_RESPMEM_MUST_COPY appears to fix the issue. MHD_RESPMEM_MUST_COPY would work, but we also use mhd_respond() for mhd_oom(), and we don't want to allocate in an oom scenario in order to maximize the possibility that an answer will be delivered. Using the macro magic makes this nicer and we get rid of the code doing alloca. Fixes an issue reported by Hristo Venev. Fixes systemd/systemd#9858. (cherry picked from commit 320ff93)
As for the TEST-17-UDEV fail, I'd just ignore it for this batch. I'd like to look into it more, which might take some time, and as said before it's not caused by this patchset, since it fails with fresh v255 as well. |
Tests are ran by consumers though, we do full runs now in Debian for example, so they shouldn't fail, can't we skip the subtest in this branch? |
The test fails on newer kernels, but it appears to be a test issue, not an issue in our code. Let's temporarily disable the test until a root cause is found. See: #357 (comment)
Yeah, sure, pushed a commit that skips the test for now. |
If the test failures were caused by the patches in this PR, then it would be reasonable to fix them. But I don't think we should backport two fairly big pull requests to fix failures not caused by the patches here. Also, I don't think we should suppress tests which are actually failing for legitimate reasons. This is something to do downstream. |
Franta pushed the commit while I was typing this. Let's run the CI with the commit. The decision whether to include it in the tag can be made later. |
Interestingly enough, the test fails even on fresh v254 on latest Arch, which is sad, since I wanted to blame sd-executor. |
udev before #30532 may kill the worker process together with a slow program, and when running with sanitizers the resulting coredump might be too big to fit into journal (or the space currently available for journal): [ 30.086194] systemd-journald[330]: Failed to write entry to /var/log/journal/e87de9ccbacf4b88924ff6d9ecaaa82d/system.journal (50 items, 68326399 bytes) despite vacuuming, ignoring: Argument list too long This then makes the test fail, as it checks for the presence of the coredump. Since we don't really care about the coredump in this specific case (as it is an expected one), let's just temporarily override the testsuite-wide Storage=journal and store the coredumps externally. This is a v255-stable-only patch, since after #30532 the test no longer checks for coredumps.
c41e090
to
0697be0
Compare
I found out what's going on, so I dropped the original commit that skipped the test and replaced it with a new one that makes the test happy (together with some explanation of what's going on). The commit is needed only in v255-stable (and potentially older stable releases), as after systemd/systemd#30532 the test no longer checks for coredumps, so the commit is no longer necessary. |
The current Testing Farm fail is unrelated (it's a qemu segfault, not our fault, seen this one a couple of times already in other jobs). |
Thanks - I think this is looking good now, anything else left to do? |
I'm not sure in what state the Ubuntu CI currently is, but if it's still fubar I'd go ahead and merge this, before the infra changes its mind. |
It is, I was told yesterday that a fix for the status reporting is under review |
No description provided.