Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linux 6.11 compatibility #16400

Closed
wants to merge 8 commits into from
Closed

Linux 6.11 compatibility #16400

wants to merge 8 commits into from

Conversation

robn
Copy link
Member

@robn robn commented Jul 31, 2024

Motivation and Context

One must imagine Sisyphus happy.

Closes #16396.

Description

See individual commits.

Note that most of this is applying block queue API changes to zvol. However, since the code is getting very complex, I've done a bit of cleanup in a way that will change when and how some queue settings are applied. That has the potential to break things that already work on older kernels. I doubt any fallout will be significant, but do take care on review and testing.

How Has This Been Tested?

Compiled and passed basic sanity check (create, fio, export, import, scrub, unload) on kernels:

  • 4.9.337
  • 4.14.336
  • 5.10.217
  • 6.1.94
  • 6.4.16
  • 6.6.31
  • 6.8.10
  • 6.9.1
  • 6.10.0-rc1
  • 6.11.0-rc1

Full ZTS run on 6.11.0-rc1 currently in progress; should be finished by the morning. I'll update this post when its done.

Update: test suite passed (within normal flakiness bars of my setup).

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

Copy link
Contributor

@tonyhutter tonyhutter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @robn, I was able to build this against 6.11rc1 without issue.

One must imagine Sisyphus happy.

😆 😆 😆

@satmandu
Copy link
Contributor

satmandu commented Jul 31, 2024

When I build this against the zfs-2.2.5 patchset I see memcpy warnings in the dmesg on 6.11.0-rc1:

sudo dmesg | grep memcpy
[   29.145783] memcpy: detected field-spanning write (size 28) of single field "xattrstart" at /var/lib/dkms/zfs/2.2.5~rc12/build/module/zfs/zfs_log.c:817 (size 0)
[   29.784994] memcpy: detected field-spanning write (size 15) of single field "(char *)(lr + 1)" at /var/lib/dkms/zfs/2.2.5~rc12/build/module/zfs/zfs_log.c:514 (size 0)
[   29.879852] memcpy: detected field-spanning write (size 82) of single field "(char *)(lr + 1) + snamesize" at /var/lib/dkms/zfs/2.2.5~rc12/build/module/zfs/zfs_log.c:515 (size 0)
[   30.276174] memcpy: detected field-spanning write (size 82) of single field "lr + 1" at /var/lib/dkms/zfs/2.2.5~rc12/build/module/zfs/zfs_log.c:425 (size 0)
[   40.155294] memcpy: detected field-spanning write (size 8) of single field "lr + 1" at /var/lib/dkms/zfs/2.2.5~rc12/build/module/zfs/zfs_log.c:461 (size 0)
[   49.231905] memcpy: detected field-spanning write (size 3) of single field "(char *)(lr + 1)" at /var/lib/dkms/zfs/2.2.5~rc12/build/module/zfs/zfs_log.c:593 (size 0)
[   49.234481] memcpy: detected field-spanning write (size 3) of single field "(char *)(lr + 1) + snamesize" at /var/lib/dkms/zfs/2.2.5~rc12/build/module/zfs/zfs_log.c:594 (size 0)

Example:

[   16.313176] ------------[ cut here ]------------
[   16.314654] memcpy: detected field-spanning write (size 28) of single field "xattrstart" at /var/lib/dkms/zfs/2.2.5~rc12/build/module/zfs/zfs_log.c:817 (size 0)
[   16.317672] WARNING: CPU: 1 PID: 748 at /var/lib/dkms/zfs/2.2.5~rc12/build/module/zfs/zfs_log.c:817 zfs_log_setsaxattr+0x115/0x120 [zfs]
[   16.319319] systemd[1]: Started systemd-journald.service - Journal Service.
[   16.321002] Modules linked in: coretemp(+) msr parport_pc ppdev lp parport drm efi_pstore nfnetlink dmi_sysfs ip_tables x_tables autofs4 zfs(POE) spl(OE) z3fold lz4 lz4_compress hid_apple hid_generic uas usbhid usb_storage hid crct10dif_pclmul crc32_pclmul ghash_clmulni_intel sha512_ssse3 nvme sha256_ssse3 sha1_ssse3 thunderbolt xhci_pci tg3 nvme_core xhci_pci_renesas video wmi aesni_intel crypto_simd cryptd
[   16.322580] CPU: 1 UID: 0 PID: 748 Comm: systemd-random- Tainted: P           OE      6.11.0-rc1 #1
[   16.330526] Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[   16.330527] Hardware name: Apple Inc. MacBookPro11,3/Mac-2BD1B31983FE1663, BIOS 432.60.3.0.0 10/27/2021
[   16.330529] RIP: 0010:zfs_log_setsaxattr+0x115/0x120 [zfs]
[   16.335476] Code: eb bf 80 3d 8c e1 0f 00 00 75 87 31 c9 48 c7 c2 e0 b6 cb c0 4c 89 c6 48 c7 c7 08 b4 cb c0 c6 05 70 e1 0f 00 01 e8 bb 6a d7 c3 <0f> 0b 4c 8b 45 c8 e9 5d ff ff ff 90 90 90 90 90 90 90 90 90 90 90
[   16.335479] RSP: 0018:ffffbdc280607910 EFLAGS: 00010286
[   16.335492] RAX: 0000000000000000 RBX: ffff96e4920ea600 RCX: 0000000000000000
[   16.335494] RDX: ffff96e7ef0af200 RSI: ffff96e7ef0a18c0 RDI: ffff96e7ef0a18c0
[   16.335495] RBP: ffffbdc280607948 R08: 0000000000000000 R09: 0000000000000003
[   16.335496] R10: ffffbdc280607768 R11: ffffffff8674f108 R12: ffff96e4a113a340
[   16.335497] R13: ffff96e4a0f39000 R14: ffff96e4831870c0 R15: ffff96e48319e0e0
[   16.335499] FS:  00007ff23c19b440(0000) GS:ffff96e7ef080000(0000) knlGS:0000000000000000
[   16.335501] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   16.335502] CR2: 00007ff23c11f400 CR3: 000000011f240003 CR4: 00000000001706f0
[   16.335504] Call Trace:
[   16.335505]  <TASK>
[   16.335507]  ? show_regs+0x64/0x70
[   16.335514]  ? __warn+0x88/0x130
[   16.335537]  ? zfs_log_setsaxattr+0x115/0x120 [zfs]
[   16.336548]  ? report_bug+0x171/0x1a0
[   16.336576]  ? irq_work_queue+0x2f/0x60
[   16.336587]  ? handle_bug+0x44/0x90
[   16.336595]  ? exc_invalid_op+0x18/0x70
[   16.336599]  ? asm_exc_invalid_op+0x1b/0x20
[   16.336621]  ? zfs_log_setsaxattr+0x115/0x120 [zfs]
[   16.370582]  zfs_sa_set_xattr+0x358/0x3d0 [zfs]
[   16.372019]  zpl_xattr_set_sa.isra.0+0x104/0x1e0 [zfs]
[   16.373394]  ? zpl_xattr_get_sa+0xb5/0x160 [zfs]
[   16.374743]  zpl_xattr_set+0x328/0x370 [zfs]
[   16.375343]  zpl_xattr_user_set+0x11e/0x150 [zfs]
[   16.375990]  __vfs_removexattr+0x81/0xb0
[   16.376008]  __vfs_removexattr_locked+0xd1/0x190
[   16.376010]  vfs_removexattr+0x59/0x100
[   16.381791]  __do_sys_fremovexattr+0x122/0x1b0
[   16.381805]  __x64_sys_fremovexattr+0x15/0x20
[   16.381807]  x64_sys_call+0x1c35/0x2060
[   16.381810]  do_syscall_64+0x69/0x110
[   16.381813]  ? __handle_mm_fault+0x82f/0x1100
[   16.381817]  ? __count_memcg_events+0x5c/0xf0
[   16.381820]  ? count_memcg_events.constprop.0+0x1e/0x40
[   16.388974]  ? handle_mm_fault+0xaf/0x2e0
[   16.388987]  ? do_user_addr_fault+0x238/0x6a0
[   16.388990]  ? irqentry_exit_to_user_mode+0x2f/0x170
[   16.388993]  ? irqentry_exit+0x3b/0x50
[   16.388995]  ? exc_page_fault+0x90/0x190
[   16.388998]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   16.389000] RIP: 0033:0x7ff23bb1e83b
[   16.389002] Code: 73 01 c3 48 8b 0d dd 45 0e 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 c7 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ad 45 0e 00 f7 d8 64 89 01 48
[   16.389004] RSP: 002b:00007ffe6cdfe6b8 EFLAGS: 00000246 ORIG_RAX: 00000000000000c7
[   16.389006] RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007ff23bb1e83b
[   16.389007] RDX: 000000000000001a RSI: 000055a76109708b RDI: 0000000000000005
[   16.389008] RBP: 00007ffe6cdfe7f0 R08: 000000007901a95b R09: 000000004619ea52
[   16.389009] R10: 00007ffe6cdfe520 R11: 0000000000000246 R12: 0000000000000004
[   16.389009] R13: 0000000000000001 R14: 000055a76109708b R15: 0000000000000001
[   16.389011]  </TASK>
[   16.401969] ---[ end trace 0000000000000000 ]---
[   16.424051] systemd-journald[721]: Received client request to flush runtime journal.
[   16.426711] mc: Linux media interface: v0.10
[   16.435055] systemd-journald[721]: /var/log/journal/e2e9be5528c34d9cbd594fba09a7389c/system.journal: Journal file uses a different sequence number ID, rotating.
[   16.436970] systemd-journald[721]: Rotating system journal.
[   16.440575] ------------[ cut here ]------------
[   16.441548] memcpy: detected field-spanning write (size 15) of single field "(char *)(lr + 1)" at /var/lib/dkms/zfs/2.2.5~rc12/build/module/zfs/zfs_log.c:514 (size 0)
[   16.443570] WARNING: CPU: 2 PID: 721 at /var/lib/dkms/zfs/2.2.5~rc12/build/module/zfs/zfs_log.c:514 do_zfs_log_rename+0x120/0x150 [zfs]
[   16.445833] Modules linked in: videobuf2_common mc coretemp msr parport_pc ppdev lp parport drm efi_pstore nfnetlink dmi_sysfs ip_tables x_tables autofs4 zfs(POE) spl(OE) z3fold lz4 lz4_compress hid_apple hid_generic uas usbhid usb_storage hid crct10dif_pclmul crc32_pclmul ghash_clmulni_intel sha512_ssse3 nvme sha256_ssse3 sha1_ssse3 thunderbolt xhci_pci tg3 nvme_core xhci_pci_renesas video wmi aesni_intel crypto_simd cryptd
[   16.450372] CPU: 2 UID: 0 PID: 721 Comm: systemd-journal Tainted: P        W  OE      6.11.0-rc1 #1
[   16.451579] Tainted: [P]=PROPRIETARY_MODULE, [W]=WARN, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[   16.452795] Hardware name: Apple Inc. MacBookPro11,3/Mac-2BD1B31983FE1663, BIOS 432.60.3.0.0 10/27/2021
[   16.454034] RIP: 0010:do_zfs_log_rename+0x120/0x150 [zfs]
[   16.454746] Code: 01 00 e9 56 ff ff ff 4c 8b 7d b8 31 c9 48 c7 c2 b0 b3 cb c0 48 c7 c7 08 b4 cb c0 c6 05 bc f9 0f 00 01 4c 89 fe e8 00 83 d7 c3 <0f> 0b 4c 89 fa eb 8b 48 8b 75 b0 31 c9 48 c7 c2 58 b4 cb c0 48 c7
[   16.454765] RSP: 0018:ffffbdc28077b758 EFLAGS: 00010282
[   16.454792] RAX: 0000000000000000 RBX: 000000000000003f RCX: 0000000000000027
[   16.454797] RDX: ffff96e7ef1218c8 RSI: 0000000000000001 RDI: ffff96e7ef1218c0
[   16.454798] RBP: ffffbdc28077b7a8 R08: 0000000000000000 R09: 0000000000000003
[   16.454799] R10: ffffbdc28077b5b0 R11: ffffffff8674f108 R12: ffff96e483f1f938
[   16.454801] R13: ffff96e48525d800 R14: ffff96e48525d878 R15: 000000000000000f
[   16.454802] FS:  00007f85fcc07440(0000) GS:ffff96e7ef100000(0000) knlGS:0000000000000000
[   16.454803] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   16.454804] CR2: 00007f85fa7ffd78 CR3: 000000011f24c001 CR4: 00000000001706f0
[   16.454807] Call Trace:
[   16.454814]  <TASK>
[   16.454820]  ? show_regs+0x64/0x70
[   16.454839]  ? __warn+0x88/0x130
[   16.454854]  ? do_zfs_log_rename+0x120/0x150 [zfs]
[   16.456657] videodev: Linux video capture interface: v2.00
[   16.459137]  ? report_bug+0x171/0x1a0
[   16.479671]  ? handle_bug+0x44/0x90
[   16.479675]  ? exc_invalid_op+0x18/0x70
[   16.479676]  ? asm_exc_invalid_op+0x1b/0x20
[   16.479680]  ? do_zfs_log_rename+0x120/0x150 [zfs]
[   16.484558]  ? do_zfs_log_rename+0x120/0x150 [zfs]
[   16.485916]  zfs_log_rename+0x18/0x20 [zfs]
[   16.486599]  zfs_rename+0x1155/0x16a0 [zfs]
[   16.489405]  ? spl_kmem_free+0x2d/0x40 [spl]
[   16.489428]  zpl_rename2+0x9f/0x190 [zfs]
[   16.491682]  vfs_rename+0x780/0xba0
[   16.491687]  ? apparmor_path_rename+0x80/0x370
[   16.491690]  do_renameat2+0x5db/0x660
[   16.491697]  __x64_sys_rename+0x40/0x50
[   16.491699]  x64_sys_call+0x2034/0x2060
[   16.491701]  do_syscall_64+0x69/0x110
[   16.491705]  ? __x64_sys_rt_sigprocmask+0x7e/0xe0
[   16.491920] facetimehd 0000:04:00.0: Found FaceTime HD camera with device id: 1570
[   16.492770] facetimehd 0000:04:00.0: Setting 64bit DMA mask
[   16.500820]  ? syscall_exit_to_user_mode+0x53/0x1b0
[   16.500823]  ? do_syscall_64+0x75/0x110
[   16.500826]  ? __set_task_blocked+0x29/0x80
[   16.500828]  ? sigprocmask+0x81/0xd0
[   16.500830]  ? __x64_sys_rt_sigprocmask+0x7e/0xe0
[   16.500832]  ? syscall_exit_to_user_mode+0x53/0x1b0
[   16.500834]  ? do_syscall_64+0x75/0x110
[   16.500836]  ? __do_sys_clone+0x66/0x90
[   16.507335]  ? syscall_exit_to_user_mode+0x53/0x1b0
[   16.508048]  ? do_syscall_64+0x75/0x110
[   16.508756]  ? __handle_mm_fault+0xc9d/0x1100
[   16.509437]  ? do_syscall_64+0x75/0x110
[   16.510132]  ? __rcu_report_exp_rnp+0x2d/0xc0
[   16.510810]  ? rcu_report_exp_cpu_mult+0x66/0xd0
[   16.511459]  ? rcu_exp_handler+0x8c/0xf0
[   16.512103]  ? __flush_smp_call_function_queue+0x101/0x410
[   16.512764]  ? irqentry_exit_to_user_mode+0x2f/0x170
[   16.513425]  ? irqentry_exit+0x3b/0x50
[   16.514068]  ? sysvec_call_function_single+0x4a/0xa0
[   16.514645] facetimehd 0000:04:00.0: S2 PCIe link init succeeded
[   16.514724]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   16.515480] facetimehd 0000:04:00.0: Refclk: 25MHz (0xa)
[   16.516035] RIP: 0033:0x7f85fd4661cb
[   16.517362] Code: c0 48 8b 5d f8 c9 c3 0f 1f 84 00 00 00 00 00 b8 ff ff ff ff eb eb 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 52 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 19 cc 19 00 f7 d8
[   16.518745] RSP: 002b:00007ffd26d0fea8 EFLAGS: 00000202 ORIG_RAX: 0000000000000052
[   16.519458] RAX: ffffffffffffffda RBX: 00007f85fba00000 RCX: 00007f85fd4661cb
[   16.520174] RDX: 0000000000000000 RSI: 0000564f41a51b70 RDI: 0000564f41a53610
[   16.520926] RBP: 00007ffd26d0fef0 R08: 0000000000000078 R09: 0000000000000003
[   16.521670] R10: 0000000000000000 R11: 0000000000000202 R12: 0000564f41a5d5e0
[   16.522427] R13: 0000000000000000 R14: 0000564f41a53610 R15: 00007ffd26d0ff20
[   16.523187]  </TASK>
[   16.523915] ---[ end trace 0000000000000000 ]---
[   16.524661] ------------[ cut here ]------------

The relevant zfs_log.c it is complaining about is this: https://github.com/openzfs/zfs/blob/544b6c1593874eacd73390970cde6c45a5817cfa/module/zfs/zfs_log.c

@robn
Copy link
Member Author

robn commented Jul 31, 2024

Yeah, I saw these too in the ZTS run. As far as I can tell they're part of the kernel FORTIFY stuff; we're memcpy()'ing into a region that the compiler can't prove is big enough. Though 6.10 had this check as well, and I don't recall it making noise, so maybe there's something deeper (like, does the kernel now enable FORTIFY for rc builds?).

I suggest they're probably worth looking into, but separately - they're static analysis warnings, and need to be assessed and fixed or silenced as appropriate. They're not part of this patch set as such.

@robn robn force-pushed the linux-6.11-compat branch 2 times, most recently from d863574 to b753d5a Compare August 2, 2024 08:00
@robn
Copy link
Member Author

robn commented Aug 2, 2024

I understand what the warnings are about now.

When built with CONFIG_FORTIFY_SOURCE, 6.11 is checking that the destination for memcpy has enough room. The various lr_XX_t structs usually have extra allocation and data after them, but this isn't visible to the compiler, so it reports the target memory regions as too small.

I have a couple of prototype patches to address this. Mostly, it's adding flex arrays to structs that have an extra allocation after them, but some rework is required for lr_create_t and lr_rename_t, as Clang doesn't support flex arrays anywhere but at end of struct. There's also some code change required, because so many places do raw pointer math to get past the end of struct, and that doesn't always work out exactly the same with the necessary type changes.

Fortunately there's no actual bugs (assuming our existing math is right, which I assume it is), it's just log noise. Still needs fixing to avoid bug reports (and also a little extra rigor doesn't hurt), but no real issue here.

Hopefully I can post a patch that isn't too horrible in the next few days.

@greg-hydrogen
Copy link

greg-hydrogen commented Aug 3, 2024

I am getting an error when trying to compile with this patch on fedora 39
make[7]: *** No rule to make target '/tmp/zfs-build-greg-EkU79TuP/BUILD/zfs-kmod-2.2.99/_kmod_build_6.11.0-rc1/module/os/linux/spl/spl-atomic.o', needed by '/tmp/zfs-build-greg-EkU79TuP/BUILD/zfs-kmod-2.2.99/_kmod_build_6.11.0-rc1/module/spl.o'. Stop

this was triggered by running on the latest main branch
make pkg-kmod

@prometheanfire prometheanfire mentioned this pull request Aug 4, 2024
13 tasks
@ascendbeing
Copy link

ascendbeing commented Aug 5, 2024

I just tried this patch set. I used 6.11rc2, never tried rc1.
When trying to power off my PC, it's unresponsive. It's off the network but it's not letting me do magic sysrq s u b to safely shutdown. And it's not shutting down further if I press the button etc.
So maybe take a quick look over if 6.11rc2 changed anything of note? Linus made some particular comments in the release announce that indicate something like that could have been done (removing "noise").
I have been using HEAD with 6.10. It powered off fine there with gentoo and xanmod patch sets. Going to have to not use this one until it's matured a bit.

@satmandu
Copy link
Contributor

satmandu commented Aug 5, 2024

I just tried this patch set. I used 6.11rc2, never tried rc1. When trying to power off my PC, it's unresponsive.

I have had success using this patch with both 6.11.0-rc1 and 6.11.0-rc2 (using my own locally built kernels on x86_64 machines) using Ubuntu 24.04, with my own customized release which is the 2.2.5-staging PR with this PR on top of it: https://launchpad.net/~satadru-umich/+archive/ubuntu/zfs-experimental/

I am not having problems with a shutdown. Reboots are working just fine for me.

@ascendbeing
Copy link

ascendbeing commented Aug 5, 2024

I guess I'll have to try what you did except I'll base it on zfs-2.2.5-hutter. I didn't think to do this because the 6.10 patch set just worked.
EDIT: still not shutting off properly. This time I tried zfs-2.2.5-hutter w/ this patch on top of it. Kernel is 6.11-pf.

@robn
Copy link
Member Author

robn commented Aug 6, 2024

@ascendbeing any interesting console output when this happens? What if you export the pool then unload the zfs module before reboot?

I built against -rc2 this morning. My very first bench check run suspended the pool during scrub, but half a dozen runs since have been fine. I don't really trust my bench check for this, it is .. quirky, and this isn't unheard of behaviour for it. But, coincidences and all that.

I'm yet to self-review this PR, but I'll do it soon, and review the rc1->rc2 changes at the same time.

@ascendbeing
Copy link

ascendbeing commented Aug 10, 2024

@ascendbeing any interesting console output when this happens? What if you export the pool then unload the zfs module before reboot?

I have 3 pools. My rootfs being one of them. I am using openrc not systemd so if there's any mechanism short of installing systemd to grab the output (I use metalog I think? with stock config basically), I'm willing to try it.

What specifically is happening is I issue a poweroff command and then instead of fully shutting down, monitor turns off and won't turn back on, and PC won't turn off, won't respond to magic sysrq (I have it 0x1 enabled), only way to turn it off is hard power off. Doesn't respond to additional button presses, no network connectivity.

Edit also yes I've tried waiting a really long time (30+ min. possibly up to several hours)

@robn
Copy link
Member Author

robn commented Aug 11, 2024

@ascendbeing I mean, your system logs the kernel messages somewhere presumably?

Not much we can do to debug this without a console. From your description that's getting extremely late in the shutdown process, to the extent that the keyboard handler can't invoke sysrq operations? A serial console might be the go.

I'm not saying its not OpenZFS at fault, but it would be extremely low-level, like, it hasn't released some resource somewhere. I can't off the top of my head think what that would be.

@robn
Copy link
Member Author

robn commented Aug 11, 2024

@wmmur you've already opened #16433 about this. This PR is about 6.11 compatibility on the master branch, not about 6.10 or 2.2.5. If you've got a failing build with this PR against a 6.11-rc kernel, please provide more information. Otherwise I suggest its probably off-topic.

@robn robn closed this Aug 11, 2024
@robn robn reopened this Aug 11, 2024
@robn
Copy link
Member Author

robn commented Aug 11, 2024

@wmmur I understand, however this PR is building fine for me on 6.11-rc2 and that file on the 6.11 branch has not changed since May. So clearly there's more going on, and its probably the same thing that is causing your 6.10 build to fail too. So, unless you have new information specific to 6.11, lets just keep it over there for the moment, otherwise we'll end up with bits of conversation in both places, which gets confusing.

@ascendbeing
Copy link

ascendbeing commented Aug 11, 2024

I can't off the top of my head think what that would be.

I recently added "reboot=efi,warm mce=bootlog" to my boot arguments. Shuts down fine on 6.10. I also have power profiles daemon broken(crashes on startup somehow) if that matters.

I have a raspberry 3b if that can be used to do stuff like acquire really late messages etc. or I could connect another PC via 2.5GbE Ethernet if that works. not to pcie or tb. 2.5GbE on crashy machine to 2.5GbE on other rig

If I have to I can snapshot the rootfs and install systemd

@robn
Copy link
Member Author

robn commented Aug 11, 2024

@ascendbeing sorry, you've got a lot more going on there than I know anything about, so I'm not really able to help much more. Divide and conquer is the game from here; try to find out if OpenZFS is the cause or something else.

(With the root on OpenZFS, if it were me I'd be trying to set up a live USB with this kernel and try to reproduce the situation that way, where there's a reasonable chance I can unload zfs.ko safely while the system is still up. But any of your ideas might work if you know how to do them).

@satmandu
Copy link
Contributor

Using this patch on top of 2.2.5, things are looking fine for me with kernel 6.11.0-rc3 on both Ubuntu 24.04 and Ubuntu 24.10/dev.

@ascendbeing
Copy link

ascendbeing commented Aug 12, 2024

@robn The crash on shutdown doesn't occur on 6.10. It happens on HEAD and zfs-2.2.5-hutter w/ original version patch when used with 6.11. The original version of this patch doesn't affect shutting down with 6.10. Probably only with rootfs on 6.11 only. I don't have a spare drive to try to install the same kernel config kernel with non rootfs ATM, but I will probably try the new version eventually. against 2.2.5 release probably. 1 chunk doesn't apply to the hutter 2.2.5 fork anymore.

It gets hairier again in Linux 6.11, so I want some actual theory of
operation laid out for next time.

Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
In 6.11 struct queue_limits gains a 'features' field, where, among other
things, flush and write-cache are enabled. Detect it and use it.

Along the way, the blk_queue_set_write_cache() compat wrapper gets a
little cleanup. Since both flags are alway set together, its now a
single bool. Also the very very ancient version that sets q->flush_flags
directly couldn't actually turn it off, so I've fixed that. Not that we
use it, but still.

Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
It's no longer available directly on the request queue, but its easy to
get from the attached disk.

Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Detect it, and use a macro to make sure we always match the prototype.

Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Apply them with with the rest of the settings.

Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
These fields are very old, so no detection necessary; we just move them
into the limit setup functions.

Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Since the change to folios it has just been a wrapper anyway. Linux has
removed their wrapper, so we add one.

Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
@robn
Copy link
Member Author

robn commented Aug 13, 2024

@ascendbeing yep, if you can find out anything at all I can look into it, but right now its very much looking like something strange in your particular setup.

If its useful, here's this patch series atop the released 2.2.5 tag: https://github.com/robn/zfs/commits/linux-6.11-compat-2.2.5/. I've compiled it but not run it, ymmv.

@robn
Copy link
Member Author

robn commented Aug 13, 2024

@wmmur I already said that your build issue is unrelated to this PR. Please stop.

Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks for pushing this boulder a little bit father!

@behlendorf behlendorf added the Status: Accepted Ready to integrate (reviewed, tested) label Aug 13, 2024
behlendorf pushed a commit that referenced this pull request Aug 14, 2024
In 6.11 struct queue_limits gains a 'features' field, where, among other
things, flush and write-cache are enabled. Detect it and use it.

Along the way, the blk_queue_set_write_cache() compat wrapper gets a
little cleanup. Since both flags are alway set together, its now a
single bool. Also the very very ancient version that sets q->flush_flags
directly couldn't actually turn it off, so I've fixed that. Not that we
use it, but still.

Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16400
behlendorf pushed a commit that referenced this pull request Aug 14, 2024
It's no longer available directly on the request queue, but its easy to
get from the attached disk.

Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16400
behlendorf pushed a commit that referenced this pull request Aug 14, 2024
Detect it, and use a macro to make sure we always match the prototype.

Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16400
behlendorf pushed a commit that referenced this pull request Aug 14, 2024
Apply them with with the rest of the settings.

Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16400
behlendorf pushed a commit that referenced this pull request Aug 14, 2024
These fields are very old, so no detection necessary; we just move them
into the limit setup functions.

Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16400
behlendorf pushed a commit that referenced this pull request Aug 14, 2024
Since the change to folios it has just been a wrapper anyway. Linux has
removed their wrapper, so we add one.

Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16400
behlendorf pushed a commit that referenced this pull request Aug 14, 2024
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Closes #16400
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Sep 4, 2024
It gets hairier again in Linux 6.11, so I want some actual theory of
operation laid out for next time.

Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Closes openzfs#16400
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Sep 4, 2024
In 6.11 struct queue_limits gains a 'features' field, where, among other
things, flush and write-cache are enabled. Detect it and use it.

Along the way, the blk_queue_set_write_cache() compat wrapper gets a
little cleanup. Since both flags are alway set together, its now a
single bool. Also the very very ancient version that sets q->flush_flags
directly couldn't actually turn it off, so I've fixed that. Not that we
use it, but still.

Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Closes openzfs#16400
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Sep 4, 2024
It's no longer available directly on the request queue, but its easy to
get from the attached disk.

Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Closes openzfs#16400
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Sep 4, 2024
Detect it, and use a macro to make sure we always match the prototype.

Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Closes openzfs#16400
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Sep 4, 2024
Apply them with with the rest of the settings.

Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Closes openzfs#16400
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Sep 4, 2024
These fields are very old, so no detection necessary; we just move them
into the limit setup functions.

Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Closes openzfs#16400
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Sep 4, 2024
Since the change to folios it has just been a wrapper anyway. Linux has
removed their wrapper, so we add one.

Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Closes openzfs#16400
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Sep 4, 2024
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Sponsored-by: https://despairlabs.com/sponsor/
Closes openzfs#16400
@ascendbeing
Copy link

ascendbeing commented Sep 17, 2024

@robn just gonna leave you and all a thank you because I can report (over last few days I've confirmed, w/ final release) that 6.11 reboot hang is gone. My configuration has drifted a good deal, but when I reported about this here, I had tried changing my .config kinda radically, and this time it just works, without any flailing.

@robn
Copy link
Member Author

robn commented Sep 17, 2024

@ascendbeing well I'm glad to hear it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Kernel 6.11.0-rc1 ‘struct request_queue’ has no member named ‘backing_dev_info’
6 participants