-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kernel fails to boot (MBR) when built with gcc 10+ or upgraded to 6.7+ #83
Comments
I also reproduced the problem within Nix: gcc8 (8.5.0) works:
And with gcc10 (10.4.0), the kernel fails to boot. With gcc9 (9.5.0), the kernel boots correctly. So it seems like the problem is triggered by gcc10+. |
I read that one of the main changes in gcc 10 is to enable stack protection by default. Indeed, building the kernel on So I’m suspecting that our bootloader does not set up the stack correctly. I don’t know what the connection to Linux 6.7+ is yet, though. |
I was wondering why I couldn’t get SeaBIOS debug output to show up in qemu. Turns out that when I don’t use Arch’s qemu 8.1.2, but the qemu 7.2.0 I’m using on router7, I do get SeaBIOS debug output on stdout 🤦 Maybe a bug in newer versions, or the configuration changed. I attached the working and broken SeaBIOS debug output: qemu-boot.broken.txt, qemu-boot.working.txt The diff is: % diff -u /tmp/qemu-boot.working.txt /tmp/qemu-boot.broken.txt
--- /tmp/qemu-boot.working.txt 2024-01-13 08:14:14.705313715 +0100
+++ /tmp/qemu-boot.broken.txt 2024-01-13 08:14:22.355448379 +0100
@@ -1,4 +1,4 @@
-/tmp/qemu/bin/qemu-system-x86_64 -boot order=c,reboot-timeout=5000 -drive file=/tmp/gokr-boot1986672184,format=raw -net nic,macaddr=b8:27:eb:12:34:56 -usb -chardev stdio,id=seabios -device isa-debugcon,iobase=0x402,chardev=seabios
+/tmp/qemu/bin/qemu-system-x86_64 -boot order=c,reboot-timeout=5000 -drive file=/tmp/gokr-boot1338272335,format=raw -net nic,macaddr=b8:27:eb:12:34:56 -usb -chardev stdio,id=seabios -device isa-debugcon,iobase=0x402,chardev=seabios -s -S
qemu-system-x86_64: warning: hub 0 is not connected to host network
VNC server running on ::1:5900
SeaBIOS (version rel-1.16.1-0-g3208b098f51a-prebuilt.qemu.org)
@@ -106,5 +106,115 @@
NULL
Booting from Hard Disk...
Booting from 0000:7c00
-VBE mode info request: 100
+In resume (status=0)
+In 32bit resume
+Attempting a hard reboot
[…] |
Speaking of working with older software versions, here’s how to start a Docker container with Debian stretch, which contains qemu 2.8, a version in which single-stepping through the MBR works out of the box (bug report regarding more recent versions: https://gitlab.com/qemu-project/qemu/-/issues/141):
Then, on the host:
We can verify the kernel command line is loaded from
To understand the program flow, I set up breakpoints at each function of the bootloader:
The list of functions that are run with the working kernel:
The list of functions that are run with the broken kernel:
So the problem seems to be with loading the kernel from disk. |
Broken read_protected_mode_kernel:
Theory: extended mode is limited to 15 MB, and with the stack protector enabled, our kernel newly exceeds 15 MB of data to copy. I had previously tried padding the kernel to figure out if the size plays a role, but that happened at the wrong level: I padded the vmlinuz file in the FAT file system, but the relevant size is determined by the kernel header data structure, which contains the number of bytes the bootloader will copy. Here are some resources I found helpful: |
I wrote a blog post about this failure: https://michael.stapelberg.ch/posts/2024-02-11-minimal-linux-bootloader-debugging-story/ I’ll close this issue in favor of a tracking bug in the gokrazy repository about longer-term MBR bootloader changes: gokrazy/gokrazy#248 |
Update: I published a blog post about this issue: https://michael.stapelberg.ch/posts/2024-02-11-minimal-linux-bootloader-debugging-story/
rtr7/kernel#434 fails to boot in qemu and on the PC Engines apu2c4. Notably, the kernel doesn’t even seem to start — no “Decompressing linux” message is printed, and SeaBIOS just tries to boot over and over again.
There are multiple triggering conditions, it seems.
Even our current kernel version (6.6.10) fails to boot when built with Debian bullseye instead of Debian buster:
Looking at the versions:
I also tried Debian buster (gcc-8), but with binutils 2.35.2-2 from bullseye, and that still works.
I then tried Debian buster, but with gcc 10 and binutils 2.35.2-2, and the resulting kernel no longer boots.
I’m suspecting the problem is with the minimal MBR bootloader we use (https://github.com/gokrazy/internal/blob/main/mbr/bootloader.asm), because when telling qemu to boot the Linux kernel directly (without going through SeaBIOS), it does boot up correctly.
I verified that the printed vmlinuz and cmdline.txt LBAs point to the correct location. I also verified that a working kernel, padded to the size of the non-working kernel, still works correctly, so it seems like the size of the file is not an issue.
The text was updated successfully, but these errors were encountered: