Fix the bootloader hanging during uefi exit_boot_services #457

Wasabi375 · 2024-09-17T05:23:49Z

Fixes #400

Switch from using the "-bios" flag to using "-pflash" instead.
Using "-bios" is not recomended see:
https://lists.katacontainers.io/pipermail/kata-dev/2021-January/001650.html

This also allows us to easily update the ovmf-prebuild binaries.

This fixes #400. I'm not sure whether the issue was an outdated ovmf
version or the bios-pflash change. I can't tell what version is
published with the ovmf-prebuild crate, so I can't check.

Ovmf binaries are no longer provided by the ovmf-prebuild crate.
Instead we now download them from the ovmf-prebuild github releases of
the same crate. Unlike the crate, those are actually updated.

Tests will now fail with a timeout error after 2 minutes. That should bee more than enough to run any test.

This also switches to the new API of the uefi crate that was introduced in version 0.31. The old api was deprecated in 0.32 and will be removed in 0.33. See https://github.com/rust-osdev/uefi-rs/blob/ac19526656953c32e8e0ef7bc703f7700b151ae4/docs/funcs_migration.md and rust-osdev/uefi-rs#893

Switch from using the "-bios" flag to using "-pflash" instead. Using "-bios" is not recomended see: https://lists.katacontainers.io/pipermail/kata-dev/2021-January/001650.html This also allows us to easily update the ovmf-prebuild binaries. This fixes rust-osdev#400. I'm not sure whether the issue was an outdated ovmf version or the bios-pflash change. I can't tell what version is published with the ovmf-prebuild crate, so I can't check. Ovmf binaries are no longer provided by the ovmf-prebuild crate. Instead we now download them from the ovmf-prebuild github releases of the same crate. Unlike the crate, those are actually updated. Fixes rust-osdev#400

Wasabi375 · 2024-09-17T05:27:12Z

The test/runner/src/ovmf.rs file is based on https://github.com/rust-osdev/uefi-rs/blob/ac19526656953c32e8e0ef7bc703f7700b151ae4/xtask/src/qemu.rs#L65

I am 99% certain that this fixes the issue in #400. However I want to implement the same fix in my own kernel to make sure.

I also still need to update https://github.com/rust-osdev/bootloader/blob/main/docs/create-disk-image.md

Wasabi375 · 2024-09-17T06:51:15Z

Ok, I can confirm that using pflash works on my machine. I don't really understand why the CI tests fails. Maybe someone else can try this fix on their machine and give me feedback. The CI log is not really helpful.

Wasabi375 · 2024-09-17T06:54:05Z

I wonder if we should provide most of the code in test/runner/src/ovmf.rs as a library. Maybe as an updated version of ovmf-prebuild.
Otherwise this is a lot of code that needs to be copied into any project using the bootloader.

bjorn3 · 2024-09-17T07:50:54Z

tests/runner/src/lib.rs

@@ -105,6 +108,8 @@ where
        "-display",
        "none",
        "--no-reboot",
+        "-smp",
+        "8",


Blog os at least doesn't support more than 1 core.

Not sure why that matters here. This is just for the tests in this repo and has no impact on any kernel build with this bootloader. Also correct me if I am wrong but Blog Os should work fine on a multicore system. It will use just 1, but the other cores don't do anything unless specifically started by the kernel.

bjorn3 · 2024-09-17T07:54:17Z

tests/runner/src/lib.rs

+            .kill()
+            .expect("Qemu could not be killed after timeout");
+        panic!("Test timed out after 2 minutes");
+    };


Rust's default test runner never times out. At worst it shows a warning after 60s. A timeout means that if you suspend the testing for longer than 120s using ctrl-z, you get an unnnecessary test failure. Also depending on how complex the tests of your OS are you may genuinely need more than 120s on low-end hardware. On local systems you can always kill the test manually and on CI there should be a timeout from the CI provider anyway.

I don't mind removing this. I don't think the crtl-z case is common, but maybe that just comes down to preference. I never use that myself.
I mostly added this so I could easily run the tests locally in the background and get notified about failures.
This bug is random in nature so I ran the tests locally in a loop so I needed some kind of failure state.

As for CI the timeout is 30minutes and that seems excessive. Especially when like here the CPU is maxed out by the hanging test.

Also depending on how complex the tests of your OS are you may genuinely need more than 120s on low-end hardware.

That might actually explain the failing CI tests. So maybe I should remove this.

bjorn3 · 2024-09-17T07:59:30Z

tests/runner/Cargo.toml

+sha2 = "0.10.8"
+tar = "0.4.41"
+lzma-rs = "0.3.0"
+tempfile = "3.12.0"


Maybe this downloading logic should be added to the ovmf-prebuilt repo instead. There is already a stub crate for this in that repo anyway.

Related: rust-osdev/ovmf-prebuilt#38, nicholasbishop/ovmf-prebuilt@73d3492.

I agree. Maybe that should be part of a separate PR though.

FYI this is available now: https://crates.io/crates/ovmf-prebuilt/0.2.1

bjorn3 · 2024-09-17T08:01:49Z

tests/runner/src/omvf.rs

+
+        // Make a copy of the OVMF vars file so that it can be used
+        // read+write without modifying the original. Under AArch64, some
+        // versions of OVMF won't boot if the vars file isn't writeable.


Maybe reading the file and writing it to a new file would work to avoid an explicit permission change?

I'm not sure what you mean. I am creating a copy of the original file. Or are you referring to this?

// Necessary, as for example on NixOS, the files are read-only inside // the Nix store. #[cfg(target_os = "linux")] fs_err::set_permissions(&ovmf_vars, Permissions::from_mode(0o666))?;

I copied that from the uefi crates tests. I did not really look into this too much. I assume they had some issues that meant they needed to add this.

If you use fs::copy you make a copy of the file including pernissions and xattrs. It seems that here a copy of just the file contents ignoring file permissions would be a better option. That can be done by reading the file and writing it again.

I'm not convinced that would be any cleaner though. A simple call to set_permissions is cleaner in my opinion.
Also this way we get some possible advantages from COW file systems, etc.

At the very least make it #[cfg(unix)] then.

tests/runner/Cargo.toml

bjorn3 · 2024-09-17T08:03:27Z

tests/runner/src/omvf.rs

+use {std::fs::Permissions, std::os::unix::fs::PermissionsExt};
+
+/// Name of the ovmf-prebuilt release tag.
+const OVMF_PREBUILT_TAG: &str = "edk2-stable202311-r2";


This is a pretty outdated version, right?

Yes. It's about a year old. I chose this version as it is the one the uefi crate uses in their tests.
We can probably update this. It just wasn't a priority.

I just tested the latest tarball version(edk2-stable202405-r1) released to https://github.com/rust-osdev/ovmf-prebuilt.

It breaks the uefi pxe tests. I don't really know enough about pxe to debug this, but it looks like it does not find the bootloader and tries to start a uefi terminal.

Wasabi375 · 2024-09-18T04:55:28Z

Do you know how I can get the actual logs from qemu in the CI ? Right now the tests time after 30 minutes but I can't reproduce this locally.

bjorn3 · 2024-09-18T10:48:05Z

Cargo.lock

+
+[[package]]
+name = "windows-targets"
+version = "0.52.6"


This is now pulling in an extra copy of windows-targets. Maybe update the async-process dependency of bootloader to 2.3.0, which uses windows-targets 0.52.6 too?

Wasabi375 added 3 commits September 16, 2024 22:30

Run test with 8 processors

b8e2b9b

Tests will now fail with a timeout error after 2 minutes. That should bee more than enough to run any test.

Wasabi375 marked this pull request as ready for review September 17, 2024 06:51

bjorn3 reviewed Sep 17, 2024

View reviewed changes

tests/runner/Cargo.toml Outdated Show resolved Hide resolved

bjorn3 reviewed Sep 17, 2024

View reviewed changes

Wasabi375 added 2 commits September 18, 2024 01:14

Remove forced timeout in tests

034ddb6

Remove error handling dependencies from tests

1ef3241

bjorn3 reviewed Sep 18, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix the bootloader hanging during uefi exit_boot_services #457

Fix the bootloader hanging during uefi exit_boot_services #457

Wasabi375 commented Sep 17, 2024

Wasabi375 commented Sep 17, 2024

Wasabi375 commented Sep 17, 2024

Wasabi375 commented Sep 17, 2024

bjorn3 Sep 17, 2024

Wasabi375 Sep 17, 2024

bjorn3 Sep 17, 2024

Wasabi375 Sep 17, 2024

bjorn3 Sep 17, 2024

Freax13 Sep 17, 2024

Wasabi375 Sep 17, 2024

nicholasbishop Nov 7, 2024

bjorn3 Sep 17, 2024

Wasabi375 Sep 17, 2024

bjorn3 Sep 17, 2024 •

edited

Loading

Wasabi375 Sep 18, 2024

bjorn3 Sep 18, 2024

bjorn3 Sep 17, 2024

Wasabi375 Sep 17, 2024

Wasabi375 Sep 18, 2024

Wasabi375 commented Sep 18, 2024

bjorn3 Sep 18, 2024

Fix the bootloader hanging during uefi exit_boot_services #457

Are you sure you want to change the base?

Fix the bootloader hanging during uefi exit_boot_services #457

Conversation

Wasabi375 commented Sep 17, 2024

Wasabi375 commented Sep 17, 2024

Wasabi375 commented Sep 17, 2024

Wasabi375 commented Sep 17, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bjorn3 Sep 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Wasabi375 commented Sep 18, 2024

Choose a reason for hiding this comment

bjorn3 Sep 17, 2024 •

edited

Loading