Skip to content

pipcet/pearl

Repository files navigation

Pearl

GNU/Linux on Apple M1-based hardware, including, in particular, a MacBook Pro called “pearl”. No relationship to Perl, though we do use that.

Project Status

This project presumably never achieved usability sufficient for people other than myself; while I continue to use it daily, I’m continuing it mostly as a matter of personal education (yes, the dreaded “learning experience” thing) rather than in any realistic hope others will be convinced to use it.

Still, this repository contains everything you need to build and run Debian GNU/Linux on an original M1 machine.

Currently, the following models should work:

  • MacBook Pro 13 inch (2020)
  • MacBook Air 13 inch (2020)
  • Mini (2020) if running macOS 11.x firmware
  • Mini (2020) running macOS Monterey firmware

The following are all not expected to work:

  • 2021 MacBooks (14 inch, 16 inch, M1 Max/Pro)

The hardware I actually have is a MacBook Pro 13 inch (2020) running macOS 12.1 firmware, and a mini currently running macOS 12.4 firmware.

Obligatory Warning

Please consider that this is experimental software and I’m unable to give warranties of any kind, as clearly stated in the respective licenses. Since this is experimental software running on undocumented hardware, there is some extra probability that it may destroy data or, hypothetically so far, render your device unusable.

What?

We distribute a (rather large) Perl script, which can be installed on an M1-based macOS machine in Recovery Mode and which starts a usable Linux kernel, in addition to being able to do other things. The distribution includes a minimal Debian root file system and a copy of the Debian installer.

The kernel uses the work done by Corellium, but based on and including drivers upstreamed by the Asahi Linux project. Hopefully, it will also include further drivers and features absent from the other two projects.

Differences

It would be much nicer to have one project rather than three of them, but there appears to have been a falling out between contributors to the other two projects. The last I’ve heard is that the Asahi Linux project’s main contributor (or benevolent dictator) has decided not to include any code from (or credit for) Corellium in the drivers to be submitted to upstream Linux. In addition, it has been said that even simply looking at those drivers and using them as a source of documentation is frowned upon.

In addition, the FDT device trees used by the Asahi Linux project are going to be very different from the one used by Corellium.

The Asahi Linux project has decided to go with a long chain of complicated and somewhat independent boot loaders as part of their boot process: it has been suggested that this sequence might look like this:

  • iBoot
  • m1n1
  • U-Boot
  • GRUB
  • Linux

By contrast, the Pearl boot process looks like this:

  • iBoot
  • Linux
  • Linux
  • Linux

That means we only need to write device drivers once, and we can use all devices that have a Linux driver.

Testing

Testers wanted! Please contact me at [email protected] if you have M1 hardware and want to try Pearl.

How?

This repository contains somewhat minimalistic code to turn a Linux kernel image (such as arch/arm64/boot/Image) into a Mach-O object file which can be loaded by iBoot. There is some boilerplate code because iBoot does not allow Mach-O kernels to make alignment requests, but apart from that, we jump straight into the kernel.

That initial kernel does not rely on the existence of a complete device tree; instead, it sets up a minimal device tree (but it does so in the kernel, with all library functions available and no need for duplicating them in the boot loader) which contains the ADT range, the kernel launches userspace, userspace code transforms the ADT into an FDT, and userspace finally kexecs a second kernel which makes use of the FDT to talk to hardware.

That second kernel can then perform as much or as little I/O as the user desires, using standard Linux device drivers, to load a third kernel.

For debugging and development, the second kernel can also load something different, such as m1n1 or U-Boot or GRUB (another option would be MacOS, but that don’t work yet), but that’s not what the ordinary boot process would look like.

As a special case, the second kernel can load an updated version of itself; this is important because the first version is signed by kmutil and is difficult to replace, while the second version need not be and can be provided over the net straight from a development machine.

What is installed using kmutil is a Mach-O containing:

  • minimal boilerplate code to realign the image
  • a Linux kernel image for the first kernel, containing:
    • an initrd; containing:
      • a Linux kernel image for the second kernel
      • tools to manipulate the device trees
      • kexec, to execute the second kernel
      • [optionally] a default Linux kernel image for the third kernel
      • [optionally] code for a boot menu
      • [optionally] lvm2, cryptsetup, etc.
      • [optionally] m1n1, as a Mach-O file
      • [optionally] U-Boot, as a Linux image file
      • [optionally][some day] the MacOS kernel, as a Mach-O file

Interaction

The first kernel does not have device drivers and cannot interact with the user in any meaningful fashion; it cannot even reboot the machine.

The second kernel does have device drivers, so it can be controlled using the keyboard; it also presents a USB gadget on one of the USB Type C ports which can be used to control it or upload a third-stage kernel. Unfortunately, I do not have hardware with fixed Ethernet connectivity, but it would certainly make sense to boot over the network. Booting over WiFi should also be supported at some point, though it requires additional blobs.

initramfs

There is a single compressed initramfs included in the Mach-O image. The second kernel is launched with a second initramfs which is constructed by the first kernel, and mostly identical to the first initramfs. The third kernel is also used with an initramfs, which is provided along with that kernel.

Blobs

Unfortunately, the WiFi module requires firmware which currently needs to be copied from the MacOS installation. For legal reasons, we can’t provide it.

Similarly, the MacOS kernel itself cannot be redistributed.

Furthermore, WiFi requires knowing the ESSID and passphrase for a network, and asking for that on every boot is annoying.

So I’m working on a facility to include an arbitrary “blobs” tarball containing those files, to be created by the user as part of the build process.

SMP

The protocol used to start secondary CPU cores on M1 machines cannot be restarted in the obvious fashion: once the CPU is running, it can never go back to behaving as it did initially.

It’s possible to virtualize this, or otherwise expose a mechanism for a started CPU to go back to a spin table whence it can jump into another kernel, but that requires code duplication.

Instead, Pearl simply leaves the starting of the secondary CPUs to the final kernel, leaving them in the same state as iBoot. That means less performance for the first stages, but that’s not expected to become a problem.

Why Not?

The Pearl images do not use m1n1 or U-Boot as part of the normal boot process. There are many reasons for this, but the main reason is that the historic reasons for the existence of boot loaders do not apply to the M1 platform.

Objections

“Hardware initialization belongs in the boot loader”

No. It doesn’t. Linux should accept hardware in whatever reasonable state it’s in, and the boot loader should be free to leave hardware in any reasonable state.

“Converting the Apple Device Tree to the Linux FDT should happen in the boot loader”

No. It’s a non-trivial task, requiring code which pulls in many dependencies (to do it properly, at least). It’s best left to userspace, and it’s entirely possible to launch the initial userspace without an FDT.

“Without the FDT, we don’t even know where the frame buffer is”

The minimal device tree set up by the kernel itself is based on the boot args structure which does specify the frame buffer parameters.

“Your Mach-O files are too large”

It’s true that they are larger than they would otherwise be, but I don’t think that’s relevant at this point. A complete image will be somewhere in the 30-40 MB range, still much less than MacOS. Images which contain Debian root file systems are obviously larger.

“Your code can’t use printf”

There is no code that would need to. All we do is realignment (and that’s an unfortunate iBoot limitation), then we’re in the kernel image and use whatever printing functions are enabled there.

“Your code can’t show a logo”

I consider that a good thing.

“We need to be able to use the same kernel image with different boot loader binaries”

I don’t understand this point at all. Changing a kernel should be trivial, it’s changing the boot loader that is cumbersome and hard to do, requiring physical interaction with the MacOS Recovery Mode. We should minimize having to do that.

“We shouldn’t support dual-booting because iBoot does”

I see absolutely no reason to cooperate with the Apple boot process more than necessary. The right approach here is to install a single Mach-O “kernel” image, then never touch it again. Long-pressing the power button is annoying and unnecessary; it should not be required for ordinary day-to-day use of MacOS and Linux.

“We need m1n1 to experiment with hardware”

We do, which is why you can kexec m1n1.

See Also

Works, but appears to have been abandoned for now.

Still in the early stages.

This works well enough to load OpenBSD from disk, apparently!

Contact

Feel free to contact me at [email protected], on GitHub, or in any of the other usual ways. So far, there has been too little communication rather than too much of it.

Technical things

Device Tree representations

Both Linux FDT device trees and Apple ADT device trees represent hierarchies of nodes containing other nodes and leaf properties; each property has a name and contains a sized array of untyped data.

I’ve found it convenient to represent them in a simple text format containing lines like this:

top.middle.lower.property = <0x12345678>

It’s much easier to manipulate such lines using standard GNU/Linux utilities.

Binary parsing

We need to parse, without including too many dependencies, various binary data:

  • Mach-O images
  • the bootargs structure
  • ADT
  • FDT
  • Linux image files

I’ve decided on using a simple ad-hoc perl solution for that, rather than including Python in the initramfs. It is, however, a little nicer than the built-in unpack function: ADTs are represented as:

sub adtnode() {
    struct [
	count(props => u32),
	count(nodes => u32),
	props => repeat(\&adtprop),
	nodes => repeat(\&adtnode),
	]
}

sub adtprop() {
    struct [
	name => string 32,
	size => size(data => u32),
	data => data,
	align(4)
	]
}

Compression

I’m trying to compress everything once only, but currently the initial kernel image is actually uncompressed; the initramfs is compressed, though.

USB Gadget

The second stage presents a USB gadget exposing an ACM interface (which is piped to a shell) and a mass storage interface (which can be used to write an image to be unpacked and executed). There’s also an Ethernet device, but that’s not used yet.

ACM “protocol”

Piped directly to a shell.

Commfile protocol

Quite simple, but sophisticated enough to prevent writing to random USB devices, and also to ensure writes are not reordered to the point where we try to boot a partially-loaded kernel.

The commfile is currently limited to 1 GB.

kexec

We’re using kexec-tools without any additional modifications, but the kernel includes patches by @mzyngier to properly exclude reserved memory ranges as possible locations for kernel images.

CI/CD

We’re reusing some CI/CD scripts I’ve written for other occasions, so artifacts (one including Debian, one not including it) are produced automatically by pushes to the main branch, and releases containing those files are produced automatically by pushes to the release branch. That means that the precise files included in the automatic releases haven’t been tested, and often the similar artifact versions haven’t been tested, either.

ADT Tunables

There are a number of ADT properties representing, in one of several ad-hoc formats, “tunables”, which describe bits to be set and cleared in MMIO space. The Corellium pre-loader code translates those to a common format to be applied by Linux drivers, increasing the number of ad-hoc tunable representations by one.

We do the same thing, but we use userspace code rather than pre-loader code.

One particular issue is the existence of fuse maps which represent values to be copied from one register to another. What the Corellium code does is to read the source registers at pre-loader time; our current code reads them from userspace using /dev/mem. Ideally, we’d read them only when they’re actually applied, but that would necessitate yet another ad-hoc format to encode them.

Boot matrix

Is a direct boot supported from <row> to <column>?

iBootstage2m1n1U-BootGRUBLinux
iBootnoyesyesnonoyes
stage2noyesyesyesnoyes
m1n1noyesyesyesnoyes
U-Bootnoyesno?yesyes
GRUBnoyes*nononoyes*
Linuxnononononono

(* - requires EFI_STUB build, untested)

Problems

macho size

Yes, I’m aware that the macho files are currently significantly larger than the final estimate given above. I need to find time to investigate this; for the time being, people might have to simply accept the larger files.

USB ports

The main problem, right now, is that USB ports cannot be switched freely between host, gadget, and power modes. Right now, it is fixed that the first port is always in power mode, the second port is in gadget mode during the second stage and in host mode otherwise.

This will need to be changed to support Macs with USB keyboards, though those tend to have enough USB ports that this shouldn’t be a problem.

Backlight brightness level

MacOS appears to have a bug which makes it reset the backlight level to “very dim” when it is booted. With MacOS 12, it is no longer possible to use nvram to set the right level in Recovery Mode. There is an experimental driver in the tree to set backlight brightness in the usual way.

That boot chime

It’s possible to disable the somewhat annoying boot chime using nvram or the MacOS configuration utilities.

Page size

We’re currently using 16 KB pages, while most distribution kernels use 4 KB pages. The CPU’s MMU supports using 4 KB pages, but it appears the IOMMU does not, and the Linux code assumes identical page sizes for both.

x8r8g8b8

When initialized by iBoot, the framebuffer is in x2r10g10b10 mode, which works fine for the boot loaders and Linux but isn’t supported by X.org. Putting the framebuffer into x8r8g8b8 mode works, but it means colors will be off either before or after the switch. We currently have code to switch the framebuffer to x8r8g8b8 mode early in the boot process.

Chromium

Chromium currently does not appear to work.

Contact

Feel free to contact me at [email protected], on GitHub, or in any of the other usual ways. So far, there has been too little communication rather than too much of it.