Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] arm64 SDK #319

Open
1 of 2 tasks
vbatts opened this issue Jan 15, 2021 · 25 comments
Open
1 of 2 tasks

[RFC] arm64 SDK #319

vbatts opened this issue Jan 15, 2021 · 25 comments
Assignees
Labels
kind/feature A feature request

Comments

@vbatts
Copy link
Member

vbatts commented Jan 15, 2021

Current situation
the flatcar sdk is for an amd64 host. All arm64 image and binaries are cross-compiled.

Impact
now that powerful arm64 machines are readily available, it's currently not an option to use them for builds

Ideal future situation
A flatcar arm64 image can be built from an arm64 host.

TODO

  • Provide SDK for ARM64
  • Integrate ARM64 SDK into release pipeline and release SDK tarballs (and possibly SDK containers)
@vbatts vbatts added the kind/feature A feature request label Jan 15, 2021
@pothos
Copy link
Member

pothos commented Jan 18, 2021

A first step would be to align the USE flags and related things of the package setup to be equal on or independent of both architectures.
The SDK bootstrap needs a seed tar ball with a compiler – I guess the arm64 developer container can be used for that.

@jepio
Copy link
Member

jepio commented Jul 16, 2021

So I managed to get it working with the arm64 developer container as a base, however I would not recommend trying that a second time: the developer container has the wrong profile and CHOST, making the process trickier than necessary to begin with. After that it's all about keywords and fixing x86 specific assumptions.

The built SDK can be fetched from https://jepio.blob.core.windows.net/flatcar-arm64/stage4-arm64-2920.0.0+2021-07-13-1007.tar.bz2 https://jepio.azureedge.net/flatcar-arm64/2942.0.0/flatcar-sdk-arm64-2942.0.0+2021-07-27-0724.tar.bz2. The binary packages, intermediate stages and my hacked up developer container can be found in the same bucket (https://jepio.blob.core.windows.net/flatcar-arm64?restype=container&comp=list).

I believe most of the changes necessary to get the bootstrap working (starting from a reasonable seed) can be merged, I'll submit the PRs in the next weeks.

@vbatts
Copy link
Member Author

vbatts commented Jul 23, 2021

@jepio that's awesome! that was the path I began, but abandoned when I was trying from my pinebook. Looking forward to it being an option

@jepio
Copy link
Member

jepio commented Jul 23, 2021

Here's a bit more instructions for how to use this @dongsupark:

git clone https://github.com/kinvolk/mantle
pushd mantle
./build cork
# move to somewhere on your PATH
popd

# pull my key
gpg --keyserver hkps://keys.openpgp.org --recv-keys 3717D1B5C719A9BD

mkdir -p flatcar-sdk/.cache/sdks/
pushd flatcar-sdk
latest=$(curl https://jepio.azureedge.net/flatcar-arm64/latest.txt | awk '{ print $1 }')
wget -O .cache/sdks/flatcar-sdk-arm64-2942.0.0.tar.bz2 "https://jepio.azureedge.net/flatcar-arm64/${latest}"
wget -O .cache/sdks/flatcar-sdk-arm64-2942.0.0.tar.bz2.sig "https://jepio.azureedge.net/flatcar-arm64/${latest}.sig"
gpg --armor --export [email protected] >key.gpg
cork create --sdk-version 2942.0.0 --verify-key key.gpg
cork enter
git -C ~/trunk/src/scripts/ checkout jepio/arm64-sdk-support
git -C ~/trunk/src/third_party/coreos-overlay/ checkout jepio/arm64-sdk-support
./update_chroot
./boostrap_sdk --seed_tarball /mnt/host/source/.cache/sdks/flatcar-sdk-arm64-2942.0.0.tar.bz2

@pothos
Copy link
Member

pothos commented Jul 23, 2021

Currently the SDK sets up QEMU process emulation for arm64 and now you would need to set up the reverse direction because during compilation of some packages the build system has to run some amd64 binaries.

The key action is setting up QEMU_LD_PREFIX and the rest can actually be removed because the normal Debian/Fedora packing of qemu-user loads the qemu binary into RAM on boot and we don't need to set it up from the SDK with dynamic loading which does not cover all cases.

@jepio
Copy link
Member

jepio commented Jul 27, 2021

I agree.

It doesn't look like it's going to be as "easy" as crosscompiling arm64 from amd64 though, because some programs that are called from the SDK for image building (syslinux, x86-specific parts of grub) can't be built for arm64 natively.

My first objective is to get arm64 -> arm64 working, and get us into a state where we can support the SDK infrastructure on our servers. Without that, it's a pain to use.

@dongsupark
Copy link
Member

dongsupark commented Aug 11, 2021

In general it looks good.
Following @jepio's instruction, I was able to create an arm64 SDK on an arm64 host. (actually an arm64 VM on Mac M1)
From that tarball, I created another flatcar-sdk environment.
Inside that, I was able to successfully build an arm64 qemu image, with fantastic speed.

However some hacks or tweaks are needed.
I am just listing all, probably some of them are just my testing failure.

  • mantle/cork needed code change to skip verifying gpg signature. Not sure why.
  • mantle/cork needed code change to skip downloading SDK tarballs, because arm64 tarballs are not on flatcar-linux.net yet.
  • build_image failed with ERROR : Required WHITE_LIST items ld-linux-x86-64.so.2 not found!!!. That's because WHITELIST has only x86_64 linker. I am not sure how we should support multi-arch there.
  • After creating an qemu image, running flatcar_production_qemu_uefi.sh fails, because the script simply assumes the host is x86_64. e.g. accel=kvm.

@pothos
Copy link
Member

pothos commented Aug 11, 2021

For the last point: the script already checks for an arm64 host but it assumes that KVM is available, maybe you can enable nested virtualization for the host VM?

@jepio
Copy link
Member

jepio commented Aug 11, 2021

Nested virt is not available on M1 right now.

@dongsupark are you using qemu for virtualization? You could try to adapt the script to get it working when ran from the host. Probably some of the helper tools are not available, and instead of accel=kvm you need accel=hvf.

@dongsupark
Copy link
Member

are you using qemu for virtualization? You could try to adapt the script to get it working when ran from the host. Probably some of the helper tools are not available, and instead of accel=kvm you need accel=hvf.

I am using UTM, basically a wrapper around qemu.
UTM is already passing accel=hvf as expected.
Still inside the guest Linux VM, the Flatcar script fails.
Anyway don't worry, that's not super critical, I could find out other options for testing qemu images. ;-)

@dongsupark
Copy link
Member

As for the multi-arch issue in generate_au_zip, I ended up writing a PR like that: flatcar/scripts#141

@jepio
Copy link
Member

jepio commented Apr 12, 2022

I think this is what was needed to create a "seed" from a development container https://gist.github.com/jepio/7ee539b768f7a33953d137d0ff7c6abe.

@chewi
Copy link
Contributor

chewi commented May 22, 2024

I've had a fresh go at achieving this.

Flatcar has been using Catalyst 3, but Gentoo have masked this now in favour of 4.0-rc1. One benefit of the new version is that it can leverage qemu-user to build for other architectures. Updating Flatcar's scripts for Catalyst 4 was quite tricky, as a lot has changed, but we were going to have to do it sooner or later.

I then kicked off a build using a vanilla Gentoo arm64 stage3 as a seed. It turned out that the seed needed git installed because of cros-workon.eclass, so I added that, although a fallback for when git is not installed yet might be a good idea.

I made it past stage1 before hitting a bug in Catalyst. I've now fixed this and am facing some USE conflicts involving curl, openssl, and rust. I'm not sure why, as it even happens for a native amd64 build using a Flatcar seed, but seemingly only with Catalyst 4. I'm still investigating.

@ader1990 also expressed interest in a riscv build, which I would like to see. It should be possible to use the same approach once a riscv Portage profile has been created, although a lot of the scripts have code like "if amd64, do thing natively, else if arm64, do thing via QEMU". These will need improving for the SDK to actually be usable.

@chewi
Copy link
Contributor

chewi commented May 22, 2024

Figured it out. It's not a new problem. I'd been doing bootstrap_sdk stage2 because I'd already built stage1, but that causes it start from the latest SDK, not the stage1 you built earlier. Simply doing bootstrap_sdk would have skipped rebuilding stage1 anyway unless I'd added --rebuild.

@chewi
Copy link
Contributor

chewi commented May 22, 2024

Damn, I've been caught out by the vanilla Gentoo seed and our package tree not quite aligning. Specifically, Perl cannot find libcrypt.so.2, part of libxcrypt. Gentoo migrated to libxcrypt 2½ years ago, so we're quite behind there.

@dongsupark
Copy link
Member

@chewi Looks like it is a good chance to resurrect the open PR flatcar/scripts#1732.

@ader1990
Copy link

I think this is what was needed to create a "seed" from a development container https://gist.github.com/jepio/7ee539b768f7a33953d137d0ff7c6abe.

I tried to run the workflow accordingly and I got this error when running this command:

./run_sdk_container -x "ci-cleanup.sh" -C flatcar-sdk-import:${VERSION} sudo -E ./bootstrap_sdk --seed_tarball flatcar-sdk-arm64-${VERSION}.tar.bz2

>>> Failed to emerge app-misc/ca-certificates-3.82 for /tmp/stage1root/, Log file:
>>>  '/mnt/host/source/src/build/catalyst/log/app-misc:ca-certificates-3.82:20240528-112105.log'
>>> Installing (122 of 127) virtual/tmpfiles-0-r1::portage-stable to /tmp/stage1root/
--2024-05-28 11:21:06--  http://mirror.release.flatcar-linux.net/portage-stable/distfiles/layout.conf
Resolving mirror.release.flatcar-linux.net... 147.75.87.17
Connecting to mirror.release.flatcar-linux.net|147.75.87.17|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2024-05-28 11:21:06 ERROR 404: Not Found.

!!! Couldn't download '.layout.conf.mirror.release.flatcar-linux.net'. Aborting.
--2024-05-28 11:21:06--  http://mirror.release.flatcar-linux.net/portage-stable/distfiles/nss-3.82.tar.gz
Resolving mirror.release.flatcar-linux.net... 147.75.87.17
Connecting to mirror.release.flatcar-linux.net|147.75.87.17|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2024-05-28 11:21:07 ERROR 404: Not Found.

--2024-05-28 11:21:07--  http://mirror.release.flatcar-linux.net/coreos/distfiles/layout.conf
Resolving mirror.release.flatcar-linux.net... 147.75.87.17
Connecting to mirror.release.flatcar-linux.net|147.75.87.17|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2024-05-28 11:21:07 ERROR 404: Not Found.

!!! Couldn't download '.layout.conf.mirror.release.flatcar-linux.net'. Aborting.
--2024-05-28 11:21:07--  http://mirror.release.flatcar-linux.net/coreos/distfiles/nss-3.82.tar.gz
Resolving mirror.release.flatcar-linux.net... 147.75.87.17
Connecting to mirror.release.flatcar-linux.net|147.75.87.17|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2024-05-28 11:21:08 ERROR 404: Not Found.

--2024-05-28 11:21:08--  http://distfiles.gentoo.org/distfiles/d7/nss-3.82.tar.gz
Resolving distfiles.gentoo.org... 195.181.175.40, 212.102.56.179, 156.146.33.138, ...
Connecting to distfiles.gentoo.org|195.181.175.40|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2024-05-28 11:21:08 ERROR 404: Not Found.

--2024-05-28 11:21:08--  ftp://ftp.mozilla.org/pub/mozilla.org/security/nss/releases/NSS_3_82_RTM/src/nss-3.82.tar.gz
           => '/var/gentoo/distfiles/nss-3.82.tar.gz.__download__'
Resolving ftp.mozilla.org... 34.117.35.28
Connecting to ftp.mozilla.org|34.117.35.28|:21... failed: Connection timed out.
Retrying.

--2024-05-28 11:22:10--  ftp://ftp.mozilla.org/pub/mozilla.org/security/nss/releases/NSS_3_82_RTM/src/nss-3.82.tar.gz
  (try: 2) => '/var/gentoo/distfiles/nss-3.82.tar.gz.__download__'

I tried to update the branch to use nss source uri without the ftp://, but did not get a successful build, as the branch code is not used it seems, but the actual image is used which had to be already built: https://alpha.release.flatcar-linux.net/arm64-usr/3510.0.0/flatcar_developer_container.bin.bz2.

@jepio
Copy link
Member

jepio commented May 29, 2024

Old sources likely gone from mirrors, let me see if this is something that can be revived.

@chewi
Copy link
Contributor

chewi commented May 29, 2024

I made it past stage3, but I figured building Rust and such was going to take forever under QEMU. I had another idea to run this on arm64 hardware, by emulating the SDK (i.e. Catalyst) and doing the actual building natively. It's already racing ahead of where the other build had got to. Shouldn't be long now.

@jepio
Copy link
Member

jepio commented May 29, 2024

@chewi: if we're switching to catalyst4 then it would be a good idea to switch for amd64 sdk first, validate that everything is still correct and then go for arm64. i can also get you access to a shiny Azure Cobalt instance for building

@jepio
Copy link
Member

jepio commented May 29, 2024

I've pushed two sdk container images:
ghcr.io/jepio/flatcar-sdk-arm64/flatcar-sdk-arm64:3941.0.0-2024-05-29-1223
ghcr.io/jepio/flatcar-sdk-arm64/flatcar-sdk-tarball:3941.0.0-2024-05-29-1223 <- catalyst output

I hit an issue cross-building a native toolchain for amd64: cet support in the toolchain (enabled by amd64 hardened profile) has a build dependency on binutils[cet]. cet is only unmasked for amd64 profiles. I would ignore cross-compiling from amd64->arm64, there is no usecase for it and it's not something anyone has thought of doing in the oss world.

@chewi
Copy link
Contributor

chewi commented May 29, 2024

Yeah, switching to 4 first might be best before any official arm64 SDK release, but the result of this should be good enough to kick off an entirely native build once we've done that.

I believe CET refers to some possibly amd64-specific CPU feature. From a Gentoo perspective, I'd like cross-compiling with CET to work, so I'll look into avoiding the mask when using crossdev.

@chewi
Copy link
Contributor

chewi commented May 29, 2024

Bah, it failed quite late on with sys-block/thin-provisioning-tools. Seems like some broken CoreOS Rust/cargo cross-compiling logic.

@chewi
Copy link
Contributor

chewi commented Jul 5, 2024

I got it working. Closing in favour of flatcar/scripts#2093.

@chewi chewi closed this as completed Jul 5, 2024
@jepio
Copy link
Member

jepio commented Jul 8, 2024

This is the tracking issue, we keep it open until the PR lands.

@jepio jepio reopened this Jul 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature A feature request
Projects
None yet
Development

No branches or pull requests

7 participants