Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System76 Thelio Astra Workstation #53

Open
geerlingguy opened this issue Oct 22, 2024 · 9 comments
Open

System76 Thelio Astra Workstation #53

geerlingguy opened this issue Oct 22, 2024 · 9 comments

Comments

@geerlingguy
Copy link
Owner

geerlingguy commented Oct 22, 2024

Basic information

  • Board URL (official): https://system76.com/desktops/thelio-astra
  • Board purchased from: (Provided by System76 for review)
  • Board purchase date: October 22, 2024
  • Board specs (as tested): Ampere Altra Max M128-30, 512 GB ECC DDR4-3200, Nvidia A402
  • Board price (as tested): $3,299 (base), TBD (as tested)

Linux/system information

# output of `screenfetch`
                          ./+o+-       jgeerling@thelio-astra
                  yyyyy- -yyyyyy+      OS: Ubuntu 24.04 noble
               ://+//////-yyyyyyo      Kernel: aarch64 Linux 6.8.0-47-generic
           .++ .:/++++++/-.+sss/`      Uptime: 3m
         .:++o:  /++++++++/:--:/-      Packages: 1872
        o:+o+:++.`..```.-/oo+++++/     Shell: bash 5.2.21
       .:+o:+o/.          `+sssoo+/    Disk: 15G / 101G (16%)
  .++/+:+oo+o:`             /sssooo.   CPU: ARM Neoverse-N1 @ 128x 3GHz
 /+++//+:`oo+o               /::--:.   GPU: ASPEED Technology, Inc. ASPEED Graphics Family (rev 41)
NVIDIA Corporation Device 25b2 (rev a1)
 \+/+o+++`o++o               ++////.   RAM: 13063MiB / 514300MiB
  .++.o+++oo+:`             /dddhhh.  
       .+.o+oo:.          `oddhhhh+   
        \+.++o+o``-````.:ohdhhhhh+    
         `:o+++ `ohhhhhhhhyo++os:     
           .o:`.syhhhhhhh/.oo++o`     
               /osyyyyyyo++ooo+++/    
                   ````` +oo+++o\:    
                          `oo++.  

# output of `uname -a`
jgeerling@thelio-astra:~$ uname -a
Linux thelio-astra 6.8.0-47-generic #47-Ubuntu SMP PREEMPT_DYNAMIC Fri Sep 27 22:03:50 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux

Benchmark results

CPU

Power

  • Idle power draw (at wall): 103 W (7 W system shutdown, BMC running)
  • Maximum simulated power draw (stress-ng --matrix 0): TODO W
  • During Geekbench multicore benchmark: TODO W
  • During top500 HPL benchmark: TODO W

Disk

MANUFACTURER_AND_MODEL_OF_DISK_HERE

Benchmark Result
iozone 4K random read TODO MB/s
iozone 4K random write TODO MB/s
iozone 1M random read TODO MB/s
iozone 1M random write TODO MB/s
iozone 1M sequential read TODO MB/s
iozone 1M sequential write TODO MB/s
wget https://raw.githubusercontent.com/geerlingguy/pi-cluster/master/benchmarks/disk-benchmark.sh
chmod +x disk-benchmark.sh
sudo MOUNT_PATH=/ TEST_SIZE=1g ./disk-benchmark.sh

Run benchmark on any attached storage device (e.g. eMMC, microSD, NVMe, SATA) and add results under an additional heading.

Also consider running PiBenchmarks.com script.

Network

iperf3 results:

  • iperf3 -c $SERVER_IP: TODO Mbps
  • iperf3 -c $SERVER_IP --reverse: TODO Mbps
  • iperf3 -c $SERVER_IP --bidir: TODO Mbps up, TODO Mbps down

(Be sure to test all interfaces, noting any that are non-functional.)

GPU

glmark2-es2 / glmark2-es2-wayland results:

1. Install glmark2-es2 with `sudo apt install -y glmark2-es2`
2. Run `glmark2-es2`
3. Replace this block of text with the results.

Note: This benchmark requires an active display on the device. Not all devices may be able to run glmark2-es2, so in that case, make a note and move on!

TODO: See this issue for discussion about a full suite of standardized GPU benchmarks.

Memory

tinymembench results:

Click to expand memory benchmark result
# Run the two commands below, then replace this code block with the full result.
git clone https://github.com/rojaster/tinymembench.git && cd tinymembench && make
./tinymembench

sbc-bench results

Run sbc-bench and paste a link to the results here:

wget https://raw.githubusercontent.com/ThomasKaiser/sbc-bench/master/sbc-bench.sh
sudo /bin/bash ./sbc-bench.sh -r

Phoronix Test Suite

Results from pi-general-benchmark.sh:

  • pts/encode-mp3: TODO sec
  • pts/x264 4K: TODO fps
  • pts/x264 1080p: TODO fps
  • pts/phpbench: TODO
  • pts/build-linux-kernel (defconfig): TODO sec
@geerlingguy
Copy link
Owner Author

Phoronix has a review up with some preliminary benchmarks: System76 Thelio Astra Reviewed: High-End ARM64 Developer Desktop.

@geerlingguy
Copy link
Owner Author

Note: I was originally going to have most of the benchmarking done already... but I had to be silly and try out a bunch of other GPUs. I then ran into the fun parade of Nvidia proprietary vs Ubuntu included vs Nouveau drivers, and totally borked my Ubuntu install...

Couple that with Arm64 needing specific card hardware support for video output pre-OS boot, and I got it nice and mangled. Need to reinstall Ubuntu and start fresh again with the A402 they included ;)

@joespeed
Copy link

@geerlingguy
Copy link
Owner Author

geerlingguy commented Oct 23, 2024

Ubuntu reinstalled. I downloaded Ubuntu 24.04.1 Server for arm64, installed it through OpenBMC's remote KVM (could also use SOL Serial-Over-LAN console, it works surprisingly well for text/console-based install), and am running through Ampere's guide for setting up Nvidia graphics accelerated Linux Desktop environment:

# 1 - Disable PCIe ASPM:
sudo nano /etc/default/grub

# Add pcie_aspm=off to kernel parameters
GRUB_CMDLINE_LINUX_DEFAULT="pcie_aspm=off"

sudo update-grub

# 2 - Install Ubuntu's desktop environment
sudo apt install -y ubuntu-desktop

# 3 - Install Nvidia graphics drivers
sudo ubuntu-drivers list --gpgpu  # list available drivers
sudo ubuntu-drivers install nvidia-driver-550  # install a driver (note: this was preinstalled)

# 4 - Blacklist nouveau
sudo nano /etc/modprobe.d/blacklist-nouveau.conf

# Put this inside and save it before rebooting
blacklist nouveau
options nouveau modeset=0

# 5 - Reboot the system
sudo reboot

Also noting timings here, since it can be a bit disconcerting how long boot processes take compared to something like a Raspberry Pi, Mac, or typical consumer PC—this is server-grade hardware, with server-grade boot times:

  • 00:00: Press power button
  • 01:00: Motherboard BIOS splash screen (ASRock Rack logo and boot commands displayed on screen)
  • 01:15: EFI stub messages appear, Linux boot begins (VGA output just shows stub messages this entire period)
  • 04:00: Ubuntu desktop environment appears (normal boot conditions)
  • Under abnormal boot conditions, like when nouveau is active and it is erroring out:
    • 05:00: OpenBMC seems to go away at this point, but VGA output remains
    • 06:00: VGA output goes blank, system reachable over SSH
    • 06:30: Mouse cursor appears over blank screen on VGA and graphics card DisplayPort outputs

It seems like the desktop rendering doesn't work out of the box with Ubuntu's default install... interestingly, I had the exact same issue on my old 2013 MacBook Air after attempting an Ubuntu 24.04 install (it worked on 22.04). So maybe if the drivers aren't perfect OOTB, it does this non-rendered desktop environment thing? Is there a regression in the nouveau drivers?

@geerlingguy
Copy link
Owner Author

It seems like it could be a nouveau issue, after looking in dmesg logs:

[   21.727151] nouveau 0004:01:00.0: Adding to iommu group 28
[   22.774703] i2c_designware APMC0D0F:00: controller timed out
... [message repeats] ...
[   54.294665] watchdog: BUG: soft lockup - CPU#122 stuck for 26s! [jbd2/dm-0-8:1631]
[   54.302232] Modules linked in: nouveau(+) snd_hda_intel(+) snd_intel_dspcfg snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_seq_midi snd_seq_midi_event drm_gpuvm snd_rawmidi drm_exec gpu_sched nls_iso8859_1 drm_ttm_helper snd_seq acpi_ipmi ttm ipmi_ssif(+) snd_seq_device irdma drm_display_helper snd_timer i40e cec snd rc_core soundcore ib_uverbs ast ib_core ipmi_devintf ipmi_msghandler arm_dmc620_pmu xgene_hwmon arm_cmn acpiphp_ampere_altra cppc_cpufreq arm_dsu_pmu acpi_tad joydev input_leds apple_mfi_fastcharge dm_multipath efi_pstore nfnetlink dmi_sysfs ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 cdc_ether usbnet hid_apple hid_logitech_hidpp hid_logitech_dj hid_generic usbhid hid uas usb_storage crct10dif_ce polyval_ce polyval_generic ghash_ce sm4 sha2_ce sha256_arm64 ice ixgbe nvme sha1_ce igb nvme_core xhci_pci xfrm_algo i2c_algo_bit mdio xhci_pci_renesas nvme_auth gnss aes_neon_bs
[   54.302392]  aes_neon_blk aes_ce_blk aes_ce_cipher
[   54.302402] CPU: 122 PID: 1631 Comm: jbd2/dm-0-8 Not tainted 6.8.0-47-generic #47-Ubuntu
[   54.302408] Hardware name: System76 Thelio Astra/Thelio Astra, BIOS 3.02 08/20/2024
[   54.302412] pstate: 00400009 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   54.302418] pc : queued_spin_lock_slowpath+0x90/0x4f0
[   54.302430] lr : _raw_spin_lock+0x84/0xb8
[   54.302438] sp : ffff80008e98b8b0
[   54.302440] x29: ffff80008e98b8b0 x28: 0000000000000000 x27: ffff80008e98bcd8
[   54.302449] x26: ffff07ff9d751740 x25: ffffc7ec57f40000 x24: ffffc7ec57f40000
[   54.302457] x23: ffffc7ec5d78cc84 x22: 0000000000000000 x21: ffff07ff9d751748
[   54.302464] x20: 0000000000000000 x19: ffff07ff9d751748 x18: ffff80008e871080
[   54.302472] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
[   54.302479] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
[   54.302486] x11: 0000000000000000 x10: 0000000000000000 x9 : ffffc7ec5e59047c
[   54.302493] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
[   54.302499] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
[   54.302506] x2 : 0000000000000001 x1 : 0000000000000001 x0 : 0000000000000001
[   54.302513] Call trace:
[   54.302516]  queued_spin_lock_slowpath+0x90/0x4f0
[   54.302524]  _raw_spin_lock+0x84/0xb8
[   54.302530]  nvme_queue_rqs+0xe4/0x2f8 [nvme]
[   54.302552]  blk_mq_flush_plug_list.part.0+0x16c/0x1c0
[   54.302561]  blk_add_rq_to_plug+0x1ac/0x2a0
[   54.302567]  blk_mq_submit_bio+0x530/0x6e0
[   54.302573]  __submit_bio+0x100/0x220
[   54.302580]  __submit_bio_noacct+0x68/0x1f0
[   54.302586]  submit_bio_noacct_nocheck+0x1e8/0x230
[   54.302591]  submit_bio_noacct+0x134/0x658
[   54.302597]  submit_bio+0xc0/0x190
[   54.302602]  submit_bh_wbc+0x150/0x220
[   54.302609]  submit_bh+0x20/0x50
[   54.302615]  jbd2_journal_commit_transaction+0x4d0/0x1708
[   54.302622]  kjournald2+0xc8/0x298
[   54.302627]  kthread+0xf8/0x110
[   54.302632]  ret_from_fork+0x10/0x20
[   54.326691] i2c_designware APMC0D0F:00: controller timed out
... [message repeats] ...
... [lots more of the spinlock messages too] ...
[  379.890828] nouveau 0004:01:00.0: gr: DATA_ERROR 0000009c [] ch 1 [00ffb40000 gnome-shell[4279]] subc 0 class c797 mthd 0d78 data 00000004
[  379.893910] ast 0003:02:00.0: swiotlb buffer is full (sz: 8388608 bytes), total 32768 (slots), used 0 (slots)
[  379.903289] nouveau 0004:01:00.0: gr: DATA_ERROR 0000009c [] ch 1 [00ffb40000 gnome-shell[4279]] subc 0 class c797 mthd 0d78 data 00000004
[  379.915735] nouveau 0004:01:00.0: gr: DATA_ERROR 0000009c [] ch 1 [00ffb40000 gnome-shell[4279]] subc 0 class c797 mthd 17e0 data 00000018

@geerlingguy
Copy link
Owner Author

geerlingguy commented Oct 23, 2024

After blacklisting the nouveau driver and rebooting (the ubuntu 550 nvidia driver was preinstalled), I am now getting display output over VGA (and in the BMC KVM).

However, it seems to not be using any GPU acceleration... glmark2 shows LLVM rendering.

I unplugged VGA output to my monitor, and plugged in DisplayPort to port 1 on the A400. Now once it hits Checkpoint 92, I see the BIOS screen on the VGA output / BMC KVM, and the displayport screen goes from 'no signal' to blank... but then it stalls out. Giving it another few minutes to see if something's just delaying boot.

Only bug I've found somewhat related is this one, but it's about Linux boot not seeing a CPU sometimes... in my case, it seems like the machine stalls at Checkpoint 92.

[Edit: And after waiting another 3 minutes or so, it looks like the whole system rebooted—it's going through DRAM checks and all the Checkpoints again now... stuck again at Checkpoint 92.]

[Edit 2: And if I unplug the display from the DP connector on the Nvidia A402, and reboot with only VGA plugged in, it reliably gets past Checkpoint AD into Linux system boot, and completes startup.]

@geerlingguy
Copy link
Owner Author

Going to pause my testing on the workstation for the time being—it looks like there are two main issues I'm hitting:

  1. The fans remain at idle and don't spin up when needed—at least the CPU coolers—and this leads to lockups under high load over long periods. See Benchmark 128-core System76 Thelio Astra top500-benchmark#44 (comment)
  2. I can't get external display output to work through the Nvidia card with either Nvidia's driver install or the Ubuntu driver install. The system doesn't get past 'Checkpoint 92'.

@geerlingguy
Copy link
Owner Author

Just a note, with this configuration (a lot of sticks of ECC RAM), it idles around 100W, and at system shutdown, uses about 7W of power to run the BMC/IPMI:

Screenshot 2024-10-23 at 11 52 03 AM

@bexcran
Copy link

bexcran commented Oct 23, 2024

Some more detailed information about the boot process:

00:00: Press power button

  • 00:10: SCP completes booting. Powers on the host ARMv8-A CPU.
  • 00:25: TF-A finishes DRAM initialization and training.
  • 00:35: TF-A finishes copying UEFI from SPI-NOR into DRAM.
    01:00: Motherboard BIOS splash screen (ASRock Rack logo and boot commands displayed on screen)
    ...

Also, if you ssh to BMC ports 2200, 2201, 2202 (using the same login as the BMC) you can see the SOL consoles for the host, SCP (PMPro and SMPro) and the secure TF-A console.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants