Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel panic on running rsend_012_pos on sparc64 #12039

Open
rincebrain opened this issue May 13, 2021 · 3 comments
Open

Kernel panic on running rsend_012_pos on sparc64 #12039

rincebrain opened this issue May 13, 2021 · 3 comments
Labels
Status: Stale No recent activity for issue Type: Architecture Indicates an issue is specific to a single processor architecture Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@rincebrain
Copy link
Contributor

rincebrain commented May 13, 2021

System information

Type Version/Name
Distribution Name Debian
Distribution Version sid
Linux Kernel 5.10.0-6-sparc64
Architecture sparc64
ZFS Version 2babd20

Describe the problem you're observing

Trying to run through ZTS for #12022, found that on vanilla git master (or my patched branch, for that matter), running the whole series of rsend tests will, when it gets to rsend_012_pos, for whatever reason, cause the kernel to crash and burn 100% of the time. (Unhelpfully, it fails to print a stacktrace - the full output to console is reproduced below.)

Sometimes, it's unhappy enough that the watchdog timer doesn't trigger and pressing break twice doesn't work to get back to prom, leaving you to physically power cycle it.

(It seems potentially relevant to mention this is a Netra T1 - so it's possible other Linux/SPARC64 hardware might not suffer from this? IDK, I do not know what's breaking right now.)

Describe how to reproduce the problem

`scripts/zfs-tests.sh -r rsend

Include any warning/errors/backtraces from the system logs

crash output to console:

[ 1435.191913] Kernel panic - not syncing: corrupted stack end detected inside scheduler
[ 1435.294939] CPU: 0 PID: 722 Comm: spl_system_task Tainted: P           OE     5.10.0-6-sparc64 #1 Debian 5.10.28-1
[ 1435.431126] Call Trace:
[ 1435.463267] Press Stop-A (L1-A) from sun keyboard or send break
[ 1435.463267] twice on console to return to the boot prom
[ 1435.609777] ---[ end Kernel panic - not syncing: corrupted stack end detected inside scheduler ]---

RED State Exception

TL=0000.0000.0000.0005 TT=0000.0000.0000.0010
   TPC=0000.0000.0042.4200 TnPC=0000.0000.0042.4204 TSTATE=0000.0000.8000.1506
TL=0000.0000.0000.0004 TT=0000.0000.0000.0010
   TPC=0000.0000.0042.4200 TnPC=0000.0000.0042.4204 TSTATE=0000.0000.8000.1506
TL=0000.0000.0000.0003 TT=0000.0000.0000.0010
   TPC=0000.0000.0042.4200 TnPC=0000.0000.0042.4204 TSTATE=0000.0000.8000.1506
TL=0000.0000.0000.0002 TT=0000.0000.0000.0010
   TPC=0000.0000.0040.70d0 TnPC=0000.0000.0040.70d4 TSTATE=0000.0000.8004.1406
TL=0000.0000.0000.0001 TT=0000.0000.0000.0068
   TPC=0000.0000.0048.bba4 TnPC=0000.0000.0048.bba8 TSTATE=0000.0000.8000.1606


Watchdog Reset
Externally Initiated Reset

/proc/cpuinfo

$ cat /proc/cpuinfo
cpu             : TI UltraSparc IIi (Sabre)
fpu             : UltraSparc IIi integrated FPU
pmu             : ultra12
prom            : OBP 3.10.25 2000/01/17 21:26
type            : sun4u
ncpus probed    : 1
ncpus active    : 1
D$ parity tl1   : 0
I$ parity tl1   : 0
Cpu0ClkTck      : 000000001a3a4034
cpucaps         : flush,stbar,swap,muldiv,v9,mul32,div32,v8plus,vis
MMU Type        : Spitfire
MMU PGSZs       : 8K,64K,512K,4MB
@rincebrain rincebrain added Status: Triage Needed New issue which needs to be triaged Type: Defect Incorrect behavior (e.g. crash, hang) labels May 13, 2021
@rincebrain
Copy link
Contributor Author

Oh boy, 4.15.0-2-sparc64 actually gave me a stacktrace...

[ 1004.096214] Kernel panic - not syncing: corrupted stack end detected inside scheduler
[ 1004.096214]
[ 1004.218708] CPU: 0 PID: 23350 Comm: spl_system_task Tainted: P           O     4.15.0-2-sparc64                                        #1 Debian 4.15.11-1
[ 1004.356022] Call Trace:
[ 1004.388180]  [00000000004668f0] panic+0xd0/0x280
[ 1004.448890]  [00000000009f8ccc] switch_to_pc+0x4f8/0x50c
[ 1004.518762]  [00000000009f8e9c] _cond_resched+0x3c/0x60
[ 1004.587500]  [00000000009fa06c] mutex_lock+0xc/0x40
[ 1004.652797]  [00000000108c968c] zio_wait_for_children+0xc/0xc0 [zfs]
[ 1004.736825]  [00000000108ca304] zio_vdev_io_done+0x24/0x200 [zfs]
[ 1004.817421]  [00000000108cb9b0] zio_execute+0x90/0x100 [zfs]
[ 1004.892274]  [000000001088a160] vdev_mirror_io_start+0x100/0x280 [zfs]
[ 1004.978602]  [00000000108cd008] zio_vdev_io_start+0x2c8/0x320 [zfs]
[ 1005.061473]  [00000000108cf674] zio_nowait+0xb4/0x140 [zfs]
[ 1005.135133]  [00000000107d54b8] arc_read+0xb58/0x1140 [zfs]
[ 1005.208761]  [00000000107e2c04] dbuf_issue_final_prefetch+0x84/0x100 [zfs]
[ 1005.299546]  [00000000107e87d8] dbuf_prefetch_indirect_done+0x1d8/0x200 [zfs]
[ 1005.393751]  [00000000107d5cf8] arc_read_done+0x258/0x440 [zfs]
[ 1005.472020]  [00000000108d16d0] zio_done+0x470/0xe40 [zfs]
[ 1005.544597]  [00000000108cb9b0] zio_execute+0x90/0x100 [zfs]
[ 1005.619062] Press Stop-A (L1-A) from sun keyboard or send break
[ 1005.619062] twice on console to return to the boot prom
[ 1005.765570] ---[ end Kernel panic - not syncing: corrupted stack end detected inside scheduler
[ 1005.765570]

@stale
Copy link

stale bot commented Jun 12, 2022

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale No recent activity for issue label Jun 12, 2022
@behlendorf behlendorf added Type: Architecture Indicates an issue is specific to a single processor architecture and removed Status: Stale No recent activity for issue Status: Triage Needed New issue which needs to be triaged labels Jun 14, 2022
@stale
Copy link

stale bot commented Jun 18, 2023

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale No recent activity for issue label Jun 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Stale No recent activity for issue Type: Architecture Indicates an issue is specific to a single processor architecture Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

2 participants