Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TA from older version crashes stack unwind layer in OP-TEE (WAS Stack unwind causes data abort on Rcar M3 board) #1872

Closed
lorc opened this issue Oct 10, 2017 · 11 comments

Comments

@lorc
Copy link
Contributor

lorc commented Oct 10, 2017

During testing of #1631 on my RCAR M3 board I have encountered next fault:

  • regression_1007 Test Panic
    ta_entry_panic: enterERROR: [0x0] TEE-CORE:
    ERROR: [0x0] TEE-CORE: TA panicked with code 0xbeef
    ERROR: [0x0] TEE-CORE: regs = 0x44179f30
    ERROR: [0x0] TEE-CORE: regs->x1 = 0x15
    ERROR: [0x0] TEE-CORE:
    ERROR: [0x0] TEE-CORE: Core data-abort at address 0x15 (alignment fault)
    ERROR: [0x0] TEE-CORE: esr 0x96000021 ttbr0 0x1000044173020 ttbr1 0x00000000 cidr 0x0
    ERROR: [0x0] TEE-CORE: cpu Update Notice.md #1 cpsr 0x60000004
    ERROR: [0x0] TEE-CORE: x0 0000000044179da8 x1 0000000000000000
    ERROR: [0x0] TEE-CORE: x2 0000000000000110 x3 0000000000000110
    ERROR: [0x0] TEE-CORE: x4 0000000044179b33 x5 00000000fffffffc
    ERROR: [0x0] TEE-CORE: x6 0000000000000004 x7 0000000000000020
    ERROR: [0x0] TEE-CORE: x8 0000000000000002 x9 0000000000000078
    ERROR: [0x0] TEE-CORE: x10 0000000044179d78 x11 0000000044179d98
    ERROR: [0x0] TEE-CORE: x12 00000000df4b59f7 x13 0000000080002780
    ERROR: [0x0] TEE-CORE: x14 000000004413d6c0 x15 00000000df4b59f7
    ERROR: [0x0] TEE-CORE: x16 000000004410e228 x17 0000000000000000
    ERROR: [0x0] TEE-CORE: x18 0000000000000000 x19 0000000044179da8
    ERROR: [0x0] TEE-CORE: x20 0000000000000015 x21 0000000044145000
    ERROR: [0x0] TEE-CORE: x22 00000000ffff3024 x23 0000000080002780
    ERROR: [0x0] TEE-CORE: x24 0000000000000006 x25 000000004417a190
    ERROR: [0x0] TEE-CORE: x26 0000000000000002 x27 0000000000000006
    ERROR: [0x0] TEE-CORE: x28 0000000044165e00 x29 0000000044179d50
    ERROR: [0x0] TEE-CORE: x30 000000004410e3a8 elr 000000004410e3a8
    ERROR: [0x0] TEE-CORE: sp_el0 0000000044179d50
    ERROR: [0x0] TEE-CORE: Call stack:
    ERROR: [0x0] TEE-CORE: 0x000000004410e3a8
    ERROR: [0x0] TEE-CORE: 0x000000004410e4ec
    ERROR: [0x0] TEE-CORE: 0x000000004410e204
    ERROR: [0x0] TEE-CORE: 0x00000000441017ec
    ERROR: [0x0] TEE-CORE: Panic '[abort] alignement fault! (trap CPU)' at core/arch/arm/kernel/abort.c:623 <get_fault_type>
    ERROR: [0x0] TEE-CORE: Call stack:
    ERROR: [0x0] TEE-CORE: 0x000000004410956c

regs = 0x44179f30 and regs->x1 = 0x15 are my debug messages using this patch:

diff --git a/core/arch/arm/tee/arch_svc.c b/core/arch/arm/tee/arch_svc.c
index 32e17a8..bc0b5e8 100644
--- a/core/arch/arm/tee/arch_svc.c
+++ b/core/arch/arm/tee/arch_svc.c
@@ -357,6 +357,7 @@ static void print_panic_stack(struct thread_svc_regs *regs)
        if (tee_ta_get_current_session(&s) != TEE_SUCCESS)
                panic();
        utc = to_user_ta_ctx(s->ctx);
+       TAMSG_RAW("regs->x1 = %p", (void*)regs->x1);
        if (utc->is_32bit)
                get_panic_regs_a32_ta((uint32_t *)regs->x1, &ai);
        else
@@ -376,6 +377,7 @@ uint32_t tee_svc_sys_return_helper(uint32_t ret, bool panic,
        if (panic) {
                TAMSG_RAW("");
                TAMSG_RAW("TA panicked with code 0x%" PRIx32, panic_code);
+               TAMSG_RAW("regs = %p", (void*)regs);
                print_panic_stack(regs);
        }

Looks like tee_svc_sys_return_helper receives invalid pointer in `regs->x1'

@etienne-lms
Copy link
Contributor

hi @lorc. same issue on my b2260. @jforissier, could be armv7 specific, but qemu does not seem to complain.

@jforissier
Copy link
Contributor

@lorc can you reproduce the issue without dyn shm? (i.e., current optee_os master).
xtest 1007 runs fine here on HiKey, 32 or 64-bit TA, with or without dyn shm :(

@lorc
Copy link
Contributor Author

lorc commented Oct 11, 2017

@jforissier, I'll try. Problem is that I need to throw out hypervisor from my setup first, because it prevents Linux from accessing static shm region. But I'll try to prepare such setup today.

@etienne-lms, my board is armv8-based, so looks like it is cross-platform bug.

@etienne-lms
Copy link
Contributor

@lorc, crosscheck your build. I find the issue in mine: non-alignment of core and libutee: i was rebuilding the core but not the TA devkit, hence TAs did not fill the structure expected by the unwind layer. Full rebuilt of the core/ta devkit/test TAs => no more core panic in stack unwind trace.

@lorc
Copy link
Contributor Author

lorc commented Oct 11, 2017

@etienne-lms, yep, I think you are right. I definetelly used some old TAs.

I also can confirm this on qemu_v8. This is the good find, thank you.

So, looks like it is a bug: I can crash OP-TEE using older TA.

@jforissier
Copy link
Contributor

@lorc could you make sure you're not using older TAs / libutee.a? Because, with the introduction of the "TA stack dump on panic" feature, the user/kernel interface for syscall_panic has effectively changed.

@lorc lorc changed the title Stack unwind causes data abort at Rcar M3 board. TA from older version crashes stack unwind layer in OP-TEE (WAS Stack unwind causes data abort on Rcar M3 board) Oct 11, 2017
@lorc
Copy link
Contributor Author

lorc commented Oct 11, 2017

@jforissier, yes, this is exactly the case. But I dunno... Is this expected behavior?

@jforissier
Copy link
Contributor

Well, that's a good question...

Until now, we have not put strict requirements on maintaining compatibility at the syscall interface (contrary to what the Linux kernel does). We more or less assumed that TAs would be recompiled before they are deployed on a new version of OP-TEE. In other words, the interface between TA and TEE is the GP API. But that may not be a reasonable assumption for some use cases?

If we were to enforce strict compatibility, we would need to have some test suite to check that, and it would also make our life a bit harder as optee_os developers. It's not only the syscall interface, but also the TA format for instance.

On a related note: when people ask "how do I add a syscall to OP-TEE", we usually answer "you don't, you write a pseudo-TA instead" ;-) to preserve the stability of the syscall interface and avoid potential conflicts.

@jforissier
Copy link
Contributor

@lorc

So, looks like it is a bug: I can crash OP-TEE using older TA.

You may only crash a debug version of OP-TEE. If CFG_UNWIND=n (which is the case by default when CFG_TEE_CORE_DEBUG=n), the unwind code is disabled and you shouldn't be able to crash it.

@lorc
Copy link
Contributor Author

lorc commented Oct 11, 2017

Yes, this is an interesting topic.

I think that this problem can be mitigated by changing signature key when introducing breaking changes to syscall interface. This at least would ensure that only compatible TAs will be loaded. But that would work only for generic OP-TEE. Product vendors use own keys and they shall changes keys manually.

Probably we can go further and derive signature key from syscall interface version.

@lorc
Copy link
Contributor Author

lorc commented Oct 11, 2017

You may only crash a debug version of OP-TEE. If CFG_UNWIND=n (which is the case by default when CFG_TEE_CORE_DEBUG=n), the unwind code is disabled and you shouldn't be able to crash it.

Oh, right. So this is okay, then. I'll close this issue then. We can discuss syscall interface versioning problem in different issue, if you wish.

@lorc lorc closed this as completed Oct 11, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants