Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TA unwinding on abort (print call stack) #1552

Merged
merged 5 commits into from
May 24, 2017

Conversation

jforissier
Copy link
Contributor

Add TA call stack to the abort message. Here is an example:

root@HiKey:/ xtest 9148
Test ID: 9148
Run test suite with level=0

TEE test application started with device [(null)]
######################################################
#
# regression+gp
#
######################################################
 
* gp_9148 50-51-15
ERROR:   TEE-CORE: 
ERROR:   TEE-CORE: User TA data-abort at address 0xffffdecd (alignment fault)
ERROR:   TEE-CORE:  esr 0x92000021  ttbr0 0x100003f0810c0   ttbr1 0x00000000   cidr 0x0
ERROR:   TEE-CORE:  cpu #6          cpsr 0x60000130
ERROR:   TEE-CORE:  x0  00000000ffffdead x1  0000000000000000
ERROR:   TEE-CORE:  x2  0000000000000000 x3  0000000040201000
ERROR:   TEE-CORE:  x4  00000000ffffdead x5  0000000040000f74
ERROR:   TEE-CORE:  x6  0000000000000000 x7  0000000040000ec8
ERROR:   TEE-CORE:  x8  0000000000000000 x9  0000000000010014
ERROR:   TEE-CORE:  x10 0000000040000ec0 x11 0000000040200000
ERROR:   TEE-CORE:  x12 000000004000b000 x13 0000000040000eb0
ERROR:   TEE-CORE:  x14 0000000040001f4b x15 0000000000000000
ERROR:   TEE-CORE:  x16 000000003f016750 x17 0000000000000000
ERROR:   TEE-CORE:  x18 0000000000000000 x19 000000003f08d2a0
ERROR:   TEE-CORE:  x20 0000000000000000 x21 000000003f04fa30
ERROR:   TEE-CORE:  x22 00000000000005f0 x23 000000003f08d468
ERROR:   TEE-CORE:  x24 0000000000000000 x25 000000003f063980
ERROR:   TEE-CORE:  x26 000000003f08d450 x27 0000000000010014
ERROR:   TEE-CORE:  x28 0000000040000f88 x29 0000000000000000
ERROR:   TEE-CORE:  x30 000000003f0043c4 elr 000000004000549e
ERROR:   TEE-CORE:  sp_el0 0000000040000f80
ERROR:   TEE-CORE: Status of TA 534d4152-5443-534c-5443-525950544f31 (0x3f063ba0) (active)
ERROR:   TEE-CORE:  arch: arm  load address: 0x40001000  ctx-idr: 1
ERROR:   TEE-CORE:  stack: 0x40000000 4096
ERROR:   TEE-CORE:  region 0: va 0x40000000 pa 0x3f218000 size 0x1000
ERROR:   TEE-CORE:  region 1: va 0x40001000 pa 0x3f200000 size 0xa000
ERROR:   TEE-CORE:  region 2: va 0x4000b000 pa 0x3f20a000 size 0x3000
ERROR:   TEE-CORE:  region 3: va 0x4000e000 pa 0x3f20d000 size 0xb000
ERROR:   TEE-CORE:  region 4: va 0x40200000 pa 0x3ee01000 size 0x2000
ERROR:   TEE-CORE:  region 5: va 0 pa 0 size 0
ERROR:   TEE-CORE:  region 6: va 0 pa 0 size 0
ERROR:   TEE-CORE:  region 7: va 0 pa 0 size 0
ERROR:   TEE-CORE: Call stack:
ERROR:   TEE-CORE:  0x4000549e
ERROR:   TEE-CORE:  0x40001f4b
ERROR:   TEE-CORE:  0x4000273f
ERROR:   TEE-CORE:  0x40005da7
  gp_9148 OK
+-----------------------------------------------------
Result of testsuite regression+gp filtered by "9148":
gp_9148 OK
+-----------------------------------------------------
23 subtests of which 0 failed
1 test case of which 0 failed
722 test cases was skipped
TEE test application done!

A script is also added to help decode the stack dump:

$ cat dump.txt | optee_os/scripts/symbolize.py -d optee_test/out/ta/* -s `pwd`
* gp_9148 50-51-15
ERROR:   TEE-CORE: 
ERROR:   TEE-CORE: User TA data-abort at address 0xffffdecd (alignment fault)
ERROR:   TEE-CORE:  esr 0x92000021  ttbr0 0x100003f0810c0   ttbr1 0x00000000   cidr 0x0
ERROR:   TEE-CORE:  cpu #6          cpsr 0x60000130
ERROR:   TEE-CORE:  x0  00000000ffffdead x1  0000000000000000
ERROR:   TEE-CORE:  x2  0000000000000000 x3  0000000040201000
ERROR:   TEE-CORE:  x4  00000000ffffdead x5  0000000040000f74
ERROR:   TEE-CORE:  x6  0000000000000000 x7  0000000040000ec8
ERROR:   TEE-CORE:  x8  0000000000000000 x9  0000000000010014
ERROR:   TEE-CORE:  x10 0000000040000ec0 x11 0000000040200000
ERROR:   TEE-CORE:  x12 00000000ffffb4a8 x13 0000000040000eb0
ERROR:   TEE-CORE:  x14 0000000040001f4b x15 0000000000000000
ERROR:   TEE-CORE:  x16 000000003f0167d0 x17 0000000000000000
ERROR:   TEE-CORE:  x18 0000000000000000 x19 000000003f08d2a0
ERROR:   TEE-CORE:  x20 0000000000000000 x21 000000003f04fad0
ERROR:   TEE-CORE:  x22 00000000000005f0 x23 000000003f08d468
ERROR:   TEE-CORE:  x24 0000000000000000 x25 000000003f063980
ERROR:   TEE-CORE:  x26 000000003f08d450 x27 0000000000010014
ERROR:   TEE-CORE:  x28 0000000040000f88 x29 0000000000000000
ERROR:   TEE-CORE:  x30 000000003f0043c4 elr 000000004000549e
ERROR:   TEE-CORE:  sp_el0 0000000040000f80
ERROR:   TEE-CORE: Status of TA 534d4152-5443-534c-5443-525950544f31 (0x3f063ba0) (active)
ERROR:   TEE-CORE:  arch: arm  load address: 0x40001000  ctx-idr: 1
ERROR:   TEE-CORE:  stack: 0x40000000 4096
ERROR:   TEE-CORE:  region 0: va 0x40000000 pa 0x3f218000 size 0x1000
ERROR:   TEE-CORE:  region 1: va 0x40001000 pa 0x3f200000 size 0xa000
ERROR:   TEE-CORE:  region 2: va 0x4000b000 pa 0x3f20a000 size 0x3000
ERROR:   TEE-CORE:  region 3: va 0x4000e000 pa 0x3f20d000 size 0xb000
ERROR:   TEE-CORE:  region 4: va 0x40200000 pa 0x3ee01000 size 0x2000
ERROR:   TEE-CORE:  region 5: va 0 pa 0 size 0
ERROR:   TEE-CORE:  region 6: va 0 pa 0 size 0
ERROR:   TEE-CORE:  region 7: va 0 pa 0 size 0
ERROR:   TEE-CORE: Call stack:
ERROR:   TEE-CORE:  0x4000549e TEE_AsymmetricDecrypt at optee_os/lib/libutee/tee_api_operations.c:1614
ERROR:   TEE-CORE:  0x40001f4b CmdAsymmetricDecryptNoParam at optee_test/ta/GP_TTA_Crypto/TTA_Crypto.c:1179
ERROR:   TEE-CORE:  0x4000273f TA_InvokeCommandEntryPoint at optee_test/ta/GP_TTA_Crypto/TTA_Crypto.c:1675
ERROR:   TEE-CORE:  0x40005da7 entry_invoke_command at optee_os/lib/libutee/arch/arm/user_ta_entry.c:210
  gp_9148 OK

Copy link
Contributor

@jenswi-linaro jenswi-linaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if a TA is paged and the unwind tables needs to be paged in?

}
exidx += utc->load_addr;
memset(&state, 0, sizeof(state));
state.registers[0] = r32(ai->regs->x0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Direct assignment should be OK, if you'd like to be very clear that we're truncating an uint64_t into an uint32_t a plain cast would be enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, initially I thought I would assert that the higher 32 bits are zero, hence a function, but actually I'm not sure it is a valid assumption (all we know is that the lower 32 bits are part of the Aarch32 state, but I suppose the higher bits may be undefined and take any value?).
I'll just make plain assignments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The higher bits are for practical purposes undefined.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK thanks for the confirmation.

start = &__exidx_start;
idx_start = (vaddr_t)&__exidx_start;
idx_end = (vaddr_t)&__exidx_end;
if (exidx && exidx_sz) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer if exidx and exidx_sz always where correct, that way we could avoid this special case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I'll change.

{
uint64_t fp;

fp = frame->fp;
if (!thread_addr_is_in_stack(fp))
return false;
if (stack && stack_size) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer if stack and stack_size always where correct, that way we could avoid this special case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@jforissier
Copy link
Contributor Author

@jenswi-linaro

What happens if a TA is paged and the unwind tables needs to be paged in?

Good question ;) The answer is: bad things, I'm afraid. I need to test it, but I think what will happen is: abort_handler() -> get_fault_type() -> is_abort_in_abort_handler(ai) == true -> panic().

What we need is that the abort be handled as if it had occurred when running TA code, although it really occurred when the TEE core tried to access a user page. Is there a simple change we can do for this to work properly? Update is_abort_in_abort_handler() maybe? More than that?

@jenswi-linaro
Copy link
Contributor

Doing backtrace of the stack requires running in thread context for the pager to be able to handle pager faults.

We could use the technique used in handle_user_ta_panic(), except that we return back into another function which when done does a proper return. We'll probably need to change how abort_handler() is called also since this second function will need to take some parameters, like the saved spsr and elr etc.

}
exidx += utc->load_addr;
} else {
exidx = (vaddr_t)&__exidx_start;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since __exidx_start is declared as an array there's no need to use the & also (btw, applying & on arrays is often a bad idea).
I wonder if we shouldn't have a special .h file that contains all symbols exported by the linker script, but that's out of scope for this PR.

idx_end = (vaddr_t)&__exidx_end;
}
start = (struct unwind_idx *)exidx;
idx_start = (vaddr_t)exidx;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the type of exidx should be vaddr_t instead to avoid this cast and the cast below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I make exidx a vaddr_t everywhere then?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, we don't have a clear policy on uaddr_t (user space address) versus vaddr_t (normal virtual address). When we know that an address can be used directly by OP-TEE OS then it could be represented by a vaddr_t since all prerequisites are in place, until those checks hasn't been done a uaddr_t should stay uaddr_t.

find_index() works with both pure kernel-mode pointers but also with user-mode pointers that has prerequisites in place (the correct vaspace is mapped) so this function should work with vaddr_t only.

#endif /*ARM32*/

/*
* Unwind a 32-bit user or kernel stack. When unwinding a user TA, set @exidx
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the comment.


bool unwind_stack(struct unwind_state *state);
/*
* Unwind a 64-bit user or kernel stack. When unwinding a user TA, set @stack
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the comment.


EMSG_RAW("Call stack:");
if (abort_is_user_exception(ai)) {
struct tee_ta_session *s;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's in this block could go into a separate function that could be called from the other version of __print_stack_unwind_arm32() below.

if (is_32bit)
__print_stack_unwind_arm32(ai);
else
__print_stack_unwind_arm64(ai);
}

void abort_print(struct abort_info *ai __maybe_unused)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__maybe_unused can be removed.

* POSSIBILITY OF SUCH DAMAGE.
*/

void __aeabi_unwind_cpp_pr0(void);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With these there's no sense to keep the same at the end of core/arch/arm/kernel/unwind_arm32.c

@jenswi-linaro
Copy link
Contributor

How we deal with uaddr_t versus vaddr_t is a bit of a mess today. I'm OK with the current state of this PR. The uaddr_t mess can probably only be addressed properly with sparse, but that's of course not in scope here.
For all the source code commits:
Reviewed-by: Jens Wiklander <[email protected]>
For the script commit:
Acked-by: Jens Wiklander <[email protected]>

@jforissier
Copy link
Contributor Author

Thanks @jenswi-linaro.

There's the issue of paged TAs, however. I don't want to introduce a situation, in which the crash of a TA could lead to a TEE core panic (even if it's only in debug mode).
Should I simply make sure that unwinding of 32-bit TAs is disabled when both CFG_WITH_PAGER and CFG_PAGED_USER_TA are enabled? Then proper unwinding in the TA context can be done in a later PR.

@jenswi-linaro
Copy link
Contributor

@jforissier, Sounds good, checking for CFG_PAGED_USER_TA should be enough.

@jenswi-linaro
Copy link
Contributor

LGTM


def spawn_addr2line(self):
if not self._addr2line:
elf = self.get_elf(self._bin);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C programmer error (semicolon at the end).


def spawn_addr2line(self):
if not self._addr2line:
elf = self.get_elf(self._bin);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From this line to 101 you are using leading tabs, which makes Python unhappy (you are using leading spaces in the rest of the script).

# Flatten list in case -d is used several times *and* with multiple
# arguments
args.dirs = [item for sublist in args.dir for item in sublist]
symbolizer = Symbolizer(sys.stdout, args.dirs, args.strip_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If args.dir is not set, the script will bail out. Either add an else to line 166 or add a try/except around this line.

args.dirs = [item for sublist in args.dir for item in sublist]
symbolizer = Symbolizer(sys.stdout, args.dirs, args.strip_path)

for line in sys.stdin:
Copy link
Contributor

@jbech-linaro jbech-linaro May 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are you supposed to give to stdin? I.e, what input is the script waiting for? Core dump? Stack trace?
edit found it above:
<paste dump here> ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would probably be nice to have a print saying something: "Waiting for user to paste the dump here ...".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be OK for interactive use but not so meaningful if the script is called in a test environment (Travis for instance).

# POSSIBILITY OF SUCH DAMAGE.
#

"""Symbolizes OP-TEE abort dumps
Copy link
Contributor

@jbech-linaro jbech-linaro May 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend to store this help section in a global variable and then use it as description to the argument parser, like here: https://docs.python.org/2/library/argparse.html#description

By doing so you will get the output when running the script with -h (or wrong parameters).

@jforissier
Copy link
Contributor Author

@jbech-linaro thanks for reviewing the Python script. Patch updated.

@jbech-linaro
Copy link
Contributor

@jbech-linaro thanks for reviewing the Python script. Patch updated.

Thanks for the updates!
Reviewed-by: Joakim Bech <[email protected]> (python script)

There is no obvious reason for requiring the first program header in a
user TA to be of type PT_LOAD. It is usually the case, due to the way
our linker script is written (ta/arch/arm/ta.ld.S). Still, it may occur
that other segments are inserted first by the linker. For example, when
linking a 32-bit binary built with unwind tables (-funwind-tables), the
first PHDR is PT_ARM_EXIDX. Such a TA won't load unless this patch is
applied.

Signed-off-by: Jerome Forissier <[email protected]>
Reviewed-by: Jens Wiklander <[email protected]>
Update the abort handling code in the TEE core to support unwinding
the user mode stack in addition to the kernel stack. unwind_arm32.c is
modified slightly so that it can be built for AArch64. This allows a
64-bit TEE core to dump both 32- and 64-bit TAs.

Paged TAs (CFG_PAGED_USER_TA=y) cannot currently be unwound, because
the code is not ready to handle the page faults that might occur as
the unwinding tables are accessed.

CFG_CORE_UNWIND is renamed to CFG_UNWIND since it enables both the
kernel and user TA stack dumps. It is still set automatically when
CFG_TEE_CORE_DEBUG=y.

32-bit user TAs have to be compiled with `-funwind-tables`, otherwise
the call stack can't be unwound and the abort reports will not show a
call stack .The TA dev kit takes care of adding this flag automatically
when CFG_UNWIND=y.

Signed-off-by: Jerome Forissier <[email protected]>
Tested-by: Jerome Forissier <[email protected]> (HiKey)
Reviewed-by: Jens Wiklander <[email protected]>
Signed-off-by: Jerome Forissier <[email protected]>
Reviewed-by: Jens Wiklander <[email protected]>
In the TA abort message that is sent to the console when a user-mode
TA crashes, there is currently no clear indication of whether the TA
was running in 32-bit or 64-bit mode. Add it since it will be useful to
develop parsing tools.

Signed-off-by: Jerome Forissier <[email protected]>
Reviewed-by: Jens Wiklander <[email protected]>
Add a helper script to decode call stacks shown in abort messages. The
script relies on addr2line to convert virtual addresses to debug
information: 'function at file:line'.

Signed-off-by: Jerome Forissier <[email protected]>
Acked-by: Jens Wiklander <[email protected]>
Reviewed-by: Joakim Bech <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants