Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xtest 1013 stalled on HiKey when log levels are 4 and optee_os is on its own UART #665

Closed
vchong opened this issue Feb 24, 2016 · 10 comments
Labels

Comments

@vchong
Copy link
Contributor

vchong commented Feb 24, 2016

Modifications to https://github.com/OP-TEE/build
common.mk:
Set DEBUG=1
Set CFG_TEE_CORE_LOG_LEVEL=4
Add CFG_TEE_TA_LOG_LEVEL=4 to XTEST_COMMON_FLAGS

hikey.mk:
For OPTEE_OS_COMMON_FLAGS, set
CFG_TEE_TA_LOG_LEVEL=4
CFG_CONSOLE_UART=0

root@HiKey:/ xtest 1013                                                                                                
Test ID: 1013                                                                                                          
Run test suite with level=0
TEE test application started with device [opteearmtz00]

* XTEST_TEE_1013 Test concurency with concurrent TA
random: nonblocking pool is initialized
INFO: task xtest:1040 blocked for more than 120 seconds.                                                               
      Tainted: G        W  O    4.4.0 #1                                                                               
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.                                              
xtest           D ffffffc000087130     0  1040   1032 0x00000008                                                       
Call trace:                                                                                                            
[<ffffffc000087130>] __switch_to+0x94/0xa8                                                                             
[<ffffffc0007e5f10>] __schedule+0x194/0x6ac                                                                            
[<ffffffc0007e6634>] schedule+0x48/0xa8                                                                                
[<ffffffc0007e68c0>] schedule_preempt_disabled+0x1c/0x2c                                                               
[<ffffffc0007e80a0>] __mutex_lock_slowpath+0xa0/0x160                                                                  
[<ffffffc0007e81b4>] mutex_lock+0x54/0x70                                                                              
[<ffffffbffc017bc4>] call_tee.isra.13+0x114/0x480 [optee_armtz]                                                        
[<ffffffbffc0181a8>] tz_invoke+0xc0/0x144 [optee_armtz]                                                                
[<ffffffbffc002004>] tee_session_invoke_be+0x80/0xd0 [optee]                                                           
[<ffffffbffc002198>] tee_session_ioctl+0x144/0x28c [optee]                                                             
[<ffffffc00020fc44>] do_vfs_ioctl+0x394/0x610                                                                          
[<ffffffc00020ff50>] SyS_ioctl+0x90/0xa4                                                                               
[<ffffffc000085cb0>] el0_svc_naked+0x24/0x28                                                                           
@ghost ghost added the bug label Feb 25, 2016
@jbech-linaro
Copy link
Contributor

@vchong we haven't done anything explicitly to solve this issue, but since there has been quite many changes since this was reported, there is a great chance we get another behavior. Would you mind re-testing this again to see if you can reproduce the issue?

@vchong
Copy link
Contributor Author

vchong commented Jul 12, 2016

I tested with tip yesterday and today and while the test now runs to completion using the above settings, it took ~4 hours (~50 minutes with log level 3) vs ~1 minute in qemu.

CFG_TEE_CORE_LOG_LEVEL=4, CFG_TEE_TA_LOG_LEVEL=4

root@HiKey:/ time xtest 1013
Test ID: 1013
Run test suite with level=0
+-----------------------------------------------------
Result of testsuite XTEST_TEE_TEST filtered by "1013":
XTEST_TEE_1013 OK
+-----------------------------------------------------
648 subtests of which 0 failed
1 test case of which 0 failed
62 test cases was skipped
TEE test application done!
real    4h 10m 12s
user    0m 0.01s
sys     11h 57m 46s

CFG_TEE_CORE_LOG_LEVEL=3, CFG_TEE_TA_LOG_LEVEL=3

root@HiKey:/ time xtest 1013
Test ID: 1013
Run test suite with level=0
+-----------------------------------------------------
Result of testsuite XTEST_TEE_TEST filtered by "1013":
XTEST_TEE_1013 OK
+-----------------------------------------------------
648 subtests of which 0 failed
1 test case of which 0 failed
62 test cases was skipped
TEE test application done!
real    51m 21.29s
user    0m 0.00s
sys     1h 28m 55s

CFG_TEE_CORE_LOG_LEVEL=4, CFG_TEE_TA_LOG_LEVEL=4

root@Vexpress:/ time xtest 1013
Test ID: 1013
Run test suite with level=0
+-----------------------------------------------------
Result of testsuite XTEST_TEE_TEST filtered by "1013":
XTEST_TEE_1013 OK
+-----------------------------------------------------
648 subtests of which 0 failed
1 test case of which 0 failed
62 test cases was skipped
TEE test application done!
real    1m 3.01s
user    0m 0.06s
sys     1m 1.79s

Also, secure world logs are not printed for HiKey if DEBUG=1 and CFG_NW_CONSOLE_UART != CFG_SW_CONSOLE_UART regardless of log levels (3 or 4).

@vchong
Copy link
Contributor Author

vchong commented Jul 12, 2016

With CFG_NW_CONSOLE_UART == CFG_SW_CONSOLE_UART, secure world logs are printed and test completes in ~22 minutes (log level 4) and ~3minutes (log level 3) respectively.

CFG_TEE_CORE_LOG_LEVEL=4, CFG_TEE_TA_LOG_LEVEL=4

root@HiKey:/ time xtest 1013
Test ID: 1013
Run test suite with level=0
+-----------------------------------------------------
Result of testsuite XTEST_TEE_TEST filtered by "1013":
XTEST_TEE_1013 OK
+-----------------------------------------------------
648 subtests of which 0 failed
1 test case of which 0 failed
62 test cases was skipped
TEE test application done!
real    22m 31.54s
user    0m 0.00s
sys     1h 4m 31s

CFG_TEE_CORE_LOG_LEVEL=3, CFG_TEE_TA_LOG_LEVEL=3

root@HiKey:/ time xtest 1013
Test ID: 1013
Run test suite with level=0
+-----------------------------------------------------
Result of testsuite XTEST_TEE_TEST filtered by "1013":
XTEST_TEE_1013 OK
+-----------------------------------------------------
648 subtests of which 0 failed
1 test case of which 0 failed
62 test cases was skipped
TEE test application done!
real    2m 49.13s
user    0m 0.01s
sys     4m 23.51s

@jenswi-linaro
Copy link
Contributor

I suspect that the difference in time spent for the different configuration is due to how much concurrency there's in the system. qemu_virt only has one core so real concurrency is impossible, but there's some experienced concurrency by rescheduling from normal world. HiKey on the other hand can have true concurrency on all 8 cores. If those are hammering the same shared resources I guess it can slow things down, but this is a bit extreme I guess there has to be something else adding to this as well.

@Lambda2023
Copy link

I am trying to running optee on hikey960 and the secure world optee-core can boot seccessfully, but the normal world linux kernel always encounter "Task dump for CPU 5:" error while booting. Following is linux booting information, how can I resolve this problem?
[ 0.715815] DMA: preallocated 256 KiB pool for atomic allocations
[ 0.722094] Serial: AMBA PL011 UART driver
[ 0.740959] hi3660-mbox e896b000.mailbox: Mailbox enabled
[ 21.760941] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 21.766673] \0x095-...: (1 GPs behind) idle=121/140000000000000/0 softirq=18/19 fqs=2625
[ 21.774591] \0x09(detected by 0, t=5252 jiffies, g=-290, c=-291, q=111)
[ 21.780930] Task dump for CPU 5:
[ 21.784189] swapper/0 R running task 0 1 0 0x00000002
[ 21.791319] Call trace:
[ 21.793802] [] __switch_to+0x94/0xa8
[ 21.798995] [<0000000000000040>] 0x40
(stuck here)

@vchong
Copy link
Contributor Author

vchong commented Sep 30, 2017

@HaiyongSun We'll investigate and get back to you later. Thanks for the report!

@Lambda2023
Copy link

@vchong Thanks, I read the file "stallwarn.txt" which writes:
Recent kernels will print a long form of the stall-warning message:
INFO: rcu_preempt detected stall on CPU
0: (63959 ticks this GP) idle=241/3fffffffffffffff/0 softirq=82/543
(t=65000 jiffies)
But, I still haven't found out the reason why ".. stalls on CPUs/tasks:" after Mailbox enabled each time.

@vchong
Copy link
Contributor Author

vchong commented Oct 5, 2017

@HaiyongSun The problem is 'fixed'. See #1851 for details.

@vchong
Copy link
Contributor Author

vchong commented Oct 5, 2017

The original issue reported is fixed so closing.
See #1483 for details.

@vchong vchong closed this as completed Oct 5, 2017
@Lambda2023
Copy link

@vchong Yes, It works! Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants