Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache optimizations #3005

Merged
merged 10 commits into from
Jul 1, 2019
Merged

Cache optimizations #3005

merged 10 commits into from
Jul 1, 2019

Conversation

jenswi-linaro
Copy link
Contributor

Increase speed by more optimized cache maintenance. Also some TLB invalidations are optimized.

@jenswi-linaro
Copy link
Contributor Author

On Hikey running xtest with paging enabled in AArch32:
Before:

time xtest
real    1m 38.05s
user    0m 0.08s
sys     1m 30.69s

after:

time xtest
real    1m 29.48s
user    0m 0.12s
sys     1m 18.83s

@jforissier
Copy link
Contributor

I get a crash on HiKey960 (64-bit TAs), which seems to be triggered by commit 227c51a ("core: pager: use icache_inv_user_range()"). At least that's what "git bisect" says.

3.5.0-76-g20d95107
make -j10 CFG_ULIBS_SHARED=y CFG_WITH_PAGER=y COMPILE_S_USER=64 CFG_TEE_CORE_LOG_LEVEL=2

$ xtest 6016
[...]
o regression_6016.2 Storage id: 80000000
    threads: 4, loops: 8
E/TC:5 3
E/TC:5 3 User TA data-abort at address 0x7f48 (translation fault)
E/TC:5 3  esr 0x92000005  ttbr0 0x600003f0240a0   ttbr1 0x00000000   cidr 0x0
E/TC:5 3  cpu #5          cpsr 0x60000100
E/TC:5 3  x0  ffffffffffffffe0 x1  0000000000007f40
E/TC:5 3  x2  0000000000007f60 x3  0000000040030570
E/TC:5 3  x4  0000000040030560 x5  000000004002d3c8
E/TC:5 3  x6  0000000000005135 x7  0000000000000020
E/TC:5 3  x8  0000000040005b40 x9  0000000000005135
E/TC:5 3  x10 000000004004d980 x11 0000000000000000
E/TC:5 3  x12 0000000000000000 x13 0000000040025f80
E/TC:5 3  x14 0000000000000000 x15 0000000000000000
E/TC:5 3  x16 0000000040046098 x17 0000000040032630
E/TC:5 3  x18 0000000000000000 x19 0000000000000000
E/TC:5 3  x20 0000000000000004 x21 0000000040025f88
E/TC:5 3  x22 0000000000005135 x23 0000000040025f80
E/TC:5 3  x24 0000000000000004 x25 000000003f036720
E/TC:5 3  x26 000000003f08bf00 x27 0000000000000004
E/TC:5 3  x28 0000000040025f88 x29 0000000040025e80
E/TC:5 3  x30 00000000400327d8 elr 0000000040032834
E/TC:5 3  sp_el0 0000000040025e80
E/TC:5 3 Status of TA b689f2a7-8adf-477a-9f99-32e90c0ad0a2 (0x3f036940) (active)
E/TC:5 3  arch: aarch64  load address: 0x0000000040026000 ctx-idr: 6
E/TC:5 3  stack: 0x0000000040025000 4096
E/TC:5 3  region  0: va 0x0000000040000000 pa 0x000000003f002000 size 0x002000 flags ---R-X
E/TC:5 3  region  1: va 0x0000000040002000 pa 0x000000003f016000 size 0x001000 flags ---RW-
E/TC:5 3  region  2: va 0x0000000040004000 pa 0x00000000ba489b20 size 0x001000 flags rw-RW-
E/TC:5 3  region  3: va 0x0000000040005000 pa 0x00000000ba489b40 size 0x001000 flags rw-RW-
E/TC:5 3  region  4: va 0x0000000040025000 pa 0x0000000000000000 size 0x001000 flags rw-RW-
E/TC:5 3  region  5: va 0x0000000040026000 pa 0x0000000000000000 size 0x002000 flags r-xR-- [0]
E/TC:5 3  region  6: va 0x0000000040028000 pa 0x0000000000000000 size 0x009000 flags rw-RW- [0]
E/TC:5 3  region  7: va 0x0000000040031000 pa 0x0000000000000000 size 0x005000 flags r-xR-- [1]
E/TC:5 3  region  8: va 0x0000000040045000 pa 0x0000000000000000 size 0x002000 flags rw-RW- [1]
E/TC:5 3  region  9: va 0x0000000040047000 pa 0x0000000000000000 size 0x00f000 flags r-xR-- [2]
E/TC:5 3  region 10: va 0x0000000040065000 pa 0x0000000000000000 size 0x004000 flags rw-RW- [2]
E/TC:5 3  region 11: va 0x0000000040069000 pa 0x0000000000000000 size 0x036000 flags r-xR-- [3]
E/TC:5 3  region 12: va 0x00000000400ae000 pa 0x0000000000000000 size 0x005000 flags rw-RW- [3]
E/TC:5 3  [0] b689f2a7-8adf-477a-9f99-32e90c0ad0a2 @ 0x0000000040026000
E/TC:5 3  [1] 71855bba-6055-4293-a63f-b0963a737360 @ 0x0000000040031000
E/TC:5 3  [2] 527f1a47-b92c-4a74-95bd-72f19f4a6f74 @ 0x0000000040047000
E/TC:5 3  [3] 87bb6ae8-4b1d-49fe-9986-2b966132c309 @ 0x0000000040069000
regression_6000.c:1700: fs_create(&sess, a->file_name, sizeof(a->file_name), 0x00000002, 0, data_01, sizeof(data_01), &obj, a->storage_id) has an unexpected value: 0xffff3024 = TEE_ERROR_TARGET_DEAD, expected 0x0 = TEEC_SUCCESS
  regression_6016.2 FAILED
  regression_6016 FAILED

@jenswi-linaro
Copy link
Contributor Author

Thanks, I'll try to stress my Hikey a bit more.

@jforissier
Copy link
Contributor

FYI, sometimes you need 2 or 3 runs of xtest 6016 to trigger the crash.

@etienne-lms
Copy link
Contributor

Successfully tested on armv7 (stm32mp1, with LPAE. Found #3009 but that unrelated.
I will run some stress tests with and wothout LPAE.

@jenswi-linaro
Copy link
Contributor Author

Rebased on master and also added a fix that hopefully fixes the issue found by @jforissier.

@jenswi-linaro
Copy link
Contributor Author

@etienne-lms, what kind of i-cache does stm32mp1 have? PIPT or VIPT?

@etienne-lms
Copy link
Contributor

L1 icache is VIPT on stm32mp1.

@jenswi-linaro
Copy link
Contributor Author

Thanks. I guess the reason why you didn't see the error was that you didn't compile with CFG_ULIBS_SHARED=y CFG_WITH_PAGER=y CFG_TEE_CORE_LOG_LEVEL=2.

@jforissier
Copy link
Contributor

jforissier commented Jun 12, 2019

@jenswi-linaro yes, the issue I mentioned in #3005 (comment) cannot be reproduced now (commit 76525ef).

Update: I will run more tests and provide a T-b.

@jforissier
Copy link
Contributor

Tested-by: Jerome Forissier <[email protected]> (HiKey960)

(I have tested pretty much all the configurations I run for release testing; commit 76525ef + PR 3086 on top).

@jenswi-linaro
Copy link
Contributor Author

Thanks for the rigorous testing. The code seems to work OK, now it only need some reviewing too.

Copy link
Contributor

@jforissier jforissier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewing commit "core: arm64.h: add tlbi_vale1is()".

core/arch/arm/include/arm64.h Outdated Show resolved Hide resolved
@jforissier
Copy link
Contributor

For commits:

  • "arm.h: add CTR_WORD_SIZE"
  • "core: cache_helpers_a{32,64}.S: remove section assignments"
  • "core: arm32.h: add TLBI_{MVA_SHIFT,ASID_MASK}"

Reviewed-by: Jerome Forissier <[email protected]>

@jenswi-linaro
Copy link
Contributor Author

Squashed, rebased and tags applied. Note that a bunch of commits are missing R-B or A-B tags.

Copy link
Contributor

@etienne-lms etienne-lms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for commit "core: add icache_inv_user_range()": 3 minor comments once addressed
Reviewed-by: Etienne Carriere <[email protected]>

* void icache_inv_user_range(void *addr, size_t size);
*
* This function has to execute with the user space ASID active,
* this means executing with reduced mapping and the code need
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: s/need/needs/

* void icache_inv_user_range(void *addr, size_t size);
*
* This function has to execute with the user space ASID active,
* this means executing with reduced mapping and the code need
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/need/needs/

/* Calculate minimum icache line size, result in x2 */
mrs x3, ctr_el0
and x3, x3, #CTR_IMINLINE_MASK
mov x2, #4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/#4/#CTR_WORD_SIZE/

Copy link
Contributor

@etienne-lms etienne-lms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For commit "core: pager: use icache_inv_user_range()":

  • tee_pager_unhide_page() calls area_tlbi_entry() which is introduced later in the series in commit "core: pager: use tlbi_mva_asid() where applicable".
  • minor comment
  • with those minor issues fixed: Reviewed-by: Etienne Carriere <[email protected]>

*
* For user executable pages it's more complicated. Those pages can
* be shared between multiple TA mappings and thus populated by
* another TA. The reference manual statates that:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: s/statates/states/

Copy link
Contributor

@etienne-lms etienne-lms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For commit "core: add dcache_clean_range_pou()":
Reviewed-by: Etienne Carriere <[email protected]>

Copy link
Contributor

@etienne-lms etienne-lms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for commit "core: pager: use dcache_clean_range_pou()": Acked-by: Etienne Carriere <[email protected]>

Copy link
Contributor

@etienne-lms etienne-lms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for commit "core: add tlbi_mva_asid()":
Reviewed-by: Etienne Carriere <[email protected]>

Copy link
Contributor

@etienne-lms etienne-lms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for commit "core: pager: use tlbi_mva_asid() where applicable":
Reviewed-by: Etienne Carriere <[email protected]>

Copy link
Contributor

@etienne-lms etienne-lms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for commit "core: arm64.h: add tlbi_vale1is()":
Reviewed-by: Etienne Carriere <[email protected]>

Adds a common define for the word size used by the CTR (cache type)
register.

Reviewed-by: Jerome Forissier <[email protected]>
Signed-off-by: Jens Wiklander <[email protected]>
Adds icache_inv_user_range() which is used when invalidating currently
mapped user space memory. This is needed since a different ASID is
usually in use while in kernel mode. So using icache_inv_range() would
normally not have any effect on user mode mappings.

Reviewed-by: Etienne Carriere <[email protected]>
Signed-off-by: Jens Wiklander <[email protected]>
Since the FUNC and LOCAL_FUNC assembly macros now assign a section to
each assembly function the explicitly assigned sections in
cache_helpers_a{32,64}.S are ignored. So remove the ignored section
assignments.

Reviewed-by: Jerome Forissier <[email protected]>
Signed-off-by: Jens Wiklander <[email protected]>
Adds dcache_clean_range_pou() which cleans the data cache to the point
of unification. This is exactly what's needed when later invalidating
the icache due to updates in a page.

Reviewed-by: Etienne Carriere <[email protected]>
Signed-off-by: Jens Wiklander <[email protected]>
Pager uses dcache_clean_range_pou() when cleaning pages before
invalidating icache for that page. Prior to this patch
dcache_clean_range() was used indirectly which cleans the range to point
of coherency instead of point of unification.

With this patch we're likely to save one data cache level by only
cleaning level 1 instead of level 1 and 2. This assumes separate data
and instructions caches level 1 and a unified data cache at level 2

Acked-by: Etienne Carriere <[email protected]>
Signed-off-by: Jens Wiklander <[email protected]>
Adds TLBI macros to help formatting source register for TLB
invalidations.

Reviewed-by: Jerome Forissier <[email protected]>
Signed-off-by: Jens Wiklander <[email protected]>
Adds tlbi_vale1is() which is a wrapper around inline assembly code to
execute  "tlbi vale1is". The operation is described as "TLB Invalidate
by VA, Last level, EL1, Inner Shareable" in the ARM ARM.

Reviewed-by: Etienne Carriere <[email protected]>
Signed-off-by: Jens Wiklander <[email protected]>
Adds tlbi_mva_asid() to invalidate one TLB entry, typically page sized,
selected by virtual address and address identifier. The function targets
both the kernel mode and user mode address identifiers at the same time.

Reviewed-by: Etienne Carriere <[email protected]>
Signed-off-by: Jens Wiklander <[email protected]>
Instead of invalidating a virtual address for all ASIDs only target the
relevant ones. For kernel mode mappings all ASIDs still needs to be
targeted though.

Reviewed-by: Etienne Carriere <[email protected]>
Signed-off-by: Jens Wiklander <[email protected]>
Prior to this patch the entire icache was invalidated when icache
invalidations was needed, even if it only was for a single page. This
was needed to reach a stable state with regards to paging user TAs.

With this patch a new function, icache_inv_user_range(), is used to
invalidate pages used by user TAs and icache_inv_range() is used instead
to invalidate kernel mode pages.

Reviewed-by: Etienne Carriere <[email protected]>
Signed-off-by: Jens Wiklander <[email protected]>
@jenswi-linaro
Copy link
Contributor Author

Comments address, commits slightly reordered with some conflict resolving and tags applied.

@jforissier jforissier merged commit c4a5739 into OP-TEE:master Jul 1, 2019
@jenswi-linaro jenswi-linaro deleted the cache_optim branch July 1, 2019 07:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants