Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arm/convolution_3x3_pack1to8_fp16s: prefer ldr/str over ld1/st1 #5603

Merged
merged 1 commit into from
Jul 30, 2024

Conversation

quink-black
Copy link
Contributor

Depending on the arch, ldr/str can be faster than ld1/st1, especially for loading to one lane form. For example

On Cortex A75,

  1. execution latency of 'ldr q0' and 'ldr h0' are 5
  2. execution latency of 'ld1 {v0.16b}' is 6
  3. execution latency of 'ld1 {v0.h}[0]' is 8

On Cortex X3,

  1. execution latency of 'ldr q0' and 'ldr h0' are 6
  2. execution latency of 'ld1 {v0.16b}' is 6
  3. execution latency of 'ld1 {v0.h}[0]' is 8

Depending on the arch, ldr/str can be faster than ld1/st1, especially
for loading to one lane form. For example, on Cortex A75,

1. execution latency of 'ldr q0' and 'ldr h0' are 5
2. execution latency of 'ld1 {v0.16b}' is 6
3. execution latency of 'ld1 {v0.h}[0]' is 8

On Cortex X3,
1. execution latency of 'ldr q0' and 'ldr h0' are 6
2. execution latency of 'ld1 {v0.16b}' is 6
3. execution latency of 'ld1 {v0.h}[0]' is 8

Signed-off-by: Zhao Zhili <[email protected]>
@github-actions github-actions bot added the arm label Jul 25, 2024
@codecov-commenter
Copy link

codecov-commenter commented Jul 25, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 94.58%. Comparing base (997c892) to head (2ace409).
Report is 5 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5603      +/-   ##
==========================================
- Coverage   94.73%   94.58%   -0.16%     
==========================================
  Files         786      786              
  Lines      248543   248542       -1     
==========================================
- Hits       235462   235084     -378     
- Misses      13081    13458     +377     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@nihui nihui merged commit 92e0b82 into Tencent:master Jul 30, 2024
29 checks passed
@nihui
Copy link
Member

nihui commented Jul 30, 2024

Thanks for your contribution !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants