perf(memblock): opt memset pattern #3632

happy-lx · 2024-09-23T09:35:10Z

This PR optimizes the MemSet memory access mode and solves some Sbuffer performance bugs. The specific changes are as follows:

The MemSet detection method is changed to: detect whether the delta address and data hash remain constant according to the Store instruction commit order, and the loadQueue is empty.
Add an ASP (Accurate Store Prefetcher) prefetcher that works specifically in MemSet mode. It uses different distances according to store instructions of different sizes. The prefetching granularity is currently 1KB.
- This prefetcher can also relax some restrictions (the data hash remains consistent && loadQueue is empty) to prefetch Store under MemCpy to improve performance. However, in order to prevent other negative impacts, it is only allowed to work under MemSet for now.
In order to improve the utilization of Sbuffer under MemSet, try to make the Sbuffer entry completely filled before sending it to Dcache. This will also help Dcache send AcquirePerm instead of AcquireData downwards so that reduce the bus bandwidth.
Solve the performance problem of sbuffer enqueuing: set enq.ready when merging is possible.

Previously, sbuffer was only ready when there were empty items. In a scenario where there are no empty items but requests received from sq can be merged, sbuffer will refuse to receive requests from sq, which will result in failure to run at full throughput.

If a memset is detected, let each newly allocated sbuffer entry wait for 32 cycles before writing to the dcache.(When memset and the write bandwidth are full, at least two sbs are executed in each cycle and 2 bytes are written. It takes 32 cycles to fill a cacheline) This will help improve sbuffer utilization

only works in MemSet Pattern

happy-lx added 5 commits September 23, 2024 17:19

perf(sq): A more accurate MemSet detection

a7ac43f

perf(memblock): add ASP store prefetcher

f810f99

only works in MemSet Pattern

bump(cpl2): add store prefetch needT

eff67f3

happy-lx requested review from cz4e, linjuanZ, good-circle, cebarobot, Lemover and bosscharlie as code owners September 23, 2024 09:35

bosscharlie approved these changes Sep 23, 2024

View reviewed changes

fix(spb): fix uninitialized Reg

746cdae

linjuanZ added the do not merge Do not merge this pull request label Sep 24, 2024

happy-lx added 8 commits September 26, 2024 17:59

fix(storepf): only trigger asp in Memset

0b5982c

bump(cpl2): bump to latest

e9e9668

Merge remote-tracking branch 'origin/master' into fix-sbuffer-memset

710c527

spf: refactor code

3043913

Merge remote-tracking branch 'origin/master' into fix-sbuffer-memset

9b68728

bump cpl2

356a575

fix(spf): consider vecValid when trainning spf

14cfe7b

fix(spf): fix seqKStride

01306eb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(memblock): opt memset pattern #3632

perf(memblock): opt memset pattern #3632

happy-lx commented Sep 23, 2024 •

edited

Loading

perf(memblock): opt memset pattern #3632

Are you sure you want to change the base?

perf(memblock): opt memset pattern #3632

Conversation

happy-lx commented Sep 23, 2024 • edited Loading

happy-lx commented Sep 23, 2024 •

edited

Loading