You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This can be simply fixed by changing the depth of the replayq to 3.
If the current behavior is a bug, please provide the steps to reproduce the problem:
The problem and proposed fix can be explored by adding an accelerator which simply loads the same address multiple times and back-to-back. After the initial miss, the L1D should be able to service the loads every cycle since the data is in the L1D. In the current state, the replayq will cause back-pressure in one in three clock cycles. After applying the suggested fix, it can handle a memory request every cycle.
What is the current behavior?
At the moment, the insufficient replayq depth will cause back pressure to the accelerator in one in three cycles.
What is the expected behavior?
Handling memory requests every cycle without back-pressure due to the insufficient replayq depth.
Please tell us about your environment:
What is the use case for changing the behavior?
RoCC accelerators accessing the L1D might gain up to 50% performance when sending memory requests back-to-back.
The text was updated successfully, but these errors were encountered:
Type of issue: other enhancement
Impact: no functional change
Development Phase: proposal
Other information
When a RoCC accelerator sends memory requests back-to-back, the two entries in the replay queue are not sufficient to handle a request every cycle.
rocket-chip/src/main/scala/rocket/SimpleHellaCacheIF.scala
Line 103 in dbcb06a
This can be simply fixed by changing the depth of the replayq to 3.
If the current behavior is a bug, please provide the steps to reproduce the problem:
The problem and proposed fix can be explored by adding an accelerator which simply loads the same address multiple times and back-to-back. After the initial miss, the L1D should be able to service the loads every cycle since the data is in the L1D. In the current state, the replayq will cause back-pressure in one in three clock cycles. After applying the suggested fix, it can handle a memory request every cycle.
What is the current behavior?
At the moment, the insufficient replayq depth will cause back pressure to the accelerator in one in three cycles.
What is the expected behavior?
Handling memory requests every cycle without back-pressure due to the insufficient replayq depth.
Please tell us about your environment:
What is the use case for changing the behavior?
RoCC accelerators accessing the L1D might gain up to 50% performance when sending memory requests back-to-back.
The text was updated successfully, but these errors were encountered: