Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HW Loop Constraints question #937

Closed
jstraus59 opened this issue Jan 17, 2024 · 5 comments
Closed

HW Loop Constraints question #937

jstraus59 opened this issue Jan 17, 2024 · 5 comments
Assignees
Labels
Component:Doc For issues in the Documentation (e.g. for README.md files) PARAM:PULP_XPULP Issue depends on the PULP_XPULP parameter Type:Enhancement For feature requests and enhancements

Comments

@jstraus59
Copy link

I have a question regarding the hardware loop constraints.

Suppose that the cv.count 0, %[N]; instruction at line 12 of the assembly code example in the documentation were to be deleted. Assuming that the lpcount0 register is initially 0, it would seem this would simply result in a single execution of the inner loop on the first execution of the outer loop, and then the expected number of inner loop executions on each subsequent outer loop iteration.

AFAICT this would not violate any of the HW Loop constraints - can you please confirm that this would not be a violation of any constraints, and should work as expected?

However, if the RTL takes special action for any instruction within the hardware loop body then it is possible there could be some unexpected behavior if, for example, an interrupt or exception occurs during the initial iteration of the inner loop, while lpcount0==0 (in normal loop execution the lpcount would never be expected to be 0.)

If this case IS a problem, then additional constraints are needed, perhaps a constraint that an instruction in a loop body may not be executed when the corresponding lpcount is 0?

One more potential problem I see along these lines is when using the non-atomic instructions to update the lpstart/lpend registers. Once one is written, it is then possible that the next instruction to be executed is within the hardware loop body, as defined by the intermediate values of the start and end. If the hardware has comparators to detect that an instruction being executed is between the start and end then whatever behavior is triggered as a result of that determination could be erroneously triggered.

Is this a potential problem?

@pascalgouedo
Copy link

Suppose that the cv.count 0, %[N]; instruction at line 12 of the assembly code example in the documentation were to be deleted. Assuming that the lpcount0 register is initially 0, it would seem this would simply result in a single execution of the inner loop on the first execution of the outer loop, and then the expected number of inner loop executions on each subsequent outer loop iteration.

Agree.

AFAICT this would not violate any of the HW Loop constraints - can you please confirm that this would not be a violation of any constraints, and should work as expected?

Agree. No constraint violation, just bad HWLoop setup.
In fact it is treated just as sequential code and not an hwloop body when loop count == 0.

However, if the RTL takes special action for any instruction within the hardware loop body then it is possible there could be some unexpected behavior if, for example, an interrupt or exception occurs during the initial iteration of the inner loop, while lpcount0==0 (in normal loop execution the lpcount would never be expected to be 0.)

RTL execution is correct even when an interrupt happens during this first and unique loop iteration when lpcount0 == 0 as it treats it as normal sequential code.

If this case IS a problem, then additional constraints are needed, perhaps a constraint that an instruction in a loop body may not be executed when the corresponding lpcount is 0?

This case does not cause any problem so no new constraint is needed here.

One more potential problem I see along these lines is when using the non-atomic instructions to update the lpstart/lpend registers. Once one is written, it is then possible that the next instruction to be executed is within the hardware loop body, as defined by the intermediate values of the start and end. If the hardware has comparators to detect that an instruction being executed is between the start and end then whatever behavior is triggered as a result of that determination could be erroneously triggered.

Is this a potential problem?

Yes this could be problem if an hwloop programming sequence like that would happen. But if loop count is programmed last and it was 0 before, those instructions in between start/stop programming are treated as normal/sequential instructions.
For sure if loop count is programmed first (and > 0) then those kind of incorrect behavior could happen.
That's why I added in the User manual this paragraph in PR #932.
I should maybe change the sentence by saying "strongly suggested" rather than "good practice".
Or "lpstartX and lpendX must be programmed before lpcountX to avoid unexpected behavior".

@pascalgouedo pascalgouedo added Type:Question For general questions PARAM:PULP_XPULP Issue depends on the PULP_XPULP parameter labels Jan 19, 2024
@jstraus59
Copy link
Author

jstraus59 commented Jan 19, 2024

@pascalgouedo Thanks for your response. I would definitely recommend the last: lpstartX and lpendX must be programmed before lpcountX to avoid unexpected behavior, but I don't see how that is different than adding an additional constrant.

FYI, after implementing the HW Loop constraint checking in the Imperas ISS, I have found that the assembly code example of HW loop code in earlier versions of the manual throw a constraint violation.

Specifically, in the initialization of the outer loop:

...
4      "cv.count  1, %[N];"
5      ".balign 4;"
6      "cv.endi   1, endO;"
7      "cv.starti 1, startO;"
...

when the cv.starti instruction at line 7 executes it is between lpstart0 (which is 0) and lpend0 (which is a few instructions after this one) and thus is considered part of HW Loop body, and thus when it executes while lpcount is non-zero this violates the constraint on only executing instructions in a HW Loop body by entering the loop from its start location.

As you have pointed out, this example has been fixed in later versions of the document.

Please advise if you do not think generating a constraint violation in this case is appropriate. Technically, the constraint specifically mentions no branch/jump to a location inside a HWLoop body, but this seems to still meet the criteria of an apparent loop body instruction executing without entering the loop body from the start location.

@jstraus59
Copy link
Author

@pascalgouedo One more point that has come up in my testing. In your previous response you stated "...But if loop count is programmed last and it was 0 before...".

But the example code in the V1.5.0 document leaves a non-zero value in lpcount0 when it ends, as the inner loop counter is set at the end of the outer loop:

...
    "    endZ:;"
    "    cv.count 0, %[N];"
    "    addi %[j], %[j], 2;"
    "endO:;"

So, if users follow this template, they cannot assume the lpcountX values are always 0 when not in a loop.

A simple constraint that lpstartX/lpendX may not be changed when lpcountX is non-zero would solve all these problems, and in fact could be enforced in hardware with an illegal instruction exception. This would also greatly simplify constraint checking, as it would only need to be done upon lpcountX transitioning from 0 to non-zero.

@pascalgouedo
Copy link

I would definitely recommend the last: lpstartX and lpendX must be programmed before lpcountX to avoid unexpected behavior, but I don't see how that is different than adding an additional constraint.

I changed the sentence and I prefer not to add another constraint as there are already too much of them in my opinion.

As you have pointed out, this example has been fixed in later versions of the document.

I changed start/stop order.

Please advise if you do not think generating a constraint violation in this case is appropriate. Technically, the constraint specifically mentions no branch/jump to a location inside a HWLoop body, but this seems to still meet the criteria of an apparent loop body instruction executing without entering the loop body from the start location.

I think that this case shouldn't generate a constraint violation and keeping no branch/jump to a location inside a HWLoop body constraint only for instructions really in the hwloop body.

But the example code in the V1.5.0 document leaves a non-zero value in lpcount0 when it ends, as the inner loop counter is set at the end of the outer loop:

You are right the example is wrong. I didn't figure it out when I changed it. I corrected it to have the correct behavior.

A simple constraint that lpstartX/lpendX may not be changed when lpcountX is non-zero would solve all these problems, and in fact could be enforced in hardware with an illegal instruction exception. This would also greatly simplify constraint checking, as it would only need to be done upon lpcountX transitioning from 0 to non-zero.

This constraint has been added. As stated in the paragraph after the constraints list none of them are enforced by HW.

@pascalgouedo pascalgouedo added Type:Enhancement For feature requests and enhancements Component:Doc For issues in the Documentation (e.g. for README.md files) and removed Type:Question For general questions labels Jan 22, 2024
pascalgouedo pushed a commit to pascalgouedo/cv32e40p that referenced this issue Jan 22, 2024
Signed-off-by: Pascal Gouedo <[email protected]>
@jstraus59
Copy link
Author

Looks good! Thanks for the quick response.

davideschiavone pushed a commit that referenced this issue Jan 30, 2024
Signed-off-by: Pascal Gouedo <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component:Doc For issues in the Documentation (e.g. for README.md files) PARAM:PULP_XPULP Issue depends on the PULP_XPULP parameter Type:Enhancement For feature requests and enhancements
Projects
None yet
Development

No branches or pull requests

3 participants