-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix wait_ns is unstable with flash performance #10632
Conversation
Not all chips' flash are non-zero wait state. These chips usually have cache to improve performance. To avoid non-zero wait state and non-constant instruction cycles, locate delay loop code on SRAM instead of on flash.
Unfortunately this is incompatible with memory protection; we try to mark the RAM non-executable if we can. Ah, you spotted that and added the lock. But that's a potentially significant extra call overhead. And who's to say the RAM is necessarily more stable than the ROM? This To actually do the job of getting exactly the number wanted, you'd be better placed to have something with solid compile-time knowledge of the CPU rate, along with any extra factors about wait states. Inspiration from #10609 might be relevant. I'd be inclined to maybe adjust it a little to leave the door open to alternative implementations - maybe we could make I'm curious - how much does this really help if there is a cache - the act of calling wait_ns, and its entry path are still in ROM, so this can only speed up the first iteration around the delay loop? If this does help (cacheless?) wouldn't you want all your code, including whatever was calling |
@ccli8, thank you for your changes. |
Actually, my goal is to make Nuvoton targets like NUMAKER_PFM_M2351 pass |
But that implies repeated misses, surely? Isn't it doing enough iterations that it should be mostly cache hits? Is the cache not working? (And why don't we see this in our CI? That test is definitely being run for many targets in CI - I had to fix it up for various previously). |
This is a good hint. Further checking, actually, M2351's cache is force-disabled (for internal reason). Its cache-on configuration is to be determined. It seems make
Not clear. Re-test |
@kjbracey-arm Do you have plan to make |
I've currently got a lot of 5.13 PRs needing to be worked on - you can feel free to make a PR doing just that. |
Change to #10683. Close this one. |
Description
Not all chips' flash are non-zero wait state. These chips usually have cache to improve performance. To avoid non-zero wait state and non-constant instruction cycles, this PR locates delay loop code on SRAM instead of on flash.
Pull request type