Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix wait_ns is unstable with flash performance #10632

Closed
wants to merge 1 commit into from

Conversation

ccli8
Copy link
Contributor

@ccli8 ccli8 commented May 22, 2019

Description

Not all chips' flash are non-zero wait state. These chips usually have cache to improve performance. To avoid non-zero wait state and non-constant instruction cycles, this PR locates delay loop code on SRAM instead of on flash.

Pull request type

[X] Fix
[ ] Refactor
[ ] Target update
[ ] Functionality change
[ ] Docs update
[ ] Test update
[ ] Breaking change

Not all chips' flash are non-zero wait state. These chips usually have cache to
improve performance. To avoid non-zero wait state and non-constant instruction
cycles, locate delay loop code on SRAM instead of on flash.
@kjbracey
Copy link
Contributor

kjbracey commented May 22, 2019

Unfortunately this is incompatible with memory protection; we try to mark the RAM non-executable if we can. Ah, you spotted that and added the lock. But that's a potentially significant extra call overhead.

And who's to say the RAM is necessarily more stable than the ROM?

This wait_ns is more a "reasonable effort" guaranteed to not undershoot, rather a solid attempt to be stable and calibrated to minimise overshoot.

To actually do the job of getting exactly the number wanted, you'd be better placed to have something with solid compile-time knowledge of the CPU rate, along with any extra factors about wait states. Inspiration from #10609 might be relevant.

I'd be inclined to maybe adjust it a little to leave the door open to alternative implementations - maybe we could make wait_ns weak, so it can be overridden by someone prepared to do system tuning, and allow it to be a macro like #10609.

I'm curious - how much does this really help if there is a cache - the act of calling wait_ns, and its entry path are still in ROM, so this can only speed up the first iteration around the delay loop?

If this does help (cacheless?) wouldn't you want all your code, including whatever was calling wait_ns, in RAM to get whatever stability you're looking for?

@ciarmcom ciarmcom requested review from a team May 22, 2019 11:00
@ciarmcom
Copy link
Member

@ccli8, thank you for your changes.
@ARMmbed/mbed-os-core @ARMmbed/mbed-os-maintainers please review.

@ccli8
Copy link
Contributor Author

ccli8 commented May 22, 2019

Actually, my goal is to make Nuvoton targets like NUMAKER_PFM_M2351 pass mbed-os-tests-mbed_platform-wait_ns test. Per my test on NUMAKER_PFM_M2351, the error rate reaches 2.6x compared to tolerance 1.2x. On NUMAKER_PFM_M2351, flash is non-zero wait state and so cache is added to improve performance. But due to cache miss, instructions are not executed in expected cycles. So I locate delay loop code on SRAM which is zero wait state. If the solution cannot apply to all platforms, then as you suggested, make wait_ns weak is another solution.

@kjbracey
Copy link
Contributor

kjbracey commented May 22, 2019

But due to cache miss, instructions are not executed in expected cycles

But that implies repeated misses, surely? Isn't it doing enough iterations that it should be mostly cache hits? Is the cache not working?

(And why don't we see this in our CI? That test is definitely being run for many targets in CI - I had to fix it up for various previously).

@ccli8
Copy link
Contributor Author

ccli8 commented May 23, 2019

Is the cache not working?

This is a good hint. Further checking, actually, M2351's cache is force-disabled (for internal reason). Its cache-on configuration is to be determined. It seems make wait_ns weak to override can help in situations like this M2351 or non-zero wait state+no cache targets.

And why don't we see this in our CI

Not clear. Re-test mbed-os-tests-mbed_platform-wait_ns on master on M2351. It still fails with 2.6x error rate compared to 1.2x tolerance.

@ccli8
Copy link
Contributor Author

ccli8 commented May 28, 2019

@kjbracey-arm Do you have plan to make wait_ns weak for override?

@kjbracey
Copy link
Contributor

I've currently got a lot of 5.13 PRs needing to be worked on - you can feel free to make a PR doing just that.

@ccli8
Copy link
Contributor Author

ccli8 commented May 28, 2019

Change to #10683. Close this one.

@0xc0170 0xc0170 closed this May 31, 2019
@ccli8 ccli8 deleted the nuvoton_wait-ns_unstbl branch July 11, 2019 09:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants