-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wait_us optimization #10609
wait_us optimization #10609
Conversation
@kjbracey-arm, thank you for your changes. |
@ARMmbed/team-st-mcd @ARMmbed/team-nxp please review |
Hi |
c23690d
to
7ce3670
Compare
@jeromecoutant, this should have fixed the Our tests have shown that this does greatly improve performance on STM32L151, which we're particularly targeting, but apparently performance is still below Mbed OS 5.6.1 with this change - ~12µs overhead, when 5.6.1 was ~9µs. The generated code for Is there any reason in the STM target code that the timer would be slower now than in 5.6.1 on that device? (We're using 5.6.1 as a comparator, as that's the last time the Would running the timer faster than 1MHz improve read time? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, is there a point of doing something similar for LP ticker?
Potentially, yes. Less necessary for speed of lpticker itself, but would open the door to replacing the whole ticker_data/api core with something templatey - get as much stuff figured out at compile time as possible. Total code size of two template specialisations for lpticker and usticker might come out not too dissimilar to the generic code. (The choice to use numerator/denominator ratios here was influenced by C++11 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change looks good however it is confusing to have macros and functions with the same name. Wouldn't it be better to mark the default implementation of us_ticker_read
as weak and override it as a static inline
function in header files of platform supporting this optimisation ?
Somewhat agree, but I feel our hands are partially tied by binary compatibility. I bet there are wifi drivers containing us timing calls :/. I don't think we can afford for The scheme here is "conventional" in that it's what the standard C library officially endorses - any standard library function can be implemented as a macro, as long as there is a real function too that you can take the address of. Now, I've only just realised that there is precedent in the HAL for inlining - Personally I find that confusing as well - it's jarring to me that
That doesn't quite add up - a static function wouldn't override a weak external function, it would just hide it. There is no "default" implementation of Alternatively, if we did act like |
Thank you for that insightful answer @kjbracey-arm; I it wasn't really clever to suggest an override of a global symbol with a static inline one 😬 . I wonder if there's some compiler magic we can use to have a global function defined inline as we want the definition to be available for all translation unit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non regression executed with ['NUCLEO_F091RC', 'NUCLEO_F103RB', 'NUCLEO_F207ZG', 'NUCLEO_F303ZE', 'NUCLEO_F446RE', 'NUCLEO_F767ZI', 'NUCLEO_H743ZI', 'NUCLEO_L073RZ', 'NUCLEO_L152RE', 'NUCLEO_L476RG', 'NUCLEO_WB55RG']
C99 non-static inline kind of works like that, and would preserve binary compatibility. You can have an inline definition in your header file, and there is exactly one real external definition in one translation unit - all other translation units either use the inline definition or call/address that external definition. There is a minor potential source compatibility glitch that was bothering me, if people were declaring the function themselves, which is why I avoided it, but I've realised that the macro form would also fail in that same situation. I'll change to non-static inline, as it does correspond to |
What was the source compatibility glitch that bothered you ? Other than that I'm delighted if we use the inline function solution. |
If there's an inline definition in the header, then this C file would produce an extra duplicate external definition, causing a link error:
But if it was a macro, then the above would also fail to compile as it would substitute the macro - that's why So extra user declarations break either way. |
Ah, another stumbling block. K64F's Non-static inline functions can't call static functions. We have quite a lot of |
Having hit that, I think I'm inclined to leave it as macros. Macros can be a bit surprising, but at least they're "C90" surprising - something pretty understandable. The |
7ce3670
to
9fa77aa
Compare
I am seeing the below error when running the usticker test on K66F Scan: us_ticker
Build successes:
Build failures:
[mbed] ERROR: "c:\python27\python.exe" returned error.
|
Test run: SUCCESSSummary: 11 of 11 test jobs passed |
Test run: SUCCESSSummary: 11 of 11 test jobs passed |
@kjbracey-arm Is this ready for 5.13.1 ? |
I think it's ready, but I'd like someone to review that last commit. It wasn't exactly a localised fix to get IAR to compile. I generally feel this isn't a 5.13.1 thing overall anyway - it's an optimisation, not a fix; but if someone wants it for 5.13.1, that's fine. |
It looks fine to me, should be sufficient to use forceinline or similar what toolchain provide for those macros, no need to have static addition in them. |
@adbridge Before we hit merge button, would like to see justification for 5.13.1 for this PR. |
Got it, important to have this in 5.13.1, ready ! |
Hi Since this PR, we got to many warnings... [Warning] us_ticker_defines.h@17,9: 'MBED_US_TICKER_DEFINES_H' is used as a header guard here, followed by #define of a different macro [-Wheader-guard] |
Avoids build warning caused by ARMmbed#10609
Avoids build warning caused by #10609
Avoids build warning caused by #10609
Description
As the timer code became more generic, coping with initialization on demand, and variable width and speed us_ticker_api implementations,
wait_us
has gradually gotten slower and slower.Some platforms have reportedly seen overhead of
wait_us()
increase from 10µs to 30µs. These changes should fully reverse that drop, and even make it better than ever.Add fast paths for platforms that provide compile-time information about
us_ticker
. Speed and code size is improved further if:us_ticker_read()
as a macroThe latter initialisation option is the default for STM, as this has always been the case.
PR adds support for the optimization for Freescale MCUXpresso family and all STM devices.
Testing on K64F, a tight
{ led = !led; wait_us(10); }
loop does achieve 10µs transitions with this change. Without, it was 16µs.Information provided by targets is only currently used to optimize
wait_us
, not any other use of us_ticker viaTicker
et al, but the information could be used to optimize those in future.Pull request type
Reviewers
@bulislaw
Release Notes
wait_us
has been optimized for certain platforms - seeus_ticker_api.h
for the details of enabling optimizations on other targets. It can be optimized further if the optiontarget.init-us-ticker-at-boot
is enabled.