Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wait_us optimization #10609

Merged
merged 5 commits into from
Jun 28, 2019
Merged

wait_us optimization #10609

merged 5 commits into from
Jun 28, 2019

Conversation

kjbracey
Copy link
Contributor

Description

As the timer code became more generic, coping with initialization on demand, and variable width and speed us_ticker_api implementations, wait_us has gradually gotten slower and slower.

Some platforms have reportedly seen overhead of wait_us() increase from 10µs to 30µs. These changes should fully reverse that drop, and even make it better than ever.

Add fast paths for platforms that provide compile-time information about us_ticker. Speed and code size is improved further if:

  • Timer has >= 2^32 microsecond range, or better still is 32-bit 1MHz.
  • Platform implements us_ticker_read() as a macro
  • Timer is initialised at boot, rather than first use

The latter initialisation option is the default for STM, as this has always been the case.

PR adds support for the optimization for Freescale MCUXpresso family and all STM devices.

Testing on K64F, a tight { led = !led; wait_us(10); } loop does achieve 10µs transitions with this change. Without, it was 16µs.

Information provided by targets is only currently used to optimize wait_us, not any other use of us_ticker via Ticker et al, but the information could be used to optimize those in future.

Pull request type

[ ] Fix
[X] Refactor
[ ] Target update
[ ] Functionality change
[ ] Docs update
[ ] Test update
[ ] Breaking change

Reviewers

@bulislaw

Release Notes

  • wait_us has been optimized for certain platforms - see us_ticker_api.h for the details of enabling optimizations on other targets. It can be optimized further if the option target.init-us-ticker-at-boot is enabled.

@ciarmcom
Copy link
Member

@kjbracey-arm, thank you for your changes.
@maclobdell @bulislaw @MarceloSalazar @ARMmbed/mbed-os-core @ARMmbed/mbed-os-test @ARMmbed/mbed-os-maintainers @ARMmbed/mbed-os-hal please review.

@MarceloSalazar
Copy link

@ARMmbed/team-st-mcd @ARMmbed/team-nxp please review

@MarceloSalazar MarceloSalazar requested review from a team May 17, 2019 15:41
@jeromecoutant
Copy link
Collaborator

Hi
After some tests, it seems that "Microsecond ticker init is safe to call repeatedly" test from tests-mbed_hal-common_tickers becomes FAIL for all STM32 families

@kjbracey
Copy link
Contributor Author

@jeromecoutant, this should have fixed the init() test issue.

Our tests have shown that this does greatly improve performance on STM32L151, which we're particularly targeting, but apparently performance is still below Mbed OS 5.6.1 with this change - ~12µs overhead, when 5.6.1 was ~9µs. The generated code for wait_us is optimal, so any delay is now down to the timer itself, or general system load.

Is there any reason in the STM target code that the timer would be slower now than in 5.6.1 on that device? (We're using 5.6.1 as a comparator, as that's the last time the wait_us code was small and tight with just a straight call to the target's us_ticker_read).

Would running the timer faster than 1MHz improve read time?

Copy link
Member

@bulislaw bulislaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, is there a point of doing something similar for LP ticker?

@kjbracey
Copy link
Contributor Author

Looks good, is there a point of doing something similar for LP ticker?

Potentially, yes. Less necessary for speed of lpticker itself, but would open the door to replacing the whole ticker_data/api core with something templatey - get as much stuff figured out at compile time as possible.

Total code size of two template specialisations for lpticker and usticker might come out not too dissimilar to the generic code.

(The choice to use numerator/denominator ratios here was influenced by C++11 std::ratio and std::duration - might want to use that as part of the framework).

Copy link
Member

@pan- pan- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks good however it is confusing to have macros and functions with the same name. Wouldn't it be better to mark the default implementation of us_ticker_read as weak and override it as a static inline function in header files of platform supporting this optimisation ?

@kjbracey
Copy link
Contributor Author

kjbracey commented May 20, 2019

The change looks good however it is confusing to have macros and functions with the same name.

Somewhat agree, but I feel our hands are partially tied by binary compatibility. I bet there are wifi drivers containing us timing calls :/. I don't think we can afford for wait_us or us_ticker_read to not have a real definition, nor can we change the prototype, I think.

The scheme here is "conventional" in that it's what the standard C library officially endorses - any standard library function can be implemented as a macro, as long as there is a real function too that you can take the address of.

Now, I've only just realised that there is precedent in the HAL for inlining - gpio_read and gpio_write for STM. That has void gpio_write() in gpio_api.h, but STM's gpio_objects.h defines static inline void gpio_write(). (And gpio_api.h always includes device.h, so the static definition is always visible, and there is no external definition).

Personally I find that confusing as well - it's jarring to me that void gpio_write() is a valid declaration of a static function....

the default implementation of us_ticker_read as weak and override it as a static inline function in header files of platform supporting this optimisation

That doesn't quite add up - a static function wouldn't override a weak external function, it would just hide it.

There is no "default" implementation of us_ticker_read - it's the core HAL function. I guess defining the real function as "invoke the HAL macro" could be provided by the core, when there is a macro? But again, I feel this is safest for binary compatibility - keeping the HAL providing an external us_ticker_read always.

Alternatively, if we did act like gpio_write and have a static inline, we would find it hard to provide a real external us_ticker_read for backwards compatibility - we'd have to suppress the static definition in whichever file provided the external one. That's harder than suppressing the macro with the extra () around the function identifier.

@pan-
Copy link
Member

pan- commented May 20, 2019

Thank you for that insightful answer @kjbracey-arm; I it wasn't really clever to suggest an override of a global symbol with a static inline one 😬 . I wonder if there's some compiler magic we can use to have a global function defined inline as we want the definition to be available for all translation unit.

Copy link
Collaborator

@jeromecoutant jeromecoutant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non regression executed with ['NUCLEO_F091RC', 'NUCLEO_F103RB', 'NUCLEO_F207ZG', 'NUCLEO_F303ZE', 'NUCLEO_F446RE', 'NUCLEO_F767ZI', 'NUCLEO_H743ZI', 'NUCLEO_L073RZ', 'NUCLEO_L152RE', 'NUCLEO_L476RG', 'NUCLEO_WB55RG']

@kjbracey
Copy link
Contributor Author

I wonder if there's some compiler magic we can use to have a global function defined inline as we want the definition to be available for all translation unit.

C99 non-static inline kind of works like that, and would preserve binary compatibility. You can have an inline definition in your header file, and there is exactly one real external definition in one translation unit - all other translation units either use the inline definition or call/address that external definition.

There is a minor potential source compatibility glitch that was bothering me, if people were declaring the function themselves, which is why I avoided it, but I've realised that the macro form would also fail in that same situation.

I'll change to non-static inline, as it does correspond to gpio_write, and is a bit clearer, if you agree.

@pan-
Copy link
Member

pan- commented May 21, 2019

What was the source compatibility glitch that bothered you ? Other than that I'm delighted if we use the inline function solution.

@kjbracey
Copy link
Contributor Author

kjbracey commented May 21, 2019

What was the source compatibility glitch that bothered you ?

If there's an inline definition in the header, then this C file would produce an extra duplicate external definition, causing a link error:

#include "mbed.h"
extern uint32_t us_ticker_read();

(Not sure if the extern is needed to provoke that, or just lack of inline; standard isn't clear) Update: C99 rationale confirms lack of inline is sufficient.

But if it was a macro, then the above would also fail to compile as it would substitute the macro - that's why us_ticker_api.h had its declaration adjusted to uint32_t (us_ticker_read)().

So extra user declarations break either way.

@kjbracey
Copy link
Contributor Author

kjbracey commented May 21, 2019

Ah, another stumbling block. K64F's us_ticker_read would want to call its PIT_GetCurrentTimerCount, which is static inline.

Non-static inline functions can't call static functions.

We have quite a lot of static inline in various places, substituting for macros (including use for MBED_FORCEINLINE), but it's quite hard to mix the two types of inline. :(

@kjbracey
Copy link
Contributor Author

Having hit that, I think I'm inclined to leave it as macros. Macros can be a bit surprising, but at least they're "C90" surprising - something pretty understandable. The inline stuff is pretty intricate, as very few people are familiar with C99 inline semantics, as opposed to C++.

@mmahadevan108
Copy link
Contributor

I am seeing the below error when running the usticker test on K66F

Scan: us_ticker
Compile [100.0%]: main.cpp
[Error] main.cpp@39,89: '_ticker_info' was not declared in this scope
[Error] main.cpp@40,86: 'TEST_ASSERT_UINT32_EQUAL' was not declared in this scope
Memory map breakdown for built projects (values in Bytes):

name target toolchain static_ram total_flash

Build successes:

  • K66F::GCC_ARM::MBED-BUILD

Build failures:

  • K66F::GCC_ARM::MBED-OS-TESTS-MBED_HAL-US_TICKER
    Building project us_ticker (K66F, GCC_ARM)
    Scan: GCC_ARM
    Scan: us_ticker
    Compile [100.0%]: main.cpp
    [Error] main.cpp@39,89: '_ticker_info' was not declared in this scope
    [Error] main.cpp@40,86: 'TEST_ASSERT_UINT32_EQUAL' was not declared in this scope

[mbed] ERROR: "c:\python27\python.exe" returned error.
Code: 1
Path: "C:\mymbed"
Command: "c:\python27\python.exe -u C:\mymbed\mbed-os\tools\test.py -D MBED_TEST_MODE -t GCC_ARM -m K66F -c --sou
rce . --build .\BUILD\tests\K66F\GCC_ARM --test-spec .\BUILD\tests\K66F\GCC_ARM\test_spec.json --build-data .\BUILD\test
s\K66F\GCC_ARM\build_data.json -n mbed-os-tests-mbed_hal-us* --greentea"
Tip: You could retry the last command with "-v" flag for verbose output

@mbed-ci
Copy link

mbed-ci commented Jun 27, 2019

Test run: SUCCESS

Summary: 11 of 11 test jobs passed
Build number : 2
Build artifacts

@mbed-ci
Copy link

mbed-ci commented Jun 27, 2019

Test run: SUCCESS

Summary: 11 of 11 test jobs passed
Build number : 3
Build artifacts

@0xc0170
Copy link
Contributor

0xc0170 commented Jun 28, 2019

@kjbracey-arm Is this ready for 5.13.1 ?

@kjbracey
Copy link
Contributor Author

I think it's ready, but I'd like someone to review that last commit. It wasn't exactly a localised fix to get IAR to compile.

I generally feel this isn't a 5.13.1 thing overall anyway - it's an optimisation, not a fix; but if someone wants it for 5.13.1, that's fine.

@0xc0170
Copy link
Contributor

0xc0170 commented Jun 28, 2019

I think it's ready, but I'd like someone to review that last commit. It wasn't exactly a localised fix to get IAR to compile.

It looks fine to me, should be sufficient to use forceinline or similar what toolchain provide for those macros, no need to have static addition in them.

@0xc0170
Copy link
Contributor

0xc0170 commented Jun 28, 2019

@adbridge Before we hit merge button, would like to see justification for 5.13.1 for this PR.

@0xc0170
Copy link
Contributor

0xc0170 commented Jun 28, 2019

Got it, important to have this in 5.13.1, ready !

@jeromecoutant
Copy link
Collaborator

Hi

Since this PR, we got to many warnings...

[Warning] us_ticker_defines.h@17,9: 'MBED_US_TICKER_DEFINES_H' is used as a header guard here, followed by #define of a different macro [-Wheader-guard]

kjbracey added a commit to kjbracey/mbed-os that referenced this pull request Jul 1, 2019
artokin pushed a commit that referenced this pull request Jul 3, 2019
evedon pushed a commit that referenced this pull request Jul 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants