Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parallel compilation (-jN) seems to produce broken code randomly #323

Closed
teetak01 opened this issue Aug 9, 2016 · 10 comments
Closed

parallel compilation (-jN) seems to produce broken code randomly #323

teetak01 opened this issue Aug 9, 2016 · 10 comments

Comments

@teetak01
Copy link

teetak01 commented Aug 9, 2016

I have noticed that randomly our test application breaks in CI testing and have been investigating the reason for it. I noticed that the binary size of the final binary varies between builds and some builds work fine, others fail on error below:

RTX error code: 0x00000001, task ID: 0x20015210
[0255][DBG ][mClt]: M2MInterfaceImpl::~M2MInterfaceImpl() - IN
[0256][DBG ][mClt]: M2MNsdlInterface::~M2MNsdlInterface() - IN
[0257][DBG ][mClt]: M2MNsdlInterface::~M2MNsdlInterface() - OUT
[0258][DBG ][mClt]: M2MConnectionHandlerPimpl::stop_listening()
[0259][DBG ][mClt]: M2MConnectionHandlerPimpl::~M2MConnectionHandlerPimpl()
sys_mbox_post error

We then through that this might be related to the -j flag. Compiling code with -j1 produces reliably nearly similar binary sizes, but using anything else makes it quite random.

For ex. building with -j32

+-----------------------------+--------+-------+-------+
| Module                      |  .text | .data |  .bss |
+-----------------------------+--------+-------+-------+
| Fill                        |    541 |    14 |  2237 |
| Misc                        |  88062 | 11229 |  8195 |
| features/FEATURE_CLIENT     |  70573 |     3 |    57 |
| features/FEATURE_COMMON_PAL |  25298 |    89 | 10452 |
| features/frameworks         |   3823 |    52 |   784 |
| features/mbedtls            | 127373 |    51 |   119 |
| features/net                |  34692 |   102 | 51261 |
| hal/common                  |   2977 |    20 |   297 |
| hal/targets                 |  15237 |    12 |   200 |
| rtos/rtos                   |    205 |     4 |     0 |
| rtos/rtx                    |   7385 |    20 |  2686 |
| Subtotals                   | 376166 | 11596 | 76288 |
+-----------------------------+--------+-------+-------+
Allocated Heap: 65536 bytes
Allocated Stack: 32768 bytes
Total Static RAM memory (data + bss): 87884 bytes
Total RAM memory (data + bss + heap + stack): 186188 bytes
Total Flash memory (text + data + misc): 388802 bytes
Image: ./.build/K64F/GCC_ARM/mbed-client-testapp.bin

with -j12

+-----------------------------+--------+-------+-------+
| Module                      |  .text | .data |  .bss |
+-----------------------------+--------+-------+-------+
| Fill                        |    493 |    14 |  2241 |
| Misc                        |  88067 | 11229 |  8195 |
| features/FEATURE_CLIENT     |  70506 |     3 |    57 |
| features/FEATURE_COMMON_PAL |  25298 |    89 | 10452 |
| features/frameworks         |   3823 |    52 |   784 |
| features/mbedtls            | 127204 |    51 |   119 |
| features/net                |  34692 |   102 | 51261 |
| hal/common                  |   2977 |    20 |   297 |
| hal/targets                 |  15237 |    12 |   200 |
| rtos/rtos                   |    205 |     4 |     0 |
| rtos/rtx                    |   7385 |    20 |  2686 |
| Subtotals                   | 375887 | 11596 | 76292 |
+-----------------------------+--------+-------+-------+
Allocated Heap: 65540 bytes
Allocated Stack: 32768 bytes
Total Static RAM memory (data + bss): 87888 bytes
Total RAM memory (data + bss + heap + stack): 186196 bytes
Total Flash memory (text + data + misc): 388523 bytes
Image: ./.build/K64F/GCC_ARM/testapp2.bin

while with -j0

+-----------------------------+--------+-------+-------+
| Module                      |  .text | .data |  .bss |
+-----------------------------+--------+-------+-------+
| Fill                        |    469 |    14 |  2241 |
| Misc                        |  88067 | 11229 |  8195 |
| features/FEATURE_CLIENT     |  70506 |     3 |    57 |
| features/FEATURE_COMMON_PAL |  25298 |    89 | 10452 |
| features/frameworks         |   3823 |    52 |   784 |
| features/mbedtls            | 127204 |    51 |   119 |
| features/net                |  34684 |   102 | 51261 |
| hal/common                  |   2977 |    20 |   297 |
| hal/targets                 |  15237 |    12 |   200 |
| rtos/rtos                   |    205 |     4 |     0 |
| rtos/rtx                    |   7385 |    20 |  2686 |
| Subtotals                   | 375855 | 11596 | 76292 |
+-----------------------------+--------+-------+-------+
Allocated Heap: 65540 bytes
Allocated Stack: 32768 bytes
Total Static RAM memory (data + bss): 87888 bytes
Total RAM memory (data + bss + heap + stack): 186196 bytes
Total Flash memory (text + data + misc): 388491 bytes
Image: ./.build/K64F/GCC_ARM/testapp.bin

There is a few hundred byte difference in the produced binary size with completely identical builds otherwise (expect maybe time-stamp).

Could there by some issue that the build order of the files result in broken linking with bad luck?

mbed-cli 0.9.1
arm-none-eabi-gcc (15:4.9.3+svn231177-1) 4.9.3 20150529 (prerelease)

mbed-client-testapp (784f1742d26e)
|- mbed-client-cli (d4844ef43abd)
|- mbed-os (0712b8adf6bb)
|  |- features/FEATURE_CLIENT/mbed-client (47a2eea08c0c)
|  |- features/FEATURE_CLIENT/mbed-client-classic (3f8574348306)
|  |- features/FEATURE_CLIENT/mbed-client-mbed-tls (68f93f7834e9)
|  `- features/FEATURE_COMMON_PAL/mbed-client-c (758fd1e11cb9)
@theotherjimmy
Copy link
Contributor

Could there by some issue that the build order of the files result in broken linking with bad luck?

Looking at the toolchain implementation here (https://github.com/ARMmbed/mbed-os/blob/master/tools/toolchains/__init__.py) It seems that the only thing that might change between runs is the order of the object files returned from compile_sources. The order of the object files are compiled in matches the order that they are passed to the linker. If this is changing on you, you should be able to observe it by passing -v to the tools to get the linker invocation.

@theotherjimmy
Copy link
Contributor

theotherjimmy commented Aug 9, 2016

Also, something to note is that the Fill is the only thing changing between your builds. woops, can't read.

@teetak01
Copy link
Author

teetak01 commented Aug 10, 2016

There is differences in other components also


| features/FEATURE_CLIENT     |  70573 |     3 |    57 |
| features/FEATURE_CLIENT     |  70506 |     3 |    57 |

| features/net                |  34692 |   102 | 51261 |
| features/net                |  34684 |   102 | 51261 |

| features/mbedtls            | 127373 |    51 |   119 |
| features/mbedtls            | 127204 |    51 |   119 |

Small changes, I do not know if that is significant but seems quite suspicious.

Also as a possible other issue, the error sys_mbox_post error can be found from two identical functions in two different places in mbed-os.

https://github.com/ARMmbed/mbed-os/blob/860fdd282b0dc3631a6c46b39442d4ab5343e534/libraries/net/lwip/lwip-sys/arch/sys_arch.c#L119

https://github.com/ARMmbed/mbed-os/blob/db99e726e006ca3260769d6cd444f51eefec1708/features/net/FEATURE_IPV4/lwip-interface/lwip-sys/arch/lwip_sys_arch.c#L119

@jupe
Copy link

jupe commented Aug 10, 2016

these issues might be related to this one:
ARMmbed/mbed-os-example-client#37
ARMmbed/mbed-os#2411 (comment)

@jupe
Copy link

jupe commented Aug 10, 2016

I heard that only way to avoid this issue is to define -j1 to use single core instead of default one -j0

@bridadan
Copy link
Contributor

I don't think I've ever had a problem using -j0 (which is the default), but I've also never had that many cores to play with :)

@bogdanm
Copy link
Contributor

bogdanm commented Aug 10, 2016

When in doubt, we can always sort the linker's object/library list alphabetically. Quite random, but also consistent.

@theotherjimmy
Copy link
Contributor

theotherjimmy commented Aug 10, 2016

I actually suggested sorting the objects a month ago. You always want the libraries after the object files though. Getting a link on that one. This is good enough http://stackoverflow.com/questions/30397233/ld-not-finding-existing-library

@theotherjimmy
Copy link
Contributor

@bridadan I don't have issues with this, and I have all of the cores to play with.

@teetak01
Copy link
Author

I think we can finally close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants