-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Malloc heap info #2442
Malloc heap info #2442
Conversation
Don't merge this yet. I pushed it up early for discussion. |
/morph test |
return __real__malloc_r(r, size); | ||
|
||
__malloc_lock(r); | ||
uint32_t *data = (uint32_t*)__real__malloc_r(r, size + 4); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps this functionaly should be behind some compiler flag because now this reserve 4 extra bytes every single malloc,, at least this should be documented because there is devices with very little amount of memory..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more docs in this code, why +4 (reading the code, size is stored there).
sizeof(size_t) should do better here
perhaps this functionaly should be behind some compiler flag because now this reserve 4 extra bytes every single malloc
config option?
cc: @kjbracey-arm @SeppoTakalo @yogpan01 |
@c1728p9: have you look up this: https://github.com/ARMmbed/nanostack-libservice/blob/master/source/nsdynmemLIB/nsdynmemLIB.c . it contains exactly same kind of functionality but little more sophisticated ;) |
Hi @jupe, thanks for the link, I didn't even think to use nanostacks allocator for reference. I'll look into adding the additional statistics to this PR, since they could be useful as well. Also, looking at the libservice, it looks like it's possible to use standard malloc rather than a custom allocator. Would it be problematic if this was turned on? This would free up a lot of ram potentially, especially for lower memory usage configurations such as 6lowpan without security. |
I think that custom allocator was developed because standard malloc was too heavy (ticks&size) for smallest microcontrollers. @SeppoTakalo and @kjbracey-arm probably know more about this background. Anyway, I think 6lwp uses standard malloc for mbed-os builds and that why we doesn't have those statistics available for now (until this PR..) . So these statictics would be very wellcome feature for us from testing point of view.. 👍 |
There's no problem redirecting ns_dyn_mem_alloc to system malloc (or something else), as long as:
|
Result: FAILUREYour command has finished executing! Here's what you wrote!
Outputmbed Build Number: 641 Test failed! |
/morph test |
Result: FAILUREYour command has finished executing! Here's what you wrote!
Outputmbed Build Number: 648 Test failed! |
If I'm reading this PR right, it tries to get the maximum dynamic memory consumption? That'll probably work, but I'd rather take the approach in ARMmbed/ualloc#22: log each call to malloc/calloc/realloc/free and send that to a PC program that can interpret the logs. This can be used to find the maximum memory used, plus a whole lot more: a memory map, the most requested allocation sizes, checks for memory leaks, even memory allocations/module if the allocation data can be associated with a stack trace. See also http://man7.org/linux/man-pages/man3/mallinfo.3.html (dlmalloc has an implementation of |
@bogdanm: I like also that approach what ualloc does but maybe with little more abstraction so that application can be decide where to dump this information and in what format. This would be also more memory friendly. Only side effect is that it require host side script which analyse that data but this is for testing purpose anyway so maybe it's accepted.. |
mallinfo is present and functional in GCC builds - I've used it. |
what about other builds ? |
Don't think there's an equivalent for the ARM C library's allocator. |
I had a look previously at this. IAR - https://www.iar.com/support/tech-notes/general/iar-dlib-library-heap-usage-statistics/ (
Good point. This PR is adding runtime stats, ualloc had just logging (we can add the logging as a separate step), or? |
@kjbracey-arm, I took a look at mallinfo, and although it is present, it doesn't give some of the necessary metrics, in particular, maximum heap usage. Accoring to the man pages for mallinfo this isn't present in some environments.
Furthermore, when I test it, the value usmblks always reads 0. @bogdanm and @jupe, the approach taken in ualloc is good for certain tests, but in more complicated system could become troublesome. It uses printf to report data, but if used from multiple threads messages could interleave. Also, printf will introduce a delay, which could interfere with the tests, especially since tests are running at 9600 baud. |
That's something we can fix though.
The serial port is not the only possible way to send this data. Also, we can override the console speed of the tests, or run the memory-specific tests at a different console speed. Again, I'm not against this PR, I just think the information it provides, while interesting, is not going to end up being very useful. When someone finds out that his program takes 100k of RAM, the next question is going to be "where is all that coming from?" |
/morph test |
Result: FAILUREYour command has finished executing! Here's what you wrote!
Outputmbed Build Number: 652 Test failed! |
free(ptr); | ||
} | ||
|
||
return new_ptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you might be missing one case: according to realloc
man page, if ptr
is not NULL and size
is 0, the call to realloc
is equivalent to free(ptr)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, when ptr
is NULL, realloc
is equivalent with malloc(size)
. I'm not sure you implement this either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"according to realloc man page, if ptr is not NULL and size is 0, the call to realloc is equivalent to free(ptr)." - This is implementation defined, and not required to be null. From the man pages here:
"If size was equal to 0, either NULL or a pointer suitable to be passed to free() is returned."
"Also, when ptr is NULL, realloc is equivalent with malloc(size). I'm not sure you implement this either." - This is both implemented and tested with a test case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"If size was equal to 0, either NULL or a pointer suitable to be passed to free() is returned."
You're talking about the return value, I'm talking about this part: if size is equal to zero, and ptr is not NULL, then the call is equivalent to free(ptr).
This is both implemented and tested with a test case.
According to your code, when ptr
is NULL, you skip over the first if
(line 548), allocate a new buffer (line 556), and then proceed to copy data into this new buffer from the NULL pointer (line 563), then you execute free(NULL)
on line 564. It might appear that it works, but I don't think that's the intended behaviour?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see what you mean. This behavior is still implementation defined since the standard allows a non-null pointer to be returned:
http://en.cppreference.com/w/c/memory/realloc
If new_size is zero, the behavior is implementation defined (null pointer may be returned (in which case the old memory block is not freed), or some non-null pointer may be returned that may not be used to access storage).
http://www.cplusplus.com/reference/cstdlib/realloc/
For C99/C++11 If size is zero, the return value depends on the particular library implementation: it may either be a null pointer or some other location that shall not be dereferenced.
I would prefer to be on the stricter side here, so if there could be a memory leak on some implementations, that shows up in the testing.
The memcpy should have a size of zero (so nothing is copied) and free is called on a null pointer, which is defined behavior which does nothing. If you think it would make it clearer, I could do an explicit check for null here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whichever you'd like. The free(NULL)
call will look strange if the memory tracer is also enabled; also, I believe that when something is listed as "implementation defined", man realloc
really helps you to figure out which implementation to choose. But in the end one cannot argue with "implementation defined" 😄
I agree. What about solution which are kind of mix of approach from this PR, ualloc and nsdynmemlib. --> If Application set stats callback pointer for mbed-os then os start calling that function every time when calling memory allocation functions (free/malloc/realloc/..). Then it's application responsibility to do what ever it want to do with it...e.g. it could print directly to serial port, or collect just summary. ... This kind of approach would be also thread safe from mbed-os point of view and by default it doesn't take more memory either ticks... Actually it allows also to create mbed-client resource which represent device RAM usage for cloud ;).. How that sounds like ? This is just a wild idea .. :) |
Agreed. Both need to exist. #2487 adds traces after the memory operation executes, while this PR needs to modify the way the memory operation works, in order to record block size information. So they need to exist together and we need to be able to turn them on or off individually. |
I do wonder just how much use these stats are in reality? They're only going to be taken at the end of a test (i.e. end of the binary execution) and nothing more will run after that point so all we really see is how much malloc and heap space is left at the end of execution. |
@adbridge: it depend on what framework you are using.. e.g. it is possible to "poll" memory consumption while testing, but might not be so useful/reality with greentea.. |
LGTM 👍 |
#define ALLOCATION_SIZE_DEFAULT 564 | ||
#define ALLOCATION_SIZE_SMALL 124 | ||
#define ALLOCATION_SIZE_LARGE 790 | ||
#define ALLOCATIONI_SIZE_FAIL (1024 * 1024 *1024) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo
Keep track of the current size allocated, maximum size allocated, number of allocations, failed allocations and total size allocated for both GCC and ARM. Report the maximum size allocated at the end of testing. Also, add a test to verify heap metrics are working as expected.
Remove the handle swap in cfstore_test_delete_all. This prevents a deleted handle from being used.
In the function raise_failure move the test_failure and case_failure calls out of the critical section. This allows these handlers to run without interrupts disabled and enables them to use rtos features such as a mutex. This is required for heap metrics to work.
I don't see the most usefull information here, which is the maximum size of the heap in use. |
"Largest contiguous block free" is properly the stat you want to catch fragmentation. |
Hi @SeppoTakalo, the worst case heap is recorded and can be seen in the max_size field of the mbed_stats_heap_t structure. @kjbracey-arm, unfortunately this metric is tricky to get. With @bogdanm's memory logging and some host processing it should be possible to determine this. |
Hi @c1728p9, the Basically, if you allocate one object size of 1000B, this says max_alloc is 1kB. But if you alloc 1000 objects, each of size 1, this will still tell you that max_alloc is 1kB. But in reality, between every object there is data structure used by malloc and padding of course. Therefore the max_alloc is actually misleading value. It gives too optimistic results of memory usage and does not count in the fragmentation of the heap. We need something that can be used for developers to REALLY know how big the heap has been. Not just sum of all the sizes of allocated objects, but the total cost of allocation, including internal structures, padding and fragmentation. |
Hi @SeppoTakalo, I see what you mean. This can vary by the heap implementation and order of allocation, which is not always consistent. The current mechanism provides an exact count of the bytes allocated by the application. This makes it easier to tell when a change increases or decreases heap usage. Also, these statistics are straight forward to get and match those which nanostack supports - https://github.com/ARMmbed/nanostack-libservice/blob/master/source/nsdynmemLIB/nsdynmemLIB.c. If more precise info is needed, @bogdanm's implementation can be used with the MBED_MEM_TRACING_ENABLED flag. Finally, if you know a straight forward way to get the high water mark for GCC and ARMCC feel free to leave a comment here (or open a PR 😄). This would definitely add value to the stats. I don't know of a straight forward way to get this info so it isn't in this PR. |
Release mbed-os-5.1.4 Changes: New Targets: 2504: [Disco_F769NI] adding new target [#2504] 2654: DELTA_DFBM_NQ620 platform porting [#2654] 2615: [MTM_MTCONNECT04S] Added support for MTM_MTCONNECT04S [#2615] 2548: Nucleof303ze [#2548] Fixes: 2678: Fixing NCS36510 compile on Linux #2678 2657: [MAX326xx] Removed echoing of characters and carriage return. #2657 2651: Use lp_timer to count time in the deepsleep tests #2651 2645: NUCLEO_F446ZE - Enable mbed5 release version #2645 2643: Fix thread self termination #2643 2634: Updated USBHost for library changes #2634 2633: Updated USBDevice to use Callback #2633 2630: Test names not dependent on disk location of root #2630 2624: CFSTORE Bugfix for realloc() moving KV area and cfstore_file_t data structures not being updated correctly #2624 2623: DISCO_L476VG - Add Serial Flow Control pins + add SERIAL_FC macro #2623 2617: STM32F2xx - Enable Serial Flow Control #2617 2613: Correctly providing directories to build_apis #2613 2607: Fix uvisor memory tracing #2607 2604: Tools - Fix fill section size variation #2604 2601: Adding ON Semiconductor copyright notice to source and header files. #2601 2597: [HAL] Fixed "intrinsic is deprecated" warnings #2597 2596: [HAL] Improve memory tracer #2596 2594: Fix TCPServer constructor #2594 2593: Add app config command line switch for test and make #2593 2589: [NUC472] Fix heap configuration error with armcc #2589 2588: Timing tests drift refactor #2588 2587: add PTEx pins as option for SPI on Hexiwear - for SD Card Interface #2587 2584: Set size of callback irq array to IrqCnt #2584 2583: github issue and PR templates #2583 2582: [GCC_CR] fix runtime hang for baremetal build #2582 2580: lwip - Add check for previously-bound socket #2580 2579: lwip - Fix handling of max sockets in socket_accept #2579 2578: Fix double free in NanostackInterface #2578 2576: Add smoke test that builds example programs with mbed-cli #2576 2575: tools-config! - Allow an empty or mal-formed config to be passed to the config system #2575 2562: Fix GCC lazy init race condition and add test #2562 2559: [utest]: Allow the linker to remove any part of utest if not used #2559 2545: Added define guards for SEQUENTIAL_FLASH_JOURNAL_MAX_LOGGED_BLOBS so #2545 2538: STM32F4xx - Add support of ADC internal channels (Temp, VRef, VBat) #2538 2521: [NUCLEO_F207ZG] Add MBED5 capability #2521 2514: Updated FlexCan and SAI SDK drivers #2514 2487: Runtime dynamic memory tracing #2487 2442: Malloc heap info #2442 2419: [STM32F1] Add asynchronous serial #2419 2393: [tools] Prevent trace-backs from incomplete args #2393 2245: Refactor export subsystem #2245 2130: stm32 : reduce number of device.h files #2130
Changes: New Targets: #2504: [Disco_F769NI] adding new target [ARMmbed/mbed-os#2504] #2654: DELTA_DFBM_NQ620 platform porting [ARMmbed/mbed-os#2654] #2615: [MTM_MTCONNECT04S] Added support for MTM_MTCONNECT04S [ARMmbed/mbed-os#2615] #2548: Nucleof303ze [ARMmbed/mbed-os#2548] Fixes: #2657: [MAX326xx] Removed echoing of characters and carriage return. [ARMmbed/mbed-os#2657] #2651: Use lp_timer to count time in the deepsleep tests [ARMmbed/mbed-os#2651] #2643: Fix thread self termination [ARMmbed/mbed-os#2643] #2623: DISCO_L476VG - Add Serial Flow Control pins + add SERIAL_FC macro [ARMmbed/mbed-os#2623] #2617: STM32F2xx - Enable Serial Flow Control [ARMmbed/mbed-os#2617] #2601: Adding ON Semiconductor copyright notice to source and header files. [ARMmbed/mbed-os#2601] #2597: [HAL] Fixed "intrinsic is deprecated" warnings [ARMmbed/mbed-os#2597] #2589: [NUC472] Fix heap configuration error with armcc [ARMmbed/mbed-os#2589] #2587: add PTEx pins as option for SPI on Hexiwear - for SD Card Interface [ARMmbed/mbed-os#2587] #2584: Set size of callback irq array to IrqCnt [ARMmbed/mbed-os#2584] #2582: [GCC_CR] fix runtime hang for baremetal build [ARMmbed/mbed-os#2582] #2562: Fix GCC lazy init race condition and add test [ARMmbed/mbed-os#2562] #2538: STM32F4xx - Add support of ADC internal channels (Temp, VRef, VBat) [ARMmbed/mbed-os#2538] #2514: Updated FlexCan and SAI SDK drivers [ARMmbed/mbed-os#2514] #2442: Malloc heap info [ARMmbed/mbed-os#2442] #2419: [STM32F1] Add asynchronous serial [ARMmbed/mbed-os#2419] #2130: stm32 : reduce number of device.h files [ARMmbed/mbed-os#2130] #2678: Fixing NCS36510 compile on Linux [ARMmbed/mbed-os#2678] #2607: Fix uvisor memory tracing [ARMmbed/mbed-os#2607] #2596: [HAL] Improve memory tracer [ARMmbed/mbed-os#2596] #2487: Runtime dynamic memory tracing [ARMmbed/mbed-os#2487]
Keep track of the current and maximum space used by malloc and report
it at the end of a test.