Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Malloc heap info #2442

Merged
merged 3 commits into from
Aug 23, 2016
Merged

Malloc heap info #2442

merged 3 commits into from
Aug 23, 2016

Conversation

c1728p9
Copy link
Contributor

@c1728p9 c1728p9 commented Aug 13, 2016

Keep track of the current and maximum space used by malloc and report
it at the end of a test.

@c1728p9
Copy link
Contributor Author

c1728p9 commented Aug 13, 2016

Don't merge this yet. I pushed it up early for discussion.

@c1728p9
Copy link
Contributor Author

c1728p9 commented Aug 13, 2016

/morph test

return __real__malloc_r(r, size);

__malloc_lock(r);
uint32_t *data = (uint32_t*)__real__malloc_r(r, size + 4);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps this functionaly should be behind some compiler flag because now this reserve 4 extra bytes every single malloc,, at least this should be documented because there is devices with very little amount of memory..

Copy link
Contributor

@0xc0170 0xc0170 Aug 15, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more docs in this code, why +4 (reading the code, size is stored there).

sizeof(size_t) should do better here

perhaps this functionaly should be behind some compiler flag because now this reserve 4 extra bytes every single malloc

config option?

@jupe
Copy link
Contributor

jupe commented Aug 14, 2016

cc: @kjbracey-arm @SeppoTakalo @yogpan01

@jupe
Copy link
Contributor

jupe commented Aug 14, 2016

@c1728p9: have you look up this: https://github.com/ARMmbed/nanostack-libservice/blob/master/source/nsdynmemLIB/nsdynmemLIB.c . it contains exactly same kind of functionality but little more sophisticated ;)

@c1728p9
Copy link
Contributor Author

c1728p9 commented Aug 14, 2016

Hi @jupe, thanks for the link, I didn't even think to use nanostacks allocator for reference. I'll look into adding the additional statistics to this PR, since they could be useful as well.

Also, looking at the libservice, it looks like it's possible to use standard malloc rather than a custom allocator. Would it be problematic if this was turned on? This would free up a lot of ram potentially, especially for lower memory usage configurations such as 6lowpan without security.

@jupe
Copy link
Contributor

jupe commented Aug 14, 2016

I think that custom allocator was developed because standard malloc was too heavy (ticks&size) for smallest microcontrollers. @SeppoTakalo and @kjbracey-arm probably know more about this background. Anyway, I think 6lwp uses standard malloc for mbed-os builds and that why we doesn't have those statistics available for now (until this PR..) . So these statictics would be very wellcome feature for us from testing point of view.. 👍

@kjbracey
Copy link
Contributor

There's no problem redirecting ns_dyn_mem_alloc to system malloc (or something else), as long as:

  1. You don't mind sharing your memory pool. In some set-ups, particularly routers, network memory usage may only be limited by the size of the memory pool, so conceivably it could exhaust your main pool. Not sure how much this is really a problem in practice.

  2. You know the allocator works from Nanostack background context. In mbed OS 5's case, that context is a high priority thread, so that shouldn't be an issue. In other implementations, that context may be interrupt, which is generally not supported by a system allocator.

@mbed-bot
Copy link

Result: FAILURE

Your command has finished executing! Here's what you wrote!

/morph test

Output

mbed Build Number: 641

Test failed!

@c1728p9
Copy link
Contributor Author

c1728p9 commented Aug 16, 2016

/morph test

@mbed-bot
Copy link

Result: FAILURE

Your command has finished executing! Here's what you wrote!

/morph test

Output

mbed Build Number: 648

Test failed!

@bogdanm
Copy link
Contributor

bogdanm commented Aug 16, 2016

If I'm reading this PR right, it tries to get the maximum dynamic memory consumption? That'll probably work, but I'd rather take the approach in ARMmbed/ualloc#22: log each call to malloc/calloc/realloc/free and send that to a PC program that can interpret the logs. This can be used to find the maximum memory used, plus a whole lot more: a memory map, the most requested allocation sizes, checks for memory leaks, even memory allocations/module if the allocation data can be associated with a stack trace.

See also http://man7.org/linux/man-pages/man3/mallinfo.3.html (dlmalloc has an implementation of mallinfo, but I don't know if it's compiled in GCC ARM embedded's Newlib).

@jupe
Copy link
Contributor

jupe commented Aug 16, 2016

@bogdanm: I like also that approach what ualloc does but maybe with little more abstraction so that application can be decide where to dump this information and in what format. This would be also more memory friendly. Only side effect is that it require host side script which analyse that data but this is for testing purpose anyway so maybe it's accepted..

@kjbracey
Copy link
Contributor

mallinfo is present and functional in GCC builds - I've used it.

@jupe
Copy link
Contributor

jupe commented Aug 16, 2016

mallinfo is present and functional in GCC builds - I've used it.

what about other builds ?

@kjbracey
Copy link
Contributor

Don't think there's an equivalent for the ARM C library's allocator.

@0xc0170
Copy link
Contributor

0xc0170 commented Aug 16, 2016

Don't think there's an equivalent for the ARM C library's allocator.

I had a look previously at this.

IAR - https://www.iar.com/support/tech-notes/general/iar-dlib-library-heap-usage-statistics/ (__iar_dlmallinfo())
ARMCC - http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.kui0099a/armlib_cihhgjad.htm (heapstats()). This heapstats() is not like dlmallinfo for IAR or GCC, not even sure its defined for ARMCC microlib 😭 this is quite sad, because if it was, we would have cross-toolchain mallinfo without much effort :-)

That'll probably work, but I'd rather take the approach in ARMmbed/ualloc#22

Good point. This PR is adding runtime stats, ualloc had just logging (we can add the logging as a separate step), or?

@c1728p9
Copy link
Contributor Author

c1728p9 commented Aug 16, 2016

@kjbracey-arm, I took a look at mallinfo, and although it is present, it doesn't give some of the necessary metrics, in particular, maximum heap usage. Accoring to the man pages for mallinfo this isn't present in some environments.

usmblks

The "highwater mark" for allocated space-that is, the maximum amount of space that was ever allocated. This field is maintained only in nonthreading environments.

Furthermore, when I test it, the value usmblks always reads 0.

@bogdanm and @jupe, the approach taken in ualloc is good for certain tests, but in more complicated system could become troublesome. It uses printf to report data, but if used from multiple threads messages could interleave. Also, printf will introduce a delay, which could interfere with the tests, especially since tests are running at 9600 baud.

@bogdanm
Copy link
Contributor

bogdanm commented Aug 16, 2016

It uses printf to report data, but if used from multiple threads messages could interleave.

That's something we can fix though.

Also, printf will introduce a delay, which could interfere with the tests, especially since tests are running at 9600 baud.

The serial port is not the only possible way to send this data. Also, we can override the console speed of the tests, or run the memory-specific tests at a different console speed.

Again, I'm not against this PR, I just think the information it provides, while interesting, is not going to end up being very useful. When someone finds out that his program takes 100k of RAM, the next question is going to be "where is all that coming from?"

@c1728p9
Copy link
Contributor Author

c1728p9 commented Aug 16, 2016

/morph test

@mbed-bot
Copy link

Result: FAILURE

Your command has finished executing! Here's what you wrote!

/morph test

Output

mbed Build Number: 652

Test failed!

free(ptr);
}

return new_ptr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you might be missing one case: according to realloc man page, if ptr is not NULL and size is 0, the call to realloc is equivalent to free(ptr).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, when ptr is NULL, realloc is equivalent with malloc(size). I'm not sure you implement this either.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"according to realloc man page, if ptr is not NULL and size is 0, the call to realloc is equivalent to free(ptr)." - This is implementation defined, and not required to be null. From the man pages here:
"If size was equal to 0, either NULL or a pointer suitable to be passed to free() is returned."

"Also, when ptr is NULL, realloc is equivalent with malloc(size). I'm not sure you implement this either." - This is both implemented and tested with a test case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"If size was equal to 0, either NULL or a pointer suitable to be passed to free() is returned."

You're talking about the return value, I'm talking about this part: if size is equal to zero, and ptr is not NULL, then the call is equivalent to free(ptr).

This is both implemented and tested with a test case.

According to your code, when ptr is NULL, you skip over the first if (line 548), allocate a new buffer (line 556), and then proceed to copy data into this new buffer from the NULL pointer (line 563), then you execute free(NULL) on line 564. It might appear that it works, but I don't think that's the intended behaviour?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see what you mean. This behavior is still implementation defined since the standard allows a non-null pointer to be returned:
http://en.cppreference.com/w/c/memory/realloc
If new_size is zero, the behavior is implementation defined (null pointer may be returned (in which case the old memory block is not freed), or some non-null pointer may be returned that may not be used to access storage).

http://www.cplusplus.com/reference/cstdlib/realloc/
For C99/C++11 If size is zero, the return value depends on the particular library implementation: it may either be a null pointer or some other location that shall not be dereferenced.

I would prefer to be on the stricter side here, so if there could be a memory leak on some implementations, that shows up in the testing.

The memcpy should have a size of zero (so nothing is copied) and free is called on a null pointer, which is defined behavior which does nothing. If you think it would make it clearer, I could do an explicit check for null here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whichever you'd like. The free(NULL) call will look strange if the memory tracer is also enabled; also, I believe that when something is listed as "implementation defined", man realloc really helps you to figure out which implementation to choose. But in the end one cannot argue with "implementation defined" 😄

@jupe
Copy link
Contributor

jupe commented Aug 17, 2016

When someone finds out that his program takes 100k of RAM, the next question is going to be "where is all that coming from?"

I agree. What about solution which are kind of mix of approach from this PR, ualloc and nsdynmemlib. --> If Application set stats callback pointer for mbed-os then os start calling that function every time when calling memory allocation functions (free/malloc/realloc/..). Then it's application responsibility to do what ever it want to do with it...e.g. it could print directly to serial port, or collect just summary. ... This kind of approach would be also thread safe from mbed-os point of view and by default it doesn't take more memory either ticks... Actually it allows also to create mbed-client resource which represent device RAM usage for cloud ;).. How that sounds like ? This is just a wild idea .. :)

@sg-
Copy link
Contributor

sg- commented Aug 17, 2016

#2475

@bogdanm
Copy link
Contributor

bogdanm commented Aug 22, 2016

@bogdanm, what are your thoughts.

Agreed. Both need to exist. #2487 adds traces after the memory operation executes, while this PR needs to modify the way the memory operation works, in order to record block size information. So they need to exist together and we need to be able to turn them on or off individually.

@adbridge
Copy link
Contributor

I do wonder just how much use these stats are in reality? They're only going to be taken at the end of a test (i.e. end of the binary execution) and nothing more will run after that point so all we really see is how much malloc and heap space is left at the end of execution.

@jupe
Copy link
Contributor

jupe commented Aug 22, 2016

@adbridge: it depend on what framework you are using.. e.g. it is possible to "poll" memory consumption while testing, but might not be so useful/reality with greentea..

@bogdanm
Copy link
Contributor

bogdanm commented Aug 22, 2016

LGTM 👍

#define ALLOCATION_SIZE_DEFAULT 564
#define ALLOCATION_SIZE_SMALL 124
#define ALLOCATION_SIZE_LARGE 790
#define ALLOCATIONI_SIZE_FAIL (1024 * 1024 *1024)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo

c1728p9 and others added 3 commits August 22, 2016 18:32
Keep track of the current size allocated, maximum size allocated,
number of allocations, failed allocations and total size allocated for
both GCC and ARM. Report the maximum size allocated at the end of
testing.

Also, add a test to verify heap metrics are working as expected.
Remove the handle swap in cfstore_test_delete_all. This prevents a
deleted handle from being used.
In the function raise_failure move the test_failure and case_failure
calls out of the critical section. This allows these handlers to run
without interrupts disabled and enables them to use rtos features
such as a mutex. This is required for heap metrics to work.
@sg- sg- merged commit bc0c85b into ARMmbed:master Aug 23, 2016
@SeppoTakalo
Copy link
Contributor

I don't see the most usefull information here, which is the maximum size of the heap in use.
If you just measure allocated space, you ignore all the fragmentation and other stuff that happens inside the allocator.
You might want to keep track of the "high watermark" of the heap. That would be the actual memory usage of application.

@kjbracey
Copy link
Contributor

"Largest contiguous block free" is properly the stat you want to catch fragmentation.

@c1728p9
Copy link
Contributor Author

c1728p9 commented Aug 29, 2016

Hi @SeppoTakalo, the worst case heap is recorded and can be seen in the max_size field of the mbed_stats_heap_t structure.

@kjbracey-arm, unfortunately this metric is tricky to get. With @bogdanm's memory logging and some host processing it should be possible to determine this.

@SeppoTakalo
Copy link
Contributor

Hi @c1728p9, the max_size seems to be the maximum size of objects allocated from malloc. Which is not the same thing as how much memory malloc uses.
What I mean about the "high water mark" is the topmost pointer in the heap used.
If you say max_alloc is 1kB, it does not mean that 1kB heap size is enough. We need to have value that includes also the fragmentation of the heap and internal structures used by malloc.

Basically, if you allocate one object size of 1000B, this says max_alloc is 1kB. But if you alloc 1000 objects, each of size 1, this will still tell you that max_alloc is 1kB. But in reality, between every object there is data structure used by malloc and padding of course. Therefore the max_alloc is actually misleading value. It gives too optimistic results of memory usage and does not count in the fragmentation of the heap.

We need something that can be used for developers to REALLY know how big the heap has been. Not just sum of all the sizes of allocated objects, but the total cost of allocation, including internal structures, padding and fragmentation.
That information can then be used to tune the linker files, for example, or decide which chip to use.

@c1728p9
Copy link
Contributor Author

c1728p9 commented Aug 29, 2016

Hi @SeppoTakalo, I see what you mean. This can vary by the heap implementation and order of allocation, which is not always consistent. The current mechanism provides an exact count of the bytes allocated by the application. This makes it easier to tell when a change increases or decreases heap usage. Also, these statistics are straight forward to get and match those which nanostack supports - https://github.com/ARMmbed/nanostack-libservice/blob/master/source/nsdynmemLIB/nsdynmemLIB.c.

If more precise info is needed, @bogdanm's implementation can be used with the MBED_MEM_TRACING_ENABLED flag.

Finally, if you know a straight forward way to get the high water mark for GCC and ARMCC feel free to leave a comment here (or open a PR 😄). This would definitely add value to the stats. I don't know of a straight forward way to get this info so it isn't in this PR.

theotherjimmy added a commit that referenced this pull request Sep 19, 2016
Release mbed-os-5.1.4

Changes:

New Targets:
2504: [Disco_F769NI] adding new target [#2504]
2654: DELTA_DFBM_NQ620 platform porting [#2654]
2615: [MTM_MTCONNECT04S] Added support for MTM_MTCONNECT04S [#2615]
2548: Nucleof303ze [#2548]

Fixes:

2678: Fixing NCS36510 compile on Linux #2678
2657: [MAX326xx] Removed echoing of characters and carriage return. #2657
2651: Use lp_timer to count time in the deepsleep tests #2651
2645: NUCLEO_F446ZE - Enable mbed5 release version #2645
2643: Fix thread self termination #2643
2634: Updated USBHost for library changes #2634
2633: Updated USBDevice to use Callback #2633
2630: Test names not dependent on disk location of root #2630
2624: CFSTORE Bugfix for realloc() moving KV area and cfstore_file_t data structures not being updated correctly #2624
2623: DISCO_L476VG - Add Serial Flow Control pins + add SERIAL_FC macro #2623
2617: STM32F2xx - Enable Serial Flow Control #2617
2613: Correctly providing directories to build_apis #2613
2607: Fix uvisor memory tracing #2607
2604: Tools - Fix fill section size variation #2604
2601: Adding ON Semiconductor copyright notice to source and header files. #2601
2597: [HAL] Fixed "intrinsic is deprecated" warnings #2597
2596: [HAL] Improve memory tracer #2596
2594: Fix TCPServer constructor #2594
2593: Add app config command line switch for test and make #2593
2589: [NUC472] Fix heap configuration error with armcc #2589
2588: Timing tests drift refactor #2588
2587: add PTEx pins as option for SPI on Hexiwear - for SD Card Interface #2587
2584: Set size of callback irq array to IrqCnt #2584
2583: github issue and PR templates #2583
2582: [GCC_CR] fix runtime hang for baremetal build #2582
2580: lwip - Add check for previously-bound socket #2580
2579: lwip - Fix handling of max sockets in socket_accept #2579
2578: Fix double free in NanostackInterface #2578
2576: Add smoke test that builds example programs with mbed-cli #2576
2575: tools-config! -  Allow an empty or mal-formed config to be passed to the config system #2575
2562: Fix GCC lazy init race condition and add test #2562
2559: [utest]: Allow the linker to remove any part of utest if not used #2559
2545: Added define guards for SEQUENTIAL_FLASH_JOURNAL_MAX_LOGGED_BLOBS so  #2545
2538: STM32F4xx - Add support of ADC internal channels (Temp, VRef, VBat) #2538
2521: [NUCLEO_F207ZG] Add MBED5 capability #2521
2514: Updated FlexCan and SAI SDK drivers #2514
2487: Runtime dynamic memory tracing #2487
2442: Malloc heap info #2442
2419: [STM32F1] Add asynchronous serial #2419
2393: [tools] Prevent trace-backs from incomplete args #2393
2245: Refactor export subsystem #2245
2130: stm32 : reduce number of device.h files #2130
aisair pushed a commit to aisair/mbed that referenced this pull request Apr 30, 2024
Changes:

New Targets:
#2504: [Disco_F769NI] adding new target [ARMmbed/mbed-os#2504]
#2654: DELTA_DFBM_NQ620 platform porting [ARMmbed/mbed-os#2654]
#2615: [MTM_MTCONNECT04S] Added support for MTM_MTCONNECT04S [ARMmbed/mbed-os#2615]
#2548: Nucleof303ze [ARMmbed/mbed-os#2548]

Fixes:

#2657: [MAX326xx] Removed echoing of characters and carriage return. [ARMmbed/mbed-os#2657]
#2651: Use lp_timer to count time in the deepsleep tests [ARMmbed/mbed-os#2651]
#2643: Fix thread self termination [ARMmbed/mbed-os#2643]
#2623: DISCO_L476VG - Add Serial Flow Control pins + add SERIAL_FC macro [ARMmbed/mbed-os#2623]
#2617: STM32F2xx - Enable Serial Flow Control [ARMmbed/mbed-os#2617]
#2601: Adding ON Semiconductor copyright notice to source and header files. [ARMmbed/mbed-os#2601]
#2597: [HAL] Fixed "intrinsic is deprecated" warnings [ARMmbed/mbed-os#2597]
#2589: [NUC472] Fix heap configuration error with armcc [ARMmbed/mbed-os#2589]
#2587: add PTEx pins as option for SPI on Hexiwear - for SD Card Interface [ARMmbed/mbed-os#2587]
#2584: Set size of callback irq array to IrqCnt [ARMmbed/mbed-os#2584]
#2582: [GCC_CR] fix runtime hang for baremetal build [ARMmbed/mbed-os#2582]
#2562: Fix GCC lazy init race condition and add test [ARMmbed/mbed-os#2562]
#2538: STM32F4xx - Add support of ADC internal channels (Temp, VRef, VBat) [ARMmbed/mbed-os#2538]
#2514: Updated FlexCan and SAI SDK drivers [ARMmbed/mbed-os#2514]
#2442: Malloc heap info [ARMmbed/mbed-os#2442]
#2419: [STM32F1] Add asynchronous serial [ARMmbed/mbed-os#2419]
#2130: stm32 : reduce number of device.h files [ARMmbed/mbed-os#2130]
#2678: Fixing NCS36510 compile on Linux [ARMmbed/mbed-os#2678]
#2607: Fix uvisor memory tracing [ARMmbed/mbed-os#2607]
#2596: [HAL] Improve memory tracer [ARMmbed/mbed-os#2596]
#2487: Runtime dynamic memory tracing [ARMmbed/mbed-os#2487]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants