[uTVM][Runtime] Deprecate uTVM Standalone Runtime #5060

liangfu · 2020-03-13T03:12:32Z

Since the MISRA-C runtime has been merged in PR #3934 and discussed in RFC #3159 , I think now it's time to migrate uTVM standalone runtime ( introduced in PR #3567 )

Rationale

MISRA-C runtime takes smaller code size (45 KiB vs approx. 100 KiB)
MISRA-C runtime is more portable, since it's completely written in pure C
MISRA-C runtime is designed to be more stable, since it tries to avoid using typecasts and dynamic allocations.
uTVM standalone runtime is currently not tested in the CI, see WIP PR [uTVM] Enable Testing Standalone uTVM Runtime in CI #4991

Actionable Items

Implement a memory container that returns addresses from a single stack, PR [uTVM][Runtime] Introduce Virtual Memory Allocator to CRT #5124
Implement arena based memory allocator for CRT
Remove picojson library in 3rdparty directory, and it would be replaced by src/runtime/crt/load_json.h
Supersede uTVM standalone runtime with MISRA-C runtime
Enable testing new uTVM standalone runtime in CI
Demonstrate possibility to run TVM independently on micro-controllers, possibly a demo on
- STM32F746 board or
- Arty-A7 with Freedom E300 or
- Sparkfun Edge

Please leave your comment.

cc @areusch

tqchen · 2020-03-13T16:16:45Z

Cross posting to here. I think it worth to think about memory allocation strategy. Specificially, we should design an API that contains a simple allocator(which is arena like and allocate memory from a stack, and release everything once done), and use that allocator for all memories in the program(including data structures and tensors). This will completely eliminate the usage of system calls and allow the program o run in bare metal.

Example API

// call use system call to get the memory, or directly points to memory segments in ucontroller
UTVMAllocator* arena = UTVMCreateArena(10000);
// Subsequent data structures are allocated from the allocator
// The free calls will recycle data into the allocator
// The simplest strategy is not to recycle at all
UTVMSetAllocator(arena);

// normal TVM API calls

tmoreau89 · 2020-03-23T18:31:14Z

@liangfu regarding "superseding uTVM standalone runtime", will MISRA-C runtime support running on bare-metal systems?

tmoreau89 · 2020-03-23T18:32:10Z

@ajtulloch @weberlo @u99127 (this might be of interest to you)

liangfu · 2020-03-23T23:01:54Z

@liangfu regarding "superseding uTVM standalone runtime", will MISRA-C runtime support running on bare-metal systems?

Yes, at least it intended to be, but how shall we provide a proper demo on this? Any idea?

tmoreau89 · 2020-03-23T23:07:19Z

We can test it on the STM board that @weberlo implemented a demo on: #4274

liangfu · 2020-03-23T23:58:58Z

Excellent idea. Perhaps we can also test the bare-metal demo in CI, with a simple RISCV processor like picorv32.

KireinaHoro · 2020-03-24T05:15:01Z

Cross posting to here. I think it worth to think about memory allocation strategy. Specificially, we should design an API that contains a simple allocator(which is arena like and allocate memory from a stack, and release everything once done), and use that allocator for all memories in the program(including data structures and tensors). This will completely eliminate the usage of system calls and allow the program o run in bare metal.

@tqchen Removing all external allocator use and go with an embedded arena allocator sounds a little bit fishy. Bare-metal platforms does not necessarily lack a proper allocator; newlib, for example, provides a pretty usable dlmalloc implementation. Are there any other concerns?

liangfu · 2020-03-24T05:41:40Z

In PR #5124, we have a reference allocator, which implements vmalloc, vrealloc, and vfree. When necessary, I think we can redirect the function calls to different implementations, e.g. dlmalloc in newlib, jemalloc and many others.

I would agree with @KireinaHoro to use implementations in newlib for bare-metal applications.

For arena like allocator, I have concerns on how shall we deal with large memory reuse between conv layers, if we don't release allocated workspaces timely.

tqchen · 2020-03-24T17:06:14Z

The workspace memory could have a different strategy. The way it works is that we create a different arena for workspace, along with a counter.

When a memory is allocated, we allocate memory from the arena, and add the counter
When a memory is de-allocated, we decrease the counter
When the counter goes to zero, we free all the memory.

This will work because all workspace memory are temporal. It also guarantees a constant time allocation

As a generalization. If most memory allocation happens in a RAII style lifecycle. e.g. everything de-allocates onces we exit a scope, then the counter based strategy(per scope) is should work pretty well.

I am not fixated about the arena allocator, but would like to challenge us to think a bit how much simpler can we make the allocation strategy looks like given what we know about the workload. Of course, we could certainly bring sub-allocator strategies that are more complicated, or fallback to libraries when needed

u99127 · 2020-03-24T17:55:24Z

Thanks for pointing this to me @tmoreau89 and thank you for this work @liangfu . Very interesting and good questions to ask.

From a design level point of view for micro-controllers I'd like to take this one step further and challenge folks to think about whether this can be achieved with static allocation rather than any form of dynamic allocation . The hypothesis being that at compile time one would know how much temporary space is needed between layers rather than having to face a run time failure.

Dynamic allocation on micro-controllers suffers from fragmentation issues and further do we want to have dynamic allocation in the runtime on micro-controllers. Further the model being executed will be part of a larger application - how can we allow our users to specify the amount of heap available or being consumed for executing their model ? It would be better to try to provide that with diagnostics at link time or compilation time rather than at runtime. @mshawcroft might have more to add. And yes, in our opinion for micro-controllers one of the challenges is the availability and usage of temporary storage for working set calculations between layers.

2 further design questions.

In the micro-controller world, supporting every new device with their different memory maps and what not will be painful and beyond one simple reference implementation, I don't think we have an efficient route to deployment other than integrating with other platforms in the microcontroller space. How would this runtime integrate with other platforms like Zephyr, mbedOS or FreeRTOS ?
I'd be interested in extending CI with qemu or some such for Cortex-M as well or indeed on the STM board that you are using @tmoreau89 .

Purely a nit but from a rationale point of view, I would say that uTVM runtime not being tested in a CI is technical debt :)

regards
Ramana

tqchen · 2020-03-25T04:24:20Z

re: fragmentation issue, think the allocation strategies carefully and adopt an arena-style allocator(counter based as above) can likely resolve the issue of fragementation. In terms of the total memory cost, we can indeed found the cost out during compile time for simple graph programs

liangfu · 2020-03-25T09:53:27Z

It's very interesting to see tflite is using arena like allocator for micro-controllers. See how adafruit demonstrate its PyBadge board with TFLite here.

tqchen · 2020-03-25T15:35:47Z

@liangfu can you try to do a arena based approach given that it is simpler? We could adopt the counter based approach to enable early free of sub-arenas(when the free counters in the arena decreases to zero, we can free the space)

liangfu · 2020-03-26T00:42:11Z

Sure, as this is definitely the direction we should follow, I can do that. And maybe we need a separate PR for the arena allocator feature.

Robeast · 2020-05-06T11:24:09Z

Hi @liangfu is there any update on your current implementation efforts? We are really looking forward to it!!

liangfu · 2020-05-08T00:12:39Z

Hi @Robeast, thanks for your attention. I only have a draft version of the new allocator for now, I'd like to send a PR soon this week.

masahi · 2022-01-09T22:53:36Z

Can we close this?

liangfu mentioned this issue Mar 13, 2020

[Runtime] Parameterize constants in MISRA-C runtime #5062

Closed

liangfu mentioned this issue Mar 22, 2020

[uTVM][Runtime] Introduce Virtual Memory Allocator to CRT #5124

Merged

areusch added the needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it label Oct 19, 2022

areusch added vert:micro MicroTVM: src/runtime/micro, src/runtime/crt, apps/microtvm and removed needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it labels Nov 16, 2022

tqchen closed this as completed Sep 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[uTVM][Runtime] Deprecate uTVM Standalone Runtime #5060

[uTVM][Runtime] Deprecate uTVM Standalone Runtime #5060

liangfu commented Mar 13, 2020 •

edited by github-actions bot

Loading

tqchen commented Mar 13, 2020

tmoreau89 commented Mar 23, 2020

tmoreau89 commented Mar 23, 2020

liangfu commented Mar 23, 2020

tmoreau89 commented Mar 23, 2020

liangfu commented Mar 23, 2020

KireinaHoro commented Mar 24, 2020 •

edited

Loading

liangfu commented Mar 24, 2020

tqchen commented Mar 24, 2020

u99127 commented Mar 24, 2020

tqchen commented Mar 25, 2020

liangfu commented Mar 25, 2020

tqchen commented Mar 25, 2020

liangfu commented Mar 26, 2020

Robeast commented May 6, 2020

liangfu commented May 8, 2020

masahi commented Jan 9, 2022 •

edited

Loading

[uTVM][Runtime] Deprecate uTVM Standalone Runtime #5060

[uTVM][Runtime] Deprecate uTVM Standalone Runtime #5060

Comments

liangfu commented Mar 13, 2020 • edited by github-actions bot Loading

Rationale

Actionable Items

tqchen commented Mar 13, 2020

Example API

tmoreau89 commented Mar 23, 2020

tmoreau89 commented Mar 23, 2020

liangfu commented Mar 23, 2020

tmoreau89 commented Mar 23, 2020

liangfu commented Mar 23, 2020

KireinaHoro commented Mar 24, 2020 • edited Loading

liangfu commented Mar 24, 2020

tqchen commented Mar 24, 2020

u99127 commented Mar 24, 2020

tqchen commented Mar 25, 2020

liangfu commented Mar 25, 2020

tqchen commented Mar 25, 2020

liangfu commented Mar 26, 2020

Robeast commented May 6, 2020

liangfu commented May 8, 2020

masahi commented Jan 9, 2022 • edited Loading

liangfu commented Mar 13, 2020 •

edited by github-actions bot

Loading

KireinaHoro commented Mar 24, 2020 •

edited

Loading

masahi commented Jan 9, 2022 •

edited

Loading