[RFC][Tracking Issue] Meta Schedule (AutoTIR) #8473

junrushao · 2021-07-14T18:20:39Z

This is a global tracking issue for landing the meta schedule. The RFC can be found here.

Steps

The steps are numbered following TensorIR (#7527).

[M4a] Performance & Coverage

Schedule Rules

Add-RFactor [MetaSchedule][M4a] Schedule Rule: Add-RFactor #9975
Auto-Inline [MetaSchedule][M4a] Schedule Rule: Auto-Inline #9943
Cross-Thread-Reduction [MetaSchedule][M4a] Schedule Rule: Cross-Thread-Reduction #9994
Multi-Level-Tiling [MetaSchedule][M4a] Schedule Rule: Multi-Level-Tiling #10043
Parallel-Vectorize-Unroll [MetaSchedule][M4a] Schedule Rule: Parallelize-Vectorize-Unroll #10033
Random-Compute-Location [MetaSchedule][M4a] Schedule Rule: Random-Compute-Location #9940

PostProcessors

Disallow-Dynamic-Loop [MetaSchedule][M4a] PostProcessor: Disallow-Dynamic-Loop #9997
Rewrite-Cooperative-Fetch [MetaSchedule][M4a] Rewrite-Cooperative-Fetch #10081
Rewrite-Parallel-Vectorize-Unroll [MetaSchedule][M4a] PostProcessor: Rewrite-Parallel-Vectorize-Unroll #10071
Rewrite-Reduction-Block [MetaSchedule][M4a] PostProcessor: Rewrite Reduction Block #10013
Rewrite-Unbound-Block [MetaSchedule][M4a] PostProcessor: Rewrite-Unbound-Block #10027
Verify-GPU-Code [MetaSchedule][M4a] PostProcessor: Verify-GPU-Code #9945

Mutators

Mutate-Compute-Location [MetaSchedule] Mutator: Mutate-Compute-Location #10028
Mutate-Parallel [MetaSchedule][M4a] Mutator: Mutate Parallel #10096
Mutate-Tile-Size [MetaSchedule][M4a] Mutator: Mutate-Tile-Size #10092
Mutate-Unroll [MetaSchedule] Mutator: Mutate-Unroll #10045

User interface

Tune-TE [MetaSchedule][M4a] User-API: Tune-TE/TIR/Relay #10079
Tune-TIR [MetaSchedule][M4a] User-API: Tune-TE/TIR/Relay #10079
Tune-Relay [MetaSchedule][M4a] User-API: Tune-TE/TIR/Relay #10079

Misc

Local Runner [MetaSchedule][M4a] Local runner #9153
Design-Space-Generator: Post-Order-Apply [MetaSchedule][M4a] Add ScheduleRule class & PostOrderApply space generator #9761
SearchStrategy: Replay-Func (random search) [MetaSchedule][M4a] Add ReplayFunc Search Strategy #9799
SearchStrategy: Evolutionary-Search [MetaSchedule][M4a] Add EvolutionarySearch Search Strategy #9836

[M4b] Relay integration

Task extraction [MetaSchedule][M4b] Task Extraction #9382
Apply-History-Best [MetaSchedule][M4b] Add ApplyHisotryBest Meta Schedule Context #10049
Builder/Runner working with Relay and Relay BYOC [MetaSchedule][M4b] Misc improvement of the Measurer #9757 [MetaSchedule][M4b] Testcases for TensorRT builder/runner #10055

M5. Operator coverage with all backends for auto tensorization

Being able to tensorize on all the backends

TIR primitive: Re-Index [TIR] Add schedule primitive ReIndex #11515
TIR primitive: Transform-Block-Layout [TIR] Add schedule primitive TransformBlockLayout #11485
MetaSchedule auto tensorization helper: TileWithTensorIntrin [TIR] Utility function to decide loop mapping for auto tensorization #11050 [TIR] Add function to tile a block according to a given tensor intrinsic #11075
MetaSchedule: enhance Multi-Level Tiling [MetaSchedule] Add MultiLevelTilingTensorCore rule for auto-tensorization on CUDA #12059 [MetaSchedule] Allow MultiLevelTilingTensorCore rule to specify multiple tensor intrin groups #12113
MetaSchedule: Rewrite-Tensorize [Metaschedule] Auto tensorization for CPU / GPU dot product #11088
Analysis: MappingProposer and AutoTensorizeComparator [TIR, analysis] Add GetAutoTensorizeMappingInfo to generate transforms for auto tensorization #11740
Intel VNNI / ARM dot variants [Metaschedule] Auto tensorization for CPU / GPU dot product #11088

M6. Memory optimization

Important for CUDA performance, not CPU. Not related to functionality.

TIR primitive: Read/Write-at
Support ewise fusion in MemHammer
Cover non-fp16, non-wmma usecases
Shared memory auto padding [MetaSchedule] Support padding for irregular shapes for CUDA tensor core #12759
Global memory coalescing
Shared ⇒ WMMA, WMMA ⇒ shared/global rewriting
Insert caching stage [TIR] Add pass ManifestSharedMemoryLocalStage #12355

M7. Unblock end-to-end experiments

Handle reshape fusion
Develop scripts to run experiment
Benchmark on the selected operator set (C1D, C2D, C3D, CAP, DIL, GMM, GRP, T2D)
Performance alignment attempt

M8. Broader Set of Intrinsics and Optimization

async pipeline [PTX] Intrinsics for async copy from global to shared (SM80) #11368
Permuted layout
LDMatrix / MMA [TIR] Support tensorization using ldmatrix + MMA #11355

The text was updated successfully, but these errors were encountered:

* [Meta Schedule][M3b] Builder This PR is part of the meta schedule project (#8473) Co-authored-by: Xiyou Zhou <[email protected]> Co-authored-by: Bohan Hou <[email protected]> Co-authored-by: Ruihang Lai <[email protected]> Co-authored-by: Hongyi Jin <[email protected]> Co-authored-by: Wuwei Lin <[email protected]> Co-authored-by: Siyuan Feng <[email protected]> * add typing * unreachable Co-authored-by: Xiyou Zhou <[email protected]> Co-authored-by: Bohan Hou <[email protected]> Co-authored-by: Ruihang Lai <[email protected]> Co-authored-by: Hongyi Jin <[email protected]> Co-authored-by: Wuwei Lin <[email protected]> Co-authored-by: Siyuan Feng <[email protected]>

This PR is part of the meta schedule project (#8473) that adds metadata of each PrimFunc's argument. This feature is necessary for dynamic shape auto-tuning. Co-authored-by: Xiyou Zhou <[email protected]> Co-authored-by: Bohan Hou <[email protected]> Co-authored-by: Ruihang Lai <[email protected]> Co-authored-by: Hongyi Jin <[email protected]> Co-authored-by: Wuwei Lin <[email protected]> Co-authored-by: Siyuan Feng <[email protected]> Co-authored-by: Xiyou Zhou <[email protected]> Co-authored-by: Bohan Hou <[email protected]> Co-authored-by: Ruihang Lai <[email protected]> Co-authored-by: Hongyi Jin <[email protected]> Co-authored-by: Wuwei Lin <[email protected]> Co-authored-by: Siyuan Feng <[email protected]>

This PR is part of the meta schedule project (#8473) that adds a generic Database interface of tuning records, as well as a default implementation of using two JSON-files to mimic the database. This feature is future-compatible with dynamic shape auto-tuning. Co-authored-by: Xiyou Zhou <[email protected]> Co-authored-by: Bohan Hou <[email protected]> Co-authored-by: Ruihang Lai <[email protected]> Co-authored-by: Hongyi Jin <[email protected]> Co-authored-by: Wuwei Lin <[email protected]> Co-authored-by: Siyuan Feng <[email protected]>

This PR is part of the meta schedule project (#8473) that adds the asynchronous program runner interface, as well as a reference implementation of RPCRunner. LocalRunner will be implemented with PopenPool executor in a follow-up PR. Co-authored-by: Xiyou Zhou <[email protected]> Co-authored-by: Bohan Hou <[email protected]> Co-authored-by: Ruihang Lai <[email protected]> Co-authored-by: Hongyi Jin <[email protected]> Co-authored-by: Wuwei Lin <[email protected]> Co-authored-by: Siyuan Feng <[email protected]> Address comments Co-authored-by: Cody Yu <[email protected]> fix lint

* [Meta Schedule][M3b] Builder This PR is part of the meta schedule project (apache#8473) Co-authored-by: Xiyou Zhou <[email protected]> Co-authored-by: Bohan Hou <[email protected]> Co-authored-by: Ruihang Lai <[email protected]> Co-authored-by: Hongyi Jin <[email protected]> Co-authored-by: Wuwei Lin <[email protected]> Co-authored-by: Siyuan Feng <[email protected]> * add typing * unreachable Co-authored-by: Xiyou Zhou <[email protected]> Co-authored-by: Bohan Hou <[email protected]> Co-authored-by: Ruihang Lai <[email protected]> Co-authored-by: Hongyi Jin <[email protected]> Co-authored-by: Wuwei Lin <[email protected]> Co-authored-by: Siyuan Feng <[email protected]>

This PR is part of the meta schedule project (apache#8473) that adds metadata of each PrimFunc's argument. This feature is necessary for dynamic shape auto-tuning. Co-authored-by: Xiyou Zhou <[email protected]> Co-authored-by: Bohan Hou <[email protected]> Co-authored-by: Ruihang Lai <[email protected]> Co-authored-by: Hongyi Jin <[email protected]> Co-authored-by: Wuwei Lin <[email protected]> Co-authored-by: Siyuan Feng <[email protected]> Co-authored-by: Xiyou Zhou <[email protected]> Co-authored-by: Bohan Hou <[email protected]> Co-authored-by: Ruihang Lai <[email protected]> Co-authored-by: Hongyi Jin <[email protected]> Co-authored-by: Wuwei Lin <[email protected]> Co-authored-by: Siyuan Feng <[email protected]>

This PR is part of the meta schedule project (apache#8473) that adds a generic Database interface of tuning records, as well as a default implementation of using two JSON-files to mimic the database. This feature is future-compatible with dynamic shape auto-tuning. Co-authored-by: Xiyou Zhou <[email protected]> Co-authored-by: Bohan Hou <[email protected]> Co-authored-by: Ruihang Lai <[email protected]> Co-authored-by: Hongyi Jin <[email protected]> Co-authored-by: Wuwei Lin <[email protected]> Co-authored-by: Siyuan Feng <[email protected]>

This PR is part of the meta schedule project (apache#8473) that adds the asynchronous program runner interface, as well as a reference implementation of RPCRunner. LocalRunner will be implemented with PopenPool executor in a follow-up PR. Co-authored-by: Xiyou Zhou <[email protected]> Co-authored-by: Bohan Hou <[email protected]> Co-authored-by: Ruihang Lai <[email protected]> Co-authored-by: Hongyi Jin <[email protected]> Co-authored-by: Wuwei Lin <[email protected]> Co-authored-by: Siyuan Feng <[email protected]> Address comments Co-authored-by: Cody Yu <[email protected]> fix lint

* [Meta Schedule][M3b] Builder This PR is part of the meta schedule project (apache#8473) Co-authored-by: Xiyou Zhou <[email protected]> Co-authored-by: Bohan Hou <[email protected]> Co-authored-by: Ruihang Lai <[email protected]> Co-authored-by: Hongyi Jin <[email protected]> Co-authored-by: Wuwei Lin <[email protected]> Co-authored-by: Siyuan Feng <[email protected]> * add typing * unreachable Co-authored-by: Xiyou Zhou <[email protected]> Co-authored-by: Bohan Hou <[email protected]> Co-authored-by: Ruihang Lai <[email protected]> Co-authored-by: Hongyi Jin <[email protected]> Co-authored-by: Wuwei Lin <[email protected]> Co-authored-by: Siyuan Feng <[email protected]>

This PR is part of the meta schedule project (apache#8473) that adds metadata of each PrimFunc's argument. This feature is necessary for dynamic shape auto-tuning. Co-authored-by: Xiyou Zhou <[email protected]> Co-authored-by: Bohan Hou <[email protected]> Co-authored-by: Ruihang Lai <[email protected]> Co-authored-by: Hongyi Jin <[email protected]> Co-authored-by: Wuwei Lin <[email protected]> Co-authored-by: Siyuan Feng <[email protected]> Co-authored-by: Xiyou Zhou <[email protected]> Co-authored-by: Bohan Hou <[email protected]> Co-authored-by: Ruihang Lai <[email protected]> Co-authored-by: Hongyi Jin <[email protected]> Co-authored-by: Wuwei Lin <[email protected]> Co-authored-by: Siyuan Feng <[email protected]>

This PR is part of the meta schedule project (apache#8473) that adds a generic Database interface of tuning records, as well as a default implementation of using two JSON-files to mimic the database. This feature is future-compatible with dynamic shape auto-tuning. Co-authored-by: Xiyou Zhou <[email protected]> Co-authored-by: Bohan Hou <[email protected]> Co-authored-by: Ruihang Lai <[email protected]> Co-authored-by: Hongyi Jin <[email protected]> Co-authored-by: Wuwei Lin <[email protected]> Co-authored-by: Siyuan Feng <[email protected]>

This PR is part of the meta schedule project (apache#8473) that adds the asynchronous program runner interface, as well as a reference implementation of RPCRunner. LocalRunner will be implemented with PopenPool executor in a follow-up PR. Co-authored-by: Xiyou Zhou <[email protected]> Co-authored-by: Bohan Hou <[email protected]> Co-authored-by: Ruihang Lai <[email protected]> Co-authored-by: Hongyi Jin <[email protected]> Co-authored-by: Wuwei Lin <[email protected]> Co-authored-by: Siyuan Feng <[email protected]> Address comments Co-authored-by: Cody Yu <[email protected]> fix lint

cbalint13 · 2022-01-26T15:17:43Z

@junrushao1994 ,

In looking for auto-tensorization ability of TVM (to explore search for accelerators designs & custom ISA) permit me to ask:

Was Auto Tensorization removed form this list (was at section [M4b] if I recall), what was/is the plan with ?
Also regarding of design plan, will/have something in common with principles of https://arxiv.org/abs/2101.08458 ?

Thank You !

junrushao · 2022-01-26T18:55:10Z

Hey @cbalint13 thanks for asking! Absolutely!

Was Auto Tensorization removed form this list (was at section [M4b] if I recall), what was/is the plan with ?

The only reason is that I'm trying to organize the roadmap. Auto tensorization is a huge item and we want to have a separate tracking issue for it. As you already see, we have been upstreaming auto tensorization-related PRs, including #9871 #10066. My branch also contains auto tensorization-related working examples if you want to try them out now :-)

Also regarding of design plan, will/have something in common with principles of https://arxiv.org/abs/2101.08458?

This work is done by my fellow colleagues, and of course we are aware, and we have a lot in common :-) Their codebase is public here. The difference here is that we are now using TensorIR, a more powerful and systematic IR/scheduling system to support tensorization

cbalint13 · 2022-01-26T19:27:16Z

Hey @cbalint13 thanks for asking! Absolutely!

@junrushao1994

First, thanks a lot for your time !

I am very happy even just to witness what is going on recently in TVM (on mind blowing pace).

Was Auto Tensorization removed form this list (was at section [M4b] if I recall), what was/is the plan with ?

The only reason is that I'm trying to organize the roadmap. Auto tensorization is a huge item and we want to have a separate tracking issue for it. As you already see, we have been upstreaming auto tensorization-related PRs, including #9871 #10066. My branch also contains auto tensorization-related working examples if you want to try them out now :-)

I see now, thanks for clarification, noticed the recent "blockize - tensorize" PR (quite a large piece, still diving on it).

Also regarding of design plan, will/have something in common with principles of https://arxiv.org/abs/2101.08458?

This work is done by my fellow colleagues, and of course we are aware, and we have a lot in common :-) Their codebase is public here. The difference here is that we are now using TensorIR, a more powerful and systematic IR/scheduling system to support tensorization

Was familiar that code-base for UNIT, it is good to know that such feature will make it into the new TIR.
I am thinking on framework (early public sketch) that emits HDL (verilog) blocks reusable and/or as cpu-isa extensions in many possible forms sampled within some combinatorial search-space and auto-tensorisation would be key process in evaluation and metrics here.
It may end sampling some very wierd-looking hardware (including systolic blocks) so auto-tensorizer might need enhancement on some more challenging ends (as i already looked at UNIT).

Can't wait to try it, will look into mentioned WiP early branch.

Many thanks again !

junrushao · 2022-01-27T02:10:47Z

Thank you @cbalint13 for your kind response! We are super excited to hear about your work and more than happy to assist/collaborate on TensorIR/MetaSchedule!

This PR is further improvement of the meta schedule project (#8473). Co-authored-by: Junru Shao <<[email protected]>> Co-authored-by: Bohan Hou <<[email protected]>> Co-authored-by: Ruihang Lai <<[email protected]>> Co-authored-by: Hongyi Jin <<[email protected]>> Co-authored-by: Wuwei Lin <<[email protected]>> Co-authored-by: Siyuan Feng <<[email protected]>>

@areusch

* [ETHOSN] Remove the compiler library from the runtime link (#10334) Due to some restructuring of the Ethos(TM)-N driver library it is no longer necessary to link the compiler library (AKA Support library) into the runtime. * [Hexagon] Export `ir_lower_vtcm_pass` function in the init file (#10330) * [runtime] Add Metadata classes for AOTExecutor (#10282) * Add new Metadata classes and base implementation. * These were autogenerated in the original PR, but checking them in as plain code until we can revisit the auto-generator approach. * address masa comments * Add documentation per Manupa's comments, and move kMetadataVersion namespace. * remove get_name function, used for debugging * clang-format * [ONNX] only broadcast matmul if the shape has changed (#10321) * [ONNX] only broadcast matmul if the shape has changed * fix copy-pasta mistake * [TIR] Tir constants integration into compilation pipeline (#8509) * [TIR] Introduce tir.allocate_const to TIR This PR is adding non-scalar constant representation in TIR. This is used to express constants (i.e., parameters) in the TIR instead of bypassing the TIR as it's done until now. Change-Id: Id3afc4d7197260cb43ecde60f05ccbce3fc42430 Co-authored-by: Giuseppe Rossini <[email protected]> Change-Id: Id4a09a637c9c1fd7d49989c6c10f474a78569e18 * [TIR] Integrate tir constant nodes in compilation pipeline This PR integrates tir.allocate_const to the compilation pipeline to support --link-params. Change-Id: Ic8d0cb75d596299fcae7078b304598afbf0c5494 Co-authored-by: Giuseppe Rossini <[email protected]> Change-Id: Id98cc682bbfacfe75c4d8b260fd41658f1f196b2 * [TIR] tir.const extraction This commit tries to implement an amendment to tir.constant RFC with centralized storage of constant data within the IRModule Please note that data and irmod_storage_idx are not mutual exclisive further more the irmod_storage_idx is valid only immediatly after prim func addition to the mod or after update within the mod. If prim func is out of the the module scope then the index become meangless. irmod_storage_idx also is not used in calculation of hash function of the tir.constant node. Change-Id: I40742ed580468b0252ea3fec02184cba65e20871 * unit test fixed Change-Id: Ied2186554d4cbad44b2346216c8be92449e55732 * cmsis-nn codegen fix Now handled case when params of the functions came as constants Change-Id: I5874e182e34ef94e23048eaf3c61b01a56d91131 * Fixes for unittests Change-Id: I5b82ee3f80337155706b5470973f494a301b5d90 * Rebasing tests fixes Change-Id: I94ac87907081bab53c1dd1ab2db106ae057b4b19 * Linter: added method param description Change-Id: I2f8c4c8d244b74c794abaa6079c46cc593ffcbdb * Printing removal fix This patch removes forgotten print in fuse_ops Change-Id: I4bb5934f3b4cd5fde19d36a8e3319aae136bce8a * Bugfix Fixed concurrent map update bug here Change-Id: Ifec3bf5030086d9079b9e493096f17dfd82297ec * Reworked logic for not to introduce empty constant list to modue attrs Change-Id: I082c85b3b4b70c218f0d714f5613ef6e178bd020 * Added support for tir builtin::tvm_access_ptr This fixed unit tests for tests/python/integration/test_arm_mprofile_dsp.py Change-Id: I10919f301ef9ddc3fd87f0e1a8414e9a52fc7938 * Unit test fix Fixes unit tests in torch frontend Change-Id: I6c179834f93dd202605d1ce5a7f07d987b9dc469 * Addressed requested changes Addressed changes requested upstream Change-Id: I741e52b89eb285732c23b1ac7ff277e757a088c3 * Namespace usage changed to conform earlier C++ standard Change-Id: I1b29238cfe2a6bedb525f4f823a3a540f631d836 * Bugfix Change-Id: I57a44b714b307278a243817ec2864e53ad31366b * updated IRModuleNode::ExtractPrimFuncConstants Updated IRModuleNode::ExtractPrimFuncConstants as per request upstream. Change-Id: I35db0145fb5827efd0445ce665d0c99465274016 * Minor changes typo fixd renamed ExtractPrimFuncConstants to ExtractConstants removed getters/setters from FuseMutator and added parametrized constructor Change-Id: Ib2326805781779b88c963a8642ff683c8755956e * Moved LinkedParam/LinkedParamNode Moved LinkedParam/LinkedParamNode from tvm::tir namespace to tvm namespace Change-Id: Ie3f0303bd4f7890c6d680268c91f2051977bc7f4 * Addressed upstream comments Changed BindParams argument to Array<NDArray> Removed 'name' argument from te.const Switched to in-depth comparision of NDArrays in constant de-duplication Removed extra final comma from NDArrayToTIR Changed return type of ConstantAllocationSize to int64_t Made link_param a tvm.testing.parameter for test_fuse_take and test_fuse_gather_nd Change-Id: I4285099cc63756aa5ebe91a5bd207d4135499b41 * Removed unnecessary forward declaration +linter Change-Id: I2a6c0d1f97773aeb1ae3f458da252a22079ccdb1 * Constant extractor now is a separate pass Change-Id: Ia4adca9d3315b26fbdc006ef7c115900c081e303 * Added forgotten file + unit test fix Change-Id: Ice305f4fefd13fe95e97574e6d63ffeb664621df * Changed to IRModule pass Refactored ExtractPrimFuncConstants to IRModule pass. deDup -> DeDup Refactored logic of Applicator supplementary class Change-Id: I6c120d175eb6790ba90f176c4f856bde8f0c7c94 * bugfix after rebasing Change-Id: Ie3ee6ea2479476a30f486baef74f20070f117942 * -v -> -vv to have more debug information Change-Id: I12c63731663b9c9ea574b9ed5cb17311ba3cf701 Co-authored-by: Giuseppe Rossini <[email protected]> * Simple workaround for PyTorch symbol crash problem in meta schedule test (#10342) * Simple workaround for PyTorch symbol crash problem in meta schedule test * workaround for CI * add reading of nRF5340 DK product ID to determine which COM port to use (#10304) * [ARM_CPU] Conv2d int8 intrinsic for cortex-A72 (#10310) * [ARM_CPU] Conv2d int8 intrinsic for cortex-A72 Add an intrinsic that performs a dot product of 8 4-element vectors at once. Also conditionally inline fused operators into the main convolution loop depending on convolutions size. Small convolution = no inlining. Performance improves by ~20% on mobilenet on raspberry pi 4 and ~30% improvement on performance for the individual convolutions. * ignore incorrect lints * fixup fstring * revert changes to conv2d_NCHWc (not int8) * remove error check, apparently tests rely on it * refactor alter op layout * [CI][Hexagon] Add Hexagon Tests to pipeline (#10302) * Add hexagon tests to CI Hexagon * Fix CRT libs * cleanup and fix Jenkins * Address @areusch comments * [TIR] Misc minor updates (#10335) * [CUBLAS] Fix cublas batch matmul strategy plevel (#10351) * [CI] Re-introduce redirect follow and update hash for Boost download (#10343) Looks like we did need the redirect in (#10247), otherwise you get a blank redirect response and `tar` doesn't like that very much: ``` tar: This does not look like a tar archive gzip: stdin: unexpected end of file ``` * Add per channel quantization to QLinearConv and fix related bugs (#10354) * [CI] Fix Flaky Test `test_task_scheduler_gradient` (#10360) * [CI] Fix Flaky Test `test_task_scheduler_gradient` A change to fix the issue of flaky test mentioned in #10356 by increase the `chain_rule` factor and avoid small gradient. * Retrigger CI. * [TOPI] VNNI support for batch matmul (#10332) * add test * compute added * schedule works * reuse dense_vnni schedule * try an alternative approach to scheduling layout transform * introduce a tunable knob to decide if compute_root * check transpose condition * support s8 + s8 input * pylint * [TIR] TIR Schedule Misc Update (#10341) * tir schedule misc update * Trigger Build * [AOT] BugFix of workspace calculation (#10337) Following an investigation from #10022, it turns out, currently the workspace calculation assumes there would be a single lowered PrimFunc could be produced per primitive Relay Function. However, the exception turned out to be the CMSIS-NN codegen that produces multiple calls/PrimFuncs in the place of a single call to single relay PrimFunc. This commit adds changes to workspace calculation to be done on lowered IRModule. Additionally, changes the test utils to not to generate any stack allocator code when USMP is used to make the tests more strict. This change also removes the confusing "run_model" which has semantics identitical to "__tvm_main__" in TIR. * [runtime] Improved log information with function signature (#10326) This PR introduces a function signature printer in the `TypedPackedFunc` part, so that the log information in `detail::unpack_call` will be more complete. This PR allows users to obatin the original function signature when the `detail::unpack_call` fails. * refactored GraphProto.from_onnx into smaller functions (#10267) * refactored GraphProto.from_onnx into smaller functions * black formatted file * removed line that does not seem to make sense. Is there a purpose that I missed? * just to trigger CI pipeline * [skip ci] Fix onnx frontend lint (#10363) This was broken in #10267, not sure how that commit passed CI (maybe some logic to figure out the PR diff in pylint is broken). Co-authored-by: driazati <[email protected]> * [COMMUNITY] csullivan -> Committer (#10364) * [BUGFIX][ARITH] Fix FloorMod Simplifier (#10336) * fix canonical simplifier * improve comments * [Lint] Fix Pylint Issues (#10358) * [TIR][Transform] relax LoopPartition restriction that the intersection of all conditions can not be none. (#10340) Co-authored-by: sqing <[email protected]> * [ETHOSN] Improved identification of driver library version (#10285) * [ETHOSN] Stricter data type conversion checks (#10271) The 21.11 update for the Ethos(TM)-N driver is slightly more strict in accepting various operator attributes. * [microNPU][4] Add the cascader Proposal generator (#9959) * [microNPU][4] Add the cascader Proposal generator The Proposal generator takes optimal Plans and combines them to find optimal 'Proposals' - sets of disjoint Plans that cover every Part in a CascaderGraph. It ultimately produces a Pareto-frontier of 'optimal' Proposals in terms of estimated cycles and memory usage. Change-Id: Id42099819a596496a5769bae22f08eeb75ec69b6 * Fixes Change-Id: I4f5f2a298bd3bb379c7c8d179150358923b0dd66 * [Runtime][Pipeline Executor] multiple threads management and the data forwarding notification mechanism. (#10234) * [Runtime][Pipeline Executor] multiple threads management and the data forwarding notification mechanism. In this patch we create working threads for each runtime of pipeline. the threads would be terminated once the runtime class gets destroyed. We also add a notification mechanism derived from the 'binding configuration' of the runtime to forward the data notification. * address review comments. * address review comments. * fix typo. * fix typo. * trigger build. * address review comments. * address review comments. * address review comments. * address review comments. * [Hexagon] RPC server/client for simulator (#10361) This is the C++ code for running Hexagon code on simulator via the RPC mechanism. It is intended to be integrated into the current HexagonLauncher, although the integration will require further changes to the launcher python code. The final goal is to be able to run the same file.py on either hardware or simulator without needing to edit the python file, but simply by changing the configuration of the execution platform (i.e. something like --exectute-on=simulator as a command line or in an environment variable). The exact details are still to be determined. * [TIR, Relay] improve bfloat16 support (#10112) * update AMP table to enable ResNet50 conversion * add runtime datatype dispatch for BFloat16 * skip asserts for uint16 for bf16 compatibility * add bf16 cast for the unary intrinsic operators * enable "bf16<-->fp32<-->any dtype" casting * support inconsistent input for bf16 BIOP legalize * add treatments for bfloat16 in if statements * add bfloat16 dtype casts in binary OP * delete unnecessary treatments for bfloat16 * add test for bfloat16 building * code style * restore the modifications in .gitignore * restore the changes to AMP lists * fix typos * fix lint errors * fix typo * [ci] Check more events before pinging reviewers (#10208) * [ci] Check more events before pinging reviewers This was missing some events before (reviews without comments, PR updated from a draft -> ready for review) so these were being ignored when finding the latest event. This PR adds them and restructures the code a bit to make it more clear what is happening for each PR. This addresses some of the issues from #9983 * fix tests Co-authored-by: driazati <[email protected]> * Lower cache_read and cache_write to Hexagon DMA via tensorize (#10365) * Lower cache_read and cache_write to Hexagon DMA via tensorize * rework test to be compatible with launcher * remove cpu device api mem_copy implementation and test * [microNPU] adding more tests with USMP (#10362) Adding a few tests to confirm memory usage with and without USMP. - Supporting the toggle to disable storage_rewrite. - There is a slight change to tir_to_cs_translator to add index of Load nodes associated with NpuAddressRange objects * [RELAY] [VIRTUALDEVICE] Change syntax for device planning and store parameter virtual devices in virtual_device_ field (#10352) * parent 33082e0 author electriclilies <[email protected]> 1643141097 -0800 committer Lily Orth-Smith <[email protected]> 1645560059 -0800 Store function param virtual devices in virtual_device_ field Fix test_annotation.py and change result_virtual_device to virtual_device * Change plan devices tests to use the new syntax for function parameters * Fix free var problem * Fix attribute parsing if there is virtual device; most device planning tests passgit status * fixed lambda lifting * Debugging high order functions -- right now FunctionOnDevice and Bind are mutually recursive. This needs to not be the case. * tests pass wootgit status * Remove FunctionOnDevice from device planner * Don't use MaybeFunctionOnDevice in VM compiler * Remove MaybeFunctionOnDevice from lambda lifter * Delete FunctionOnDevice and MaybeFunctionOnDevice! * Reomve GetFunctionResultVirtualDevice * Remove GetFunctionParamVirtualDevice * lint * lint * Python formatting * Remove FunctionOnDevice python test * Fix bug in binds & debug output * Fix text printer * lint * Remove function on device from fold constant tests * Mark nits * Revert behavior of bind * clean up debug * Make ExprBinder public interface and use instead of Bind * Fix lambda lift * This is broken but not sure how to fix * passes all device planning tests yay! * Add substitution helper and use in device planner * Remove unnecessary check * Respond to comments * Update comment * [VirtualMachine] new method allowing to set one input tensor by its index or name (#10293) * set_input_with_index was implemented for VM * clean code * add getInputIndexFromName. add function descriptions. lint fix * fix lint * transfer comparison of parameter names number and assigned devices number to VMFunction constructor * add GetVMFunctionWithName to Executable API * clean code * add SetInputWithName (set_input_with_name) to VM API * join SetInputWithIndex and SetInputWithName to SetOneInputTensor (set_one_input) to VM API, the joined methods were removed * fix lint * some fixes after review * add set_one_input method to python API of VirtualMachine * pytests for set_input and set_one_input methods of VirtualMachine were implemented and checked * CI restart * construct simple model for pytests by relay instead of onnx tools (need for correct CI) Co-authored-by: Valery Chernov <[email protected]> * [Hexagon] Replace strlen in constant initialization with sizeof (#10381) Strlen is not constexpr everywhere, so replace it with sizeof. In C++ sizeof("string") works fine, since "string" has type "const char [...]". * check to avoid crash in opt_level=0 vm build (#10347) * [DOCS] Add how to contribute TVM docs with images. (#10287) * [MetaSchedule] Update Tuning Interfaces. (#10367) This PR is further improvement of the meta schedule project (apache/tvm#8473). Co-authored-by: Junru Shao <<[email protected]>> Co-authored-by: Bohan Hou <<[email protected]>> Co-authored-by: Ruihang Lai <<[email protected]>> Co-authored-by: Hongyi Jin <<[email protected]>> Co-authored-by: Wuwei Lin <<[email protected]>> Co-authored-by: Siyuan Feng <<[email protected]>> * [Bugfix][TVMScript] Convert BufferSlice to BufferLoad when used as range/loop start and end (#10370) A quick fix of the parser issue mentioned in #10327 . Ranges and loops require `start` and `stop` to be PrimExpr, however, `BufferSlice` is not always scalar so it's not a `PrimExpr`. This PR performs the transformation. * [FIX,PROFILING] Add extra precision to numbers when serializing to json (#10392) Numbers were serialized with too little precision when serializing profiling reports to json. Deserialization can then sometimes round the number differently than if the full precision was available. Fixes #10382. * Fix plint error. (#10394) plint complain error in parser.py and test_vm.py just fix it. * meta schedule misc update (#10389) * Fix tvmc run error message when inputs aren't found. (#10017) * [Runtime][PipelineExecutor] Polish the name and comments of variable. (#10395) Polish comments and variable name * Enable groups argument for conv2d_transpose on the cudnn backend (#10396) * wip * reset conv2d_transpose topi conv_mode to 1 * fix for 'Error: identifier “hfabs” is undefined' * address @masahi's comments in pytorch test_forward Co-authored-by: Masahiro Masuda <[email protected]> * Fixed a bug in the convert_fully_connected() function (#10371) In case we need to change the output shape, need to convert the output_shape tuple to list before the change. * [TensorIR] Renormalize split pattern (#10401) * [MetaSchedule] Arithmetic analysis (#10403) This PR changes the normal form of the affine detector and supports a single var predicate. It also enhances ModularSet detector to enable floor mod patterns. * Add @slow decorator to run tests on `main` (#10057) * Add @slow decorator to run tests on `main` This adds the infrastructure discussed in https://discuss.tvm.apache.org/t/rfc-ci-skip-slow-tests-on-prs/11910, but without affecting any tests. As we investigate reasons behind [slow tests](https://gist.github.com/driazati/e009f09ff44c6bc91c4d95a8e17fd6f1) in CI, this decorator will allow us to move these to run only on `main` and not PRs after checking with all concerned parties. * cleanup Co-authored-by: driazati <[email protected]> * [microTVM] Zephyr: refactor _find_openocd_serial_port (#10346) Refactor _find_openocd_serial_port() as a generic USB serial port finder since other runners beyond openocd use it (e.g. jlink runner). Also instead of using redundant hardcoded values in BOARD_USB_FIND_KW dict, use idVendor and idProduct from boards.json. And don't use 'usb' module to first find the serial number of the port and then pass it to 'serial' module to obtain the port path, instead search for the port path directly via 'serial' module using the serial number (if provided) or use idVendor and idProduct values taken from boards.json. Signed-off-by: Gustavo Romero <[email protected]> * [microTVM][RVM] Skip USB device attach if device is already attached (#8737) * [microTVM][RVM] Skip USB device attach if device is already attached Currently, when the VirtualBox provider is selected, if base-box-tool.py 'test' command is used and a VM is already running with the USB device necessary to perform the tests already attached to it the command fails because it tries to blindly attach again the USB device without checking if device is already attached. The failure can be reproduced by first running a VM for testing (the tests need to fail and leave the VM running): $ ./base-box-tool.py --provider virtualbox test --microtvm-board=stm32f746g_disco then one tries to re-run the tests without building the whole VM again: $ ./base-box-tool.py --provider virtualbox test --skip-build zephyr --microtvm-board=stm32f746g_disco This commit fixes that error by checking and properly skipping the USB device attach if it's already attached to the VirtualBox VM. Signed-off-by: Gustavo Romero <[email protected]> * areusch review: Use --machinereadable for the output Use 'showvminfo --machinereadable' output to parse for more robustness to updates in VBoxManage. * Realize the function op during forward rewrite (#10410) * [ci][1/2] Shard `frontend: GPU` job into 2 jobs (#10413) This is the longest individual CI job by about an hour, meaning everything else is usually done and waiting on this job for a while before the entire build completes. This PR breaks it up into two roughly equal jobs (based on timings in https://ci.tlcpack.ai/job/tvm/job/main/2623/testReport/, both should take about 90 minutes). If capacity is available, this means CI jobs could potentially take 1 hour less. If not available, besides an insignificant queueing delay this PR has no effect. This is a two part PR since the Jenkinsfile changes cannot be bundled in this PR, so they will need to be in a follow up. cc @areusch Co-authored-by: driazati <[email protected]> * RelayViz Graphviz renderer (#10400) Following apache/tvm#10085, this PR adds a graphviz backend. It requires python `graphviz` package and `dot` executable in the PATH, similar to `tedd.py`. This implementation is much like a porting of `visualize` function in https://tvm.apache.org/2020/07/14/bert-pytorch-tvm, except that `node_attr_dict` is replaced with a callback `get_node_attr`. `get_node_attr` can be somehow used to emphasize a set of nodes. It might be useful if we encounter problems in inferences and want to find nodes with certain types and attributes. An example is provided in https://github.com/chiwwang/tvm/blob/graphviz_renderer_example/test_viz.py Its outputs are (conv2d with NCHW layout is green-colored): https://github.com/chiwwang/tvm/blob/graphviz_renderer_example/mod_with_subgraph.pdf https://github.com/chiwwang/tvm/blob/graphviz_renderer_example/mod_wo_subgraph.pdf * [Runtime][ThreadPool]Refactor affinity function and support CPU affinity list setting. (#9802) * [Runtime][ThreadPool] Refactor affinity function and support CPU affinity list setting. Issue: 1. There are multiple affinity function using "LINUX" and "ANDROID" macro check and the multiple check make the logic maintain and change become complex. 2. Current logic of tvm [Runtime][ThreadPool] assume all of the cpu resources are available for a single backend runtime to do the data flow computation. But such assumption may not true when user running multiple task on the system and not want tvm task exhaust all of the cpu resource, or when user going to run multiple backend runtime of tvm on the system, each backend runtime of tvm should use different cpu affinity settings to achieve best performance. Solution: 1.Refactor the affinity functions to move the "LINUX" and "ANDROID" check into one function. 2.In this solution, we introduce a new "CPU AffinityMode type" named "kSpecify", by using "kSpecify" and the function named "tvm::runtime::threading ::Configure" user can specify the cpu list for the cpu affinity of a backend runtime. This solution reused the existing per thread thread pool logic of [Runtime][Threadpool] that created a worker thread pool for current thread which can running a particular runtime. for a multiple runtime use case, user can first launch multiple threads, then call "tvm::runtime::threading ::Configure" with cpu list to create tvm data flow worker thread pool, after doing this the execution of the multiple runtime on the multiple threads will use different cpu resource list. * fix windows build issue. * fix build issue. * fix build issue. * fix windows build issue. * fix plint issue * polish comments. * address review comments. * address reivew comments. * address review comments. * address review comments. Co-authored-by: hua jiang <[email protected]> * [CI][1/2] Update the Python version of pyxir (#10406) Currently the CMake file for pyxir is looking for things in Python3.6, so it needs to be upgraded to use 3.7 now that we have moved to use 3.7. Otherwise the build fails when the docker images are updated since the 3.6 can't find the pyxir packages which have moved to 3.7. Additionally, there seems to be a problem with the newer version of setuptools installing the pyxir libraries, so reverting these versions to the previous versions as a workaraound. Note that this has to be done in two patches for the changes to go through the current CI, this patch downgrades the pip and setuptools versions. * Modify debug output (#10372) 1. Modify debug output to make it more readable 3. Replace magic number with a variable `error_ct_threshold` 3. Add function to set error counter threshold externally for debug purposes * Fix relative include path (#10402) * [ci][2/2] Shard `frontend: GPU` job into 2 jobs (#10414) * [TensorIR] Update VerifyGPU (#10405) * update VerifyGPU * address comments * [Bugfix][Arith] Fix TryFuseIter (#10427) * Lily -> Committer (#10417) * Add group_conv2d_transpose_nchw to CUDA backend (#10423) * add group_conv2d_transpose_nchw to CUDA backend * simplify significantly, just add groups argument to conv2d_transpose_nchw * [MISC] Add miss Type2Str and remove compile warnings (#10430) * [MISC] Add miss Type2Str and remove compile warnings * fix lint * [cleanup] Log compile errors for AOT tests (#10214) * [cleanup] Log compile errors for AOT tests See #10213 * Update tests/python/relay/aot/aot_test_utils.py * removed the encode of msg that is already str Co-authored-by: lhutton1 <[email protected]> Co-authored-by: driazati <[email protected]> Co-authored-by: Manupa Karunaratne <[email protected]> Co-authored-by: lhutton1 <[email protected]> * [skip ci][CI][Fix] Fixing lint (#10445) A linting issue was introduced in #10423, fixing this up. Change-Id: I06c518194e30dcaa755005f06b8b7280c237d386 * [CMSIS-NN] enable USMP with CMSIS-NN (#10224) This commit mainly enables the USMP with CMSIS-NN codegen. In order to do that, CMSIS-NN functions needed to contain BufferMaps. This commit adds the necessary BufferMaps as well. All the tests are modified to run with USMP while the networks tests run with and without USMP. * Fix plint complain for some files. (#10433) * Fix a Uninitialized Variable Warnings. (#10436) There is a 'Uninitialized Variable' Warning in building process, just fix it. * [Frontend][TFLite] Added broadcasting to prelu alpha. (#10435) * Update prelu test cases * Add broadcasting to prelu alpha * [Relay] Fix shape func for strided slice (#10418) * fix dyn strided slice * add tests * remove stuff * jostle ci * jostle ci * jostle * [skip-ci][COMMUNITY] leandron to PMC (#10448) * [Hexagon] Allow execution on target or simulator from HexagonLauncher (#10454) Setting ANDROID_SERIAL_NUMBER=simulator will execute the tests on simulator instead of a hardware device. This patch also introduces an environment variable HEXAGON_RPC_LIB_DIR to specify the location of the hexagon_api binaries. If unset, the code will look for the binaries in the same way as before this patch. * [microNPU][5] Convert Proposals to te.Schedules (#10062) * [microNPU][5] Convert Proposals to te.Schedules Change-Id: I6771578f1007b8fea02e2dec7d0c797a6ef6aa5e * Fixes Change-Id: Id062ca7793656be4e870ac48ba41a34aa83276d2 * Fix test Change-Id: Ib0fd55b99459c26425e1805df19d12367244e1b0 * hot fix (#10464) * [ci] Add workflow to cc teams (#10322) As discussed in https://discuss.tvm.apache.org/t/rfc-remove-codeowners/12095/2?u=driazati, this adds a mechanism to auto-tag people based on PR/issue titles and labels. This should improve visibility across the project and make it easy for interested people to subscribe to various topics. Details on usage will be posted in the relevant issue: #10317 Co-authored-by: driazati <[email protected]> * just a typo fixed (#10442) * minor typo fixed * to trigger CI * to trigger CI * fixed formatting issues * black formatted file * [runtime] AOTExecutor implementation and c target code-generator (#10283) * Add memory pools to Metadata classes. * Move ShapeToJSON to utils. * Track returned TensorType from AOTExecutorCodegen. * Support calling Relay functions with Tuple. * Expand supported TIR calling conventions to work with C++ runtime. * Rename MetadataModule to ConstLoaderModule. * Add runtime AOT executor module. * Add AOT code-generation. * Add a runtime Module to mux between .text Metadata and live Metadata. * Move launch_param to namespace * Add test of c++ AOT. * Fix incongruity between kTvmRuntimeCrt constant * Expand ExecutorCodegenMetadata to include AOT runtime metadata. * commit cpp test * Make Metadata compile under C. * Ignore ephemeral metadata_module export_model_library_format. * This module does not need to be exported, since it is merely a C++ wrapper around get_c_metadata, and get_metadata is not used in C. * address manupa, kparszsyc, masahi comments. * further address comments * clang and python format * Fix broken test * Address lingering comments from masahi, kparszyzc * [Runtime][ThreadPool] Handle the default value of affinity mode. (#10434) * [Runtime][ThreadPool] Handle the default value of affinity mode and a corner case of function 'SetMaxConcurrency'. 1. Handle the default value of affinity mode. 2. After calling the function 'SetMaxConcurrency' with a non-zero value, if calling the function 'SetMaxConcurrency' again with a zero value , then the second setting can not correctly set the max_concurrency value into zero. use new logic to fix this issue. * address review comments. * polish the warning message. * [Relay] Fix output dtype for conv2d wgrad when the original one is void (#10459) * [Relay] Fix output dtype for conv2d wgrad when the original one is void * fix cpplint * also add out dtype information to dgrad * also use out_dtype for wgrad * remove redundant import * [skip ci][ci] Remove -i from lint scripts (#10469) This was changed in #8509 to run without checking the file formatting, which would lead to pylint errors like we saw on `main` in apache/tvm@0c836b7. Co-authored-by: driazati <[email protected]> * Modify Jenkinsfile to prevent builds from triggering on branch indexing (#10432) Co-authored-by: Noah <[email protected]> * [skip ci][ci] Skip actions on forks (#10468) * [ci] Use available CPUs in builds (#10359) * [ci] Use sccache in builds * trigger ci * update Co-authored-by: driazati <[email protected]> * [ci] Fix slow test script permissions (#10457) This is failing silently, e.g.: https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/PR-10359/4/pipeline cc @areusch Co-authored-by: driazati <[email protected]> * [runtime][Hexagon] AOTExecutor implementation for C Codegen (#10311) * Hexagon AOT tests work * fix and address comments * [microTVM] Zephyr: add B-U585I-IOT02A board support (#10416) * [MetaSchedule] Fix Cyclic Dependency in PyClass Family (#10368) Following the design of module_pass, we developed a mechanism, a decorator named derived_obj, to systematically allow derivation from TVM objects in pure Python and being passed into any language, without cyclic dependency. This PR introduces the new mechanism to all PyClasses in meta schedule. * [Hotfix] Black format (#10482) * [MetaSchedule] Keep Task / Trial / Iter / Postproc Number Consistent in Log (#10478) This PR fixes some inconsistency in log printing and make sure all numbers start from zero for tasks, trials, iters and postprocs. I think it's better for debugging if any task or trail went wrong in the future. * [Torch] fix torch version check (#10481) old code checkout "1.10.2" greater_than "1.5.0" if false, fix it * [microNPU] Remove unused code from testing infra (#10462) Removing some legacy code from infra.py that is not called by anything. * [MetaSchedule] Enable AutoTVM-style template-based search space (#10461) * [MetaSchedule] Enable AutoTVM-style template-based search space * Fix lint * suppress mypy * [MetaSchedule] update misc parts (#10444) Co-authored-by: Junru Shao <[email protected]> * [Arith] Handle mod/floormod in modular set analysis (#10453) * Correctly enable architecture extensions in CMSIS-NN Zephyr Demo (#10458) * Correctly enable architecture extensions in CMSIS-NN Zephyr Demo Without `CONFIG_FPU` being set the correct architecture extensions weren't being applied which means the buffer sizes didn't necessarily match up - this corrects it so that they align. * Fix memory allocation in demo The stack allocator forcibly aligns memory by removing parts of it which causes there not to be enough memory and the CMSIS-NN integration uses more stack than the demo with pure TVM operators (we should look to remove some of our stack usage) Co-authored-by: Leo-arm <[email protected]> Co-authored-by: Masahiro Masuda <[email protected]> Co-authored-by: Andrew Reusch <[email protected]> Co-authored-by: Matthew Brookhart <[email protected]> Co-authored-by: Dmitriy Smirnov <[email protected]> Co-authored-by: Giuseppe Rossini <[email protected]> Co-authored-by: Alan MacDonald <[email protected]> Co-authored-by: Tristan Konolige <[email protected]> Co-authored-by: Mehrdad Hessar <[email protected]> Co-authored-by: Ruihang Lai <[email protected]> Co-authored-by: Christopher Sidebottom <[email protected]> Co-authored-by: Sevin F. Varoglu <[email protected]> Co-authored-by: Xiyou Zhou <[email protected]> Co-authored-by: Hongyi Jin <[email protected]> Co-authored-by: Manupa Karunaratne <[email protected]> Co-authored-by: Yaxing Cai <[email protected]> Co-authored-by: SebastianBoblestETAS <[email protected]> Co-authored-by: David Riazati <[email protected]> Co-authored-by: driazati <[email protected]> Co-authored-by: Ziheng Jiang <[email protected]> Co-authored-by: Jinkun Lin <[email protected]> Co-authored-by: Qiang Zhang <[email protected]> Co-authored-by: albert qing <[email protected]> Co-authored-by: sqing <[email protected]> Co-authored-by: Matthew Barrett <[email protected]> Co-authored-by: Hua Jiang <[email protected]> Co-authored-by: Krzysztof Parzyszek <[email protected]> Co-authored-by: Youlei Yang <[email protected]> Co-authored-by: Adam Straw <[email protected]> Co-authored-by: Lily Orth-Smith <[email protected]> Co-authored-by: Valery Chernov <[email protected]> Co-authored-by: Valery Chernov <[email protected]> Co-authored-by: wrongtest <[email protected]> Co-authored-by: Christian Convey <[email protected]> Co-authored-by: Junru Shao <<[email protected]>> Co-authored-by: Bohan Hou <<[email protected]>> Co-authored-by: Ruihang Lai <<[email protected]>> Co-authored-by: Hongyi Jin <<[email protected]>> Co-authored-by: Wuwei Lin <<[email protected]>> Co-authored-by: Siyuan Feng <<[email protected]>> Co-authored-by: Zihao Ye <[email protected]> Co-authored-by: Hans Brouwer <[email protected]> Co-authored-by: Ophir Frish <[email protected]> Co-authored-by: Bohan Hou <[email protected]> Co-authored-by: Gustavo Romero <[email protected]> Co-authored-by: chiwwang <[email protected]> Co-authored-by: hua jiang <[email protected]> Co-authored-by: Elen Kalda <[email protected]> Co-authored-by: Kirill Snezhko <[email protected]> Co-authored-by: Ben Greiner <[email protected]> Co-authored-by: Siyuan Feng <[email protected]> Co-authored-by: Haichen Shen <[email protected]> Co-authored-by: Cody Yu <[email protected]> Co-authored-by: lhutton1 <[email protected]> Co-authored-by: blackkker <[email protected]> Co-authored-by: AndrewZhaoLuo <[email protected]> Co-authored-by: Tianqi Chen <[email protected]> Co-authored-by: Sebastian Boblest <[email protected]> Co-authored-by: Noah Kontur <[email protected]> Co-authored-by: Noah <[email protected]> Co-authored-by: Junru Shao <[email protected]> Co-authored-by: yogurfrul <[email protected]> Co-authored-by: Wuwei Lin <[email protected]> Co-authored-by: Christopher Sidebottom <[email protected]>

@areusch

* Merged PR 2: Merge latest commits * Create directx_build.yml (#2) * Create directx_build.yml * Create track_tvm_github.yml * Update README.md * Merge with main branch from TVM official repo (#3) * [ETHOSN] Remove the compiler library from the runtime link (#10334) Due to some restructuring of the Ethos(TM)-N driver library it is no longer necessary to link the compiler library (AKA Support library) into the runtime. * [Hexagon] Export `ir_lower_vtcm_pass` function in the init file (#10330) * [runtime] Add Metadata classes for AOTExecutor (#10282) * Add new Metadata classes and base implementation. * These were autogenerated in the original PR, but checking them in as plain code until we can revisit the auto-generator approach. * address masa comments * Add documentation per Manupa's comments, and move kMetadataVersion namespace. * remove get_name function, used for debugging * clang-format * [ONNX] only broadcast matmul if the shape has changed (#10321) * [ONNX] only broadcast matmul if the shape has changed * fix copy-pasta mistake * [TIR] Tir constants integration into compilation pipeline (#8509) * [TIR] Introduce tir.allocate_const to TIR This PR is adding non-scalar constant representation in TIR. This is used to express constants (i.e., parameters) in the TIR instead of bypassing the TIR as it's done until now. Change-Id: Id3afc4d7197260cb43ecde60f05ccbce3fc42430 Co-authored-by: Giuseppe Rossini <[email protected]> Change-Id: Id4a09a637c9c1fd7d49989c6c10f474a78569e18 * [TIR] Integrate tir constant nodes in compilation pipeline This PR integrates tir.allocate_const to the compilation pipeline to support --link-params. Change-Id: Ic8d0cb75d596299fcae7078b304598afbf0c5494 Co-authored-by: Giuseppe Rossini <[email protected]> Change-Id: Id98cc682bbfacfe75c4d8b260fd41658f1f196b2 * [TIR] tir.const extraction This commit tries to implement an amendment to tir.constant RFC with centralized storage of constant data within the IRModule Please note that data and irmod_storage_idx are not mutual exclisive further more the irmod_storage_idx is valid only immediatly after prim func addition to the mod or after update within the mod. If prim func is out of the the module scope then the index become meangless. irmod_storage_idx also is not used in calculation of hash function of the tir.constant node. Change-Id: I40742ed580468b0252ea3fec02184cba65e20871 * unit test fixed Change-Id: Ied2186554d4cbad44b2346216c8be92449e55732 * cmsis-nn codegen fix Now handled case when params of the functions came as constants Change-Id: I5874e182e34ef94e23048eaf3c61b01a56d91131 * Fixes for unittests Change-Id: I5b82ee3f80337155706b5470973f494a301b5d90 * Rebasing tests fixes Change-Id: I94ac87907081bab53c1dd1ab2db106ae057b4b19 * Linter: added method param description Change-Id: I2f8c4c8d244b74c794abaa6079c46cc593ffcbdb * Printing removal fix This patch removes forgotten print in fuse_ops Change-Id: I4bb5934f3b4cd5fde19d36a8e3319aae136bce8a * Bugfix Fixed concurrent map update bug here Change-Id: Ifec3bf5030086d9079b9e493096f17dfd82297ec * Reworked logic for not to introduce empty constant list to modue attrs Change-Id: I082c85b3b4b70c218f0d714f5613ef6e178bd020 * Added support for tir builtin::tvm_access_ptr This fixed unit tests for tests/python/integration/test_arm_mprofile_dsp.py Change-Id: I10919f301ef9ddc3fd87f0e1a8414e9a52fc7938 * Unit test fix Fixes unit tests in torch frontend Change-Id: I6c179834f93dd202605d1ce5a7f07d987b9dc469 * Addressed requested changes Addressed changes requested upstream Change-Id: I741e52b89eb285732c23b1ac7ff277e757a088c3 * Namespace usage changed to conform earlier C++ standard Change-Id: I1b29238cfe2a6bedb525f4f823a3a540f631d836 * Bugfix Change-Id: I57a44b714b307278a243817ec2864e53ad31366b * updated IRModuleNode::ExtractPrimFuncConstants Updated IRModuleNode::ExtractPrimFuncConstants as per request upstream. Change-Id: I35db0145fb5827efd0445ce665d0c99465274016 * Minor changes typo fixd renamed ExtractPrimFuncConstants to ExtractConstants removed getters/setters from FuseMutator and added parametrized constructor Change-Id: Ib2326805781779b88c963a8642ff683c8755956e * Moved LinkedParam/LinkedParamNode Moved LinkedParam/LinkedParamNode from tvm::tir namespace to tvm namespace Change-Id: Ie3f0303bd4f7890c6d680268c91f2051977bc7f4 * Addressed upstream comments Changed BindParams argument to Array<NDArray> Removed 'name' argument from te.const Switched to in-depth comparision of NDArrays in constant de-duplication Removed extra final comma from NDArrayToTIR Changed return type of ConstantAllocationSize to int64_t Made link_param a tvm.testing.parameter for test_fuse_take and test_fuse_gather_nd Change-Id: I4285099cc63756aa5ebe91a5bd207d4135499b41 * Removed unnecessary forward declaration +linter Change-Id: I2a6c0d1f97773aeb1ae3f458da252a22079ccdb1 * Constant extractor now is a separate pass Change-Id: Ia4adca9d3315b26fbdc006ef7c115900c081e303 * Added forgotten file + unit test fix Change-Id: Ice305f4fefd13fe95e97574e6d63ffeb664621df * Changed to IRModule pass Refactored ExtractPrimFuncConstants to IRModule pass. deDup -> DeDup Refactored logic of Applicator supplementary class Change-Id: I6c120d175eb6790ba90f176c4f856bde8f0c7c94 * bugfix after rebasing Change-Id: Ie3ee6ea2479476a30f486baef74f20070f117942 * -v -> -vv to have more debug information Change-Id: I12c63731663b9c9ea574b9ed5cb17311ba3cf701 Co-authored-by: Giuseppe Rossini <[email protected]> * Simple workaround for PyTorch symbol crash problem in meta schedule test (#10342) * Simple workaround for PyTorch symbol crash problem in meta schedule test * workaround for CI * add reading of nRF5340 DK product ID to determine which COM port to use (#10304) * [ARM_CPU] Conv2d int8 intrinsic for cortex-A72 (#10310) * [ARM_CPU] Conv2d int8 intrinsic for cortex-A72 Add an intrinsic that performs a dot product of 8 4-element vectors at once. Also conditionally inline fused operators into the main convolution loop depending on convolutions size. Small convolution = no inlining. Performance improves by ~20% on mobilenet on raspberry pi 4 and ~30% improvement on performance for the individual convolutions. * ignore incorrect lints * fixup fstring * revert changes to conv2d_NCHWc (not int8) * remove error check, apparently tests rely on it * refactor alter op layout * [CI][Hexagon] Add Hexagon Tests to pipeline (#10302) * Add hexagon tests to CI Hexagon * Fix CRT libs * cleanup and fix Jenkins * Address @areusch comments * [TIR] Misc minor updates (#10335) * [CUBLAS] Fix cublas batch matmul strategy plevel (#10351) * [CI] Re-introduce redirect follow and update hash for Boost download (#10343) Looks like we did need the redirect in (#10247), otherwise you get a blank redirect response and `tar` doesn't like that very much: ``` tar: This does not look like a tar archive gzip: stdin: unexpected end of file ``` * Add per channel quantization to QLinearConv and fix related bugs (#10354) * [CI] Fix Flaky Test `test_task_scheduler_gradient` (#10360) * [CI] Fix Flaky Test `test_task_scheduler_gradient` A change to fix the issue of flaky test mentioned in #10356 by increase the `chain_rule` factor and avoid small gradient. * Retrigger CI. * [TOPI] VNNI support for batch matmul (#10332) * add test * compute added * schedule works * reuse dense_vnni schedule * try an alternative approach to scheduling layout transform * introduce a tunable knob to decide if compute_root * check transpose condition * support s8 + s8 input * pylint * [TIR] TIR Schedule Misc Update (#10341) * tir schedule misc update * Trigger Build * [AOT] BugFix of workspace calculation (#10337) Following an investigation from #10022, it turns out, currently the workspace calculation assumes there would be a single lowered PrimFunc could be produced per primitive Relay Function. However, the exception turned out to be the CMSIS-NN codegen that produces multiple calls/PrimFuncs in the place of a single call to single relay PrimFunc. This commit adds changes to workspace calculation to be done on lowered IRModule. Additionally, changes the test utils to not to generate any stack allocator code when USMP is used to make the tests more strict. This change also removes the confusing "run_model" which has semantics identitical to "__tvm_main__" in TIR. * [runtime] Improved log information with function signature (#10326) This PR introduces a function signature printer in the `TypedPackedFunc` part, so that the log information in `detail::unpack_call` will be more complete. This PR allows users to obatin the original function signature when the `detail::unpack_call` fails. * refactored GraphProto.from_onnx into smaller functions (#10267) * refactored GraphProto.from_onnx into smaller functions * black formatted file * removed line that does not seem to make sense. Is there a purpose that I missed? * just to trigger CI pipeline * [skip ci] Fix onnx frontend lint (#10363) This was broken in #10267, not sure how that commit passed CI (maybe some logic to figure out the PR diff in pylint is broken). Co-authored-by: driazati <[email protected]> * [COMMUNITY] csullivan -> Committer (#10364) * [BUGFIX][ARITH] Fix FloorMod Simplifier (#10336) * fix canonical simplifier * improve comments * [Lint] Fix Pylint Issues (#10358) * [TIR][Transform] relax LoopPartition restriction that the intersection of all conditions can not be none. (#10340) Co-authored-by: sqing <[email protected]> * [ETHOSN] Improved identification of driver library version (#10285) * [ETHOSN] Stricter data type conversion checks (#10271) The 21.11 update for the Ethos(TM)-N driver is slightly more strict in accepting various operator attributes. * [microNPU][4] Add the cascader Proposal generator (#9959) * [microNPU][4] Add the cascader Proposal generator The Proposal generator takes optimal Plans and combines them to find optimal 'Proposals' - sets of disjoint Plans that cover every Part in a CascaderGraph. It ultimately produces a Pareto-frontier of 'optimal' Proposals in terms of estimated cycles and memory usage. Change-Id: Id42099819a596496a5769bae22f08eeb75ec69b6 * Fixes Change-Id: I4f5f2a298bd3bb379c7c8d179150358923b0dd66 * [Runtime][Pipeline Executor] multiple threads management and the data forwarding notification mechanism. (#10234) * [Runtime][Pipeline Executor] multiple threads management and the data forwarding notification mechanism. In this patch we create working threads for each runtime of pipeline. the threads would be terminated once the runtime class gets destroyed. We also add a notification mechanism derived from the 'binding configuration' of the runtime to forward the data notification. * address review comments. * address review comments. * fix typo. * fix typo. * trigger build. * address review comments. * address review comments. * address review comments. * address review comments. * [Hexagon] RPC server/client for simulator (#10361) This is the C++ code for running Hexagon code on simulator via the RPC mechanism. It is intended to be integrated into the current HexagonLauncher, although the integration will require further changes to the launcher python code. The final goal is to be able to run the same file.py on either hardware or simulator without needing to edit the python file, but simply by changing the configuration of the execution platform (i.e. something like --exectute-on=simulator as a command line or in an environment variable). The exact details are still to be determined. * [TIR, Relay] improve bfloat16 support (#10112) * update AMP table to enable ResNet50 conversion * add runtime datatype dispatch for BFloat16 * skip asserts for uint16 for bf16 compatibility * add bf16 cast for the unary intrinsic operators * enable "bf16<-->fp32<-->any dtype" casting * support inconsistent input for bf16 BIOP legalize * add treatments for bfloat16 in if statements * add bfloat16 dtype casts in binary OP * delete unnecessary treatments for bfloat16 * add test for bfloat16 building * code style * restore the modifications in .gitignore * restore the changes to AMP lists * fix typos * fix lint errors * fix typo * [ci] Check more events before pinging reviewers (#10208) * [ci] Check more events before pinging reviewers This was missing some events before (reviews without comments, PR updated from a draft -> ready for review) so these were being ignored when finding the latest event. This PR adds them and restructures the code a bit to make it more clear what is happening for each PR. This addresses some of the issues from #9983 * fix tests Co-authored-by: driazati <[email protected]> * Lower cache_read and cache_write to Hexagon DMA via tensorize (#10365) * Lower cache_read and cache_write to Hexagon DMA via tensorize * rework test to be compatible with launcher * remove cpu device api mem_copy implementation and test * [microNPU] adding more tests with USMP (#10362) Adding a few tests to confirm memory usage with and without USMP. - Supporting the toggle to disable storage_rewrite. - There is a slight change to tir_to_cs_translator to add index of Load nodes associated with NpuAddressRange objects * [RELAY] [VIRTUALDEVICE] Change syntax for device planning and store parameter virtual devices in virtual_device_ field (#10352) * parent 33082e0 author electriclilies <[email protected]> 1643141097 -0800 committer Lily Orth-Smith <[email protected]> 1645560059 -0800 Store function param virtual devices in virtual_device_ field Fix test_annotation.py and change result_virtual_device to virtual_device * Change plan devices tests to use the new syntax for function parameters * Fix free var problem * Fix attribute parsing if there is virtual device; most device planning tests passgit status * fixed lambda lifting * Debugging high order functions -- right now FunctionOnDevice and Bind are mutually recursive. This needs to not be the case. * tests pass wootgit status * Remove FunctionOnDevice from device planner * Don't use MaybeFunctionOnDevice in VM compiler * Remove MaybeFunctionOnDevice from lambda lifter * Delete FunctionOnDevice and MaybeFunctionOnDevice! * Reomve GetFunctionResultVirtualDevice * Remove GetFunctionParamVirtualDevice * lint * lint * Python formatting * Remove FunctionOnDevice python test * Fix bug in binds & debug output * Fix text printer * lint * Remove function on device from fold constant tests * Mark nits * Revert behavior of bind * clean up debug * Make ExprBinder public interface and use instead of Bind * Fix lambda lift * This is broken but not sure how to fix * passes all device planning tests yay! * Add substitution helper and use in device planner * Remove unnecessary check * Respond to comments * Update comment * [VirtualMachine] new method allowing to set one input tensor by its index or name (#10293) * set_input_with_index was implemented for VM * clean code * add getInputIndexFromName. add function descriptions. lint fix * fix lint * transfer comparison of parameter names number and assigned devices number to VMFunction constructor * add GetVMFunctionWithName to Executable API * clean code * add SetInputWithName (set_input_with_name) to VM API * join SetInputWithIndex and SetInputWithName to SetOneInputTensor (set_one_input) to VM API, the joined methods were removed * fix lint * some fixes after review * add set_one_input method to python API of VirtualMachine * pytests for set_input and set_one_input methods of VirtualMachine were implemented and checked * CI restart * construct simple model for pytests by relay instead of onnx tools (need for correct CI) Co-authored-by: Valery Chernov <[email protected]> * [Hexagon] Replace strlen in constant initialization with sizeof (#10381) Strlen is not constexpr everywhere, so replace it with sizeof. In C++ sizeof("string") works fine, since "string" has type "const char [...]". * check to avoid crash in opt_level=0 vm build (#10347) * [DOCS] Add how to contribute TVM docs with images. (#10287) * [MetaSchedule] Update Tuning Interfaces. (#10367) This PR is further improvement of the meta schedule project (apache/tvm#8473). Co-authored-by: Junru Shao <<[email protected]>> Co-authored-by: Bohan Hou <<[email protected]>> Co-authored-by: Ruihang Lai <<[email protected]>> Co-authored-by: Hongyi Jin <<[email protected]>> Co-authored-by: Wuwei Lin <<[email protected]>> Co-authored-by: Siyuan Feng <<[email protected]>> * [Bugfix][TVMScript] Convert BufferSlice to BufferLoad when used as range/loop start and end (#10370) A quick fix of the parser issue mentioned in #10327 . Ranges and loops require `start` and `stop` to be PrimExpr, however, `BufferSlice` is not always scalar so it's not a `PrimExpr`. This PR performs the transformation. * [FIX,PROFILING] Add extra precision to numbers when serializing to json (#10392) Numbers were serialized with too little precision when serializing profiling reports to json. Deserialization can then sometimes round the number differently than if the full precision was available. Fixes #10382. * Fix plint error. (#10394) plint complain error in parser.py and test_vm.py just fix it. * meta schedule misc update (#10389) * Fix tvmc run error message when inputs aren't found. (#10017) * [Runtime][PipelineExecutor] Polish the name and comments of variable. (#10395) Polish comments and variable name * Enable groups argument for conv2d_transpose on the cudnn backend (#10396) * wip * reset conv2d_transpose topi conv_mode to 1 * fix for 'Error: identifier “hfabs” is undefined' * address @masahi's comments in pytorch test_forward Co-authored-by: Masahiro Masuda <[email protected]> * Fixed a bug in the convert_fully_connected() function (#10371) In case we need to change the output shape, need to convert the output_shape tuple to list before the change. * [TensorIR] Renormalize split pattern (#10401) * [MetaSchedule] Arithmetic analysis (#10403) This PR changes the normal form of the affine detector and supports a single var predicate. It also enhances ModularSet detector to enable floor mod patterns. * Add @slow decorator to run tests on `main` (#10057) * Add @slow decorator to run tests on `main` This adds the infrastructure discussed in https://discuss.tvm.apache.org/t/rfc-ci-skip-slow-tests-on-prs/11910, but without affecting any tests. As we investigate reasons behind [slow tests](https://gist.github.com/driazati/e009f09ff44c6bc91c4d95a8e17fd6f1) in CI, this decorator will allow us to move these to run only on `main` and not PRs after checking with all concerned parties. * cleanup Co-authored-by: driazati <[email protected]> * [microTVM] Zephyr: refactor _find_openocd_serial_port (#10346) Refactor _find_openocd_serial_port() as a generic USB serial port finder since other runners beyond openocd use it (e.g. jlink runner). Also instead of using redundant hardcoded values in BOARD_USB_FIND_KW dict, use idVendor and idProduct from boards.json. And don't use 'usb' module to first find the serial number of the port and then pass it to 'serial' module to obtain the port path, instead search for the port path directly via 'serial' module using the serial number (if provided) or use idVendor and idProduct values taken from boards.json. Signed-off-by: Gustavo Romero <[email protected]> * [microTVM][RVM] Skip USB device attach if device is already attached (#8737) * [microTVM][RVM] Skip USB device attach if device is already attached Currently, when the VirtualBox provider is selected, if base-box-tool.py 'test' command is used and a VM is already running with the USB device necessary to perform the tests already attached to it the command fails because it tries to blindly attach again the USB device without checking if device is already attached. The failure can be reproduced by first running a VM for testing (the tests need to fail and leave the VM running): $ ./base-box-tool.py --provider virtualbox test --microtvm-board=stm32f746g_disco then one tries to re-run the tests without building the whole VM again: $ ./base-box-tool.py --provider virtualbox test --skip-build zephyr --microtvm-board=stm32f746g_disco This commit fixes that error by checking and properly skipping the USB device attach if it's already attached to the VirtualBox VM. Signed-off-by: Gustavo Romero <[email protected]> * areusch review: Use --machinereadable for the output Use 'showvminfo --machinereadable' output to parse for more robustness to updates in VBoxManage. * Realize the function op during forward rewrite (#10410) * [ci][1/2] Shard `frontend: GPU` job into 2 jobs (#10413) This is the longest individual CI job by about an hour, meaning everything else is usually done and waiting on this job for a while before the entire build completes. This PR breaks it up into two roughly equal jobs (based on timings in https://ci.tlcpack.ai/job/tvm/job/main/2623/testReport/, both should take about 90 minutes). If capacity is available, this means CI jobs could potentially take 1 hour less. If not available, besides an insignificant queueing delay this PR has no effect. This is a two part PR since the Jenkinsfile changes cannot be bundled in this PR, so they will need to be in a follow up. cc @areusch Co-authored-by: driazati <[email protected]> * RelayViz Graphviz renderer (#10400) Following apache/tvm#10085, this PR adds a graphviz backend. It requires python `graphviz` package and `dot` executable in the PATH, similar to `tedd.py`. This implementation is much like a porting of `visualize` function in https://tvm.apache.org/2020/07/14/bert-pytorch-tvm, except that `node_attr_dict` is replaced with a callback `get_node_attr`. `get_node_attr` can be somehow used to emphasize a set of nodes. It might be useful if we encounter problems in inferences and want to find nodes with certain types and attributes. An example is provided in https://github.com/chiwwang/tvm/blob/graphviz_renderer_example/test_viz.py Its outputs are (conv2d with NCHW layout is green-colored): https://github.com/chiwwang/tvm/blob/graphviz_renderer_example/mod_with_subgraph.pdf https://github.com/chiwwang/tvm/blob/graphviz_renderer_example/mod_wo_subgraph.pdf * [Runtime][ThreadPool]Refactor affinity function and support CPU affinity list setting. (#9802) * [Runtime][ThreadPool] Refactor affinity function and support CPU affinity list setting. Issue: 1. There are multiple affinity function using "LINUX" and "ANDROID" macro check and the multiple check make the logic maintain and change become complex. 2. Current logic of tvm [Runtime][ThreadPool] assume all of the cpu resources are available for a single backend runtime to do the data flow computation. But such assumption may not true when user running multiple task on the system and not want tvm task exhaust all of the cpu resource, or when user going to run multiple backend runtime of tvm on the system, each backend runtime of tvm should use different cpu affinity settings to achieve best performance. Solution: 1.Refactor the affinity functions to move the "LINUX" and "ANDROID" check into one function. 2.In this solution, we introduce a new "CPU AffinityMode type" named "kSpecify", by using "kSpecify" and the function named "tvm::runtime::threading ::Configure" user can specify the cpu list for the cpu affinity of a backend runtime. This solution reused the existing per thread thread pool logic of [Runtime][Threadpool] that created a worker thread pool for current thread which can running a particular runtime. for a multiple runtime use case, user can first launch multiple threads, then call "tvm::runtime::threading ::Configure" with cpu list to create tvm data flow worker thread pool, after doing this the execution of the multiple runtime on the multiple threads will use different cpu resource list. * fix windows build issue. * fix build issue. * fix build issue. * fix windows build issue. * fix plint issue * polish comments. * address review comments. * address reivew comments. * address review comments. * address review comments. Co-authored-by: hua jiang <[email protected]> * [CI][1/2] Update the Python version of pyxir (#10406) Currently the CMake file for pyxir is looking for things in Python3.6, so it needs to be upgraded to use 3.7 now that we have moved to use 3.7. Otherwise the build fails when the docker images are updated since the 3.6 can't find the pyxir packages which have moved to 3.7. Additionally, there seems to be a problem with the newer version of setuptools installing the pyxir libraries, so reverting these versions to the previous versions as a workaraound. Note that this has to be done in two patches for the changes to go through the current CI, this patch downgrades the pip and setuptools versions. * Modify debug output (#10372) 1. Modify debug output to make it more readable 3. Replace magic number with a variable `error_ct_threshold` 3. Add function to set error counter threshold externally for debug purposes * Fix relative include path (#10402) * [ci][2/2] Shard `frontend: GPU` job into 2 jobs (#10414) * [TensorIR] Update VerifyGPU (#10405) * update VerifyGPU * address comments * [Bugfix][Arith] Fix TryFuseIter (#10427) * Lily -> Committer (#10417) * Add group_conv2d_transpose_nchw to CUDA backend (#10423) * add group_conv2d_transpose_nchw to CUDA backend * simplify significantly, just add groups argument to conv2d_transpose_nchw * [MISC] Add miss Type2Str and remove compile warnings (#10430) * [MISC] Add miss Type2Str and remove compile warnings * fix lint * [cleanup] Log compile errors for AOT tests (#10214) * [cleanup] Log compile errors for AOT tests See #10213 * Update tests/python/relay/aot/aot_test_utils.py * removed the encode of msg that is already str Co-authored-by: lhutton1 <[email protected]> Co-authored-by: driazati <[email protected]> Co-authored-by: Manupa Karunaratne <[email protected]> Co-authored-by: lhutton1 <[email protected]> * [skip ci][CI][Fix] Fixing lint (#10445) A linting issue was introduced in #10423, fixing this up. Change-Id: I06c518194e30dcaa755005f06b8b7280c237d386 * [CMSIS-NN] enable USMP with CMSIS-NN (#10224) This commit mainly enables the USMP with CMSIS-NN codegen. In order to do that, CMSIS-NN functions needed to contain BufferMaps. This commit adds the necessary BufferMaps as well. All the tests are modified to run with USMP while the networks tests run with and without USMP. * Fix plint complain for some files. (#10433) * Fix a Uninitialized Variable Warnings. (#10436) There is a 'Uninitialized Variable' Warning in building process, just fix it. * [Frontend][TFLite] Added broadcasting to prelu alpha. (#10435) * Update prelu test cases * Add broadcasting to prelu alpha * [Relay] Fix shape func for strided slice (#10418) * fix dyn strided slice * add tests * remove stuff * jostle ci * jostle ci * jostle * [skip-ci][COMMUNITY] leandron to PMC (#10448) * [Hexagon] Allow execution on target or simulator from HexagonLauncher (#10454) Setting ANDROID_SERIAL_NUMBER=simulator will execute the tests on simulator instead of a hardware device. This patch also introduces an environment variable HEXAGON_RPC_LIB_DIR to specify the location of the hexagon_api binaries. If unset, the code will look for the binaries in the same way as before this patch. * [microNPU][5] Convert Proposals to te.Schedules (#10062) * [microNPU][5] Convert Proposals to te.Schedules Change-Id: I6771578f1007b8fea02e2dec7d0c797a6ef6aa5e * Fixes Change-Id: Id062ca7793656be4e870ac48ba41a34aa83276d2 * Fix test Change-Id: Ib0fd55b99459c26425e1805df19d12367244e1b0 * hot fix (#10464) * [ci] Add workflow to cc teams (#10322) As discussed in https://discuss.tvm.apache.org/t/rfc-remove-codeowners/12095/2?u=driazati, this adds a mechanism to auto-tag people based on PR/issue titles and labels. This should improve visibility across the project and make it easy for interested people to subscribe to various topics. Details on usage will be posted in the relevant issue: #10317 Co-authored-by: driazati <[email protected]> * just a typo fixed (#10442) * minor typo fixed * to trigger CI * to trigger CI * fixed formatting issues * black formatted file * [runtime] AOTExecutor implementation and c target code-generator (#10283) * Add memory pools to Metadata classes. * Move ShapeToJSON to utils. * Track returned TensorType from AOTExecutorCodegen. * Support calling Relay functions with Tuple. * Expand supported TIR calling conventions to work with C++ runtime. * Rename MetadataModule to ConstLoaderModule. * Add runtime AOT executor module. * Add AOT code-generation. * Add a runtime Module to mux between .text Metadata and live Metadata. * Move launch_param to namespace * Add test of c++ AOT. * Fix incongruity between kTvmRuntimeCrt constant * Expand ExecutorCodegenMetadata to include AOT runtime metadata. * commit cpp test * Make Metadata compile under C. * Ignore ephemeral metadata_module export_model_library_format. * This module does not need to be exported, since it is merely a C++ wrapper around get_c_metadata, and get_metadata is not used in C. * address manupa, kparszsyc, masahi comments. * further address comments * clang and python format * Fix broken test * Address lingering comments from masahi, kparszyzc * [Runtime][ThreadPool] Handle the default value of affinity mode. (#10434) * [Runtime][ThreadPool] Handle the default value of affinity mode and a corner case of function 'SetMaxConcurrency'. 1. Handle the default value of affinity mode. 2. After calling the function 'SetMaxConcurrency' with a non-zero value, if calling the function 'SetMaxConcurrency' again with a zero value , then the second setting can not correctly set the max_concurrency value into zero. use new logic to fix this issue. * address review comments. * polish the warning message. * [Relay] Fix output dtype for conv2d wgrad when the original one is void (#10459) * [Relay] Fix output dtype for conv2d wgrad when the original one is void * fix cpplint * also add out dtype information to dgrad * also use out_dtype for wgrad * remove redundant import * [skip ci][ci] Remove -i from lint scripts (#10469) This was changed in #8509 to run without checking the file formatting, which would lead to pylint errors like we saw on `main` in apache/tvm@0c836b7. Co-authored-by: driazati <[email protected]> * Modify Jenkinsfile to prevent builds from triggering on branch indexing (#10432) Co-authored-by: Noah <[email protected]> * [skip ci][ci] Skip actions on forks (#10468) * [ci] Use available CPUs in builds (#10359) * [ci] Use sccache in builds * trigger ci * update Co-authored-by: driazati <[email protected]> * [ci] Fix slow test script permissions (#10457) This is failing silently, e.g.: https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/PR-10359/4/pipeline cc @areusch Co-authored-by: driazati <[email protected]> * [runtime][Hexagon] AOTExecutor implementation for C Codegen (#10311) * Hexagon AOT tests work * fix and address comments * [microTVM] Zephyr: add B-U585I-IOT02A board support (#10416) * [MetaSchedule] Fix Cyclic Dependency in PyClass Family (#10368) Following the design of module_pass, we developed a mechanism, a decorator named derived_obj, to systematically allow derivation from TVM objects in pure Python and being passed into any language, without cyclic dependency. This PR introduces the new mechanism to all PyClasses in meta schedule. * [Hotfix] Black format (#10482) * [MetaSchedule] Keep Task / Trial / Iter / Postproc Number Consistent in Log (#10478) This PR fixes some inconsistency in log printing and make sure all numbers start from zero for tasks, trials, iters and postprocs. I think it's better for debugging if any task or trail went wrong in the future. * [Torch] fix torch version check (#10481) old code checkout "1.10.2" greater_than "1.5.0" if false, fix it * [microNPU] Remove unused code from testing infra (#10462) Removing some legacy code from infra.py that is not called by anything. * [MetaSchedule] Enable AutoTVM-style template-based search space (#10461) * [MetaSchedule] Enable AutoTVM-style template-based search space * Fix lint * suppress mypy * [MetaSchedule] update misc parts (#10444) Co-authored-by: Junru Shao <[email protected]> * [Arith] Handle mod/floormod in modular set analysis (#10453) * Correctly enable architecture extensions in CMSIS-NN Zephyr Demo (#10458) * Correctly enable architecture extensions in CMSIS-NN Zephyr Demo Without `CONFIG_FPU` being set the correct architecture extensions weren't being applied which means the buffer sizes didn't necessarily match up - this corrects it so that they align. * Fix memory allocation in demo The stack allocator forcibly aligns memory by removing parts of it which causes there not to be enough memory and the CMSIS-NN integration uses more stack than the demo with pure TVM operators (we should look to remove some of our stack usage) Co-authored-by: Leo-arm <[email protected]> Co-authored-by: Masahiro Masuda <[email protected]> Co-authored-by: Andrew Reusch <[email protected]> Co-authored-by: Matthew Brookhart <[email protected]> Co-authored-by: Dmitriy Smirnov <[email protected]> Co-authored-by: Giuseppe Rossini <[email protected]> Co-authored-by: Alan MacDonald <[email protected]> Co-authored-by: Tristan Konolige <[email protected]> Co-authored-by: Mehrdad Hessar <[email protected]> Co-authored-by: Ruihang Lai <[email protected]> Co-authored-by: Christopher Sidebottom <[email protected]> Co-authored-by: Sevin F. Varoglu <[email protected]> Co-authored-by: Xiyou Zhou <[email protected]> Co-authored-by: Hongyi Jin <[email protected]> Co-authored-by: Manupa Karunaratne <[email protected]> Co-authored-by: Yaxing Cai <[email protected]> Co-authored-by: SebastianBoblestETAS <[email protected]> Co-authored-by: David Riazati <[email protected]> Co-authored-by: driazati <[email protected]> Co-authored-by: Ziheng Jiang <[email protected]> Co-authored-by: Jinkun Lin <[email protected]> Co-authored-by: Qiang Zhang <[email protected]> Co-authored-by: albert qing <[email protected]> Co-authored-by: sqing <[email protected]> Co-authored-by: Matthew Barrett <[email protected]> Co-authored-by: Hua Jiang <[email protected]> Co-authored-by: Krzysztof Parzyszek <[email protected]> Co-authored-by: Youlei Yang <[email protected]> Co-authored-by: Adam Straw <[email protected]> Co-authored-by: Lily Orth-Smith <[email protected]> Co-authored-by: Valery Chernov <[email protected]> Co-authored-by: Valery Chernov <[email protected]> Co-authored-by: wrongtest <[email protected]> Co-authored-by: Christian Convey <[email protected]> Co-authored-by: Junru Shao <<[email protected]>> Co-authored-by: Bohan Hou <<[email protected]>> Co-authored-by: Ruihang Lai <<[email protected]>> Co-authored-by: Hongyi Jin <<[email protected]>> Co-authored-by: Wuwei Lin <<[email protected]>> Co-authored-by: Siyuan Feng <<[email protected]>> Co-authored-by: Zihao Ye <[email protected]> Co-authored-by: Hans Brouwer <[email protected]> Co-authored-by: Ophir Frish <[email protected]> Co-authored-by: Bohan Hou <[email protected]> Co-authored-by: Gustavo Romero <[email protected]> Co-authored-by: chiwwang <[email protected]> Co-authored-by: hua jiang <[email protected]> Co-authored-by: Elen Kalda <[email protected]> Co-authored-by: Kirill Snezhko <[email protected]> Co-authored-by: Ben Greiner <[email protected]> Co-authored-by: Siyuan Feng <[email protected]> Co-authored-by: Haichen Shen <[email protected]> Co-authored-by: Cody Yu <[email protected]> Co-authored-by: lhutton1 <[email protected]> Co-authored-by: blackkker <[email protected]> Co-authored-by: AndrewZhaoLuo <[email protected]> Co-authored-by: Tianqi Chen <[email protected]> Co-authored-by: Sebastian Boblest <[email protected]> Co-authored-by: Noah Kontur <[email protected]> Co-authored-by: Noah <[email protected]> Co-authored-by: Junru Shao <[email protected]> Co-authored-by: yogurfrul <[email protected]> Co-authored-by: Wuwei Lin <[email protected]> Co-authored-by: Christopher Sidebottom <[email protected]> Co-authored-by: Leo-arm <[email protected]> Co-authored-by: Masahiro Masuda <[email protected]> Co-authored-by: Andrew Reusch <[email protected]> Co-authored-by: Matthew Brookhart <[email protected]> Co-authored-by: Dmitriy Smirnov <[email protected]> Co-authored-by: Giuseppe Rossini <[email protected]> Co-authored-by: Alan MacDonald <[email protected]> Co-authored-by: Tristan Konolige <[email protected]> Co-authored-by: Mehrdad Hessar <[email protected]> Co-authored-by: Ruihang Lai <[email protected]> Co-authored-by: Christopher Sidebottom <[email protected]> Co-authored-by: Sevin F. Varoglu <[email protected]> Co-authored-by: Xiyou Zhou <[email protected]> Co-authored-by: Hongyi Jin <[email protected]> Co-authored-by: Manupa Karunaratne <[email protected]> Co-authored-by: Yaxing Cai <[email protected]> Co-authored-by: SebastianBoblestETAS <[email protected]> Co-authored-by: David Riazati <[email protected]> Co-authored-by: driazati <[email protected]> Co-authored-by: Ziheng Jiang <[email protected]> Co-authored-by: Jinkun Lin <[email protected]> Co-authored-by: Qiang Zhang <[email protected]> Co-authored-by: albert qing <[email protected]> Co-authored-by: sqing <[email protected]> Co-authored-by: Matthew Barrett <[email protected]> Co-authored-by: Hua Jiang <[email protected]> Co-authored-by: Krzysztof Parzyszek <[email protected]> Co-authored-by: Youlei Yang <[email protected]> Co-authored-by: Adam Straw <[email protected]> Co-authored-by: Lily Orth-Smith <[email protected]> Co-authored-by: Valery Chernov <[email protected]> Co-authored-by: Valery Chernov <[email protected]> Co-authored-by: wrongtest <[email protected]> Co-authored-by: Christian Convey <[email protected]> Co-authored-by: Junru Shao <<[email protected]>> Co-authored-by: Bohan Hou <<[email protected]>> Co-authored-by: Ruihang Lai <<[email protected]>> Co-authored-by: Hongyi Jin <<[email protected]>> Co-authored-by: Wuwei Lin <<[email protected]>> Co-authored-by: Siyuan Feng <<[email protected]>> Co-authored-by: Zihao Ye <[email protected]> Co-authored-by: Hans Brouwer <[email protected]> Co-authored-by: Ophir Frish <[email protected]> Co-authored-by: Bohan Hou <[email protected]> Co-authored-by: Gustavo Romero <[email protected]> Co-authored-by: chiwwang <[email protected]> Co-authored-by: hua jiang <[email protected]> Co-authored-by: Elen Kalda <[email protected]> Co-authored-by: Kirill Snezhko <[email protected]> Co-authored-by: Ben Greiner <[email protected]> Co-authored-by: Siyuan Feng <[email protected]> Co-authored-by: Haichen Shen <[email protected]> Co-authored-by: Cody Yu <[email protected]> Co-authored-by: lhutton1 <[email protected]> Co-authored-by: blackkker <[email protected]> Co-authored-by: AndrewZhaoLuo <[email protected]> Co-authored-by: Tianqi Chen <[email protected]> Co-authored-by: Sebastian Boblest <[email protected]> Co-authored-by: Noah Kontur <[email protected]> Co-authored-by: Noah <[email protected]> Co-authored-by: Junru Shao <[email protected]> Co-authored-by: yogurfrul <[email protected]> Co-authored-by: Wuwei Lin <[email protected]> Co-authored-by: Christopher Sidebottom <[email protected]>

This PR is further improvement of the meta schedule project (apache#8473). Co-authored-by: Junru Shao <<[email protected]>> Co-authored-by: Bohan Hou <<[email protected]>> Co-authored-by: Ruihang Lai <<[email protected]>> Co-authored-by: Hongyi Jin <<[email protected]>> Co-authored-by: Wuwei Lin <<[email protected]>> Co-authored-by: Siyuan Feng <<[email protected]>>

tqchen · 2022-07-26T19:24:13Z

Would be good to get a status update @junrushao1994 . I would suggest we move followup non-infra part to separate trackings to keep things tracable.

junrushao mentioned this issue Jul 14, 2021

[RFC] Meta Schedule (AutoTIR) apache/tvm-rfcs#5

Merged

comaniac added the type:rfc-tracking RFC progress tracking. Ref: https://github.com/apache/tvm-rfcs label Jul 27, 2021

This was referenced Aug 1, 2021

[MetaSchedule][M3a] Instruction and Trace #8615

Merged

[MetaSchedule][M3a] Traced Schedule #8623

Merged

junrushao mentioned this issue Aug 24, 2021

[MetaSchedule][M3a] Add Sampling Primitive SampleCategorical. #8817

Merged

Hzfengsy mentioned this issue Aug 27, 2021

[RFC][Tracking Issue] TensorIR Scheduling #7527

Closed

29 tasks

comaniac mentioned this issue Sep 15, 2021

[RFC][Tracking Issue] Pipeline Executor For Compute graph pipeline #8596

Closed

15 tasks

junrushao mentioned this issue Sep 18, 2021

[MetaSchedule][M3b] Builder #9044

Merged

zxybazh mentioned this issue Sep 20, 2021

[MetaSchedule][M3a] TuneContext #9053

Merged

This was referenced Sep 21, 2021

[MetaSchedule][M3b] Argument Info #9059

Merged

[MetaSchedule][M3b] Database #9061

Merged

zxybazh mentioned this issue Sep 22, 2021

[MetaSchedule][M3a] SpaceGenerator #9079

Merged

junrushao mentioned this issue Sep 24, 2021

[MetaSchedule][M3b] Runner #9111

Merged

zxybazh mentioned this issue Sep 26, 2021

[MetaSchedule][M3a] SearchStrategy #9132

Merged

shingjan mentioned this issue Sep 29, 2021

[MetaSchedule][M4a] Local runner #9153

Merged

zxybazh mentioned this issue Sep 29, 2021

[MetaSchedule][M3a] TaskScheduler #9154

Merged

zxybazh mentioned this issue Oct 14, 2021

[Meta Schedule][M3c] Schedule Rules, Mutator & Postprocs #9291

Closed

zxybazh mentioned this issue Jan 5, 2022

[MetaSchedule][M4a] Add EvolutionarySearch Search Strategy #9836

Merged

junrushao mentioned this issue Jan 6, 2022

[MetaSchedule][M3c] XGB-based Cost Model #9859

Merged

junrushao mentioned this issue Jan 6, 2022

[MetaSchedule][M3c] Add Per-Store-Feature #9860

Merged

denise-k mentioned this issue Jan 11, 2022

TVM Roadmap RFC apache/tvm-rfcs#50

Merged

zxybazh mentioned this issue Jan 24, 2022

[MetaSchedule][M4b] Add ApplyHisotryBest Meta Schedule Context #10049

Merged

zxybazh mentioned this issue Jan 27, 2022

[MetaSchedule][M4a] User-API: Tune-TE/TIR/Relay #10079

Merged

Hzfengsy mentioned this issue Jan 27, 2022

[MetaSchedule][M4a] Rewrite-Cooperative-Fetch #10081

Merged

MasterJH5574 mentioned this issue Jan 28, 2022

[MetaSchedule][M4a] Mutator: Mutate-Tile-Size #10092

Merged

This was referenced Feb 24, 2022

[MetaSchedule] Add Gradient Based Task Scheduler #10366

Merged

[MetaSchedule] Update Tuning Interfaces. #10367

Merged

wenxcs mentioned this issue Mar 4, 2022

1 (#6) wenxcs-msft/tvm.dx#7

Closed

areusch added the needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it label Oct 19, 2022

Lunderberg removed the needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it label Oct 28, 2022

hpanda-naut added the tune:meta_schedule src/meta_schedule, python/tvm/meta_schedule label Dec 1, 2022

tqchen closed this as completed Sep 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC][Tracking Issue] Meta Schedule (AutoTIR) #8473

[RFC][Tracking Issue] Meta Schedule (AutoTIR) #8473

junrushao commented Jul 14, 2021 •

edited by vinx13

Loading

cbalint13 commented Jan 26, 2022 •

edited

Loading

junrushao commented Jan 26, 2022

cbalint13 commented Jan 26, 2022

junrushao commented Jan 27, 2022

tqchen commented Jul 26, 2022

[RFC][Tracking Issue] Meta Schedule (AutoTIR) #8473

[RFC][Tracking Issue] Meta Schedule (AutoTIR) #8473

Comments

junrushao commented Jul 14, 2021 • edited by vinx13 Loading

Steps

[M3a] Core infrastructure

[M3b] Enable measurement

[M3c] Enhance search

[M4a] Performance & Coverage

[M4b] Relay integration

M5. Operator coverage with all backends for auto tensorization

M6. Memory optimization

M7. Unblock end-to-end experiments

M8. Broader Set of Intrinsics and Optimization

cbalint13 commented Jan 26, 2022 • edited Loading

junrushao commented Jan 26, 2022

cbalint13 commented Jan 26, 2022

junrushao commented Jan 27, 2022

tqchen commented Jul 26, 2022

junrushao commented Jul 14, 2021 •

edited by vinx13

Loading

cbalint13 commented Jan 26, 2022 •

edited

Loading