Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of relay_to_tir target hook #8423

Merged
merged 1 commit into from
Sep 16, 2021

Conversation

Mousius
Copy link
Member

@Mousius Mousius commented Jul 8, 2021

Part of: #8589

This is an example of how the first new hook proposed in the Additional Target Hooks RFC could be added, longer
term the compilation should move to using Target proper but this unblocks our current work whilst illustrating the eventual interface via Target in target_kind.cc

I'll clean up the duplication if we move forwards with this approach 😸

src/relay/backend/aot_executor_codegen.cc Outdated Show resolved Hide resolved
src/relay/backend/aot_executor_codegen.cc Outdated Show resolved Hide resolved
src/target/target_kind.cc Outdated Show resolved Hide resolved
tests/python/relay/test_additional_target_hooks.py Outdated Show resolved Hide resolved
tests/python/relay/test_additional_target_hooks.py Outdated Show resolved Hide resolved
tests/python/relay/test_external_codegen.py Outdated Show resolved Hide resolved
@mbaret
Copy link
Contributor

mbaret commented Jul 29, 2021

Ping for further reviews @manupa-arm @zhiics @comaniac @jroesch @tqchen @electriclilies (also feel free to take a look at the RFC linked above).

@Mousius Mousius force-pushed the additional-hook-1-relay_to_tir branch 2 times, most recently from f143a9f to ab08175 Compare August 2, 2021 08:59
@Mousius
Copy link
Member Author

Mousius commented Aug 2, 2021

@mbaret I think I've resolved most of your comments here, could take another look? 😸

@Mousius Mousius force-pushed the additional-hook-1-relay_to_tir branch from ab08175 to 9071aa1 Compare August 4, 2021 15:45
Copy link
Contributor

@areusch areusch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so i think we should resolve the concern about compiler pass vs hook first before we merge this. i'll update this thread or the RFC with some specifics/proposals hopefully later on today or tomorrow.

@areusch
Copy link
Contributor

areusch commented Aug 5, 2021

@mbs-octoml could you take a look at this?

@Mousius Mousius force-pushed the additional-hook-1-relay_to_tir branch from 9071aa1 to 7c02f5f Compare August 13, 2021 12:44
Copy link
Contributor

@areusch areusch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @Mousius, left some feedback here. cc @mbs-octoml

src/target/target_kind.cc Outdated Show resolved Hide resolved
src/target/target_kind.cc Outdated Show resolved Hide resolved
tests/python/relay/utils/external_codegen.py Outdated Show resolved Hide resolved
@mbs-octoml
Copy link
Contributor

I've started looking at #8849 and was surprised it's not depending on this extension point. I must have missed something? Anything I can do to help this one along?

@mbs-octoml
Copy link
Contributor

Ah, I think I see why -- #8849 (or parent thereof) bypasses LowerTE in favor of it's own transformation which calls out to compile_engine.py directly.

@Mousius
Copy link
Member Author

Mousius commented Aug 26, 2021

Hi @mbs-octoml,

Ideally it would use the hooks so that the entire TIR can be inspected and planned. You can see workarounds to add unplanned memory, such as:
https://github.com/apache/tvm/pull/8849/files#diff-46f4f7142e5b8253554f139b4d2d33f25c77cc593da034ef50873d771d790d74R242

We'd also like to use this as part of CMSIS-NN but it's likely going to be a later stage refactoring as this is currently pending discussions around the RFC at apache/tvm-rfcs#10 and https://discuss.tvm.apache.org/t/pre-rfc-additional-target-hooks/10430/4

src/relay/backend/te_compiler.cc Outdated Show resolved Hide resolved
src/target/target_kind.cc Outdated Show resolved Hide resolved
{out_var, out_buffer},
};

tir::PrimFunc replacement_func = tir::PrimFunc({x_var, y_var, out_var}, math_loop, VoidType(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It all looks pretty slick to me, you've convinced me of the 'it's just a Pass' approach. The only part not demoed here I can think of is caching. Would you be up for adding that too? Then I think we shouldn't get hung up on trying to fold any of this handling back into the te_compiler (eg by inheriting from some 'GenericLowerTE' class or something). Thanks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first thought is the Pass infra works well as a composition approach where we give a series of tools to the user rather than having them extend from specific classes - to incorporate caching I'd change the signature to something like tvm::transform::Pass(const CompileCache& cache) or similar so we can pass the cache between the intermediary passes?

Splitting the cache from te_compiler.cc and factoring it in like that seems big enough for a separate PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree on reuse by composition rather than inheritance. I wasn't proposing to refactor caching to be shared, rather proposing to show it in your example of something that's so easy to do directly we shouldn't even worry about refactoring.

After writing my initial comment I remembered the prim<->prim shape function handling is also a bit subtle, but that's quite possibly something this sort of extension mechanism won't need to support anyway. Or, by the time it does we will have cleaned that part up enough the way to handle it will be obvious.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I'd suggest using a shared cache passed between the passes is that it encapsulates the logic of:

(Function, Target) -> PrimFunc / External Function / Empty Node

Which includes things like tracking the UniqueName of a node, we could use the IRModule itself as a form of cache in the relay_to_tir.cc but it'd likely be better to give the full cache capability to the hooks so we don't have to re-implement too much of that.

That's what I think justifies factoring the cache out and using it between the constituent passes rather than a local cache per Pass. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see it's necessary.
Within a pass it's a few lines to ensure two calls to the same attrs::kPrimitive Function are rewritten to call the same PrimFunc. We could probably even use the memoization built into ExprMutator.
Between passes there's nothing left to say -- it has all been encapsulated within the IRModule itself.
Even if it were necessary for some peculiar reason then I'd turn the conversation to figuring out how to extend IRModule or attributes or whatever to again ensure there's no special state between passes other than what is spelled out in the IRModule.
Does that make sense?

BTW LGTM for this one exactly as is since I can see the caching issue has deserved more conversation who's outcome can easily go into a follow up. Thanks for pushing on the 'just a Pass' approach, it's so much better :-)

@mbs-octoml
Copy link
Contributor

Just a few comments and I'll be happy to lgtm after those.

@Mousius Mousius force-pushed the additional-hook-1-relay_to_tir branch 2 times, most recently from 69300e7 to df34c95 Compare September 15, 2021 09:03
This the first new hook proposed in the Additional Target Hooks RFC, longer
term the compilation should move to using `Target` proper but this unblocks our current work whilst illustrating the eventual interface via `Target` in `src/relay/backend/contrib/example_target_hooks/relay_to_tir.cc`

Ideally the host target would be annotated onto the `IRModule` so as this `Pass` could use it instead of defaulting to C but this is fine for now.
@Mousius Mousius force-pushed the additional-hook-1-relay_to_tir branch from df34c95 to 055bd20 Compare September 15, 2021 09:07
# specific language governing permissions and limitations
# under the License.

file(GLOB EXAMPLE_TARGET_HOOKS_SRC src/relay/backend/contrib/example_target_hooks/relay_to_tir.cc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

everything in this PR looks great, except that i'm a little concerned we're linking in test-only C++ here into libtvm.so. Possible to do this with TVMScript in Python? i feel like we would need to e.g. add a tests/libtest/src/*.cc plus a separate cmake build target to create a .so for the code in there that links against libtvm.so, then a pytest fixture to load it in one time at the start of testing and provide the module to tests for use. and that sounds like a lot of extra ask for this pr :/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like there's precedent for this: c44b7bf/cmake/modules/contrib/CODEGENC.cmake

so let's not block this PR on that. we will likely need to come to a solution for this in the future.

@areusch areusch merged commit be37923 into apache:main Sep 16, 2021
AndrewZhaoLuo added a commit to AndrewZhaoLuo/tvm that referenced this pull request Sep 16, 2021
* main: (102 commits)
  Implementation of relay_to_tir target hook (apache#8423)
  [Onnx] Fix NLL Loss tests (apache#8971)
  [Bugfix] Fix other div zero errors also in rewrite_simplify (apache#8983)
  [ONNX] enable the onnx tests after PR apache#8274 merged (apache#9019)
  [Hexagon] Disable `thread_local` on Hexagon (apache#9025)
  [Hexagon] Allow undefined symbols in libtvm_runtime.so on Hexagon (apache#9024)
  [Onnx] Add momentum (apache#9000)
  fix (apache#9021)
  [Community] @AndrewZhaoLuo -> Reviewer (apache#9020)
  [Hexagon] Implement model launcher (apache#8986)
  [Relay][Pass] Add ExtractOperators pass (apache#8996)
  [BYOC][TensorRT] Add TensorRT own int8 calibration support to TensorRT BYOC integration (apache#8808)
  [ONNX] Add Einsum converter (apache#8985)
  Add standalone_crt/ to be part of the wheel package, when available. (apache#9005)
  [Relay] Remove memory planing from LowerTEPass  (apache#8974)
  [Hexagon] Treat floats as float32 when passing args to offloaded kernels (apache#9010)
  [Runtime] Pipeline Executor Initial patch. (apache#8702)
  [Hexagon] `llvm-options` attribute is an array of strings (apache#9011)
  disable cuda int8 schedule for non-cuda gpu target (apache#9014)
  [Torch] Add an option to make imported models compatible with the Relay text parser (apache#9015)
  ...
mikepapadim pushed a commit to mikepapadim/tvm that referenced this pull request Sep 17, 2021
This the first new hook proposed in the Additional Target Hooks RFC, longer
term the compilation should move to using `Target` proper but this unblocks our current work whilst illustrating the eventual interface via `Target` in `src/relay/backend/contrib/example_target_hooks/relay_to_tir.cc`

Ideally the host target would be annotated onto the `IRModule` so as this `Pass` could use it instead of defaulting to C but this is fine for now.
ylc pushed a commit to ylc/tvm that referenced this pull request Sep 29, 2021
This the first new hook proposed in the Additional Target Hooks RFC, longer
term the compilation should move to using `Target` proper but this unblocks our current work whilst illustrating the eventual interface via `Target` in `src/relay/backend/contrib/example_target_hooks/relay_to_tir.cc`

Ideally the host target would be annotated onto the `IRModule` so as this `Pass` could use it instead of defaulting to C but this is fine for now.
areusch pushed a commit that referenced this pull request Oct 4, 2021
… only to `/docs` (#9031)

* Add script to look for changed in doc dir

* Modify Jenkinsfile

* Minor changes in scripts

* Working Jenkinsfile on selective stages on docs

* Pass groovy formater on Jenkinsfile

* Implementation of relay_to_tir target hook (#8423)

This the first new hook proposed in the Additional Target Hooks RFC, longer
term the compilation should move to using `Target` proper but this unblocks our current work whilst illustrating the eventual interface via `Target` in `src/relay/backend/contrib/example_target_hooks/relay_to_tir.cc`

Ideally the host target would be annotated onto the `IRModule` so as this `Pass` could use it instead of defaulting to C but this is fine for now.

* [CUDA] Fix dense tensorcore legalize type error when units is specified (#9030)

* Fix dense tensorcore legalize type error when units is specified

* revert black change due to different version from CI

* [ONNX] QLinearAveragePool and QLinearGlobalAveragePool contrib op (#9017)

* [ONNX] QLinearAveragePool and QLinearGlobalAveragePool contrib op

* Fix linter error for variable name and else after return

* Separate quantized avg_pool impl and add TODO for global_avg_pool

* Fix comment typo

* Fix line break in `setup.py` (#9029)

* [Onnx] Add SoftmaxCrossEntropyLoss (#8906)

* nll loss v1

* add converter

* decode strings in byte form

* decode variable length inputs

* make shapes correct

* unsqueeze

* proper weight handling

* simplify if statement

* fix tests

* add comment about tests

* delete extra file

* lint

* so cool

* Update CI Lint Image Version (#8841)

* Update CI Lint Image Version

* trigger

* [BUG] ToBasicBlockNormalForm immutability (#8778)

* ToBasicBlockNormalForm immutability

* better comment on ToBasicBlock

* refine comment of ToBasicBlockForm

* [GRAPH EXECUTOR,VM] Add benchmarking function to graph executor and vm (#8807)

* [GRAPH EXECUTOR,VM] Add benchmarking function to graph executor and vm

This new benchmarking function is just a convenience function for
calling time_evaluator on the underlying module. Hopefully this should
make it easier for users to get good benchmarks of their code.

* formatting

* import order

* more test, more comments, more precision

* fix tests

* add seconds descriptions to doc

* Apply CPPLint to CRT Tests (#8844)

This one was a bit trickier as there was more usage of dynamic arrays and less safe casts. I've tried to minimise the changes to just those required to passing linting.

* [Relay][TOPI] Support of depthwise conv2d NHWC for Mali/Bifrost. (#8584)

* [Relay][TOPI] Support of depthwise conv2d NHWC for Mali/Bifrost.

Added initial tunable autotvm templates for depthwise conv2d with
NHWC layout for Mali and Bifrost.

* [Relay][TOPI] Misc fixes for depthwise conv2d Mali/Bifrost.

- Fix assert for Bifrost.
- Set reasonable default axis splits to avoid using tophub for NHWC.
- Fixed typo: arm cpu -> Mali.

* [Relay][TOPI] Fixed formatting in depthwise conv2d Mali/Bifrost.

* Support for CMSIS-NN in Corstone300 Makefile (#8831)

Change-Id: Ifc2305db4e11d1d15d45407287f8f0bea469100a

* [microtvm][Zephyr] Increase timeout to fix flaky tests (#8846)

* increase timeout

* trigger

* [AMP] Bump up tolerance on flaky test (#8850)

* bumpy up tol

* bumped tolerance up even more

* jostle ci

* [Hexagon] Rework tvm.target.hexagon() interface (#8823)

* [Hexagon] Rework tvm.target.hexagon() interface

Make the tvm.target.hexagon() function take most options as keyword
parameters. This will allow adding additional parameters without changing
the interface.

No changes are required to existing code, except for changing positional
parameters following the CPU version to keyword parameters, and updating
the names of the keyword parameters:
  sim_args  -> sim_options,
  llvm_args -> llvm_options,
although the old names will be accepted for the time being.

* formatting

* change ' to "

* Rename 'args' to 'config' for clarity

* Use 'strip' instad of 'replace'

* Restart build

* [Pattern matching] Add an option to rewrite the graph only once (#8843)

* [Pattern matching] Add an option to rewrite the graph only once

If the graph returned from the callback consists of the original
pattern, the rewriter will run in the loop, which is not always desired.
So this patch proposes an option to run the rewriter only once.

Change-Id: I85cf0a055b8961d52394f21c1e4d7aad0a7e1d06

* Make rewrite_once default to false

Change-Id: Idf6f01f254c403158883681e75c2a5978efbd2d0

* update gpu and cpu (#8853)

* VTA cmake change to include Verilator header for building tsim library (#8797)

* VTA cmake file require Verilator include for tsim target. VTA module.cc uses svOpenArrayHandle to send wide data through DPI

* Refactor Verialtor check conditions

* Build TSIM only for CPU target. CPU target don't use -Werror to compile with Verilator. Jenkinsfile to have tvm_multilib_tsim defined for CPU build target.

* remove build/libvta_tsim.so from non tsim targeting builds

* Revert to enable TSIM build i386. Revert to -Werror in CPU config. Remove verilator CPP objects from cmake config for tsim and put them as include into vta module.cc to avoid Verilator compilation warnings

* [FIX] Bug fix for a floormod rewrite simplify rule (#8852)

* Update rewrite_simplify.cc

* Update test_arith_rewrite_simplify.py

* Update test_arith_rewrite_simplify.py

* Update test_arith_rewrite_simplify.py

* move rust lint script (#8726)

* [AMP] Disallow fp16 conversion for summation-like ops (#8810)

* [AMP] Disallow fp16 conversion for summation-like ops

* test only structural equality

* [TOPI] [Relay] Sparse Conv2d Implementation for 3x3 kernels (#8605)

* [topi] add spconv2d_3x3 nhwc

* [relay] sparse_conv2d: add kernel_size attr

* [relay] add strategy for spconv2d_3x3 nhwc

* [relay] pass to convert spconv2d with const args

* [relay] convert sparse conv2d pass fixes

* use array for sparse conv2d attr

* fixup 1x1 tests; new 3x3 tests

* extend repeat_interleave op for relay.Expr (#8839)

Co-authored-by: Valery Chernov <[email protected]>

* Change AOT from ExprVisitor to MixedModeVisitor (#8856)

This should allow better scale-ability for AOT when targeting larger networks.

* Add a PaddlePaddle Frontend (#8645)

* fix some problems for matmul

* fix some problems for matmul

* add alpha parameter for matmul

* remove unnecessary condition

* add TranslatedLayer which support model loaded by jit.load

* add mul operator support

* Add padding mode support for conv/pool2d

* support 4 two-tuples

* add paddle test case

* add paddle conv2d  case

* update test_forward.py

* fix paddle convert_matmul

* add paddle multiply and matmul op test case

* add test case and fix bug

* delete import pandas

* add paddlepaddle tests

* modify the variable name of convert_reshape

* formatting

* formatting

* use black to format python code

* pylint check

* Remove fluid api

* black format

Co-authored-by: root <[email protected]>
Co-authored-by: wjj19950828 <[email protected]>
Co-authored-by: heliqi <[email protected]>
Co-authored-by: Junru Shao <[email protected]>

* [Runtime] add set_output_zero_copy (#8497)

* Update graph_executor.h

* Update graph_executor.cc

* modify zero copy UT add set input zero copy

* modify C style

* add runtime test

* realy build  generatr the json

Co-authored-by: hwstaff <[email protected]>

* [Hexagon] Change declaration order of unique_ptr objects to fix crash (#8859)

A crash occurs when automatically deleting an instance of
CodeGenHexagon because the LLVMContext object has already been
freed. Objects of both types are created using unique_ptr, but
the object managed by the LLVMContext unique_ptr is passed to
CodeGenHexagon object (not as a unique_ptr).

This crash is fixed by moving the declaration of the LLVMContext
object before the CodeGenHexagon object. I'm not sure if this
is the best way to fix this, but it does fix the crash. Also,
in other files, the LLVMContext object is always created first.

Co-authored-by: Cahoon, Brendon <[email protected]>

* [Graph Executor, VM] Add end to end benchmarking of models (#8858)

Add benchmarking that includes ovearhead of transfering inputs and
outputs to and from the device. This should give an accurate measurement
of the runtime a user would see when using the model. This is
accomplished by adding functions that run from inputs to return values
into the graph executor and the VM.

* [UnitTests] Expose TVM pytest helpers as plugin (#8532)

* [UnitTests] Expose TVM pytest helpers as plugin

Previously, pytest helper utilities such as automatic parametrization
of `target`/`dev`, or `tvm.testing.parameter` were only available for
tests within the `${TVM_HOME}/tests` directory.  This PR extracts the
helper utilities into an importable plugin, which can be used in
external tests (e.g. one-off debugging).

* [UnitTests] Refactor the plugin-specific logic out into plugin.py.

* [UnitTests] Moved marker definition out to global variable.

* Remove AOT Executor header from Arduino project (#8857)

* [Community] @mdw-octoml -> Reviewer (#8868)

* [TIR] Fix opaque access in buffer locator pass and match_buffer in region detector (#8855)

* init

* fix

* Update src/tir/transforms/plan_update_buffer_allocation_location.cc

Co-authored-by: Ruihang Lai <[email protected]>

* Update src/tir/transforms/plan_update_buffer_allocation_location.cc

Co-authored-by: Ruihang Lai <[email protected]>

* address

Co-authored-by: Junru Shao <[email protected]>
Co-authored-by: Ruihang Lai <[email protected]>

* [Autoscheduler] Configurable workload keys (#8862)

* change workload keys

* remove binary string comparison

* append the tuple not every integer

* clean up

* lint

* dump workload keys to dags

* fix things

* change some strings

* misc fixes, add tests

* jostle ci

* [Tutorial][Executor] Fix the usage of executors in tutorials (#8586)

* fix: executor usage for keras tutorial

* fix: executor usage for onnx tutorial

* [Tutorial][Executor] Fix executors in tutorials

* [Frontend][Onnx] Simplify onnx input since name accesses are not reliable. (#8867)

* Simplify onnx input since name accesses are no longer supported.

* move Celu importer.

* [TIR] GetBlockReadWriteRegion (#8875)

* [TIR] GetBlockReadWriteRegion

* Fix black issue

* Use constant reference for the interface

* Fix lint issue

* [RISCV] Add support for llvm parameter -mabi (-target-abi) (#8860)

* [Community] @manupa-arm -> Committer (#8870)

* adding Manupa to the contributors list

* re-trigger CI

* [RPC] Fix ios_rpc build (#8864)

* [Vulkan][Target] Added the driver name to the vulkan target string. (#8882)

Driver name (e.g. "NVIDIA", "radv", "AMD open-source driver") is read
from the `driverName` property in
[VkPhysicalDeviceDriverProperties](https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VkPhysicalDeviceDriverProperties.html),
or is left as `"unknown_driver_name"` if the driver does not support
querying the driver name.

* [ONNX][TOPI] Support select_last_index for argmin/max (#8816)

* support select_last_index for argmin/max

* reverse conditions which made on accident

* forward args in reduce.py

* make proper nodes for reduction ops

* remove complicated nested lambdas

* fix lambda capture for conversion

* forward more arguments

* forward more args

* enable onnx tests

* wrapping casts to remove ambiguity

* revert changes extraneous

* correct incorrect attrs being used for ops

* change attributes

* remove old impl

* register new attribute node

* clean up test

* reformat

* reformat

* coolio

* stable comparison

* casts to avoid ambiguity

* casting more

* correct arg passing

* support select_last_index for argmin/max

* reverse conditions which made on accident

* forward args in reduce.py

* make proper nodes for reduction ops

* remove complicated nested lambdas

* fix lambda capture for conversion

* forward more arguments

* forward more args

* enable onnx tests

* wrapping casts to remove ambiguity

* revert changes extraneous

* correct incorrect attrs being used for ops

* change attributes

* remove old impl

* register new attribute node

* clean up test

* reformat

* reformat

* coolio

* stable comparison

* casts to avoid ambiguity

* casting more

* correct arg passing

* fix broken input

* OneElementReduceAttrs-->ArgReduceAttrs"

* reduce boilerplate

* change names

* remove log statement

* jostle ci

Co-authored-by: Andrew Zhao Luo <[email protected]>

* refactor optimize GEMM on CPU tutorial (#8825)

* refactor optimize GEMM on CPU tutorial

* fix lint errors

* fix more lint errors

* fix typo

* fix problem with redefinition of `k`
add TODO and comments around loop unrolling
clarify note on the array packing figure

* reword general description of array packing

* grap kaxis from compute definition

* remove duplicate comments on unrolling

* Change target string to Target object in the TE compiler and interpreter (#8835)

* # This is a combination of 2 commits.
# This is the 1st commit message:

Initial changes

# This is the commit message #2:

Ftarget string -> Target object works!

* Fix remaining target strings

* fix bad rebase

* Fix typo

* 1 more bad rebase fix

* Lint

* typo

* Forgot to commit this

* Add TargetStrHash and Map<Target... to std::unordered_map<Target... conversion fn

* Passing most tests, yay

* remove some comments

* lint

* target-str-to-target-object

* Respond to change requests

Co-authored-by: Jared Roesch <[email protected]>

* [TensorIR][M2a] CacheRead/Write (#8863)

Co-authored-by: Junru Shao <[email protected]>
Co-authored-by: Wuwei Lin <[email protected]>
Co-authored-by: Ruihang Lai <[email protected]>
Co-authored-by: Hongyi Jin <[email protected]>
Co-authored-by: Siyuan Feng <[email protected]>
Co-authored-by: Bohan Hou <[email protected]>

* [CI] make pre-commit hooks to run on every push instead of every commit (#8888)

* [TVMScript] Fix printing ForNode annotations (#8891)

* [1/10] CMSIS-NN graph partitioner for softmax (#8653)

* cmsis graph partitioner for softmax

Change-Id: I80ecd7bc5351f241b4674ef53b36e4398c8adb83

* Updated docstring in the partioning function

Change-Id: Ieb4b623e5929cfdb6aa0235db64c825fac8d7055

* [microTVM][RVM] Add Arduino RVM (#8748)

* Functioning Arduino Vagrant VM

Begin building Arduino Vagrant VM

Mostly working Vagrant VM

Changes for debugging

Add ignored json file

Fix venv path

* Generalize parts of RVM for multiple platforms

cwd hack

Add unit tests from apps directory to task_python_microtvm.sh

Generalize parts of RVM for multiple platforms

* Add Vagrantfile lint exceptions

* Address PR comments

Address Mehrdad's PR comments

More PR comments

Documentation tweaks

Add dialout group to user

* Rerun tests

* Spresense fix

* Rerun CI tests

* Rerun tests

* sce loss example

* add comments, remove other tests

* lint

* lint

* jostle

* lint up

* jostle

* uncomment some tests

* proper return

* clean up

* lint

* minor merge errors

Co-authored-by: Andrew Zhao Luo <[email protected]>
Co-authored-by: Mehrdad Hessar <[email protected]>
Co-authored-by: Jiawei Liu <[email protected]>
Co-authored-by: Tristan Konolige <[email protected]>
Co-authored-by: Christopher Sidebottom <[email protected]>
Co-authored-by: Anastasia Stulova <[email protected]>
Co-authored-by: Ashutosh Parkhi <[email protected]>
Co-authored-by: Krzysztof Parzyszek <[email protected]>
Co-authored-by: Elen Kalda <[email protected]>
Co-authored-by: Anton Sorokin <[email protected]>
Co-authored-by: Chenfan <[email protected]>
Co-authored-by: masahi <[email protected]>
Co-authored-by: Tantalus13A98B5F <[email protected]>
Co-authored-by: Valery Chernov <[email protected]>
Co-authored-by: Valery Chernov <[email protected]>
Co-authored-by: Jason <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: wjj19950828 <[email protected]>
Co-authored-by: heliqi <[email protected]>
Co-authored-by: Junru Shao <[email protected]>
Co-authored-by: Swift.Sun <[email protected]>
Co-authored-by: hwstaff <[email protected]>
Co-authored-by: Cahoon, Brendon <[email protected]>
Co-authored-by: Lunderberg <[email protected]>
Co-authored-by: Yizhi Liu <[email protected]>
Co-authored-by: Siyuan Feng <[email protected]>
Co-authored-by: Ruihang Lai <[email protected]>
Co-authored-by: Josh Fromm <[email protected]>
Co-authored-by: Alexander Pivovarov <[email protected]>
Co-authored-by: Thierry Moreau <[email protected]>
Co-authored-by: Egor Churaev <[email protected]>
Co-authored-by: Adam Straw <[email protected]>
Co-authored-by: Lily Orth-Smith <[email protected]>
Co-authored-by: Jared Roesch <[email protected]>
Co-authored-by: Siyuan Feng <[email protected]>
Co-authored-by: Wuwei Lin <[email protected]>
Co-authored-by: Hongyi Jin <[email protected]>
Co-authored-by: Bohan Hou <[email protected]>
Co-authored-by: Michalis Papadimitriou <[email protected]>
Co-authored-by: Gavin Uberti <[email protected]>

* [Hexagon] Don't use {} initialization with FastRPC structures (#9033)

The data members in FastRPC structures aren't guaranteed to remain
in the same order. Replace aggregate initialization with direct,
member-by-member initialization.

* Test

* Minor checkstyle issue

* Test

* Test file

* Revert changed in unit tests

* Change script name

* Test

* Revert format on groovy file

* Remove test file

* Minor change in script

* Minor formating changes

* Revert logic in conditions for changed files

Co-authored-by: Christopher Sidebottom <[email protected]>
Co-authored-by: masahi <[email protected]>
Co-authored-by: Anirudh Sundar <[email protected]>
Co-authored-by: Leandro Nunes <[email protected]>
Co-authored-by: AndrewZhaoLuo <[email protected]>
Co-authored-by: Andrew Zhao Luo <[email protected]>
Co-authored-by: Mehrdad Hessar <[email protected]>
Co-authored-by: Jiawei Liu <[email protected]>
Co-authored-by: Tristan Konolige <[email protected]>
Co-authored-by: Christopher Sidebottom <[email protected]>
Co-authored-by: Anastasia Stulova <[email protected]>
Co-authored-by: Ashutosh Parkhi <[email protected]>
Co-authored-by: Krzysztof Parzyszek <[email protected]>
Co-authored-by: Elen Kalda <[email protected]>
Co-authored-by: Anton Sorokin <[email protected]>
Co-authored-by: Chenfan <[email protected]>
Co-authored-by: Tantalus13A98B5F <[email protected]>
Co-authored-by: Valery Chernov <[email protected]>
Co-authored-by: Valery Chernov <[email protected]>
Co-authored-by: Jason <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: wjj19950828 <[email protected]>
Co-authored-by: heliqi <[email protected]>
Co-authored-by: Junru Shao <[email protected]>
Co-authored-by: Swift.Sun <[email protected]>
Co-authored-by: hwstaff <[email protected]>
Co-authored-by: Cahoon, Brendon <[email protected]>
Co-authored-by: Lunderberg <[email protected]>
Co-authored-by: Yizhi Liu <[email protected]>
Co-authored-by: Siyuan Feng <[email protected]>
Co-authored-by: Ruihang Lai <[email protected]>
Co-authored-by: Josh Fromm <[email protected]>
Co-authored-by: Alexander Pivovarov <[email protected]>
Co-authored-by: Thierry Moreau <[email protected]>
Co-authored-by: Egor Churaev <[email protected]>
Co-authored-by: Adam Straw <[email protected]>
Co-authored-by: Lily Orth-Smith <[email protected]>
Co-authored-by: Jared Roesch <[email protected]>
Co-authored-by: Siyuan Feng <[email protected]>
Co-authored-by: Wuwei Lin <[email protected]>
Co-authored-by: Hongyi Jin <[email protected]>
Co-authored-by: Bohan Hou <[email protected]>
Co-authored-by: Gavin Uberti <[email protected]>
ylc pushed a commit to ylc/tvm that referenced this pull request Jan 7, 2022
… only to `/docs` (apache#9031)

* Add script to look for changed in doc dir

* Modify Jenkinsfile

* Minor changes in scripts

* Working Jenkinsfile on selective stages on docs

* Pass groovy formater on Jenkinsfile

* Implementation of relay_to_tir target hook (apache#8423)

This the first new hook proposed in the Additional Target Hooks RFC, longer
term the compilation should move to using `Target` proper but this unblocks our current work whilst illustrating the eventual interface via `Target` in `src/relay/backend/contrib/example_target_hooks/relay_to_tir.cc`

Ideally the host target would be annotated onto the `IRModule` so as this `Pass` could use it instead of defaulting to C but this is fine for now.

* [CUDA] Fix dense tensorcore legalize type error when units is specified (apache#9030)

* Fix dense tensorcore legalize type error when units is specified

* revert black change due to different version from CI

* [ONNX] QLinearAveragePool and QLinearGlobalAveragePool contrib op (apache#9017)

* [ONNX] QLinearAveragePool and QLinearGlobalAveragePool contrib op

* Fix linter error for variable name and else after return

* Separate quantized avg_pool impl and add TODO for global_avg_pool

* Fix comment typo

* Fix line break in `setup.py` (apache#9029)

* [Onnx] Add SoftmaxCrossEntropyLoss (apache#8906)

* nll loss v1

* add converter

* decode strings in byte form

* decode variable length inputs

* make shapes correct

* unsqueeze

* proper weight handling

* simplify if statement

* fix tests

* add comment about tests

* delete extra file

* lint

* so cool

* Update CI Lint Image Version (apache#8841)

* Update CI Lint Image Version

* trigger

* [BUG] ToBasicBlockNormalForm immutability (apache#8778)

* ToBasicBlockNormalForm immutability

* better comment on ToBasicBlock

* refine comment of ToBasicBlockForm

* [GRAPH EXECUTOR,VM] Add benchmarking function to graph executor and vm (apache#8807)

* [GRAPH EXECUTOR,VM] Add benchmarking function to graph executor and vm

This new benchmarking function is just a convenience function for
calling time_evaluator on the underlying module. Hopefully this should
make it easier for users to get good benchmarks of their code.

* formatting

* import order

* more test, more comments, more precision

* fix tests

* add seconds descriptions to doc

* Apply CPPLint to CRT Tests (apache#8844)

This one was a bit trickier as there was more usage of dynamic arrays and less safe casts. I've tried to minimise the changes to just those required to passing linting.

* [Relay][TOPI] Support of depthwise conv2d NHWC for Mali/Bifrost. (apache#8584)

* [Relay][TOPI] Support of depthwise conv2d NHWC for Mali/Bifrost.

Added initial tunable autotvm templates for depthwise conv2d with
NHWC layout for Mali and Bifrost.

* [Relay][TOPI] Misc fixes for depthwise conv2d Mali/Bifrost.

- Fix assert for Bifrost.
- Set reasonable default axis splits to avoid using tophub for NHWC.
- Fixed typo: arm cpu -> Mali.

* [Relay][TOPI] Fixed formatting in depthwise conv2d Mali/Bifrost.

* Support for CMSIS-NN in Corstone300 Makefile (apache#8831)

Change-Id: Ifc2305db4e11d1d15d45407287f8f0bea469100a

* [microtvm][Zephyr] Increase timeout to fix flaky tests (apache#8846)

* increase timeout

* trigger

* [AMP] Bump up tolerance on flaky test (apache#8850)

* bumpy up tol

* bumped tolerance up even more

* jostle ci

* [Hexagon] Rework tvm.target.hexagon() interface (apache#8823)

* [Hexagon] Rework tvm.target.hexagon() interface

Make the tvm.target.hexagon() function take most options as keyword
parameters. This will allow adding additional parameters without changing
the interface.

No changes are required to existing code, except for changing positional
parameters following the CPU version to keyword parameters, and updating
the names of the keyword parameters:
  sim_args  -> sim_options,
  llvm_args -> llvm_options,
although the old names will be accepted for the time being.

* formatting

* change ' to "

* Rename 'args' to 'config' for clarity

* Use 'strip' instad of 'replace'

* Restart build

* [Pattern matching] Add an option to rewrite the graph only once (apache#8843)

* [Pattern matching] Add an option to rewrite the graph only once

If the graph returned from the callback consists of the original
pattern, the rewriter will run in the loop, which is not always desired.
So this patch proposes an option to run the rewriter only once.

Change-Id: I85cf0a055b8961d52394f21c1e4d7aad0a7e1d06

* Make rewrite_once default to false

Change-Id: Idf6f01f254c403158883681e75c2a5978efbd2d0

* update gpu and cpu (apache#8853)

* VTA cmake change to include Verilator header for building tsim library (apache#8797)

* VTA cmake file require Verilator include for tsim target. VTA module.cc uses svOpenArrayHandle to send wide data through DPI

* Refactor Verialtor check conditions

* Build TSIM only for CPU target. CPU target don't use -Werror to compile with Verilator. Jenkinsfile to have tvm_multilib_tsim defined for CPU build target.

* remove build/libvta_tsim.so from non tsim targeting builds

* Revert to enable TSIM build i386. Revert to -Werror in CPU config. Remove verilator CPP objects from cmake config for tsim and put them as include into vta module.cc to avoid Verilator compilation warnings

* [FIX] Bug fix for a floormod rewrite simplify rule (apache#8852)

* Update rewrite_simplify.cc

* Update test_arith_rewrite_simplify.py

* Update test_arith_rewrite_simplify.py

* Update test_arith_rewrite_simplify.py

* move rust lint script (apache#8726)

* [AMP] Disallow fp16 conversion for summation-like ops (apache#8810)

* [AMP] Disallow fp16 conversion for summation-like ops

* test only structural equality

* [TOPI] [Relay] Sparse Conv2d Implementation for 3x3 kernels (apache#8605)

* [topi] add spconv2d_3x3 nhwc

* [relay] sparse_conv2d: add kernel_size attr

* [relay] add strategy for spconv2d_3x3 nhwc

* [relay] pass to convert spconv2d with const args

* [relay] convert sparse conv2d pass fixes

* use array for sparse conv2d attr

* fixup 1x1 tests; new 3x3 tests

* extend repeat_interleave op for relay.Expr (apache#8839)

Co-authored-by: Valery Chernov <[email protected]>

* Change AOT from ExprVisitor to MixedModeVisitor (apache#8856)

This should allow better scale-ability for AOT when targeting larger networks.

* Add a PaddlePaddle Frontend (apache#8645)

* fix some problems for matmul

* fix some problems for matmul

* add alpha parameter for matmul

* remove unnecessary condition

* add TranslatedLayer which support model loaded by jit.load

* add mul operator support

* Add padding mode support for conv/pool2d

* support 4 two-tuples

* add paddle test case

* add paddle conv2d  case

* update test_forward.py

* fix paddle convert_matmul

* add paddle multiply and matmul op test case

* add test case and fix bug

* delete import pandas

* add paddlepaddle tests

* modify the variable name of convert_reshape

* formatting

* formatting

* use black to format python code

* pylint check

* Remove fluid api

* black format

Co-authored-by: root <[email protected]>
Co-authored-by: wjj19950828 <[email protected]>
Co-authored-by: heliqi <[email protected]>
Co-authored-by: Junru Shao <[email protected]>

* [Runtime] add set_output_zero_copy (apache#8497)

* Update graph_executor.h

* Update graph_executor.cc

* modify zero copy UT add set input zero copy

* modify C style

* add runtime test

* realy build  generatr the json

Co-authored-by: hwstaff <[email protected]>

* [Hexagon] Change declaration order of unique_ptr objects to fix crash (apache#8859)

A crash occurs when automatically deleting an instance of
CodeGenHexagon because the LLVMContext object has already been
freed. Objects of both types are created using unique_ptr, but
the object managed by the LLVMContext unique_ptr is passed to
CodeGenHexagon object (not as a unique_ptr).

This crash is fixed by moving the declaration of the LLVMContext
object before the CodeGenHexagon object. I'm not sure if this
is the best way to fix this, but it does fix the crash. Also,
in other files, the LLVMContext object is always created first.

Co-authored-by: Cahoon, Brendon <[email protected]>

* [Graph Executor, VM] Add end to end benchmarking of models (apache#8858)

Add benchmarking that includes ovearhead of transfering inputs and
outputs to and from the device. This should give an accurate measurement
of the runtime a user would see when using the model. This is
accomplished by adding functions that run from inputs to return values
into the graph executor and the VM.

* [UnitTests] Expose TVM pytest helpers as plugin (apache#8532)

* [UnitTests] Expose TVM pytest helpers as plugin

Previously, pytest helper utilities such as automatic parametrization
of `target`/`dev`, or `tvm.testing.parameter` were only available for
tests within the `${TVM_HOME}/tests` directory.  This PR extracts the
helper utilities into an importable plugin, which can be used in
external tests (e.g. one-off debugging).

* [UnitTests] Refactor the plugin-specific logic out into plugin.py.

* [UnitTests] Moved marker definition out to global variable.

* Remove AOT Executor header from Arduino project (apache#8857)

* [Community] @mdw-octoml -> Reviewer (apache#8868)

* [TIR] Fix opaque access in buffer locator pass and match_buffer in region detector (apache#8855)

* init

* fix

* Update src/tir/transforms/plan_update_buffer_allocation_location.cc

Co-authored-by: Ruihang Lai <[email protected]>

* Update src/tir/transforms/plan_update_buffer_allocation_location.cc

Co-authored-by: Ruihang Lai <[email protected]>

* address

Co-authored-by: Junru Shao <[email protected]>
Co-authored-by: Ruihang Lai <[email protected]>

* [Autoscheduler] Configurable workload keys (apache#8862)

* change workload keys

* remove binary string comparison

* append the tuple not every integer

* clean up

* lint

* dump workload keys to dags

* fix things

* change some strings

* misc fixes, add tests

* jostle ci

* [Tutorial][Executor] Fix the usage of executors in tutorials (apache#8586)

* fix: executor usage for keras tutorial

* fix: executor usage for onnx tutorial

* [Tutorial][Executor] Fix executors in tutorials

* [Frontend][Onnx] Simplify onnx input since name accesses are not reliable. (apache#8867)

* Simplify onnx input since name accesses are no longer supported.

* move Celu importer.

* [TIR] GetBlockReadWriteRegion (apache#8875)

* [TIR] GetBlockReadWriteRegion

* Fix black issue

* Use constant reference for the interface

* Fix lint issue

* [RISCV] Add support for llvm parameter -mabi (-target-abi) (apache#8860)

* [Community] @manupa-arm -> Committer (apache#8870)

* adding Manupa to the contributors list

* re-trigger CI

* [RPC] Fix ios_rpc build (apache#8864)

* [Vulkan][Target] Added the driver name to the vulkan target string. (apache#8882)

Driver name (e.g. "NVIDIA", "radv", "AMD open-source driver") is read
from the `driverName` property in
[VkPhysicalDeviceDriverProperties](https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VkPhysicalDeviceDriverProperties.html),
or is left as `"unknown_driver_name"` if the driver does not support
querying the driver name.

* [ONNX][TOPI] Support select_last_index for argmin/max (apache#8816)

* support select_last_index for argmin/max

* reverse conditions which made on accident

* forward args in reduce.py

* make proper nodes for reduction ops

* remove complicated nested lambdas

* fix lambda capture for conversion

* forward more arguments

* forward more args

* enable onnx tests

* wrapping casts to remove ambiguity

* revert changes extraneous

* correct incorrect attrs being used for ops

* change attributes

* remove old impl

* register new attribute node

* clean up test

* reformat

* reformat

* coolio

* stable comparison

* casts to avoid ambiguity

* casting more

* correct arg passing

* support select_last_index for argmin/max

* reverse conditions which made on accident

* forward args in reduce.py

* make proper nodes for reduction ops

* remove complicated nested lambdas

* fix lambda capture for conversion

* forward more arguments

* forward more args

* enable onnx tests

* wrapping casts to remove ambiguity

* revert changes extraneous

* correct incorrect attrs being used for ops

* change attributes

* remove old impl

* register new attribute node

* clean up test

* reformat

* reformat

* coolio

* stable comparison

* casts to avoid ambiguity

* casting more

* correct arg passing

* fix broken input

* OneElementReduceAttrs-->ArgReduceAttrs"

* reduce boilerplate

* change names

* remove log statement

* jostle ci

Co-authored-by: Andrew Zhao Luo <[email protected]>

* refactor optimize GEMM on CPU tutorial (apache#8825)

* refactor optimize GEMM on CPU tutorial

* fix lint errors

* fix more lint errors

* fix typo

* fix problem with redefinition of `k`
add TODO and comments around loop unrolling
clarify note on the array packing figure

* reword general description of array packing

* grap kaxis from compute definition

* remove duplicate comments on unrolling

* Change target string to Target object in the TE compiler and interpreter (apache#8835)

* # This is a combination of 2 commits.

Initial changes

Ftarget string -> Target object works!

* Fix remaining target strings

* fix bad rebase

* Fix typo

* 1 more bad rebase fix

* Lint

* typo

* Forgot to commit this

* Add TargetStrHash and Map<Target... to std::unordered_map<Target... conversion fn

* Passing most tests, yay

* remove some comments

* lint

* target-str-to-target-object

* Respond to change requests

Co-authored-by: Jared Roesch <[email protected]>

* [TensorIR][M2a] CacheRead/Write (apache#8863)

Co-authored-by: Junru Shao <[email protected]>
Co-authored-by: Wuwei Lin <[email protected]>
Co-authored-by: Ruihang Lai <[email protected]>
Co-authored-by: Hongyi Jin <[email protected]>
Co-authored-by: Siyuan Feng <[email protected]>
Co-authored-by: Bohan Hou <[email protected]>

* [CI] make pre-commit hooks to run on every push instead of every commit (apache#8888)

* [TVMScript] Fix printing ForNode annotations (apache#8891)

* [1/10] CMSIS-NN graph partitioner for softmax (apache#8653)

* cmsis graph partitioner for softmax

Change-Id: I80ecd7bc5351f241b4674ef53b36e4398c8adb83

* Updated docstring in the partioning function

Change-Id: Ieb4b623e5929cfdb6aa0235db64c825fac8d7055

* [microTVM][RVM] Add Arduino RVM (apache#8748)

* Functioning Arduino Vagrant VM

Begin building Arduino Vagrant VM

Mostly working Vagrant VM

Changes for debugging

Add ignored json file

Fix venv path

* Generalize parts of RVM for multiple platforms

cwd hack

Add unit tests from apps directory to task_python_microtvm.sh

Generalize parts of RVM for multiple platforms

* Add Vagrantfile lint exceptions

* Address PR comments

Address Mehrdad's PR comments

More PR comments

Documentation tweaks

Add dialout group to user

* Rerun tests

* Spresense fix

* Rerun CI tests

* Rerun tests

* sce loss example

* add comments, remove other tests

* lint

* lint

* jostle

* lint up

* jostle

* uncomment some tests

* proper return

* clean up

* lint

* minor merge errors

Co-authored-by: Andrew Zhao Luo <[email protected]>
Co-authored-by: Mehrdad Hessar <[email protected]>
Co-authored-by: Jiawei Liu <[email protected]>
Co-authored-by: Tristan Konolige <[email protected]>
Co-authored-by: Christopher Sidebottom <[email protected]>
Co-authored-by: Anastasia Stulova <[email protected]>
Co-authored-by: Ashutosh Parkhi <[email protected]>
Co-authored-by: Krzysztof Parzyszek <[email protected]>
Co-authored-by: Elen Kalda <[email protected]>
Co-authored-by: Anton Sorokin <[email protected]>
Co-authored-by: Chenfan <[email protected]>
Co-authored-by: masahi <[email protected]>
Co-authored-by: Tantalus13A98B5F <[email protected]>
Co-authored-by: Valery Chernov <[email protected]>
Co-authored-by: Valery Chernov <[email protected]>
Co-authored-by: Jason <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: wjj19950828 <[email protected]>
Co-authored-by: heliqi <[email protected]>
Co-authored-by: Junru Shao <[email protected]>
Co-authored-by: Swift.Sun <[email protected]>
Co-authored-by: hwstaff <[email protected]>
Co-authored-by: Cahoon, Brendon <[email protected]>
Co-authored-by: Lunderberg <[email protected]>
Co-authored-by: Yizhi Liu <[email protected]>
Co-authored-by: Siyuan Feng <[email protected]>
Co-authored-by: Ruihang Lai <[email protected]>
Co-authored-by: Josh Fromm <[email protected]>
Co-authored-by: Alexander Pivovarov <[email protected]>
Co-authored-by: Thierry Moreau <[email protected]>
Co-authored-by: Egor Churaev <[email protected]>
Co-authored-by: Adam Straw <[email protected]>
Co-authored-by: Lily Orth-Smith <[email protected]>
Co-authored-by: Jared Roesch <[email protected]>
Co-authored-by: Siyuan Feng <[email protected]>
Co-authored-by: Wuwei Lin <[email protected]>
Co-authored-by: Hongyi Jin <[email protected]>
Co-authored-by: Bohan Hou <[email protected]>
Co-authored-by: Michalis Papadimitriou <[email protected]>
Co-authored-by: Gavin Uberti <[email protected]>

* [Hexagon] Don't use {} initialization with FastRPC structures (apache#9033)

The data members in FastRPC structures aren't guaranteed to remain
in the same order. Replace aggregate initialization with direct,
member-by-member initialization.

* Test

* Minor checkstyle issue

* Test

* Test file

* Revert changed in unit tests

* Change script name

* Test

* Revert format on groovy file

* Remove test file

* Minor change in script

* Minor formating changes

* Revert logic in conditions for changed files

Co-authored-by: Christopher Sidebottom <[email protected]>
Co-authored-by: masahi <[email protected]>
Co-authored-by: Anirudh Sundar <[email protected]>
Co-authored-by: Leandro Nunes <[email protected]>
Co-authored-by: AndrewZhaoLuo <[email protected]>
Co-authored-by: Andrew Zhao Luo <[email protected]>
Co-authored-by: Mehrdad Hessar <[email protected]>
Co-authored-by: Jiawei Liu <[email protected]>
Co-authored-by: Tristan Konolige <[email protected]>
Co-authored-by: Christopher Sidebottom <[email protected]>
Co-authored-by: Anastasia Stulova <[email protected]>
Co-authored-by: Ashutosh Parkhi <[email protected]>
Co-authored-by: Krzysztof Parzyszek <[email protected]>
Co-authored-by: Elen Kalda <[email protected]>
Co-authored-by: Anton Sorokin <[email protected]>
Co-authored-by: Chenfan <[email protected]>
Co-authored-by: Tantalus13A98B5F <[email protected]>
Co-authored-by: Valery Chernov <[email protected]>
Co-authored-by: Valery Chernov <[email protected]>
Co-authored-by: Jason <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: wjj19950828 <[email protected]>
Co-authored-by: heliqi <[email protected]>
Co-authored-by: Junru Shao <[email protected]>
Co-authored-by: Swift.Sun <[email protected]>
Co-authored-by: hwstaff <[email protected]>
Co-authored-by: Cahoon, Brendon <[email protected]>
Co-authored-by: Lunderberg <[email protected]>
Co-authored-by: Yizhi Liu <[email protected]>
Co-authored-by: Siyuan Feng <[email protected]>
Co-authored-by: Ruihang Lai <[email protected]>
Co-authored-by: Josh Fromm <[email protected]>
Co-authored-by: Alexander Pivovarov <[email protected]>
Co-authored-by: Thierry Moreau <[email protected]>
Co-authored-by: Egor Churaev <[email protected]>
Co-authored-by: Adam Straw <[email protected]>
Co-authored-by: Lily Orth-Smith <[email protected]>
Co-authored-by: Jared Roesch <[email protected]>
Co-authored-by: Siyuan Feng <[email protected]>
Co-authored-by: Wuwei Lin <[email protected]>
Co-authored-by: Hongyi Jin <[email protected]>
Co-authored-by: Bohan Hou <[email protected]>
Co-authored-by: Gavin Uberti <[email protected]>
ylc pushed a commit to ylc/tvm that referenced this pull request Jan 13, 2022
This the first new hook proposed in the Additional Target Hooks RFC, longer
term the compilation should move to using `Target` proper but this unblocks our current work whilst illustrating the eventual interface via `Target` in `src/relay/backend/contrib/example_target_hooks/relay_to_tir.cc`

Ideally the host target would be annotated onto the `IRModule` so as this `Pass` could use it instead of defaulting to C but this is fine for now.
ylc pushed a commit to ylc/tvm that referenced this pull request Jan 13, 2022
… only to `/docs` (apache#9031)

* Add script to look for changed in doc dir

* Modify Jenkinsfile

* Minor changes in scripts

* Working Jenkinsfile on selective stages on docs

* Pass groovy formater on Jenkinsfile

* Implementation of relay_to_tir target hook (apache#8423)

This the first new hook proposed in the Additional Target Hooks RFC, longer
term the compilation should move to using `Target` proper but this unblocks our current work whilst illustrating the eventual interface via `Target` in `src/relay/backend/contrib/example_target_hooks/relay_to_tir.cc`

Ideally the host target would be annotated onto the `IRModule` so as this `Pass` could use it instead of defaulting to C but this is fine for now.

* [CUDA] Fix dense tensorcore legalize type error when units is specified (apache#9030)

* Fix dense tensorcore legalize type error when units is specified

* revert black change due to different version from CI

* [ONNX] QLinearAveragePool and QLinearGlobalAveragePool contrib op (apache#9017)

* [ONNX] QLinearAveragePool and QLinearGlobalAveragePool contrib op

* Fix linter error for variable name and else after return

* Separate quantized avg_pool impl and add TODO for global_avg_pool

* Fix comment typo

* Fix line break in `setup.py` (apache#9029)

* [Onnx] Add SoftmaxCrossEntropyLoss (apache#8906)

* nll loss v1

* add converter

* decode strings in byte form

* decode variable length inputs

* make shapes correct

* unsqueeze

* proper weight handling

* simplify if statement

* fix tests

* add comment about tests

* delete extra file

* lint

* so cool

* Update CI Lint Image Version (apache#8841)

* Update CI Lint Image Version

* trigger

* [BUG] ToBasicBlockNormalForm immutability (apache#8778)

* ToBasicBlockNormalForm immutability

* better comment on ToBasicBlock

* refine comment of ToBasicBlockForm

* [GRAPH EXECUTOR,VM] Add benchmarking function to graph executor and vm (apache#8807)

* [GRAPH EXECUTOR,VM] Add benchmarking function to graph executor and vm

This new benchmarking function is just a convenience function for
calling time_evaluator on the underlying module. Hopefully this should
make it easier for users to get good benchmarks of their code.

* formatting

* import order

* more test, more comments, more precision

* fix tests

* add seconds descriptions to doc

* Apply CPPLint to CRT Tests (apache#8844)

This one was a bit trickier as there was more usage of dynamic arrays and less safe casts. I've tried to minimise the changes to just those required to passing linting.

* [Relay][TOPI] Support of depthwise conv2d NHWC for Mali/Bifrost. (apache#8584)

* [Relay][TOPI] Support of depthwise conv2d NHWC for Mali/Bifrost.

Added initial tunable autotvm templates for depthwise conv2d with
NHWC layout for Mali and Bifrost.

* [Relay][TOPI] Misc fixes for depthwise conv2d Mali/Bifrost.

- Fix assert for Bifrost.
- Set reasonable default axis splits to avoid using tophub for NHWC.
- Fixed typo: arm cpu -> Mali.

* [Relay][TOPI] Fixed formatting in depthwise conv2d Mali/Bifrost.

* Support for CMSIS-NN in Corstone300 Makefile (apache#8831)

Change-Id: Ifc2305db4e11d1d15d45407287f8f0bea469100a

* [microtvm][Zephyr] Increase timeout to fix flaky tests (apache#8846)

* increase timeout

* trigger

* [AMP] Bump up tolerance on flaky test (apache#8850)

* bumpy up tol

* bumped tolerance up even more

* jostle ci

* [Hexagon] Rework tvm.target.hexagon() interface (apache#8823)

* [Hexagon] Rework tvm.target.hexagon() interface

Make the tvm.target.hexagon() function take most options as keyword
parameters. This will allow adding additional parameters without changing
the interface.

No changes are required to existing code, except for changing positional
parameters following the CPU version to keyword parameters, and updating
the names of the keyword parameters:
  sim_args  -> sim_options,
  llvm_args -> llvm_options,
although the old names will be accepted for the time being.

* formatting

* change ' to "

* Rename 'args' to 'config' for clarity

* Use 'strip' instad of 'replace'

* Restart build

* [Pattern matching] Add an option to rewrite the graph only once (apache#8843)

* [Pattern matching] Add an option to rewrite the graph only once

If the graph returned from the callback consists of the original
pattern, the rewriter will run in the loop, which is not always desired.
So this patch proposes an option to run the rewriter only once.

Change-Id: I85cf0a055b8961d52394f21c1e4d7aad0a7e1d06

* Make rewrite_once default to false

Change-Id: Idf6f01f254c403158883681e75c2a5978efbd2d0

* update gpu and cpu (apache#8853)

* VTA cmake change to include Verilator header for building tsim library (apache#8797)

* VTA cmake file require Verilator include for tsim target. VTA module.cc uses svOpenArrayHandle to send wide data through DPI

* Refactor Verialtor check conditions

* Build TSIM only for CPU target. CPU target don't use -Werror to compile with Verilator. Jenkinsfile to have tvm_multilib_tsim defined for CPU build target.

* remove build/libvta_tsim.so from non tsim targeting builds

* Revert to enable TSIM build i386. Revert to -Werror in CPU config. Remove verilator CPP objects from cmake config for tsim and put them as include into vta module.cc to avoid Verilator compilation warnings

* [FIX] Bug fix for a floormod rewrite simplify rule (apache#8852)

* Update rewrite_simplify.cc

* Update test_arith_rewrite_simplify.py

* Update test_arith_rewrite_simplify.py

* Update test_arith_rewrite_simplify.py

* move rust lint script (apache#8726)

* [AMP] Disallow fp16 conversion for summation-like ops (apache#8810)

* [AMP] Disallow fp16 conversion for summation-like ops

* test only structural equality

* [TOPI] [Relay] Sparse Conv2d Implementation for 3x3 kernels (apache#8605)

* [topi] add spconv2d_3x3 nhwc

* [relay] sparse_conv2d: add kernel_size attr

* [relay] add strategy for spconv2d_3x3 nhwc

* [relay] pass to convert spconv2d with const args

* [relay] convert sparse conv2d pass fixes

* use array for sparse conv2d attr

* fixup 1x1 tests; new 3x3 tests

* extend repeat_interleave op for relay.Expr (apache#8839)

Co-authored-by: Valery Chernov <[email protected]>

* Change AOT from ExprVisitor to MixedModeVisitor (apache#8856)

This should allow better scale-ability for AOT when targeting larger networks.

* Add a PaddlePaddle Frontend (apache#8645)

* fix some problems for matmul

* fix some problems for matmul

* add alpha parameter for matmul

* remove unnecessary condition

* add TranslatedLayer which support model loaded by jit.load

* add mul operator support

* Add padding mode support for conv/pool2d

* support 4 two-tuples

* add paddle test case

* add paddle conv2d  case

* update test_forward.py

* fix paddle convert_matmul

* add paddle multiply and matmul op test case

* add test case and fix bug

* delete import pandas

* add paddlepaddle tests

* modify the variable name of convert_reshape

* formatting

* formatting

* use black to format python code

* pylint check

* Remove fluid api

* black format

Co-authored-by: root <[email protected]>
Co-authored-by: wjj19950828 <[email protected]>
Co-authored-by: heliqi <[email protected]>
Co-authored-by: Junru Shao <[email protected]>

* [Runtime] add set_output_zero_copy (apache#8497)

* Update graph_executor.h

* Update graph_executor.cc

* modify zero copy UT add set input zero copy

* modify C style

* add runtime test

* realy build  generatr the json

Co-authored-by: hwstaff <[email protected]>

* [Hexagon] Change declaration order of unique_ptr objects to fix crash (apache#8859)

A crash occurs when automatically deleting an instance of
CodeGenHexagon because the LLVMContext object has already been
freed. Objects of both types are created using unique_ptr, but
the object managed by the LLVMContext unique_ptr is passed to
CodeGenHexagon object (not as a unique_ptr).

This crash is fixed by moving the declaration of the LLVMContext
object before the CodeGenHexagon object. I'm not sure if this
is the best way to fix this, but it does fix the crash. Also,
in other files, the LLVMContext object is always created first.

Co-authored-by: Cahoon, Brendon <[email protected]>

* [Graph Executor, VM] Add end to end benchmarking of models (apache#8858)

Add benchmarking that includes ovearhead of transfering inputs and
outputs to and from the device. This should give an accurate measurement
of the runtime a user would see when using the model. This is
accomplished by adding functions that run from inputs to return values
into the graph executor and the VM.

* [UnitTests] Expose TVM pytest helpers as plugin (apache#8532)

* [UnitTests] Expose TVM pytest helpers as plugin

Previously, pytest helper utilities such as automatic parametrization
of `target`/`dev`, or `tvm.testing.parameter` were only available for
tests within the `${TVM_HOME}/tests` directory.  This PR extracts the
helper utilities into an importable plugin, which can be used in
external tests (e.g. one-off debugging).

* [UnitTests] Refactor the plugin-specific logic out into plugin.py.

* [UnitTests] Moved marker definition out to global variable.

* Remove AOT Executor header from Arduino project (apache#8857)

* [Community] @mdw-octoml -> Reviewer (apache#8868)

* [TIR] Fix opaque access in buffer locator pass and match_buffer in region detector (apache#8855)

* init

* fix

* Update src/tir/transforms/plan_update_buffer_allocation_location.cc

Co-authored-by: Ruihang Lai <[email protected]>

* Update src/tir/transforms/plan_update_buffer_allocation_location.cc

Co-authored-by: Ruihang Lai <[email protected]>

* address

Co-authored-by: Junru Shao <[email protected]>
Co-authored-by: Ruihang Lai <[email protected]>

* [Autoscheduler] Configurable workload keys (apache#8862)

* change workload keys

* remove binary string comparison

* append the tuple not every integer

* clean up

* lint

* dump workload keys to dags

* fix things

* change some strings

* misc fixes, add tests

* jostle ci

* [Tutorial][Executor] Fix the usage of executors in tutorials (apache#8586)

* fix: executor usage for keras tutorial

* fix: executor usage for onnx tutorial

* [Tutorial][Executor] Fix executors in tutorials

* [Frontend][Onnx] Simplify onnx input since name accesses are not reliable. (apache#8867)

* Simplify onnx input since name accesses are no longer supported.

* move Celu importer.

* [TIR] GetBlockReadWriteRegion (apache#8875)

* [TIR] GetBlockReadWriteRegion

* Fix black issue

* Use constant reference for the interface

* Fix lint issue

* [RISCV] Add support for llvm parameter -mabi (-target-abi) (apache#8860)

* [Community] @manupa-arm -> Committer (apache#8870)

* adding Manupa to the contributors list

* re-trigger CI

* [RPC] Fix ios_rpc build (apache#8864)

* [Vulkan][Target] Added the driver name to the vulkan target string. (apache#8882)

Driver name (e.g. "NVIDIA", "radv", "AMD open-source driver") is read
from the `driverName` property in
[VkPhysicalDeviceDriverProperties](https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VkPhysicalDeviceDriverProperties.html),
or is left as `"unknown_driver_name"` if the driver does not support
querying the driver name.

* [ONNX][TOPI] Support select_last_index for argmin/max (apache#8816)

* support select_last_index for argmin/max

* reverse conditions which made on accident

* forward args in reduce.py

* make proper nodes for reduction ops

* remove complicated nested lambdas

* fix lambda capture for conversion

* forward more arguments

* forward more args

* enable onnx tests

* wrapping casts to remove ambiguity

* revert changes extraneous

* correct incorrect attrs being used for ops

* change attributes

* remove old impl

* register new attribute node

* clean up test

* reformat

* reformat

* coolio

* stable comparison

* casts to avoid ambiguity

* casting more

* correct arg passing

* support select_last_index for argmin/max

* reverse conditions which made on accident

* forward args in reduce.py

* make proper nodes for reduction ops

* remove complicated nested lambdas

* fix lambda capture for conversion

* forward more arguments

* forward more args

* enable onnx tests

* wrapping casts to remove ambiguity

* revert changes extraneous

* correct incorrect attrs being used for ops

* change attributes

* remove old impl

* register new attribute node

* clean up test

* reformat

* reformat

* coolio

* stable comparison

* casts to avoid ambiguity

* casting more

* correct arg passing

* fix broken input

* OneElementReduceAttrs-->ArgReduceAttrs"

* reduce boilerplate

* change names

* remove log statement

* jostle ci

Co-authored-by: Andrew Zhao Luo <[email protected]>

* refactor optimize GEMM on CPU tutorial (apache#8825)

* refactor optimize GEMM on CPU tutorial

* fix lint errors

* fix more lint errors

* fix typo

* fix problem with redefinition of `k`
add TODO and comments around loop unrolling
clarify note on the array packing figure

* reword general description of array packing

* grap kaxis from compute definition

* remove duplicate comments on unrolling

* Change target string to Target object in the TE compiler and interpreter (apache#8835)

* # This is a combination of 2 commits.

Initial changes

Ftarget string -> Target object works!

* Fix remaining target strings

* fix bad rebase

* Fix typo

* 1 more bad rebase fix

* Lint

* typo

* Forgot to commit this

* Add TargetStrHash and Map<Target... to std::unordered_map<Target... conversion fn

* Passing most tests, yay

* remove some comments

* lint

* target-str-to-target-object

* Respond to change requests

Co-authored-by: Jared Roesch <[email protected]>

* [TensorIR][M2a] CacheRead/Write (apache#8863)

Co-authored-by: Junru Shao <[email protected]>
Co-authored-by: Wuwei Lin <[email protected]>
Co-authored-by: Ruihang Lai <[email protected]>
Co-authored-by: Hongyi Jin <[email protected]>
Co-authored-by: Siyuan Feng <[email protected]>
Co-authored-by: Bohan Hou <[email protected]>

* [CI] make pre-commit hooks to run on every push instead of every commit (apache#8888)

* [TVMScript] Fix printing ForNode annotations (apache#8891)

* [1/10] CMSIS-NN graph partitioner for softmax (apache#8653)

* cmsis graph partitioner for softmax

Change-Id: I80ecd7bc5351f241b4674ef53b36e4398c8adb83

* Updated docstring in the partioning function

Change-Id: Ieb4b623e5929cfdb6aa0235db64c825fac8d7055

* [microTVM][RVM] Add Arduino RVM (apache#8748)

* Functioning Arduino Vagrant VM

Begin building Arduino Vagrant VM

Mostly working Vagrant VM

Changes for debugging

Add ignored json file

Fix venv path

* Generalize parts of RVM for multiple platforms

cwd hack

Add unit tests from apps directory to task_python_microtvm.sh

Generalize parts of RVM for multiple platforms

* Add Vagrantfile lint exceptions

* Address PR comments

Address Mehrdad's PR comments

More PR comments

Documentation tweaks

Add dialout group to user

* Rerun tests

* Spresense fix

* Rerun CI tests

* Rerun tests

* sce loss example

* add comments, remove other tests

* lint

* lint

* jostle

* lint up

* jostle

* uncomment some tests

* proper return

* clean up

* lint

* minor merge errors

Co-authored-by: Andrew Zhao Luo <[email protected]>
Co-authored-by: Mehrdad Hessar <[email protected]>
Co-authored-by: Jiawei Liu <[email protected]>
Co-authored-by: Tristan Konolige <[email protected]>
Co-authored-by: Christopher Sidebottom <[email protected]>
Co-authored-by: Anastasia Stulova <[email protected]>
Co-authored-by: Ashutosh Parkhi <[email protected]>
Co-authored-by: Krzysztof Parzyszek <[email protected]>
Co-authored-by: Elen Kalda <[email protected]>
Co-authored-by: Anton Sorokin <[email protected]>
Co-authored-by: Chenfan <[email protected]>
Co-authored-by: masahi <[email protected]>
Co-authored-by: Tantalus13A98B5F <[email protected]>
Co-authored-by: Valery Chernov <[email protected]>
Co-authored-by: Valery Chernov <[email protected]>
Co-authored-by: Jason <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: wjj19950828 <[email protected]>
Co-authored-by: heliqi <[email protected]>
Co-authored-by: Junru Shao <[email protected]>
Co-authored-by: Swift.Sun <[email protected]>
Co-authored-by: hwstaff <[email protected]>
Co-authored-by: Cahoon, Brendon <[email protected]>
Co-authored-by: Lunderberg <[email protected]>
Co-authored-by: Yizhi Liu <[email protected]>
Co-authored-by: Siyuan Feng <[email protected]>
Co-authored-by: Ruihang Lai <[email protected]>
Co-authored-by: Josh Fromm <[email protected]>
Co-authored-by: Alexander Pivovarov <[email protected]>
Co-authored-by: Thierry Moreau <[email protected]>
Co-authored-by: Egor Churaev <[email protected]>
Co-authored-by: Adam Straw <[email protected]>
Co-authored-by: Lily Orth-Smith <[email protected]>
Co-authored-by: Jared Roesch <[email protected]>
Co-authored-by: Siyuan Feng <[email protected]>
Co-authored-by: Wuwei Lin <[email protected]>
Co-authored-by: Hongyi Jin <[email protected]>
Co-authored-by: Bohan Hou <[email protected]>
Co-authored-by: Michalis Papadimitriou <[email protected]>
Co-authored-by: Gavin Uberti <[email protected]>

* [Hexagon] Don't use {} initialization with FastRPC structures (apache#9033)

The data members in FastRPC structures aren't guaranteed to remain
in the same order. Replace aggregate initialization with direct,
member-by-member initialization.

* Test

* Minor checkstyle issue

* Test

* Test file

* Revert changed in unit tests

* Change script name

* Test

* Revert format on groovy file

* Remove test file

* Minor change in script

* Minor formating changes

* Revert logic in conditions for changed files

Co-authored-by: Christopher Sidebottom <[email protected]>
Co-authored-by: masahi <[email protected]>
Co-authored-by: Anirudh Sundar <[email protected]>
Co-authored-by: Leandro Nunes <[email protected]>
Co-authored-by: AndrewZhaoLuo <[email protected]>
Co-authored-by: Andrew Zhao Luo <[email protected]>
Co-authored-by: Mehrdad Hessar <[email protected]>
Co-authored-by: Jiawei Liu <[email protected]>
Co-authored-by: Tristan Konolige <[email protected]>
Co-authored-by: Christopher Sidebottom <[email protected]>
Co-authored-by: Anastasia Stulova <[email protected]>
Co-authored-by: Ashutosh Parkhi <[email protected]>
Co-authored-by: Krzysztof Parzyszek <[email protected]>
Co-authored-by: Elen Kalda <[email protected]>
Co-authored-by: Anton Sorokin <[email protected]>
Co-authored-by: Chenfan <[email protected]>
Co-authored-by: Tantalus13A98B5F <[email protected]>
Co-authored-by: Valery Chernov <[email protected]>
Co-authored-by: Valery Chernov <[email protected]>
Co-authored-by: Jason <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: wjj19950828 <[email protected]>
Co-authored-by: heliqi <[email protected]>
Co-authored-by: Junru Shao <[email protected]>
Co-authored-by: Swift.Sun <[email protected]>
Co-authored-by: hwstaff <[email protected]>
Co-authored-by: Cahoon, Brendon <[email protected]>
Co-authored-by: Lunderberg <[email protected]>
Co-authored-by: Yizhi Liu <[email protected]>
Co-authored-by: Siyuan Feng <[email protected]>
Co-authored-by: Ruihang Lai <[email protected]>
Co-authored-by: Josh Fromm <[email protected]>
Co-authored-by: Alexander Pivovarov <[email protected]>
Co-authored-by: Thierry Moreau <[email protected]>
Co-authored-by: Egor Churaev <[email protected]>
Co-authored-by: Adam Straw <[email protected]>
Co-authored-by: Lily Orth-Smith <[email protected]>
Co-authored-by: Jared Roesch <[email protected]>
Co-authored-by: Siyuan Feng <[email protected]>
Co-authored-by: Wuwei Lin <[email protected]>
Co-authored-by: Hongyi Jin <[email protected]>
Co-authored-by: Bohan Hou <[email protected]>
Co-authored-by: Gavin Uberti <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: need RFC need RFC discussion
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants