[CI] Include static builds of the runtime as part of CI #3

gigiblender · 2022-12-14T11:06:54Z

No description provided.

… buffer (apache#13605) * Fix PlanAndUpdateBufferAllocationLocation not visiting constant buffer * add comment

…ache#13414) Enable depthwise conv2d NHWC with HWIO kernel layout. The default kernel layout is HWOI, matched to previous behavior.

…che#13602) * Add support for SequenceAt and SplitToSequence to onnx importer * Formatting * Change keepdims comparison * Only unify non-tuples in If

…#13606) * introduce LowerToPrimFunc to lower Relay func to TIR prim func * add doc * expose to python * adding test * another minor doc update * Verify that the input is a primitive function

…CopyConstants scheduler (apache#13588) In Ethos-U, CopyConstants scheduler currently copies weights for all operators. But in Vela, there are a number of scenarios where the weights are not buffered in SRAM, and FullyConnected case is one of them.

* fixed test * fix flag for arduino

Pass `std::nullopt` to initialization of `PassBuilder` for `PGOOptions`. LLVM is moving away from its own `Optional` type to `std::optional`.

…13616) default_rng was introduced in numpy 1.19, which is not present even in Ubuntu 20.04 (it comes with 1.17.4).

…abase (apache#13611) [Metaschedule] Align get_top_k logic in MemoryDatabase and JSONDatabase

…ase (apache#13618) * fixed tensor core batch_matmul legalize for transpose_b = False case * add test * clean up

…che#13615) In the Relay Matmul shape relation, we are a little over enthusiastic about unifying dynamic shapes. If one of the shapes is static, it does not need to be unified. This change only rewrites dynamic shapes to required static constraints. * Remove overwriting of matmul shapes when they are static * Simplify nesting * Add shape check to dense tests.

[Frontend] [ONNX] Support sequence_lens of GRU. Support convert sequence_lens input of GRU.

* [ETHOSN] Add support for experimental compiler option The support library currently supports enabling the experimental cascading compiler option via an environment variable `FORCE_EXPERIMENTAL_COMPILER`. This commit exposes the ability to enable this option through TVMC.

…#13622) * Fix print round-tripable multi thread env binding * add unittest

* [TOPI][Hexagon] Implement global_avg_pool2d for hexagon * Fix name * Fix lint issues * Use get_hexagon_target()

apache#13613) * Add check for non-contiguous memory access when lowering to async dma copies. * lint * lint and nits * lint

* [MetaSchedule] Fix tensorcore winograd task extraction * add test * fixed target

@multiverstack-intellif

Please join us to welcome @multiverstack-intellif as a new reviewer to TVM. Min contributed key features in TIR scheduling, specifically, cache_read/write that are aware of cache location. These are huge improvements that are technically profound and helpful to the overall TVM stack. Therefore, it would be good to get more opportunities for him to participate more deeply in the community. - [Commits History](https://github.com/apache/tvm/commits?author=multiverstack-intellif) - [Code Review](https://github.com/apache/tvm/pulls?q=reviewed-by:multiverstack-intellif)

This PR address fixes for UMA target registration. * Fix the doc issue apache#13304 * Continues stalled PR apache#12731 Changes: * Incorporates all proposed fixes from mentioned [PR apache#12731](apache#12731) * Address test case concerns and discussions from [PR apache#12731](apache#12731) * **NEW:** Already exiting target cannot be created, explicit error on this. * **NEW:** Attributes having special/reserved scope cannot be created explicitly. It also address proper test cases for all the above.

)

…afe (apache#13609) Protect CheckSemaphore with mutex. Ensure that only one thread can add a semaphore if it doesn't already exist.

…rsions above 3.8 too (apache#13635) Currently, `relay.testing.py_converter` is checking for using _exactly_ Python 3.8 in order to use certain updated signatures in the `ast` library. However, those signatures are also correct for versions _above_ 3.8. This PR changes the bounds checks so that the converter will work above 3.8.

…3623) * add set_output and test for set_output_zero_copy in python * clean up * clean up test * test finished * remove set output * remove setoutput from header * use zero copy for params * fix typo * address comments * address comments * add second test for set_input params * add requires_torch * add requires torch * remove pytest * add error handling for c graph executor * better handling

This PR tries to fix the crash of parser when the old value of a var is an array but the new value is not. For example: ```python from tvm.script import tir as T def func_wrapper(shape, dtype): @T.prim_func def test_case(): a = T.alloc_buffer(shape, dtype=dtype) return test_case if __name__ == "__main__": a = np.zeros((10, 10), dtype="int8") print(func_wrapper((256, 256), dtype="int8").script()) ``` In the above code, there are two assignment to var 'a'. In the global scope, its value is a numpy array. But it is a Buffer in the prim function. There is a table named 'name2value' to track the value of vars like 'a' here. When the parser wants to update its value, it will compare the value between the new and the old assignment. Here the problem comes. When we use '==' to compare an array with a value, the result is an array too, which can not be used as a condition of a if stmt directly. So, the code above will emit an error: ```shell error: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() --> /workspace/code_newest/tvm/private_test/test_meta_programming.py:16:9 | 16 | a = T.alloc_buffer(shape, dtype=dtype) | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ``` This PR fixes this by change "==" to "is". Co-authored-by: lightzhan-intellif <[email protected]>

…pache#13325) * [TRANSFORM] Fix virtual device anaotation issue with BYOC subgraphs Heterogeneous module partitioned by BYOC has functions nodes without any VirtualDevice definition (having FullyUnconstrained device). Ignoring the device here causes expr_virtual_devices_ being empty when PopVirtualDevice is called assuming above PushVirtualDevice is succeeded. PushVirtualDevice and PopVirtualDevice occurs as pairs across function body, hence it's better to insert the The Virtual Device for Uncontrained and Pop it subsequently. * * Test case Co-authored-by: Siva Rama Krishna Reddy B <[email protected]>

This PR try to fix the following bug: ```python def test_var_capturing_order(): b = 2 @T.prim_func def test_case(): k: T.int32 = b if __name__ == "__main__": b = 1 ``` In the prim func `test_case`, the vaule of b should be 2, rather than 1. The parser wrongly uses global vars to shadow the value of nonlocal vars, which should be reversed. Co-authored-by: lightzhan-intellif <[email protected]>

Removing some minor code path that is not used any longer.

Temporarily constrain the version of numpy to workaround the deprecated value used in mxnet. See apache#13647. Change-Id: Ib271c223447c76b855fe35cc8a1e77411a3fa441

-New process memory allocator is used to create buffers and networks. -Support for 22.08 stack has been kept intact in the sources and tests until new docker image is built and used. -Tests were modified to meet limitations imposed on input zero point and kernel size by NPU software. -Removed defining ETHON_API_VERSION from cmake infra.

This commit updates the microTVM code to use Zephyr 3.2 and SDK 0.15.2. As a result of this change, there are a few other changes that are included: - A launch script was added for Zephyr and Arduino template project to run the Project API server from a different Python ENV than TVM Python ENV; - For Zephyr, the launch script uses global python3.8 which is where west is registered. However, for Arduino it uses a separate virtual ENV with python3 version that exists in the host; - tests/micro/project_api/test_arduino_microtvm_api_server.py was removed since these tests were using Arduino microTVM API server by importing it from TVM. We no longer support Arduino/Zephyr dependencies in TVM testing python ENV; - Disables a demo and test due to a CMSIS-NN bug: [Bug] CMSIS-NN BYOC fails with Zephyr 3.2 apache#13856; There will be a follow up work to move Zephyr to a completely separate virtual ENV as it was done in this commit for Arduino in the launch script.

@vinx13

…e#13891) This PR adds Tensor Core intrinsics with `shared.dyn` scope and changes the default rules to use `shared.dyn`. Here are the performance improvement of GEMM 1024x1024x1024 on my device (RTX-3080) | | Use `shared` | Use `shared.dyn`. | Speedup | | ----------- | --------------------- | -------------------- | ---------- | fp 16-16-16 | 66399.8766 GFLOPs | 71778.3808 GFLOPs | 8.1% | fp 16-16-32 | 44292.5893 GFLOPs | 49070.2514 GFLOPS | 10.8% | cc @vinx13 @junrushao @masahi

normalize test_crt.py temporary directory usage remove test_crt.py unused code and variables fix test_crt.py to pass pylint and add to pylint script

@wrongtest-intellif

Dear community: Please join us to welcome Wrongtest (@wrongtest-intellif) as a new PMC member. Wrongtest has been actively improving and contributing to the TIR Schedule and arithmetic analysis implementation. He’s been actively involved in several components of the TVM top to bottom, including but not limited to, - Relay - TOPI - MetaSchedule - TIR - TVMScript - Arithmetic analysis in TIR - Codegen for Cuda and LLVM - Runtime He has a deep understanding of the whole part of one of the compilation flows in the TVM(Relay-TIR-Codegen). He focuses on applying TVM to their NPU backend, contributing features back to the community, and cultivating more active community contributors in his company. In addition, He has been very actively participating in the community, discussion and sharing his ideas in the forum. He has been also actively managing the PRs and issues. * [Commits History](https://github.com/apache/tvm/commits?author=wrongtest-intellif) * [Code Review](https://github.com/apache/tvm/pulls?q=+reviewed-by%3Awrongtest-intellif+) * [Community Forum Summary](https://discuss.tvm.apache.org/u/wrongtest/summary)

* feat: combine cutlass and ansor * use sm80 and disable run_benchmark * fix lint error * use tempfile; fix dangerous default value * merge cutlass_ansor test into test_cutlass.py * fix lint --------- Co-authored-by: hanqingchang <[email protected]>

This PR extends the value accepted by USE_LIBBACKTRACE cmake variable. It adds ability to search pre-compile libbacktrace from system path or user-provided path. The behavior of this variable will be: - OFF: Don't use libbacktrace - ON: Find libbacktrace from system paths. Fail the build generation if libbacktrace is not found. - COMPILE: Build and link to libbacktrace from 3rdparty/libbacktrace. - \<PATH\>: Looking for the libbacktrace header and static lib from a user-provided path. Fail the build generation if libbacktrace is not found. - AUTO: - Find libbacktrace from system paths. - If not found, fallback to COMPILE on Linux or MacOS, fallback to OFF on Windows or other platforms.

…apache#13849) * Assertion failed during tuning * Cleanup * Do not commit * Do not commit * Undo fix + provide test for multithread random filling * Random fill test with fix enabled * Isolate the effect of this test on the other tests * Correct the typo in the function name * Import threading + lint

…e#13877) * initial commit * update additional use cases * typo * asf header, summary * clean up * lint * move code to src/runtime/file_utils.h * file utils is cool

…3826) [Torch] Fix advanced indexing with NoneType index

* [Hexagon][QNN] Improve performance of qnn.mul This commit imroves performance of qnn.mul operation without QNN canonicalization. * [QNN][Relay][Topi] Add qnn.dense with weight layout This commit adds new Relay operation "qnn.contrib_dense_pack" that supports different weights layout (nn.dense and qnn.dense do not support this attribute). This new operation is full analog of "nn.contrib_dense_pack" operation but in QNN space.

…pache#13874) * [TOPHUB] use keys as a keyword for searching of existing statistics In case of ARM we might not specify -device and in this case llvm will be used while even in this case we can determin proper filename with stat since keys have architecture defined. The same situatin must with with x86 * Add test on target not having arm_cpu device * minor fix, add comment * Fix pylint * Fix comment

* [OpenCL] Implement save/load pre-compiled programs Using pre-compiled programs might significantly improve inference time of the first run. - Added methods `SupportPreCompiledPrograms` which reports if the module supports using pre-compiled programs. - Method `GetPreCompiledPrograms` returns string with bytes of pre-compiled programs. - Method `SetPreCompiledPrograms` allows user to pass pre-compiled programs to the module. * Fix lint * Apply comment: PackedFunc is used * Fix build * Fix CI and rename functions * Apply comments

- Fix a bug where `buffer_decl`, combined with certain usage patterns of the resulting buffer, cause an TVM-internal assert failure during TIR-compilation.

fix doc for tvm.te.const()

* [CLML][CODEGEN] CLML native codegen utility This util generates native CLML code given a DNN model. It does import via tvmc, extracts clml_modules, get the json source and finally generates clml_models.cc that holds source for various sub graphs. cpp_clml tool has additional infrastructure to compile it as a standalong binary that runs these models. This PR adds symbol name to the generates json grpah. Also, extends const_loader interface to get constant params. * * review comments * * review * * review

…d.py script (apache#13905) As the PR title states, the purpose of this PR is to pass the path parameter given to cmake_build to the task_build.py script. with this PR we will be able to control compilation in different directories.

* add base class for bitwise operations. BitwiseAnd, BitwiseNot, BitwiseOr and BitwiseXor were implemented * add test for BitwiseAnd, BitwiseNot, BitwiseOr, BitwiseXor operations to ONNX front-end * add test of BitShift for ONNX front-end * fix dtype for test * skip test due to old version of ORT --------- Co-authored-by: Valery Chernov <[email protected]>

A simple tvmc tune command currently results in a huge wall of warnings about target_host parameter being deprecated, even when the user hasn't provided a target-host cmd line argument. We can prevent that happening from just not providing the default target-host to tvmc. Also, ensure that when the user does provide target-host, we print the warning once, not 500 times.

) This PR updates the order of arithmetic analyzer simplification, by adding a stage of canonical simplification at the very beginning so that every simplification always starts with a canonical round. This is because the rewrite simplification may destroy some PrimExpr property that the canonical simplification can make use of. Therefore, adding the canonical one in the front can maximize the use of canonical simplification.

Issue apache#13911 reported that newer versions of setuptools crash with the version constraint "<=1.23.*", this commit implements the suggested fix by using "<=1.23".

This PR is the bug fix reported in apache#13892. Initially, we mix the logic of `LetStmt` docsifying method with and without concise scoping. For example, in ```python x = T.var("int32") with T.let(x, 0): ``` `x` in the `LetStmt` works as a right value, while in ```python x: T.int32 = 0 ``` `x` in the `LetStmt` works as a left value as result. Our old logic mixed them together to generate the wrong code for the first case. Meanwhile, during the fix, we found another bug in concise scoping check. For example, we have ```python x = T.var("int32") y = T.var("int32") with T.let(x, y): with T.let(y, 0): ``` here we should not output ```python x = T.var("int32") y = T.var("int32") with T.let(x, y): y: int32 = 0 ``` becase this will define a new `y_1: int32 = 0` indeed, due the the variable shadowing logic of the parser, which is different from the `y` we define and refer to. Our concise scoping `v: ... = ...` should launch if and only if the `v` is never defined before. Otherwise, we use `with T.let(v, ...):` instead.

…ter (apache#13859) This PR refactors the output of `assert_structural_equal`. Different from the directly printing mismatching nodes, in the old version, the improved one will print the whole scripts, with mismatching nodes underlined. And we print the `ObjectPath` to the mismatching nodes for further better debug. For example, we have following functions ```python @T.prim_func def func1(a: T.handle, b: T.handle): A = T.match_buffer(a, (128, 128)) B = T.match_buffer(b, (128, 128)) @T.prim_func def func2(a: T.handle, b: T.handle): A = T.match_buffer(a, (128, 128)) B = T.match_buffer(b, (128, 256)) ``` the log of `assert_structural_equal(func1, func2)` will be like ```python ValueError: StructuralEqual check failed, caused by lhs at <root>.buffer_map[b].shape[1].value: # from tvm.script import tir as T @T.prim_func def main(a: T.handle, b: T.handle): A = T.match_buffer(a, (128, 128)) B = T.match_buffer(b, (128, 128)) ^^^ T.evaluate(0) and rhs at <root>.buffer_map[b].shape[1].value: # from tvm.script import tir as T @T.prim_func def main(a: T.handle, b: T.handle): A = T.match_buffer(a, (128, 128)) B = T.match_buffer(b, (128, 256)) ^^^ T.evaluate(0) ``` instead of ```python ValueError: StructuralEqual check failed, caused by lhs: 128 and rhs: 256 ``` which is not readable sometimes.

Rescale parameters have been added for binary elementwise operation in accordance with the Vela API (rescale field in NpuElementWiseOperation https://review.mlplatform.org/plugins/gitiles/ml/ethos-u/ethos-u-vela/+/refs/tags/3.5.0/ethosu/vela/api.py#381). This PR is preparation for implementation of softmax operation.

This patch undoes the change that was put in place to prevent the build and installation of NNPACK from failing due to a renaming of the default branch to main by the NNPACK external dependency cpuinfo. See apache#13871 The issue has been fixed at the source by PR apache#214 which is now merged in to NNPACK, so the change to `ubuntu_install_nnpack.sh` is no longer required: Maratyszcza/NNPACK#214

Leaving class definitions was not correctly handled when recreating scoping information. The fix correctly pops scope whenever the indentation level becomes less than the current scope.

The rules that rewrite `min(floordiv(x + (A-1), A) * A, max(x, A))` and `min(truncdiv(x + (A-1), A) * A, max(x, A))` into `max(x, A)` did not have sufficiently tight bounds. The `truncdiv` rule required that `x >= 0`, while the `floordiv` rule had no requirement on `x`. In both cases, the simplification was incorrect when `x==0`, as it would result in a rewrite from `min(0, max(0, A))` into `max(0, A)`. This commit updates the rules to require that `x >= 0` for each of these rules.

On the platforms with several OpenCL platforms (e.g. Intel CPU and NVidia GPU) it was possible to use OpenCL device only from one platform. And in case when Intel was the first in the platforms list than it wasn't possible to run model on NVidia GPU. In this PR the init function was modified and now it is possible to use OpenCL devices from different platforms. In case when there are several GPU accelerators then it is possible to select one of them. You can use device id to select GPU device. On the code below the device names of two OpenCL devices are printed: ```python import tvm print("opencl 0: ", tvm.opencl(0).device_name) print("opencl 1: ", tvm.opencl(1).device_name) ``` In case then the machine doesn't contain any GPUs then we will try to use CPU if OpenCL runtime is available.

Certain tests that make use of `pytest_wrapper.py`, such as those triggered by `task_python_integration.sh` will fail when a "PLATFORM" environment variable is not set within the Docker container. When using `build.sh` to both create a container, and run a command to execute one of the tests that requires a "PLATFORM", an error will occur due to the missing environment variable. This patch is necessary to add support for this environment variable to `build.sh` and prevent such errors.

masahi and others added 5 commits December 13, 2022 20:10

[TIR] Fix PlanAndUpdateBufferAllocationLocation not visiting constant…

1d98634

… buffer (apache#13605) * Fix PlanAndUpdateBufferAllocationLocation not visiting constant buffer * add comment

[Hexagon] Enable depthwise conv2d NHWC with an HWIO kernel layout (ap…

12311dc

…ache#13414) Enable depthwise conv2d NHWC with HWIO kernel layout. The default kernel layout is HWOI, matched to previous behavior.

[Relay][Frontend][Onnx] SequenceAt and SplitToSequence Operators (apa…

c547bbb

…che#13602) * Add support for SequenceAt and SplitToSequence to onnx importer * Formatting * Change keepdims comparison * Only unify non-tuples in If

[Relay][TIR] Add utility to lower Relay func to TIR prim func (apache…

c6652bc

…#13606) * introduce LowerToPrimFunc to lower Relay func to TIR prim func * add doc * expose to python * adding test * another minor doc update * Verify that the input is a primitive function

gigiblender force-pushed the build-static-runtime branch from c1c3b12 to 5ba8295 Compare December 14, 2022 12:33

mehrdadh and others added 24 commits December 14, 2022 09:25

[microTVM][Zephyr] Fix TVMC test on hardware (apache#13598)

949089d

* fixed test * fix flag for arduino

[LLVM] Use std::nullopt instead of llvm::None (apache#13617)

496ca9a

Pass `std::nullopt` to initialization of `PassBuilder` for `PGOOptions`. LLVM is moving away from its own `Optional` type to `std::optional`.

[Hexagon] Switch from default_rng to random in Hexagon tests (apache#…

795945b

…13616) default_rng was introduced in numpy 1.19, which is not present even in Ubuntu 20.04 (it comes with 1.17.4).

[Metaschedule] Aligning get_top_k logic in MemoryDatabase and JSONDat…

57a13a2

…abase (apache#13611) [Metaschedule] Align get_top_k logic in MemoryDatabase and JSONDatabase

[TOPI] Fix batch_matmul tensorcore legalize for transpose_b = False c…

7fd0cdb

…ase (apache#13618) * fixed tensor core batch_matmul legalize for transpose_b = False case * add test * clean up

[Frontend] [ONNX] Support sequence_lens of GRU (apache#13587)

06be0b3

[Frontend] [ONNX] Support sequence_lens of GRU. Support convert sequence_lens input of GRU.

[TVMScript] Fix print round-tripable multi thread env binding (apache…

ce97138

…#13622) * Fix print round-tripable multi thread env binding * add unittest

[TOPI][Hexagon] Implement global_avg_pool2d for hexagon (apache#13614)

cdb4eea

* [TOPI][Hexagon] Implement global_avg_pool2d for hexagon * Fix name * Fix lint issues * Use get_hexagon_target()

Add check for non-contiguous memory access when lowering to async dma… (

7674ea8

apache#13613) * Add check for non-contiguous memory access when lowering to async dma copies. * lint * lint and nits * lint

[MetaSchedule] Fix tensorcore winograd task extraction (apache#13625)

37f6aa0

* [MetaSchedule] Fix tensorcore winograd task extraction * add test * fixed target

[Arith] Allow const folding on fp16 involving one and zero (apache#13631

cded048

)

[Hexagon][runtime] Make HexagonThreadManager::CheckSemaphore thread s…

c932777

…afe (apache#13609) Protect CheckSemaphore with mutex. Ensure that only one thread can add a semaphore if it doesn't already exist.

[TVMScript] Remove obsolete modules (apache#13638)

ab28afb

Removing some minor code path that is not used any longer.

[CI] Fix android build by constraining numpy version (apache#13648)

26a205c

Temporarily constrain the version of numpy to workaround the deprecated value used in mxnet. See apache#13647. Change-Id: Ib271c223447c76b855fe35cc8a1e77411a3fa441

mehrdadh and others added 29 commits February 1, 2023 11:30

[microTVM] Clean-up test_crt.py and add to pylint (apache#13886)

2ced7a7

normalize test_crt.py temporary directory usage remove test_crt.py unused code and variables fix test_crt.py to pass pylint and add to pylint script

[VM][DMLC] Lower memory usage when loading and dumping weights (apach…

9008ec2

…e#13877) * initial commit * update additional use cases * typo * asf header, summary * clean up * lint * move code to src/runtime/file_utils.h * file utils is cool

[Torch] Fix advanced indexing with NoneType index arguments (apache#1…

7aecc1a

…3826) [Torch] Fix advanced indexing with NoneType index

[tir] fix buffer_decl buffer allocation (apache#13906)

a89ff3e

- Fix a bug where `buffer_decl`, combined with certain usage patterns of the resulting buffer, cause an TVM-internal assert failure during TIR-compilation.

[Doc] fix doc for tvm.te.const() (apache#13904)

666006e

fix doc for tvm.te.const()

Fix numpy version constraint (apache#13912)

e34506c

Issue apache#13911 reported that newer versions of setuptools crash with the version constraint "<=1.23.*", this commit implements the suggested fix by using "<=1.23".

[TVMScript,Fix] Fix findsource when classes are indented (apache#13924)

5456fae

Leaving class definitions was not correctly handled when recreating scoping information. The fix correctly pops scope whenever the indentation level becomes less than the current scope.

[CI] Include static builds of the runtime as part of CI

2252b32

gigiblender force-pushed the build-static-runtime branch from 4f4c7d2 to 2252b32 Compare February 7, 2023 12:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] Include static builds of the runtime as part of CI #3

[CI] Include static builds of the runtime as part of CI #3

gigiblender commented Dec 14, 2022

[CI] Include static builds of the runtime as part of CI #3

Are you sure you want to change the base?

[CI] Include static builds of the runtime as part of CI #3

Conversation

gigiblender commented Dec 14, 2022