Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] Include static builds of the runtime as part of CI #3

Open
wants to merge 231 commits into
base: main
Choose a base branch
from

Conversation

gigiblender
Copy link
Owner

No description provided.

masahi and others added 5 commits December 13, 2022 20:10
… buffer (apache#13605)

* Fix PlanAndUpdateBufferAllocationLocation not visiting constant buffer

* add comment
…ache#13414)

Enable depthwise conv2d NHWC with HWIO kernel layout.  The default kernel layout is HWOI, matched to previous behavior.
…che#13602)

* Add support for SequenceAt and SplitToSequence to onnx importer

* Formatting

* Change keepdims comparison

* Only unify non-tuples in If
…#13606)

* introduce LowerToPrimFunc to lower Relay func to TIR prim func

* add doc

* expose to python

* adding test

* another minor doc update

* Verify that the input is a primitive function
…CopyConstants scheduler (apache#13588)

In Ethos-U, CopyConstants scheduler currently copies weights for all operators. But in Vela, there are a number of scenarios where the weights are not buffered in SRAM, and FullyConnected case is one of them.
mehrdadh and others added 24 commits December 14, 2022 09:25
Pass `std::nullopt` to initialization of `PassBuilder` for `PGOOptions`.
LLVM is moving away from its own `Optional` type to `std::optional`.
…13616)

default_rng was introduced in numpy 1.19, which is not present
even in Ubuntu 20.04 (it comes with 1.17.4).
…abase (apache#13611)

[Metaschedule] Align get_top_k logic in MemoryDatabase and JSONDatabase
…ase (apache#13618)

* fixed tensor core batch_matmul legalize for transpose_b = False case

* add test

* clean up
…che#13615)

In the Relay Matmul shape relation, we are a little over enthusiastic about unifying dynamic shapes. If one of the shapes is static, it does not need to be unified. This change only rewrites dynamic shapes to required static constraints.

* Remove overwriting of matmul shapes when they are static

* Simplify nesting

* Add shape check to dense tests.
[Frontend] [ONNX] Support sequence_lens of GRU.

Support convert sequence_lens input of GRU.
* [ETHOSN] Add support for experimental compiler option

The support library currently supports enabling the experimental
cascading compiler option via an environment variable
`FORCE_EXPERIMENTAL_COMPILER`. This commit exposes the ability to
enable this option through TVMC.
…#13622)

* Fix print round-tripable multi thread env binding

* add unittest
* [TOPI][Hexagon] Implement global_avg_pool2d for hexagon

* Fix name

* Fix lint issues

* Use get_hexagon_target()
apache#13613)

* Add check for non-contiguous memory access when lowering to async dma copies.

* lint

* lint and nits

* lint
* [MetaSchedule] Fix tensorcore winograd task extraction

* add test

* fixed target
Please join us to welcome @multiverstack-intellif as a new reviewer to
TVM. Min contributed key features in TIR scheduling, specifically,
cache_read/write that are aware of cache location.
These are huge improvements that are technically profound and helpful
to the overall TVM stack. Therefore, it would be good to get more
opportunities for him to participate more deeply in the community.

- [Commits History](https://github.com/apache/tvm/commits?author=multiverstack-intellif)
- [Code Review](https://github.com/apache/tvm/pulls?q=reviewed-by:multiverstack-intellif)
This PR address fixes for UMA target registration.
* Fix the doc issue apache#13304 
* Continues stalled PR apache#12731 

Changes:
* Incorporates all proposed fixes from mentioned [PR apache#12731](apache#12731)
* Address test case concerns and discussions from [PR apache#12731](apache#12731)
* **NEW:** Already exiting target cannot be created, explicit error on this.
* **NEW:** Attributes having special/reserved scope cannot be created explicitly.

It also address proper test cases for all the above.
…afe (apache#13609)

Protect CheckSemaphore with mutex. Ensure that only one thread can add a semaphore if it doesn't already exist.
…rsions above 3.8 too (apache#13635)

Currently, `relay.testing.py_converter` is checking for using _exactly_ Python 3.8 in order to use certain updated signatures in the `ast` library. However, those signatures are also correct for versions _above_ 3.8. This PR changes the bounds checks so that the converter will work above 3.8.
…3623)

* add set_output and test for set_output_zero_copy in python

* clean up

* clean up test

* test finished

* remove set output

* remove setoutput from header

* use zero copy for params

* fix typo

* address comments

* address comments

* add second test for set_input params

* add requires_torch

* add requires torch

* remove pytest

* add error handling for c graph executor

* better handling
This PR tries to fix the crash of parser when the old value of a var is an array but the new value is not. For example:

```python
from tvm.script import tir as T
def func_wrapper(shape, dtype):
    @T.prim_func
    def test_case():
        a = T.alloc_buffer(shape, dtype=dtype)
    
    return test_case


if __name__ == "__main__":
    a = np.zeros((10, 10), dtype="int8")
    print(func_wrapper((256, 256), dtype="int8").script())
```

In the above code, there are two assignment to var 'a'. In the global scope, its value is a numpy array. But it is a Buffer in the prim function. There is a table named 'name2value' to track the value of vars like 'a' here.
When the parser wants to update its value, it will compare the value between the new and the old assignment. Here the problem comes. When we use '==' to compare an array with a value, the result is an array too, which can not be used as a condition of a if stmt directly. So, the code above will emit an error:

```shell
error: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
 --> /workspace/code_newest/tvm/private_test/test_meta_programming.py:16:9
    |  
 16 |          a = T.alloc_buffer(shape, dtype=dtype)
    |          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
```

This PR fixes this by change "==" to "is".

Co-authored-by: lightzhan-intellif <[email protected]>
…pache#13325)

* [TRANSFORM] Fix virtual device anaotation issue with BYOC subgraphs

Heterogeneous module partitioned by BYOC has functions nodes without any VirtualDevice
definition (having FullyUnconstrained device). Ignoring the device here causes
expr_virtual_devices_ being empty when PopVirtualDevice is called assuming above PushVirtualDevice
is succeeded. PushVirtualDevice and PopVirtualDevice occurs as pairs across function body,
hence it's better to insert the The Virtual Device for Uncontrained and Pop it subsequently.

* * Test case

Co-authored-by: Siva Rama Krishna Reddy B <[email protected]>
This PR try to fix the following bug:

```python
def test_var_capturing_order():
    b = 2

    @T.prim_func
    def test_case():
        k: T.int32 = b


if __name__ == "__main__":
    b = 1
```

In the prim func `test_case`, the vaule of b should be 2, rather than 1. The parser wrongly uses global vars to shadow the value of nonlocal vars, which should be reversed.

Co-authored-by: lightzhan-intellif <[email protected]>
Removing some minor code path that is not used any longer.
Temporarily constrain the version of numpy to workaround the
deprecated value used in mxnet. See apache#13647.

Change-Id: Ib271c223447c76b855fe35cc8a1e77411a3fa441
-New process memory allocator is used to create
 buffers and networks.
-Support for 22.08 stack has been kept intact
 in the sources and tests until new docker
 image is built and used.
-Tests were modified to meet limitations imposed
 on input zero point and kernel size by NPU software.
-Removed defining ETHON_API_VERSION from cmake infra.
mehrdadh and others added 29 commits February 1, 2023 11:30
This commit updates the microTVM code to use Zephyr 3.2 and SDK 0.15.2.
                                                                       
As a result of this change, there are a few other changes that are     
included:           

- A launch script was added for Zephyr and Arduino template project to 
run the Project API server from a different Python ENV than TVM Python 
ENV;

- For Zephyr, the launch script uses global python3.8 which is where   
west is registered. However, for Arduino it uses a separate virtual ENV
with python3 version that exists in the host;
                                                                       
- tests/micro/project_api/test_arduino_microtvm_api_server.py was      
removed since these tests were using Arduino microTVM API server by    
importing it from TVM. We no longer support Arduino/Zephyr dependencies
in TVM testing python ENV;
                                                                       
- Disables a demo and test due to a CMSIS-NN bug:                      
[Bug] CMSIS-NN BYOC fails with Zephyr 3.2 apache#13856;                      
                
There will be a follow up work to move Zephyr to a completely separate 
virtual ENV as it was done in this commit for Arduino in the launch
script.
…e#13891)

This PR adds Tensor Core intrinsics with `shared.dyn` scope and changes the default rules to use `shared.dyn`.

Here are the performance improvement of GEMM 1024x1024x1024 on my device (RTX-3080)

|                     |      Use `shared`         |  Use `shared.dyn`.   | Speedup  |
| ----------- | --------------------- | -------------------- | ---------- |
fp 16-16-16 | 66399.8766 GFLOPs | 71778.3808 GFLOPs |      8.1%    |
fp 16-16-32 | 44292.5893 GFLOPs | 49070.2514 GFLOPS |  10.8%    |

cc @vinx13 @junrushao @masahi
normalize test_crt.py temporary directory usage
remove test_crt.py unused code and variables
fix test_crt.py to pass pylint and add to pylint script
Dear community:

Please join us to welcome Wrongtest (@wrongtest-intellif) as a new PMC member.

Wrongtest has been actively improving and contributing to the TIR Schedule and arithmetic analysis implementation. He’s been actively involved in several components of the TVM top to bottom,
including but not limited to,

- Relay
- TOPI
- MetaSchedule
- TIR
- TVMScript
- Arithmetic analysis in TIR
- Codegen for Cuda and LLVM
- Runtime

He has a deep understanding of the whole part of one of the compilation flows in the TVM(Relay-TIR-Codegen). He focuses on applying TVM to their NPU backend, contributing features back to the community, and cultivating more active community contributors in his company.

In addition, He has been very actively participating in the community, discussion and sharing his ideas in the forum. He has been also actively managing the PRs and issues.

* [Commits History](https://github.com/apache/tvm/commits?author=wrongtest-intellif)
* [Code Review](https://github.com/apache/tvm/pulls?q=+reviewed-by%3Awrongtest-intellif+)
* [Community Forum Summary](https://discuss.tvm.apache.org/u/wrongtest/summary)
* feat: combine cutlass and ansor

* use sm80 and disable run_benchmark

* fix lint error

* use tempfile; fix dangerous default value

* merge cutlass_ansor test into test_cutlass.py

* fix lint

---------

Co-authored-by: hanqingchang <[email protected]>
This PR extends the value accepted by USE_LIBBACKTRACE cmake variable. It adds ability to search pre-compile libbacktrace from system path or user-provided path. The behavior of this variable will be:
- OFF: Don't use libbacktrace
- ON: Find libbacktrace from system paths. Fail the build generation if libbacktrace is not found.
- COMPILE: Build and link to libbacktrace from 3rdparty/libbacktrace. 
- \<PATH\>: Looking for the libbacktrace header and static lib from a user-provided path. Fail the build generation if libbacktrace is not found.
- AUTO: 
  - Find libbacktrace from system paths.
  - If not found, fallback to COMPILE on Linux or MacOS, fallback to OFF on Windows or other platforms.
…apache#13849)

* Assertion failed during tuning

* Cleanup

* Do not commit

* Do not commit

* Undo fix + provide test for multithread random filling

* Random fill test with fix enabled

* Isolate the effect of this test on the other tests

* Correct the typo in the function name

* Import threading + lint
…e#13877)

* initial commit

* update additional use cases

* typo

* asf header, summary

* clean up

* lint

* move code to src/runtime/file_utils.h

* file utils is cool
…3826)

[Torch] Fix advanced indexing with NoneType index
* [Hexagon][QNN] Improve performance of qnn.mul

This commit imroves performance of qnn.mul operation without QNN
canonicalization.

* [QNN][Relay][Topi] Add qnn.dense with weight layout

This commit adds new Relay operation "qnn.contrib_dense_pack" that supports
different weights layout (nn.dense and qnn.dense do not support this
attribute). This new operation is full analog of "nn.contrib_dense_pack"
operation but in QNN space.
…pache#13874)

* [TOPHUB] use keys as a keyword for searching of existing statistics

In case of ARM we might not specify -device and in this case llvm will
be used while even in this case we can determin proper filename with
stat since keys have architecture defined. The same situatin must with
with x86

* Add test on target not having arm_cpu device

* minor fix, add comment

* Fix pylint

* Fix comment
* [OpenCL] Implement save/load pre-compiled programs

Using pre-compiled programs might significantly improve inference time
of the first run.

- Added methods `SupportPreCompiledPrograms` which reports if the module
  supports using pre-compiled programs.
- Method `GetPreCompiledPrograms` returns string with bytes of
  pre-compiled programs.
- Method `SetPreCompiledPrograms` allows user to pass pre-compiled
  programs to the module.

* Fix lint

* Apply comment: PackedFunc is used

* Fix build

* Fix CI and rename functions

* Apply comments
- Fix a bug where `buffer_decl`, combined with certain
  usage patterns of the resulting buffer, cause an TVM-internal
  assert failure during TIR-compilation.
* [CLML][CODEGEN] CLML native codegen utility

This util generates native CLML code given a DNN model.
It does import via tvmc, extracts clml_modules, get the json source and
finally generates clml_models.cc that holds source for various sub graphs.
cpp_clml tool has additional infrastructure to compile it as a standalong
binary that runs these models.

This PR adds symbol name to the generates json grpah.
Also, extends const_loader interface to get constant params.

* * review comments

* * review

* * review
…d.py script (apache#13905)

As the PR title states, the purpose of this PR is to pass the path parameter given to cmake_build to the task_build.py script. with this PR we will be able to control compilation in different directories.
* add base class for bitwise operations. BitwiseAnd, BitwiseNot, BitwiseOr and BitwiseXor were implemented

* add test for BitwiseAnd, BitwiseNot, BitwiseOr, BitwiseXor operations to ONNX front-end

* add test of BitShift for ONNX front-end

* fix dtype for test

* skip test due to old version of ORT

---------

Co-authored-by: Valery Chernov <[email protected]>
A simple tvmc tune command currently results in a huge wall of warnings
about target_host parameter being deprecated, even when the user hasn't
provided a target-host cmd line argument.

We can prevent that happening from just not providing the default
target-host to tvmc. Also, ensure that when the user does provide
target-host, we print the warning once, not 500 times.
)

This PR updates the order of arithmetic analyzer simplification, by
adding a stage of canonical simplification at the very beginning so
that every simplification always starts with a canonical round. This
is because the rewrite simplification may destroy some PrimExpr property
that the canonical simplification can make use of. Therefore, adding
the canonical one in the front can maximize the use of canonical
simplification.
Issue apache#13911 reported that newer versions of setuptools
crash with the version constraint "<=1.23.*", this commit
implements the suggested fix by using "<=1.23".
This PR is the bug fix reported in apache#13892. Initially, we mix the logic of `LetStmt` docsifying method with and without concise scoping. For example, in
```python
x = T.var("int32")
with T.let(x, 0):
```
`x` in the `LetStmt` works as a right value, while in
```python
x: T.int32 = 0
```
`x` in the `LetStmt` works as a left value as result.
Our old logic mixed them together to generate the wrong code for the first case.
Meanwhile, during the fix, we found another bug in concise scoping check. For example, we have
```python
x = T.var("int32")
y = T.var("int32")
with T.let(x, y):
  with T.let(y, 0):
```
here we should not output
```python
x = T.var("int32")
y = T.var("int32")
with T.let(x, y):
  y: int32 = 0
```
becase this will define a new `y_1: int32 = 0` indeed, due the the variable shadowing logic of the parser, which is different from the `y` we define and refer to.
Our concise scoping `v: ... = ...` should launch if and only if the `v` is never defined before.
Otherwise, we use `with T.let(v, ...):` instead.
…ter (apache#13859)

This PR refactors the output of `assert_structural_equal`. Different from the directly printing mismatching nodes, in the old version, the improved one will print the whole scripts, with mismatching nodes underlined. And we print the `ObjectPath` to the mismatching nodes for further better debug. For example, we have following functions

```python
@T.prim_func
def func1(a: T.handle, b: T.handle):
  A = T.match_buffer(a, (128, 128))
  B = T.match_buffer(b, (128, 128))

@T.prim_func
def func2(a: T.handle, b: T.handle):
  A = T.match_buffer(a, (128, 128))
  B = T.match_buffer(b, (128, 256))
```

the log of `assert_structural_equal(func1, func2)` will be like

```python
ValueError: StructuralEqual check failed, caused by lhs at <root>.buffer_map[b].shape[1].value:
# from tvm.script import tir as T

@T.prim_func
def main(a: T.handle, b: T.handle):
  A = T.match_buffer(a, (128, 128))
  B = T.match_buffer(b, (128, 128))
                              ^^^
  T.evaluate(0)
and rhs at <root>.buffer_map[b].shape[1].value:
# from tvm.script import tir as T

@T.prim_func
def main(a: T.handle, b: T.handle):
  A = T.match_buffer(a, (128, 128))
  B = T.match_buffer(b, (128, 256))
                              ^^^
  T.evaluate(0)
```

instead of

```python
ValueError: StructuralEqual check failed, caused by lhs:
128
and rhs:
256
```

which is not readable sometimes.
Rescale parameters have been added for binary elementwise operation in accordance with the Vela API (rescale field in NpuElementWiseOperation https://review.mlplatform.org/plugins/gitiles/ml/ethos-u/ethos-u-vela/+/refs/tags/3.5.0/ethosu/vela/api.py#381). This PR is preparation for implementation of softmax operation.
This patch undoes the change that was put in place to prevent the build and installation of NNPACK from failing due to a renaming of the default branch to main by the NNPACK external dependency cpuinfo.

See apache#13871

The issue has been fixed at the source by PR apache#214 which is now merged in to NNPACK, so the change to `ubuntu_install_nnpack.sh` is no longer required:

Maratyszcza/NNPACK#214
Leaving class definitions was not correctly handled when recreating
scoping information. The fix correctly pops scope whenever the
indentation level becomes less than the current scope.
The rules that rewrite `min(floordiv(x + (A-1), A) * A, max(x, A))`
and `min(truncdiv(x + (A-1), A) * A, max(x, A))` into `max(x, A)` did
not have sufficiently tight bounds.  The `truncdiv` rule required that
`x >= 0`, while the `floordiv` rule had no requirement on `x`.  In
both cases, the simplification was incorrect when `x==0`, as it would
result in a rewrite from `min(0, max(0, A))` into `max(0, A)`.

This commit updates the rules to require that `x >= 0` for each of
these rules.
On the platforms with several OpenCL platforms (e.g. Intel CPU and
NVidia GPU) it was possible to use OpenCL device only from one
platform. And in case when Intel was the first in the platforms list
than it wasn't possible to run model on NVidia GPU.

In this PR the init function was modified and now it is possible to use
OpenCL devices from different platforms. In case when there are several
GPU accelerators then it is possible to select one of them. You can use
device id to select GPU device. On the code below the device names of
two OpenCL devices are printed:
```python
import tvm

print("opencl 0: ", tvm.opencl(0).device_name)
print("opencl 1: ", tvm.opencl(1).device_name)
```

In case then the machine doesn't contain any GPUs then we will try to
use CPU if OpenCL runtime is available.
Certain tests that make use of `pytest_wrapper.py`, such as those
triggered by `task_python_integration.sh` will fail when a "PLATFORM"
environment variable is not set within the Docker container.

When using `build.sh` to both create a container, and run a command to
execute one of the tests that requires a "PLATFORM", an error will occur
due to the missing environment variable.

This patch is necessary to add support for this environment variable to
`build.sh` and prevent such errors.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.