upate #8

jiangjiajun · 2021-08-13T09:56:30Z

Thanks for contributing to TVM! Please refer to guideline https://tvm.apache.org/docs/contribute/ for useful information and tips. After the pull request is submitted, please request code reviews from Reviewers by @ them in the pull request thread.

* [Refactor] Avoid Override Generic Op Strategy in "hls.py" * Fix The Broken CI Test Cases

Set the number of cores for scripts and builds that run inside the RVM based on the specified number of cores for the VM. Currently Vagrant doesn't set env. variable TVM_CI_NUM_CORES with the number of cores available in the VM created by Vagrant, as a consequence the scripts and builds (like the ones used to build TVM and QEMU) that run inside the VM after it is created will use the default number of only 2 CPUs, so not using the full CPU resources available in the VM, in case there are more than 2 cores available. This commit sets TVM_CI_NUM_CORES equal to the number of cores available in the VM created by Vagrant so the builds (which use that environment variable to find out the number of CPUs that must be used for the builds) can use all the CPUs available, speeding up the builds. Signed-off-by: Gustavo Romero <[email protected]>

- Move "from_device" argument definition from "vulkan" target to all targets. - Add device querying to TargetInternal::FromConfig, using "from_device" argument. If present, these have lower priority than explicitly-specified attributes, but higher priority than the default attribute values. - Add default no-op DeviceAPI::GetTargetProperty. Co-authored-by: Eric Lunderberg <[email protected]>

* [Runtime] Add graph_executor get_input_index API. In graph_executor use case, user can use set_input with input index to set input parameter, but there is no straight forward way to get correct index number with input name, here provide get_input_index API to do such work. * Update python/tvm/contrib/graph_executor.py Co-authored-by: Cody Yu <[email protected]> * Update python/tvm/contrib/graph_executor.py Co-authored-by: Cody Yu <[email protected]> * Update src/runtime/graph_executor/graph_executor.cc Co-authored-by: Cody Yu <[email protected]> * Update python/tvm/contrib/graph_executor.py Co-authored-by: Cody Yu <[email protected]> Co-authored-by: Cody Yu <[email protected]>

* [Target] Allow for spaces in target attributes. Some target parameters, such as the device_name on vulkan, have spaces in them. This prevented round-trips between string and Target objects, which can occur in some cases. * [Vulkan] Fixed "device_name" property querying. * [Target] Switched from escaped spaces to quoted spaces. Instead of -attr=value\ with\ spaces, will instead be written as -attr='value with spaces'. Co-authored-by: Eric Lunderberg <[email protected]>

* [AMP] Do not allow fp16 cast on arange inputs * add test * Add comment explaining the issue with fp16 "end"

Platform boards passed to base-box-tool.py need to be a subset of platform boards support by 'tests/micro/zephyr --microtvm-platforms='. Currently base-box-tool.py only accepts the 'stm32f746xx' ST board, which is not supported by 'tests/micro/zephyr --microtvm-platforms='. As a consequence if one passes '--microtvm-platform=stm32f746xx' to base-box-tool.py the 'tests/micro/zephyr' test will fail. That commmit fixes it by adding two new platforms to base-box-tool ('stm32f746xx_nucleo' and 'stm32f746xx_disco') which are supported by tests/micro/zephyr and by removing the nonexistent 'stm32f746xx' platform. The new platform boards are quite similar and share the same USB VID and PID. Signed-off-by: Gustavo Romero <[email protected]>

- Pass parameters through TVMRetValue as std::string instead of runtime::String - Remove escaping of spaces inside quotes for target attributes. Updated unit test to verify round-trip behavior. - Added missing "device_type" query for Vulkan. Co-authored-by: Eric Lunderberg <[email protected]>

We kill the rpc server in the del function. When a server co-exist with remote resources in the same function scope, the destruction order is not determined. This can cause server to be destructed before the actual remote array. As a side effect, it can cause sometime test to timeout due to waiting on the socket.

* Fix support for linking to only libtvm_runtime also ensures that the ResNet example uses the new support. * Fix build.rs to rebuild if the Python script changes Co-authored-by: Jared Roesch <[email protected]>

* fix * lint

#8660) Co-authored-by: Eric Lunderberg <[email protected]>

* Add transpose support for tensorrt batch_matmul * Address PR comment * Refactor to add ONNX_DEFAULT_CONFIGS

* fix * fix * lint

* [TENSORIR] Add `from_legacy_te_schdule` attr to TE PrimFuncs The `from_legacy_te_schedule` marks PrimFuncs created from TE scheduling. Passes that only operate on TE scheduling check this attrs and no op if it is not found. If `from_legacy_te_schedule` is false or not set, then it is assumed that the PrimFunc is from TensorIR. Passes specific to TensorIR now check for the absence of this attr. * formatting * enable passes regardless of te or not

* Move flake8 to ci_lint This fixes the scenario where you lint with ci_lint but it can still fail in PR due to flake8 being injected only into the Mac build. * Disable flake8 until the docker changes have landed

* Add linear congruential engine. * Fix typo. * Minor fix. * Fix comments and intros. * Change to unsigned. * Minor comment fix. * Fix unsigned rand state to signed.

* fuse dence sum * remove excess copying * dev LSTM in ONNX * alternative implementation of LSTM in onnx frontend. It is quicker than current one without tuning * LSTM_dev2 was implemented in onnx frontend * LSTM dev in pytorch frontend * LSTM cell implementation was transferred to common place. Unneccessary code was removed * lint fixes * Weights permutation for LSTM layer in onnx frontend * LSTM cell description was added * arguments and values were renamed. descriptions of some methods were added * LSTM output shape and actvations input format were fixed in onnx frontend * empty. tvm-ci test * unbind method was transferred from onnx frontend to common.py * unbind method was transferred from pytorch frontend to common.py * lstm cell was transferred from op/layers.py to frontend/common.py * clean up weight dictionary initialization * fix pytorch frontend wrapper over unbind method * minor fix of comments * empty. tvm-ci test restart * empty. tvm-ci test restart Co-authored-by: Valery Chernov <[email protected]>

…d target (#8542) * [Onnx][UnitTests] Excluded additional onnx tests - The onnx tests `test_basic_convinteger`, `test_convinteger_with_padding`, `test_range_float_type_positive_delta_expanded`, and `test_range_int32_type_positive_delta_expanded` don't run correctly on CUDA targets, so they are added to the exclusion. - Parametrized over the relative directory name, rather than the full directory name. This improves readability of the pytest output, and keeps the same parametrized test name across different python version. - Changed the target-specific skips to check the target kind, rather than the full target string. * [UnitTests] Apply correct requires_gpu() pytest marks for parametrized target Prevoiusly, the addition of tvm.testing._target_to_requirement pytest marks was handled by the parametrize_targets function. The _auto_parametrize_target function assumed that a unit test that was already parametrized had all markings needed. If a unit test was explicitly parametrized using @pytest.mark.parametrize, these marks would be missing. In most cases, this explicit use of @pytest.mark.parametrize('target', ...) should be avoided, but has value in the case of marking with multiple parameters with @pytest.mark.parametrize('target,other', ...). This use case isn't yet supported by the tvm.testing.parameters function. Therefore, if this occurs, detect it and add the appropriate marks. * [UnitTest] Bugfix, applying requires_* markers to parametrized targets. Initial implementation did work correctly with @tvm.testing.parametrize_targets. Also, went through all cases where "target" is used to parametrize on something other than a target string, and renamed. * [Onnx] Switched from using pytest.skip to tvm.testing.known_failing_targets After merging of the `tvm.testing.parametrize_targets` and `tvm.testing._auto_parametrize_target` code paths, `known_failing_targets` can be used in both cases. * [Testing] Enable `Target` object as argument to _target_to_requirement Previously, tvm.testing._target_to_requirement required the argument to be a string. This commit allows it to be either a string or a `tvm.target.Target`. * [Testing] Auto-target parametrization, handle pytest ParameterSet If the unit test has already been parametrized with pytest.params to add parameter-specific marks, respect those existing marks. This can happen in some cases in the CI, uncertain yet what is causing them. Maybe pytest-xdist related, but there's some difficulty in reproducing it locally. Co-authored-by: Eric Lunderberg <[email protected]>

…rgv (#8671)

… shared queue (#8658)

* add hex indicator to message * add pytest skip * trigger * trigger

* conv2d working, fixing conv2d_depthwise * Depthwise conv2d working. * Make convinteger work on cuda. * Simplify code and add tests. * Formatting. * Fixed fallback broadcasting. * Fix fallback broadcasting. * Formatting. * Fix lint * Merge with new test parameterization.

…#8529) * [Topi][Testing] Minor cleanup for python reference implementations - Use input dtype for dilate/conv2d accumulate in python impl. Previously, the python implementations of dilation and conv2d would use numpy default dtype in some cases, rather than the input data's dtype. - Added fallback for datatypes not supported by scipy.signal.convolve2d (e.g. float16). - Refactored to avoid duplication, use common get_pad_tuple functionality. * [Topi][UnitTests] Added float16 tests to test_topi_dense.py * [Topi][UnitTests] Added float16 to test_topi_conv2d_nchw.py * [Topi][Float16] Added float16 tests for depthwise conv2d. * [UnitTests] Explicitly set seed for float16 tests Intended to avoid flaky test failures later due to rounding errors. * [UnitTests] Fixed a few failing unit tests. - ref_data must be a test fixture, not acquired through request.getfixturevalue, in order to have the random_seed be known. - dilate_python's return value didn't follow `out_dtype`. - The test_topi_conv3d tests had the reference results computed in float64, due to dilate_python() not respecting the input data type. With the correct dtype, the tolerances needed to be slightly widened. Co-authored-by: Eric Lunderberg <[email protected]>

* Add Arduino CLI support to ci-qemu * Install latest version of Arduino SDK * Remove unnecessary --fix-missing * Tweak to clarify what URLs go with what * Retrigger CI * Temporarily replace buggy Spresense core

…ut (#8677) * add timeout * rename timeout and change timeout to a reasonable value * fix tests after project api merge * retrigger because of flaktest

Co-authored-by: Valery Chernov <[email protected]>

* Fix Rust CI * Turn Rust CI back on

* [Docs] Added documentation on pytest target parametrization. Follow-up from #8542, to document existing features. * [Docs] Updated pytest parametrization documentation following review Co-authored-by: Eric Lunderberg <[email protected]>

* Fix obvious memory leak in function.rs * Update object pointer

GPU memory is only released once the PackedFunc for evaling the model is gced by Python. In CI we're noticing intermittent 'CUDA: Out of memory' failures while processing the tutorials, and tracing showed there was no gc happening between items. Not confident this will solve the problem but worth a try.

* refactor host to qemu * remove unused variables * remove skip-build arg * fix microtvm test script

* [Docker] Refactor/clean-up of docker/bash.sh - Added detailed help message, displayed using `-h` or `--help`. - Optional flags handled using `getopt`, can now occur in any order. - `--mount` flag may occur more than once. - Switched from short arguments to docker-run to long arguments (e.g. `--volume` instead of `-v`). Short arguments are good shortcuts for interactive work, but can be more difficult to read in longer scripts. - Mount the `.tvm_test_data` folder, to avoid re-downloading test data already available in the host environment. * [Docker] docker/bash.sh CI fix Dash-prefixed arguments as part of the command now require prefixing with -- to separate them from arguments intended for docker/bash.sh * [Docker] docker/bash.sh, consistent quoting * [Docker] Added --repo-mount-point for docker/bash.sh * [Docker] Updated command-line parsing of docker/bash.sh - Maintained previous behavior, any unrecognized flags after the docker/bash.sh are part of the command, no -- is needed. (e.g. docker/bash.sh ci_gpu make -j2) - Reverted changes to Jenskinsfile to add a --, no longer needed. * [Docker] Fixed multi-argument commands * [Docker] docker/bash.sh check permissions before mounting ~/.tvm_test_data * [Docker] Consistent workplace directory in docker/bash.sh for Jenkins Some locations in the CI perform build commands outside of the build steps (e.g. tests/scripts/task_ci_setup.sh#L38), and cmake doesn't like it if the build directory changes. These should probably be moved into the build steps of the CI, and be packed in tvm_multilib in the Jenkinsfile, but for the meantime maintaining a consistent /workspace directory on all CI nodes allows cmake to run. * [Docker] Updated bash.sh for MacOS compatibility MacOS has an older version of bash that handles arrays slightly differently. All instances of array expansion `"${ARRAY[@]}"` should instead be written as `${ARRAY[@]+"${ARRAY[@]}"}`. Otherwise, `set -u` will erroneously complain about an undefined variable. See https://stackoverflow.com/a/61551944 for details. Even though this is an older version of bash (observed in version 3.2.57), this is the last major version available under GPLv2 and is therefore the default version on MacOSX. At some point, the `docker/bash.sh` could be migrated to python for ease of maintenance/testing.

* [Docs][UnitTest] Updated target parametrization documentation The intended audience are developers writing unit tests, or debugging unit tests that have failed. Therefore, moving the recommended style to the top of the section, and the implementation details to the bottom. * Documentation updates as recommended by tkonolige

* Refactor AOT Test Utils parameters into object `compile_and_run` was getting quite complicated to understand as well as being mostly duplicated by `comile_and_run_multiple_models`. This patch pulls out some common parameters into a data class `AOTTestNetwork` which makes it clearer what each parameter is doing and provides documentation. * Rename Network -> Model and sizebytes -> size_bytes

* Convert AOT to TECompiler This removes the dependency on "compile_engine.h" from aot_executor_codegen.cc. This required a few changes to how AOT was operating: * AOT run_model is now based on the post lowering main_module * AOTOnDemandAllocator is ran twice to ensure SIDs are updated post-lowering * Moved to using tec::UpdateFunctionMetadata Tests are passing, but would appreciate other validation 😸 * Clarify reasoning behind replanning memory later * Use main_func_info rather than bespoke logic in AOT This moves from using the bespoke AOT UpdateMainWorkspaceSize to the LoweredModule main_func_info property to unify with Graph executor codegen.

* clean up typerel * add layout transform when input is 3D * add test * update doc to clarify that only 2D input data is supported * add weight_layout attribute in dense * remove explicit layout transform from dense_alter_op.py * Add DensePackInferCorrectLayout to insert layout transform * relax type rel * revert type rel relax and add check on dim * introduce DensePackAttrs to avoid breaking dense op * try fixing arm compute lib test * Update tests/python/contrib/test_arm_compute_lib/test_dense.py Co-authored-by: lhutton1 <[email protected]> * formatting Co-authored-by: lhutton1 <[email protected]>

Co-authored-by: Wuwei Lin <[email protected]>

* [UnitTest] Updated tolerances to avoid flaky unit test. The result was correct, but the atol was just small enough to trigger a CI error for a value that was close to zero in an unrelated PR at #8670. https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/PR-8670/16/pipeline/#step-236-log-1703 * Also updated 32-bit version of test_conv2d_nchw

* alternative chunk op was implemented in pytorch frontend. aten::unsafe_chunk was added to op map in pytorch frontend * chunk was replaced by new one in pytorch frontend. it is faster in 2.5 times Co-authored-by: Valery Chernov <[email protected]>

This PR is part of the TensorIR upstreaming effort (#7527), which adds the one schedule primitive storage_align. Co-authored-by: Siyuan Feng <[email protected]> Co-authored-by: Bohan Hou <[email protected]> Co-authored-by: Ruihang Lai <[email protected]> Co-authored-by: Hongyi Jin <[email protected]> Co-authored-by: Junru Shao <[email protected]>

* WIP support per-channel quantization * more WIP * More WIP * fix issue with per-channel bias_add * Fix fake quantize tests (#4) * Fixed fake quantize issues. * Formatting. * Cleanup unused imports * Fix real int8 tests. * Add Relu * One more little one (#5) * Fixed fake quantize issues. * Formatting. * Cleanup unused imports * Fix real int8 tests. * Fix requantize shape bug. * Non-working Per-channel Dense * Fix legalization for non spatial operators. (#6) * Fix legalization for non spatial operators. * Fix axis checks for end2end functionality. * fix axis normalization fix lint fix lint again * Per channel fq2i (#8) * WIP support per-channel quantization * more WIP * More WIP * fix issue with per-channel bias_add * Fix fake quantize tests (#4) * Fixed fake quantize issues. * Formatting. * Cleanup unused imports * Fix real int8 tests. * Add Relu * One more little one (#5) * Fixed fake quantize issues. * Formatting. * Cleanup unused imports * Fix real int8 tests. * Fix requantize shape bug. * Non-working Per-channel Dense * Fix legalization for non spatial operators. (#6) * Fix legalization for non spatial operators. * Fix axis checks for end2end functionality. * fix axis normalization fix lint fix lint again * Fix bug in requantize dimension expansion. * Format. Co-authored-by: Josh Fromm <[email protected]> * respond to review comments respond to review comments Co-authored-by: Josh Fromm <[email protected]>

* WIP support per-channel quantization * more WIP * More WIP * fix issue with per-channel bias_add * Fix fake quantize tests (#4) * Fixed fake quantize issues. * Formatting. * Cleanup unused imports * Fix real int8 tests. * Add Relu * One more little one (#5) * Fixed fake quantize issues. * Formatting. * Cleanup unused imports * Fix real int8 tests. * Fix requantize shape bug. * Non-working Per-channel Dense * Fix legalization for non spatial operators. (#6) * Fix legalization for non spatial operators. * Fix axis checks for end2end functionality. * fix axis normalization fix lint fix lint again * Per channel fq2i (#8) * WIP support per-channel quantization * more WIP * More WIP * fix issue with per-channel bias_add * Fix fake quantize tests (#4) * Fixed fake quantize issues. * Formatting. * Cleanup unused imports * Fix real int8 tests. * Add Relu * One more little one (#5) * Fixed fake quantize issues. * Formatting. * Cleanup unused imports * Fix real int8 tests. * Fix requantize shape bug. * Non-working Per-channel Dense * Fix legalization for non spatial operators. (#6) * Fix legalization for non spatial operators. * Fix axis checks for end2end functionality. * fix axis normalization fix lint fix lint again * Fix bug in requantize dimension expansion. * Format. Co-authored-by: Josh Fromm <[email protected]> * respond to review comments * start dtos * wip depth_to_space * dtos ident Co-authored-by: Matthew <[email protected]> Co-authored-by: Josh Fromm <[email protected]>

Matthew Brookhart and others added 30 commits August 3, 2021 09:28

Parametrize ONNX Unit tests (#8621)

5140d90

[Refactor] Avoid Override Generic Op Strategy in "hls.py" (#8614)

4b9d43e

* [Refactor] Avoid Override Generic Op Strategy in "hls.py" * Fix The Broken CI Test Cases

[Relay] Change Default "opt_level" of Sequantial from 2 to 0 (#8634)

b9204cd

[AMP] Disallow fp16 conversion for arange op (#8644)

0ce7f6c

* [AMP] Do not allow fp16 cast on arange inputs * add test * Add comment explaining the issue with fp16 "end"

Fix rust rt link (#8631)

5b9b16c

* Fix support for linking to only libtvm_runtime also ensures that the ResNet example uses the new support. * Fix build.rs to rebuild if the Python script changes Co-authored-by: Jared Roesch <[email protected]>

[Frontend][Pytorch] add suppport for 'aten::upsample_bicubic2d' (#8648)

fe2cdf3

* fix * lint

[Bugfix][Target] Correct passing of target-queried bool/int parameters (

a495f95

#8660) Co-authored-by: Eric Lunderberg <[email protected]>

[TensorRT] Add transpose_a/b for TensorRT batch_matmul (#8607)

26c2a9a

* Add transpose support for tensorrt batch_matmul * Address PR comment * Refactor to add ONNX_DEFAULT_CONFIGS

[Fix][Frontend][TOPI] minor bugs (#8622)

874ea7a

* fix * fix * lint

[AutoScheduler] Fix deserization of workload registry entry (#8662)

cdfae39

Move flake8 to ci_lint (#8652)

69ddb9b

* Move flake8 to ci_lint This fixes the scenario where you lint with ci_lint but it can still fail in PR due to flake8 being injected only into the Mac build. * Disable flake8 until the docker changes have landed

[Support] Linear Congruential Random Engine (#8642)

dc5da05

* Add linear congruential engine. * Fix typo. * Minor fix. * Fix comments and intros. * Change to unsigned. * Minor comment fix. * Fix unsigned rand state to signed.

[VM] Add get_input_index support. (#8661)

e1bb7ac

[VTA] Fix vta rpc server, refactor launch cond to not depend on sys.a…

a0cf2e9

…rgv (#8671)

[FIX] Fix threadpool reset by killing threads before destroying their…

40de9ce

… shared queue (#8658)

[microTVM][Zephyr] Add skip for AOT test (#8628)

338940d

* add hex indicator to message * add pytest skip * trigger * trigger

Allow rust tvm build configuration through cargo features (#8665)

49756a5

Add batch_matmul convertion to FQ2I pass (#8635)

392a757

guberti and others added 27 commits August 9, 2021 13:57

[microTVM] Add Arduino CLI support to ci-qemu (#8504)

39571c1

* Add Arduino CLI support to ci-qemu * Install latest version of Arduino SDK * Remove unnecessary --fix-missing * Tweak to clarify what URLs go with what * Retrigger CI * Temporarily replace buggy Spresense core

[AutoScheduler] Fix FLOPS estimation (#8695)

768becd

Improve the error message in module.cc (#8694)

b893774

[FIX] Correctly link to PAPI (#8691)

ad83636

[Torch] Fix ELU conversion (#8699)

b7488ef

Rev ci-qemu to 0.07 (#8698)

ade2d4d

[microTVM][Zephyr] Fix: Test fails on hardware because of short timeo…

334a021

…ut (#8677) * add timeout * rename timeout and change timeout to a reasonable value * fix tests after project api merge * retrigger because of flaktest

add in-place methods used by Tacotron2 to pytorch frontend (#8692)

1abd248

Co-authored-by: Valery Chernov <[email protected]>

[Rust] Restore the Rust CI testing after Docker image update (#8657)

55864b2

* Fix Rust CI * Turn Rust CI back on

[Rust][Fix] Memory leak (#8714)

09b989d

* Fix obvious memory leak in function.rs * Update object pointer

[microTVM] Zephyr Test Refactor (#8713)

e88fe77

* refactor host to qemu * remove unused variables * remove skip-build arg * fix microtvm test script

increase atol for float32 (#8712)

9586ee2

Remove qemu installation from Zephyr RVM (#8701)

5e20ef9

[TIR] Use PopenPool instead of multiprocessing.pool (#8492)

4dd7f68

Co-authored-by: Wuwei Lin <[email protected]>

[CI] Add Arm Compute Library to Arm CI unit test pipeline (#8734)

3e37bb5

enhance tir signed-unsigned cast (#8706)

395b308

[TVMC] Switch profile flag to use new profiler (#8710)

ccc09fa

jiangjiajun merged commit 74cc942 into jiangjiajun:main Aug 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

upate #8

upate #8

jiangjiajun commented Aug 13, 2021

upate #8

upate #8

Conversation

jiangjiajun commented Aug 13, 2021