Test merge unity #2

Hzfengsy · 2024-01-17T05:51:29Z

No description provided.

This PR contains a minor fix for RCCL integration.

This commit adds 2 new operations (R.quantize and R.dequantize) and supports them in LegalizeOps pass.

…he#15861) Prior to this commit, the `RewriteCUDAGraph` pass would unconditionally rewrite an `IRModule`, and was conditionally included as a lowering pass for used in `relax.build`, based on the current `PassContext`. This commit moves the check on the `PassContext` from the `relax.build` method to the `RewriteCUDAGraph` pass itself. This allows the pass to be part of a lowering flow that is constructed once, and is later used when the `PassContext.current()` may have changed.

This PR uses static NCCL instead of the dynamic linked one to ensure out-of-box use of TVM Unity wheel.

This PR fixed a bug introduced in apache#15827 since which the cudagraph's stream is discarded.

This commit adds `-lrt` to TVM runtime when linked against static NCCL. The static NCCL depends on symbol `shm_unlink` which comes from librt.

Removed instances of accidentally repeated words from comments. There are cases where duplicated words appear legitimately, those cases remain unmodified.

* [Unity] Use PrimValue as offset in R.tril and R.triu This mirrors the support in `topi`, which supports a `PrimExpr` as the offset of the diagonal. * Update implementation to avoid I believe the `-Wsequence-point` raised by gcc is spurious, as the `index++` occurs within a braced-initialization list, which has a defined left-to-right execution order. However, better to avoid the warning altogether. * Updated attr usage to args * Correct relax op names in msc * Parametrize failing MSC unit tests, mark with xfail * Lint fix * Marked relay to relax tests as known failures

* reconstruct codegen * minor fix * minor fix * minor fix * update tests * minor fix

Disco worker originally automatically import `tvm.testing.disco` for convenient unittesting. However, `tvm.testing` is a special subpackage that introduces many unnecessary dependencies, for example, pytest. This PR removes such dependencies by directly moving the testing function registration logic to the entry file.

This PR adds support for ReLU in NN module and op, also adds support for GELU in the NN modules.

…eam-unity

…pache#15883) * [Unity][Transform] Allow static Relax arguments to dynamic PrimFunc Prior to this commit, the `relax.transform.FuseTIR` transform required that the shapes arguments passed into a `PrimFunc` be structurally equivalent to the shapes of the parameters, and that any replacement of symbolic `tir.Var` be with a symbolic `tir.Var` in the fused function. This commit updates the `SymbolicMatcher` to instead extract a `Map<tir::Var, PrimExpr>`. As a result, a Relax tensor with statically-known shape can be passed into a TIR PrimFunc with dynamic shape. The resulting fused TIR function is in terms of the statically-known shape, and no longer contains the symbolic variable.

…_tir_inplace` (apache#15878) * Add call_inplace_packed operator * Whitespace

…the normalized_shape (apache#15894) * Fix the KeyError and correctly use the normalized_shape * Update test_frontend_from_fx.py

…operator (apache#15902)

* Fix MaxPool TypeError * Add regression test case.

* delete unused import and add class docstring * add test for fast math transform * Update test_fast_math_transform.py

This commit adds debugging information to checks in the `FuseOps` pass. While the existing checks indicate where an error occurred in the `FuseOps` code, this adds information on the relax expressions that caused the error.

Prior to this commit, `relax::ExternFunc` nodes would be de-duplicated as part of the `EliminateCommonSubexpr` pass. This commit instead ignores the `relax::ExternFunc` nodes, retaining the in-line definitions.

…e#15884) * add support for torch.tensor as index * still don't fit in array indexing * support at most one tensor index, to avoid error * correct tests for tensor as index * code style * code style * code style * code style

…ache#15893) * fix the error pad_einsum documentation * Update schedule.py

apache#15804) * [Unity] Fix TVMError when loading ONNX model with CumSum operator * Add regression test for loading ONNX model with CumSum operator * Fix formatting * Fix spacing errors

* [Unity] Fix TVMScript Issues in Testcases Due to frequent sync with upstream, some of the testcases are broken, because of the changes in the TVMScript. This PR is to fix the broken

…ache#15699) * [Unity][Analysis] Implemented DefinableTIRVarsInStructInfo The existing utility `TIRVarsInStructInfo` returns all TIR variables, regardless of whether they are suitable for a variable definition, or are usage sites. This utility walks over the struct info once, returning both the definable symbolic variables and the used symbolic variables. * [Unity][Analysis] Accept relax::Expr arg in Defined/FreeSymbolicVars Prior to this commit, this utility could only be used with a `relax::Function` argument. This allows individual expressions to be inspected, even if they are not part of a complete function. * [Unity] Propagate symbolic variables in LiftTransformParams * Updated LiftTransformParams to use support::OrderedSet * Fixed import after rebase

…pache#15923) Prior to this commit, the `tvm::script::printer::AttrPrinter` class took the attribute path as a `const ObjectPath&`. In both places where an `AttrPrinter` is called, the temporary object `n_p->Attr("attrs")` is passed for this argument. While binding a temporary object to a const reference can extend the lifetime of the temporary, this requires the const reference to be in the same scope as the temporary, and does not apply in this case (see [this stackoverflow post](https://stackoverflow.com/a/2784304)). Therefore, this reference is only valid through the construction of `AttrPrinter printer`, and is invalid during its usage on the following line. This dangling reference has caused segfaults in CI for unrelated changes ([example](https://ci.tlcpack.ai/blue/organizations/jenkins/tvm-unity/detail/PR-15904/3/pipeline)), and can be reproduced with the following test case. ```python import pytest from tvm.script import relax as R @pytest.mark.parametrize("iter", range(10000)) def test_argmax_without_specified_axis(iter): @R.function def func(x: R.Tensor((1, 2, 3, 4), "float32")): return R.argmax(x) func.script(show_meta=True) ``` This test case is not included in this commit, as the reproduction is not consistent, with failure requiring on the order of 10k iterations to trigger. In addition, reproduction was sensitive to the following conditions. * The function being printed must contain at least one `relax::Call` node, with an operation that has attributes. * TVM must be built with optimization enabled. In gcc, the `-ftree-dse` optimization, which is part of `-O1`, is required to trigger the bug. * Python's default allocation must be used. If `PYTHONMALLOC=malloc` is set to instead use the system's `malloc`, the segfault was no longer triggered. This commit updates `AttrPrinter` to accept the `ObjectPath` by value. With the change applied, the above test ran 100k times without error.

…he#15822) * [Unity][VM] Improved error message in CodeGenVM::EmitKillObject This was implemented while debugging CI failures in apache#15810, but is not otherwise related to the changes in that PR. * ci bump

…pache#15904) * [Unity][Transform] Canonicalize and use CSE between pattern matches The `PatternRewriter` is intended to iterate until no matching patterns remain. Prior to this commit, this only involved repeating the pattern match rewrite rules. However, intermediate results produced by pattern replacement could cause the iterative pattern matching to terminate early. * If two rewrite rules each introduce the same intermediate, there will exist two copies of that intermediate, which can prevent `only_used_by` patterns from matching. Applying `EliminateCommonSubexpr` allows the pattern matching to continue. * Applying a rewrite rule may result in dangling intermediates that are no longer used. These dangling intermediates may prevent the next application of a rewrite rule that uses the `only_used_by` constraint. Applying `RemoveAllUnused` allows the pattern matching to continue. * A rewrite rule that returns a `relax::Var` or `relax::TupleGetItem` as the replacement introduces trivial var-to-var rebinding, which are not tracked by `PatternRewriter`. Applying `CanonicalizeBindings` allows the pattern matching to continue. While this could be fixed externally by repeatedly applying `rewrite_call`, this would require re-inspecting the entire function, and not just the dataflow block in which the replacement occurred. * Fix tests for removing redundant reshapes * Fixed failing unit tests, along with edge case in CSE

…#15917) * support torch.arange()+ (int) in dynamo * code style * code style

This PR introduces the PagedKVCache object to Relax runtime for the KV cache value management in batching settings. One test file is included. Note that this file does not contain the test of attention function/kernel. That part will be uploaded and tested separately.

* update dp4a tensor intrin * update dp4a tensor intrin * lint --------- Co-authored-by: Lufang CHEN 陈橹方 <[email protected]>

If a matrix multiplication cannot be performed due to incompatible shapes, the error message now specifies the arguments, the shape of each argument, and which dimension of the shape has a mismatch. Previously, this error message only provided the dimension of the mismatch.

…16307) Prior to this commit, an error message would occur in `ExprMutator::ReEmitBinding` if the struct info is missing from the generated value. However, because this error was generated from inside `GetStructInfo`, it didn't include sufficient context for debugging. This commit checks the struct info explicitly, and includes the context of the updated variable in the error message.

) Prior to this commit, the `BundleModelParams` would replace model parameters with `param_tuple[index]` within expressions. These nested expressions would then be normalized, resulting in `gv = param_tuple[index]` or `lv = param_tuple[index]` variable definitions. These auto-generated `gv` and `lv` names make it quite difficult to determine which model parameter is being used. This commit updates the `BundleModelParams` transform to explicitly produce the bound variable, `orig_param_name = param_tuple[index]`, preserving human-readable names from the parameters.

…pache#16367) Resolve a bug that caused undefined relax variables in the output of `CanonicalizeBindings` for cases where `VisitVarDef(const Var&)` replaces a variable, and `VisitExpr_(const VarNode*)` returns a value with different struct info, both occurring within the same `VarBinding`. The ExprMutator is only allowed to update a variable's struct info if the value bound to it has new struct info. When CanonicalizeBindings replaces a trivial binding, this may provide better struct info as a result. Prior to this commit, `ExprMutator::ReEmitBinding` defined a remap for `binding->var->vid`, even if the derived class defined a replacement by overriding `VisitVarDef`. If the derived class defines a new variable binding by overriding `VisitVarDef`, and also causes a variable replacement by overriding `VisitExpr` and returning a type with different struct info, then `ExprMutator` must check for both `binding->var->vid` *AND* `new_var->vid`. The former may be present in the unmodified graph, and the latter may be produced by the derived class before delegating to the base class. This commit updates `ExprMutator::ReEmitBinding` to define entries for both replacements that may be required.

apache#16362) This PR adds a sanity check to ensure that all `tir_var_upper_bound` attrs used by static memory planning has integer as value type. This check helps avoid mistakes of using wrong value types. The check is needed since `func->GetAttr<Map<String, IntImm>>` does not apply type check.

…pache#16310) Prior to this commit, several diagnostics in the `WellFormedChecker` would explicitly extract the name from `relax::Var`, `tir::Var`, and `GlobalVar` instances. This is unnecessary, as these classes can be printed directly, and skips any changes to the default printing behavior (e.g. printing of variable addresses) that may be useful while debugging.

…ache#16306) * [Unity][Transform] Update LambdaLift to use name of lifted lambda Prior to this commit, the `LambdaLift` pass named each function as `"lifted_func_" + i`, in incremental order of occurrence. This provided unique names for each function, but could be difficult to read, or to refer to the lifted functions. This commit updates the naming scheme to use the location at which the lifted lambda occurs to generate a unique name for the new `GlobalVar`. * Update variables names and comments for unique function naming * Add unit test for conflicting name

…k, and cumprod (apache#16351)

CI images should also be updated to install cmake 3.24

…che#16388)

…e` kernel and add test (apache#16376) fix typo bug and add test for vllm reconstruct_from_cache kernel

apache#16349) * [Unity][MSC] Avoid depending on trivial bindings in Relax intermediate The conversion from tensorflow to MSC is done by first converting from tensorflow to relay, then converting from relay to executable python code, executing that python code to generate relax, and finally converting from relax to MSC. During the relax phase of this conversion, some relax `IRModule` are applied, including `FuseOpsByPattern`. The test cases in `test_msc/test_translate_tensorflow.py` rely on `FuseOpsByPattern` preserving trivial bindings (e.g. `var_1 = var_2`) in the relax IRModule. If these trivial bindings are removed by `CanonicalizeBindings`, then the test cases in this file fail. The presence or absence of trivial bindings `FuseOpsByPattern` should be considered an implementation detail, and relax passes should not be required to preserve trivial bindings. This PR updates the relay to executable python step of the tensorflow to MSC conversion, to remove trivial bindings and output a variable name that matches the expected value in the test case. While not an ideal resolution, as other variable name changes could still reintroduce the same test failures, it ensures that `FuseOpsByPattern` may canonicalize bindings as an internal pre- or post-processing step without breaking these unit tests. * Update implementation to remove dataflow block in MSC codegen The potential for duplicate variable names was introduced by having the `block_builder.emit_output` call, which is only required to export values from a dataflow block. The dataflow block is not used in any later MSC conversion, and its removal avoids this re-export of variables. If the dataflow block is required in the future, it can be generated using `tvm.relax.transform.ConvertToDataflowBlock`. * Make failing test cases be close to the same structural form * Updated tests to validate output after compilation * Lint fixes

…e#16314) * [Unity][Analysis] Add utility for collecting compile-time bindings Whether an optimizations should be performed may depend on when the variables in an expression are known. For example, consider a LoRA-adjusted model, with base weights `W` of shape `[m,n]`, LoRA components `A` and `B` with shapes `[r,n]` and `[m,r]` respectively, and activations `x` with shape `[n,1]`. The LoRA-adjusted matmul could be computed either as `(W + B*A)*x` or as `(W*x + B*(A*x))`. If `A` and `B` are provided at run-time, then computing `(W + B*(A*x))` requires significantly fewer computations. * `(W + B*A)*x`: `m*n*(2*r + 3)` operations 1. `B*A`: `2*m*n*r` operations using a naive matmul 2. Adding `W` to (1): `m*n` operations 3. Multiplying `x` by (2): `2*m*n` operations * `(W*x + B*(A*x))`: (2*m*n + r*(2*n + 2*m + 1)) 1. `W*x`: `2*m*n` operations 2. `A*x`: `2*r*n` operations 3. Multiplying `B` by (2): `2*m*r` operations 4. Adding (1) and (3)`: `m` operations However, if `A` and `B` are known at compile-time, then computing `(W + B*A)*x` groups all compile-time values together, allowing them to be computed earlier (i.e. using `LiftTransformParams`) * `(W + B*A)*x`: `2*m*n` operations 1. `B*A`: 0 operations, computed at compile-time 2. Adding `W` to (1): 0 operations, computed at compile-time 3. Multiplying `x` by (2): `2*m*n` operations Since the choice of optimized expression depends on which parameters can be computed at compile-time, it is useful to have a utility that identifies values that can be computed at compile-time. * [Unity] QoL improvements for Dataflow matching - Update the zero-parameter `WildcardPattern` constructor to produce a valid instance. Previously, the zero-parameter constructor produced a null instance of `WildcardPattern`, which resulted in an error when used. The `WildcardPattern` was expected to be constructed through the `Wildcard` function instead. Since all other `DFPattern` child classes could be constructed explicitly, this could lead to unexpected outcomes. - Check for `pattern.defined()` when performing a pattern-match. If a null instance of a pattern is provided, this gives an error message with more context than the one raised by `DFPatternFunctor`. - Expose `RewriteCall` for use in C++. Previously, it had only been exposed through the FFI registry, and had no declaration in a header file. * [Unity][Transform] Implement relax.transform.AdjustMatmulOrder Reorder `x*(A*B)` to `(x*A)*B`. Intended for optimization of LoRA models, for which `(x*A)*B` has a much smaller memory footprint. * Fix copy-paste error * Check for re-orderings from the LHS, skip if cannot prove a benefit

This PR supports PagedKVCache with leveraging TIR kernels. Right now we do not have sufficient TIR kernels for multi-level sequences in PagedKVCache, therefore `Fork` in PagedKVCache is disabled when such a function does not exist. This PR adds a "reduced" creator of PagedKVCache, where some auxiliary functions such as the begin/end forward function of prefill/decode default to None. CUDA tests are added to ensure correctness. Co-authored-by: Hongyi Jin <[email protected]> Co-authored-by: Bohan Hou <[email protected]>

* finalize * fix ci

* [Unity][nnModule] Dynamic shape support in nn Module

fix onnx frontend Co-authored-by: cheng wen <chengven027-intellif>

…6396) This PR enhances PagedKVCache with the inline RoPE compute, which unblocks the movement towards sliding window and attention sink. Both FlashInfer and TIR kernels are updated in this PR with the RoPE calculation. Note that FlashInfer is bumped in order to include the RoPE update. The previous standalone kernel used for RoPE application are thereby removed. --- Co-authored-by: Bohan Hou <[email protected]> Co-authored-by: Hongyi Jin <[email protected]>

…che#16111) This PR enhances the static block memory planning pass. Prior to this PR, the memory planning only works on memory allocation that is not externally referenced. In dynamic shape settings, such memory allocation is not fully static and may lead to memory fragmentation. This PR enhances the behavior, so that for such memory allocation, we first allocate a storage with regard to its estimated upper bound (when known), and then allocate the tensor with the actual dynamic shape out from the storage. This will ensure the static memory allocation and avoid memory fragmentation.

* [Unity] Split DecomposeOpsForTraining into two steps Prior to this commit, the `DecomposeOpsForTraining` transform directly replaced `relax.nn.batch_norm` into more primitive relax operations. This required the decomposed form of `relax.nn.batch_norm` to be duplicated with `DecomposeOpsForInference`. This commit refactors the pass to occur in two steps, first to apply training-specific mutations, and then to decompose. Having a clear `DecomposeOps` pass also has a clear single location for operator decomposition, which may be migrated into the operator definition in the future, similar to `FLegalize`. * Updated ApplyPassToFunction utility to use a regex

junrushao and others added 30 commits October 3, 2023 10:36

[Fix] Fix Build with RCCL (apache#15863)

d2a9f39

This PR contains a minor fix for RCCL integration.

[Unity] Add R.quantize and R.dequantize ops (apache#15849)

8954139

This commit adds 2 new operations (R.quantize and R.dequantize) and supports them in LegalizeOps pass.

[CMake] Prefer a static NCCL (apache#15869)

9f0ac49

This PR uses static NCCL instead of the dynamic linked one to ensure out-of-box use of TVM Unity wheel.

[Bugfix] Fix Disco-CUDAGraph Integration (apache#15870)

6d2b44d

This PR fixed a bug introduced in apache#15827 since which the cudagraph's stream is discarded.

[Disco] Add -lrt to TVM runtime for NCCL (apache#15876)

b486210

This commit adds `-lrt` to TVM runtime when linked against static NCCL. The static NCCL depends on symbol `shm_unlink` which comes from librt.

[Unity][Fix] Remove duplicated words from comments, NFC (apache#15875)

2e30dbe

Removed instances of accidentally repeated words from comments. There are cases where duplicated words appear legitimately, those cases remain unmodified.

[Unity][MSC][pre M1.2] Reconstruct codegen (apache#15813)

59ec81f

* reconstruct codegen * minor fix * minor fix * minor fix * update tests * minor fix

[Unity][NN] Enhance ReLU and GELU support (apache#15885)

969e31a

This PR adds support for ReLU in NN module and op, also adds support for GELU in the NN modules.

Merge remote-tracking branch 'apache-upstream/main' into apache-upstr…

b9a02b1

…eam-unity

[Unity][Op] Introduce call_inplace_packed as a counterpart to `call…

dd57556

…_tir_inplace` (apache#15878) * Add call_inplace_packed operator * Whitespace

[Unity] [Bugfix] Fix KeyError:'None' in layer_norm and correctly use …

ec1184e

…the normalized_shape (apache#15894) * Fix the KeyError and correctly use the normalized_shape * Update test_frontend_from_fx.py

[Unity] [Bugfix] Fix TypeError in TVM PyTorch frontend for LayerNorm …

b138005

…operator (apache#15902)

[Unity] [Bugfix] Fix MaxPool TypeError in ONNX frontend (apache#15908)

67d6193

* Fix MaxPool TypeError * Add regression test case.

[Unity]Add FastMathTransform pass to Relax (apache#15814)

58e00f3

* delete unused import and add class docstring * add test for fast math transform * Update test_fast_math_transform.py

[Unity] Improve FuseOps error messages (apache#15899)

f154026

This commit adds debugging information to checks in the `FuseOps` pass. While the existing checks indicate where an error occurred in the `FuseOps` code, this adds information on the relax expressions that caused the error.

[Unity] Ignore R.ExternFunc in EliminateCommonSubexpr (apache#15900)

465d691

Prior to this commit, `relax::ExternFunc` nodes would be de-duplicated as part of the `EliminateCommonSubexpr` pass. This commit instead ignores the `relax::ExternFunc` nodes, retaining the in-line definitions.

[Unity] Fix wrong variable name in test_optimize_layout_transform (ap…

95a89d2

…ache#15893) * fix the error pad_einsum documentation * Update schedule.py

[Fix][Unity] Fix TVMError when loading ONNX model with CumSum operator (

1e789f3

apache#15804) * [Unity] Fix TVMError when loading ONNX model with CumSum operator * Add regression test for loading ONNX model with CumSum operator * Fix formatting * Fix spacing errors

[Unity] Fix TVMScript Issues in Testcases (apache#15920)

589d919

* [Unity] Fix TVMScript Issues in Testcases Due to frequent sync with upstream, some of the testcases are broken, because of the changes in the TVMScript. This PR is to fix the broken

[Unity][VM] Improved error message in CodeGenVM::EmitKillObject (apac…

e676780

…he#15822) * [Unity][VM] Improved error message in CodeGenVM::EmitKillObject This was implemented while debugging CI failures in apache#15810, but is not otherwise related to the changes in that PR. * ci bump

[Unity][Relax]support torch.arange()+ (int) in torch frontend (apache…

9892675

…#15917) * support torch.arange()+ (int) in dynamo * code style * code style

vincentccc and others added 28 commits January 8, 2024 17:53

[TIR] Extend DP4A tensor intrin (apache#16293)

4c77f0f

* update dp4a tensor intrin * update dp4a tensor intrin * lint --------- Co-authored-by: Lufang CHEN 陈橹方 <[email protected]>

Merge branch 'main' of github.com:apache/tvm into unity

1e95b63

[CI] Upgrade Unity ci images (apache#16369)

4e05eb4

[Unity] Add dlight.gpu.Fallback in DispatchSortScan, add argsort, top…

e1d71b3

…k, and cumprod (apache#16351)

Merge branch 'main' of github.com:apache/tvm into unity

45532d7

[Unity] Set CMAKE_CUDA_ARCHITECTURES default to native (apache#16335)

474c06b

CI images should also be updated to install cmake 3.24

Merge remote-tracking branch 'upstream/main' into unity

c40d96b

[Unity] Update dispatch test cases following the merge from main (apa…

b8230f6

…che#16388)

[Unity] Fix creation of disco ProcessSession (apache#16375)

81a6c51

[Unity][Contrib] Fix a bug due to typo in vllm `reconstruct_from_cach…

d1b890a

…e` kernel and add test (apache#16376) fix typo bug and add test for vllm reconstruct_from_cache kernel

[Unity][BlockBuilder] Restore bb.get() (apache#16378)

138cb65

* finalize * fix ci

[Unity][nnModule] Dynamic shape support in nn Module (apache#16284)

07d8e02

* [Unity][nnModule] Dynamic shape support in nn Module

[Unity][Relax][Op] Add Conv3D Operator (apache#16385)

5c87bfe

[Relax][Frontend][ONNX]fix onnx frontend parse (apache#16395)

e9bea9d

fix onnx frontend Co-authored-by: cheng wen <chengven027-intellif>

Merge branch 'main' into test-merge-unity

c8f2e30

Hzfengsy closed this Jan 22, 2024

Hzfengsy deleted the test-merge-unity branch January 24, 2024 05:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test merge unity #2

Test merge unity #2

Hzfengsy commented Jan 17, 2024

Test merge unity #2

Test merge unity #2

Conversation

Hzfengsy commented Jan 17, 2024