Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PatternLang] Add ConstantPattern #5689

Merged
merged 2 commits into from
May 28, 2020
Merged

Conversation

comaniac
Copy link
Contributor

@comaniac comaniac commented May 28, 2020

Discussion: https://discuss.tvm.ai/t/patternlang-match-constant-nodes/6835

Changelog:

  • Add ConstantPattern to the pattern language.
  • Add unit tests of constant pattern.
  • Eliminate pylint errors in the dataflow pattern unit test file.
  • Update the document.

cc @mbrookhart @zhiics @mbaret

Copy link
Contributor

@mbrookhart mbrookhart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make other syntactic sugar, you could add an is_const() that simiply creats a new ConstPattern, but I don't think it's strictly necessary.

For my own edification, what did you use to auto-format the test file? I tried autopep8 on it a while back and got really bad results, so I ignored it after that (as does the linter).

@comaniac
Copy link
Contributor Author

comaniac commented May 28, 2020

To make other syntactic sugar, you could add an is_const() that simiply creats a new ConstPattern, but I don't think it's strictly necessary.

I could do that. How about is_tuple and is_tuple_get_item?

For my own edification, what did you use to auto-format the test file? I tried autopep8 on it a while back and got really bad results, so I ignored it after that (as does the linter).

I used yapf with the .style.yapf I put under the TVM home.

@mbaret
Copy link
Contributor

mbaret commented May 28, 2020

Thanks for the PR so soon! Is there an example of how partition works on a constant match? In particular, does the constant remain propagated into the body?

On a general point, it's helpful if reformats are kept separate from feature additions for reviews so it's easy to see what the changes are.

@mbrookhart
Copy link
Contributor

mbrookhart commented May 28, 2020

@mbaret Originally the partition pass embedded constants in the function body, but @comaniac filed #5662, and I responded with #5663, so it currently will lift the constants to the function arguments.

Do you prefer embedded constants?

@comaniac
Copy link
Contributor Author

comaniac commented May 28, 2020

Thanks for the PR so soon! Is there an example of how partition works on a constant match? In particular, does the constant remain propagated into the body?

An example can be found in the unit test: https://github.com/apache/incubator-tvm/pull/5689/files#diff-f9920485e5e341787129348ce1985db9R213

Also, could you provide an example to show your expection of constant propogation? I've checked that all merge composite unit tests are passed with the current implement even for the DNNL codegen unit tests that use composite functions with parameter binding. It would be better if you could include all your use cases in the unit tests so that we could on the same page.

On a general point, it's helpful if reformats are kept separate from feature additions for reviews so it's easy to see what the changes are.

Yeah the bugfix should be put to a separate PR for sure. For re-formatting, I didn't intentionally reformat the unit test file. VSCode by default auto-formats the file once I paste some code. I was even considering to revert auto-formatting due to the reason you pointed out. However, I checked the re-formatting and it is mostly just blank and too-long lines, so I think it should be fine for this small feature PR.

@mbaret
Copy link
Contributor

mbaret commented May 28, 2020

%2 = fn (%input: Tensor[(1, 224, 224, 3), uint8], Composite="qnn_conv2d") -> Tensor[(1, 224, 224, 64), uint8] {
    %0 = qnn.conv2d(%input, meta[relay.Constant][0] /* ty=Tensor[(3, 3, 3, 64), uint8] */ /* ty=Tensor[(3, 3, 3, 64), uint8] */, 115 /* ty=int32 */, 134 /* ty=int32 */, 1f /* ty=float32 */, 0.00503911f /* ty=float32 */, padding=[1, 1, 1, 1], channels=64, kernel_size=[3, 3], data_layout="NHWC", kernel_layout="HWIO", out_dtype="int32") /* ty=Tensor[(1, 224, 224, 64), int32] */;
    %1 = nn.bias_add(%0, meta[relay.Constant][1] /* ty=Tensor[(64), int32] */ /* ty=Tensor[(64), int32] */, axis=3) /* ty=Tensor[(1, 224, 224, 64), int32] */;
    qnn.requantize(%1, 0.00503911f /* ty=float32 */, 0 /* ty=int32 */, 3.26957f /* ty=float32 */, 0 /* ty=int32 */, out_dtype="uint8") /* ty=Tensor[(1, 224, 224, 64), uint8] */
  };

Would be an example of the composite function I'd expect if I matched the weight of the conv as a constant (this is the old merge composite behaviour for that case).

I think the most logical behaviour is for constants to remain embedded where they are explicitly part of the pattern and to lift them where they are not.

@masahi
Copy link
Member

masahi commented May 28, 2020

I might be missing something, but I expect constants to be available to external codegen at compile time. Otherwise we cannot do constant folding. Sorry I didn't put thought on this issue when merging #5663

@comaniac Under the current implementation of merge composite + pattern partitioning, do you hit this visitor https://github.com/apache/incubator-tvm/blob/master/src/relay/backend/contrib/dnnl/codegen.cc#L156 during DNNL codegen?

@masahi masahi self-assigned this May 28, 2020
@mbrookhart
Copy link
Contributor

mbrookhart commented May 28, 2020

☝️ I'll let you guys discuss the appropriate behavior, there's seems to be some complication to this.

In the mean time, this was my initial assumption:

I expect constants to be available to external codegen at compile time. Otherwise we cannot do constant folding.

But it doesn't match the fusion pass.

I think the most logical behaviour is for constants to remain embedded where they are explicitly part of the pattern and to lift them where they are not.

I think about how to do that.

@comaniac
Copy link
Contributor Author

I made an example and tested it with old MergeComposite pass (master commit 6100112). It behaves as I expected. The partitioned composite function still has 3 arguments even %w has been bind. @mbaret could you double check?

import tvm
from tvm import relay
from tvm import tir
from tvm.relay.testing import run_opt_pass

from tvm.relay.build_module import bind_params_by_name
import numpy as np


# Make a graph
x = relay.var('x', shape=(1, 3, 224, 224))
w = relay.var('w', shape=(3, 3, 3, 3))
b = relay.var('b', shape=(3,))

conv2d = relay.op.nn.conv2d(x, w)
out = relay.op.nn.bias_add(conv2d, b)
func = relay.Function([x, w, b], out)
mod = tvm.IRModule.from_expr(func)

mod["main"] = bind_params_by_name(mod["main"],
                                  {'w': tvm.nd.array(np.ones(shape=(3, 3, 3, 3)))})
print('=== Before ===')
print(mod['main'].body)


def pat():
    x = relay.var('x', shape=(1, 3, 224, 224))
    w = relay.var('w', shape=(3, 3, 3, 3))
    b = relay.var('b', shape=(3,))

    conv2d = relay.op.nn.conv2d(x, w)
    out = relay.op.nn.bias_add(conv2d, b)
    return out

pattern_table = [('pat', pat())]
result = run_opt_pass(mod['main'], relay.transform.MergeComposite(pattern_table))
print('=== After ===')
print(result)
=== Before ===
free_var %x: Tensor[(1, 3, 224, 224), float32]
%0 = nn.conv2d(%x, meta[relay.Constant][0] /* ty=Tensor[(3, 3, 3, 3), float64] */ /* ty=Tensor[(3, 3, 3, 3), float64] */, padding=[0, 0, 0, 0]) /* ty=Tensor[(1, 3, 222, 222), float32] */;
free_var %b: Tensor[(3), float32]
nn.bias_add(%0, %b) /* ty=Tensor[(1, 3, 222, 222), float32] */
// meta data omitted. you can use show_meta_data=True to include meta data
=== After ===
fn (%x: Tensor[(1, 3, 224, 224), float32], %b: Tensor[(3), float32]) -> Tensor[(1, 3, 222, 222), float32] {
  %1 = fn (%x1: Tensor[(1, 3, 224, 224), float32], %w: Tensor[(3, 3, 3, 3), float64], %b1: Tensor[(3), float32], Composite="pat") -> Tensor[(1, 3, 222, 222), float32] {
    %0 = nn.conv2d(%x1, %w, padding=[0, 0, 0, 0]) /* ty=Tensor[(1, 3, 222, 222), float32] */;
    nn.bias_add(%0, %b1) /* ty=Tensor[(1, 3, 222, 222), float32] */
  };
  %1(%x, meta[relay.Constant][0] /* ty=Tensor[(3, 3, 3, 3), float64] */ /* ty=Tensor[(3, 3, 3, 3), float64] */, %b) /* ty=Tensor[(1, 3, 222, 222), float32] */
}
// meta data omitted. you can use show_meta_data=True to include meta data

@masahi I just checked with this PR. The unit test hits the line you pointed out twice.

@mbaret
Copy link
Contributor

mbaret commented May 28, 2020

Try changing w in the pattern to a relay.const rather than a var.

@masahi
Copy link
Member

masahi commented May 28, 2020

@masahi I just checked with this PR. The unit test hits the line you pointed out twice.

hmm something seems off to me. On the dnnl fused mobilenet tests, I think all params in the network should go through that visitor. In #5310, @zhiics improved how to handle big constants, which made running fused mobilenet possible. If the visitor is called only twice, we shouldn't have had such issues at all.

@comaniac
Copy link
Contributor Author

comaniac commented May 28, 2020

OK I can finally reproduce this case:

fn (%x: Tensor[(1, 3, 224, 224), float32], %b: Tensor[(3), float32]) -> Tensor[(1, 3, 222, 222), float32] {
  %1 = fn (%x1: Tensor[(1, 3, 224, 224), float32], %b1: Tensor[(3), float32], Composite="pat") -> Tensor[(1, 3, 222, 222), float32] {
    %0 = nn.conv2d(%x1, meta[relay.Constant][0] /* ty=Tensor[(3, 3, 3, 3), float64] */ /* ty=Tensor[(3, 3, 3, 3), float64] */, padding=[0, 0, 0, 0]) /* ty=Tensor[(1, 3, 222, 222), float32] */;
    nn.bias_add(%0, %b1) /* ty=Tensor[(1, 3, 222, 222), float32] */
  };
  %1(%x, %b) /* ty=Tensor[(1, 3, 222, 222), float32] */
}
// meta data omitted. you can use show_meta_data=True to include meta data

Combining wih the previous example, we know that old MergeComposite preserves constant node when the pattern is also a constant; otherwise it will lift constant nodes. In the DNNL codegen, the patterns are all specified as VarNode, so the case that matches a pattern with constant nodess never shows up in unit tests and that's why we missed it.

However, I personally think this behavior is weird. General speaking, whatever we specify var or const in the pattern, we may need or not need constant lifting. It seems to me that this partition behavior should not be bind with patterns but should be a partition option.

Anyways, I think the behavior of constant lifting should be discussed in another topic and does not relate to this PR.

@mbaret
Copy link
Contributor

mbaret commented May 28, 2020

It's not really the case that old merge composite explicitly 'lifts' constant nodes if they're vars, in old merge composite vars indicate inputs to the pattern (so they don't need to explicitly match against a VarNode).

My expectation would be if the pattern requires there to be a constant node, I should see that constant node in the partitioned function. In other words, the body of the function should still match the pattern used to create it.

@mbrookhart
Copy link
Contributor

I think I agree with @mbaret here. I think I see a simple way to do that behavior by default, why don't I post another PR in that direction after this goes in?

@comaniac
Copy link
Contributor Author

That's the behavior I just realized with above examples. In this case, what would be the behavior if we specify VarPattern('x') | ConstantPattern()?

@mbrookhart
Copy link
Contributor

I think it depends on which one matched? If it's a var, we lift it, if it's a constant, we embed?

@comaniac
Copy link
Contributor Author

comaniac commented May 28, 2020

OK so here is a conclusion for another follow-up PR to deal with constant lifting.

  • Pattern input ConstantPattern:
    • Only match constant nodes and keep them in the partitioned functions. In this case, the arguments of partitioned functions will be reduced.
  • Pattern input VarPattern('x'):
    • Match only the var node with a specified name hint. In this case, the arguments of partitioned function is fixed.
  • Pattern input wildcard or VarPattern('x') | ConstantPattern:
    • Match both constant and var node but life constant nodes. In this case, the arguments of partitioned function is fixed whever it matches constant or var.

btw, IMHO, for clearer semantic, it might be better if we support VarPattern() that matches a var node with any name hints but not constant node.

@mbrookhart
Copy link
Contributor

I'd liked to refine slightly:

  • Pattern input ExprPattern(relay.const):
    • Only match constant nodes that match the value and embed them in the partitioned functions. In this case, the arguments of partitioned functions will be reduced.
  • Pattern input ConstantPattern:
    • Only match constant nodes and embed them in the partitioned functions. In this case, the arguments of partitioned functions will be reduced.
  • Pattern input VarPattern('x'):
    • Match only the var node with an optional specified name hint. In this case, the arguments of partitioned function is fixed.
  • Pattern input wildcard:
    • Match anything. In this case, the arguments of partitioned function is fixed.
  • Pattern AltPattern:
    • Match either lhs or rhs. depending on which side matched, recursively apply this matching logic and embed constants appropriately based on the contained pattern.

@mbrookhart
Copy link
Contributor

@comaniac The input VarPattern name is already optional, if you use is_input() it will match any VarNode
https://github.com/apache/incubator-tvm/blob/a072da0588c542757d2815832b7f010f530b2428/src/relay/ir/dataflow_matcher.cc#L386-L395
https://github.com/apache/incubator-tvm/blob/a072da0588c542757d2815832b7f010f530b2428/python/tvm/relay/dataflow_pattern/__init__.py#L171-L185

I'm not sure that made it into the document or test cases, I'll make sure to update in the followup PR.

@@ -131,6 +136,44 @@ The next example is matching a pattern of batch_norm -> get(0) -> relu:
out = relay.nn.relu(tuple_get_item_node)
pat.match(out)

The next example is matching a constant node regarding its values. This is useful to check
if a specific parameter in a subgraph has been bind or not.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bound, can be fixed in the next PR

@masahi
Copy link
Member

masahi commented May 28, 2020

@comaniac @mbrookhart good to go?

@comaniac
Copy link
Contributor Author

@comaniac @mbrookhart good to go?

Yeah I'm fine with that.

@masahi masahi merged commit 95b3ad9 into apache:master May 28, 2020
@comaniac comaniac deleted the add_const_to_pattern branch May 29, 2020 00:03
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Jun 9, 2020
* Add ConstantPattern

* update doc
trevor-m pushed a commit to neo-ai/tvm that referenced this pull request Jun 16, 2020
* [TFLITE]Select op support for tflite frontend (#5486)

* [TFLITE]Select/Where op support for tflite frontend

* Review comment fixed

* Review comment fixed

* [FRONTEND][TFLite] Fully connected op conversion made in sync with TFLite (#5510)

* [FRONTEND][TFLite] Fully connected op conversion made in sync with TFLite

* [1] Test case added

* [2] Review comments handled

* [3] Prints removed

* [TOPI][Winograd] Optimization of Conv2d Winograd algorithm on Tensor Core (#5485)

* Cache PrimExpr instead of raw pointers in bound analyzer (#5533)

The objects that the raw pointers point to can be deallocated and new
objects can be allocated at the same address, all while these pointers
are still in the cache. This can lead to unexpected behavior, for
example to calculated bound conflicts with previously cached values.

Caching PrimExpr will prevent the objects from being deallocated while
the cache is active.

* fix a few bugs with shape inference and types in the onnx importer (#5534)

* [Frontend][TFLite] ADD_N operator  (#5474)

* [WEB][RUNTIME] TVM WebAssembly JS Runtime (#5506)

* [WEB] Remove the old web runtime

* [WEB][RUNTIME] TVM WebAssembly Runtime

This PR introduces a brand new TVM web runtime based on the WASM standard API.
Main highlights:

- The new runtime is rewritten using the Typescript.
- The new runtime now directly interfaces with WebAssembly's standard API,
  instead of relying on emscripten's API.
  This change will make the js runtime more portable to runtime variants.
  For example, we could also try to make it interface with the tvm's rust runtime implementation.
- System library can be provided through WASI
  - We also build a hack to enable Emscripten to generate a WASI like
    bundle for runtime environment on the Web.
- The wasm generation now uses the mainlin LLVM.
- Dynamic link(dlopen) is not used due to limitation of wasm,
  instead we rely on the recent new RPC refactor to directly
  restart a new session for each wasm binary sent to the RPC.

* Address review comments

* Skip tensorcore test

* [RELAY][ONNX]ReduceLogSumExp Operator support (#5453)

* [RELAY]LogSumExp Op Support

* [ONNX]LogSumExp Op Support

* [RPC][BUGFIX] Fix remote device sync (#5538)

* [Refactor][std::string --> String] IRModule is updated with String (#5523)

* [std::string --> String] IRModule is updated with String

* [1] Packedfunction updated

* [2] Lint error fixed

* [3] Remove std::string variant

* [RUNTIME] Store nullptr PackedFunc as nullptr for better error propagation (#5540)

* [Relay-TFLite] FP32 and Quantized Object Detection Model (#5479)

* TFlite e2e FP32 Object detection model

* Fix test

* [Relay-TFLite] Quantized activations

* Flexbuffer parsing

* Lint

* Relaxing checks.

* Github reviews

* comments

Co-authored-by: Ubuntu <[email protected]>

* Changes to cpp_rpc to make it work on Android (+ Hexagon offloading) (#5535)

* Changes to cpp_rpc to make it work on Android (+ Hexagon offloading)

- Implement getNextString to break up std::string into words. stringstream
  just doesn't work on Android.
- string::find_last_of doesn't look for the last substring, but the
  last character from a given string.
- Use SIGTERM to terminate processes (this isn't necessary, but using
  SIGKILL is not a good practice).
- Convert "./rpc" to a full path. When a module is uploaded and offloaded
  to Hexagon, the dlopen on Hexagon needs an absolute path (or a path
  without directories).

* Only set the absolute patch on non-Windows platforms

Windows has different macros for the maximum path length.

* Add Onnx Pad v11 (#5539)

* fix restructured text (#5541)

* [CRT]fix to reduce RAM size during loading model (#5507)

* [CRT]fix to reduce RAM size during loading model

* Release graph_json memory immediately after reading

* Load platform specific lib for tvmdsoop instead of only so (#5542)

* [RPC] Improve RPCServer AsyncIO support. (#5544)

* [RPC] Improve RPCServer AsyncIO support.

When the RPCServer is in the async IO mode, it is possible for the server
to directly serve async function that may return its value via a callback in the future.
This mode is particular useful to the web environment, where blocking is not an option.

This PR introduces the Async support to the RPCSession, allowing the AsyncIO driven servers
to serve the async functions. These functions will still be presented as synchronized version
on the client side.

Followup PR will refactor the web runtime to make use of this feature.

* Address comments

* [Rust] Add first stage of updating and rewriting Rust bindings. (#5526)

* Add tvm-sys

* Use as_mut_ptr

* Address CR feedback

* Update rust/tvm-sys/src/datatype.rs

Co-authored-by: Nick Hynes <[email protected]>

* Final CR comments

* Fix find and replace error in frontend

Co-authored-by: Nick Hynes <[email protected]>

* [TE] Fix MakeLoopNest for warp memory (#5382)

* [TIR][Printer] text format printer considering future parsing use (#5483)

* [Optimization] Warp level reduction support for CUDA (#5498)

- Added the warp level reduction support

- Upgraded shfl intrinsics to the sync version.

- This is the building block for scheduling softmax like operations.

Signed-off-by: Wei Pan <[email protected]>

* A clone of test/python/unittest/test_runtime_micro.py, however (#5546)

modified to run specifically on ARM cortex-M hardware, which
currently is just the STM32F746 discovery board.

Signed-off-by: Tom Gall <[email protected]>

* [CI] Install wasmtime for WebAssembly tests (#5494)

* Apparently, ONNX Conv with no 'pads' defaults to zero padding (#5548)

* [WEB] WebGPU support (#5545)

This PR introduces WebGPU support to tvm.
The WebGPU runtime is directly built in javascript(as WebGPU uses JS as the first class citizen API)
and exposes back to the tvm's runtime via PackedFuncs.

One important note is that `ctx.sync` is not async.
This is due to the fact that WebGPU is a purely async API and we cannot block in the web environment.

So the current best way to use the js api is to wrap things in an async function.
When copy a GPU array to CPU, `await ctx.sync()` need to be called to wait for copy completion.

We use a AsyncIO rpc server to serve the async functions to the clients.

* [TOPI][RELAY][TENSORFLOW]Math ops added (#5502)

* [TOPI][RELAY][TENSORFLOW]Math ops added

* Extra newline removed

* CI fix

* Review comments fixed

* Review comments fixed

* [RUNTIME] Hexagon driver for offloading kernels to simulator (#5492)

* [RUNTIME] Hexagon driver for offloading kernels to simulator

* Add sim_dev as external project when building with Hexagon/sim support

* Change target CPU for sim_dev to v60

* [LINT] clang-format the h,cc,m files. (#5557)

This PR prepares for our migration to use the clang-format
as part of the linter system.

* [BYOC, MergeComposite] Add additional check before re-using the cached match (#5552)

* Add additional check before re-using the cached match in merge composite

* clean up ExtractPattern calls

* [WEB] Setup lint, doc, test (#5556)

* [CI] Update ci-cpu to bionic (#5555)

* [CI] Update ci-cpu to bionic (#5554)

* [Fix] Fix conv2d alter op for arm cpu (#5532)

* [FRONTEND]onnx, mxnet, pytorch mathops added (#5561)

* Fix topi test for tensorcore (#5563)

* [Refactor][std::string --> String] IR is updated with String (#5547)

* [std::string --> String] GlobalTypeVar is updated with String

* [std::string --> String] GlobalVar is updated with String

* [std::string --> String][IR] ADT is updated with String

* [std::string --> String][IR] OP is updated with String

* [std::string --> String][IR] Attrs is updated with String input

* [std::string --> String][IR] GlobalVar is updated with String

* [std::string --> String][Test] Pyconverter is updated with String change

* [DOCKER] Fix vulkansdk in the ci-gpu (#5566)

* [CI] reintroduce docker stage for wasm tests (#5565)

* [DOCKER] Introduce ci-wasm

* Add Jenkinsfile

* Rename prepare to prepwasm so it won't run by default

* [CI] Update ci-lint to use the latest image that contains clang-format (#5568)

* [DOCKER] Add clang-format and nodejs to ci-lint (#5567)

* [TARGET] Phase out WebGL (#5570)

The graphics API is moving towards next generation.
Vulkan/Metal on the native and WebGPU on the web.

Due to the limited programming model, we cannot get the best compute performance in WebGL.
Now that the mainline already have both WebGPU and vulkan support, this PR phases out WebGL.

* [LINT] Enable clang-format. (#5572)

* [LINT] Enable clang-format.

* Add more docs

* [CI] Update the ci-gpu to the lastest build with the new vulkansdk. (#5571)

* [Relay] enable blocking format in x86 conv2d and fold scale axis (#5357)

* [CI] Fix clang-format error (#5577)

* Allow ubuntu_install_darknet.sh to work in both 18.04 and 16.04 (#5574)

* [PYTORCH]expand bug fix (#5576)

* [CI] Enable llvm-11 and llvm-10 in build tests, recover webdocs. (#5579)

This PR ties up the last loosen end of the recent CI update.

* [PYTORCH] Support max_pool2d_with_indices (#5549)

* Use real output name instead of node_name

* Add pytorch max_pool2d_with_indices converter.

* Add test for maxpool2d with indices

* Add explicit assert for single output

* Only consume output (not indices) from max pool 2d with indices

* undo change

* [Relay] Fixed bug in attribute parsing for pool layers. (#5582)

* Fixed pooling bug.

* Added tests and fixed more cases.

* [RELAY][TF] Support symbolic newshape for Reshape (#5429)

* [RELAY][TF] Support symbolic newshape for Reshape

* Only need to pass data

* Use MakeReshape() in Reshape()

* Change newshape to Expr

* Create a template for Array<T>

* Fuse reshape when newshape is constant

* Make newshape Optional

* Use bool() of Optional

Co-authored-by: Li Xiaoquan <[email protected]>

* Add prim::device op (#5584)

* Fix the runtime raise error (#5586)

* [RELAY][Convert Layout] Specify additional layouts in convert layout pass (#5422)

* [RELAY] Specify additional layouts in convert layout pass

* This patch means that you can specify an additional layout, rather than using the layout chosen by default during conversion.
* This is specifically useful for external codegen when a 3rd party library needs to target a specific kernel layout for example.

Change-Id: I3ef9cf45ead574801870a38af9768f93e29aab10

* Use mapping of op name to list of desired layouts

Change-Id: Ibd691a3cb93e73a394f36112668ad52a84c7d5a2

* Fix issue with code block

Change-Id: Ibb4e38c05ad4312b7dea845be699b8d5d57e0a94

* Address comments, Improve tutorial

Change-Id: Ib824eead329d551c338234de3b2d814693afd0ec

* Fix linting

Change-Id: Ie9e1891f590b3a7496a56ff8362cdda9d4b5fa75

* Test uses NCHW default layout. Unrelated issue with NHWC.

Change-Id: I1c16f0db73db56f5e9536db3fe5eb2624c3b595c

* Fix mistake in tutorial

Change-Id: I944041245d27af262dc96f1cd8117f1f19272062

* Address multiple comments

Change-Id: If33a1e34acd8fc37d1c7797ee189a6448a392672

* Improve tutorial

Change-Id: Ib04142c94c7958ab5067947d2ff4c84354e3d0c5

* Fix Clang-format

Change-Id: Ieff39e3f0817d22579c68b3287e972a3b0fcfbc8

* Add a quantized conv2 unit test for the tflite front-end (#5558)

Signed-off-by: Giuseppe Rossini <[email protected]>

* [Relay][Transform] Safe check added for Merge Composite (#5562)

* [MXNET]abs, round, reciprocal, sign, softsign, hard_sigmoid (#5587)

* [Hexagon] One more fix for concurrency count (#5589)

* Fix JSON graph dumping. (#5591)

* Previously this function placed a JSON-escaped string containing
   the JSON-encoded graph.

* [DOCS] Improve document in reflection (#5593)

* Overestimate binary size for microTVM compiled binaries. (#5590)

* Overestimate binary size for microTVM compiled binaries.

 * Currently uTVM binary section sizes are computed by summing the
   sizes of all symbols in the section.
 * This method produces errors because it presumes the linker works in
   a particular way, rather than analyzing the linked output.
 * As we intend to move away from linking inside TVM (RFC
   forthcoming), just using this stopgap to make forward progress
   until then.

* address weberlo comments

* fix regression (use 64 bit word size)

* [TFLite Runtime] Fix bug and re-enable RPC execution test (#5436)

* [Relay][VM] Memory planner (part 1) (#5144)

* Start on memory planning

WIP

Move to test_memory_passes.py

Work on memory planning

Post-rebase and VM changes

Plumb through the offsets

Basic tests all pass, fix offset to data buffer.

Fix compile errors

Fix ws

Apply suggestions from code review

Co-Authored-By: Haichen Shen <[email protected]>

Address CR

Update src/runtime/vm/vm.cc

Co-Authored-By: Haichen Shen <[email protected]>

Fix another comment

Fix lint

Fix

Fix

Fix

Lint is done?

Fix

More fix

Trying to debug

No clue

Fix lint

* Fix docs

* Disable aggressive constant eval

* It works

* Fix lint

* Found issue with dynamic

* Fix the pass, but runtime segfaults

* fix scalar tensor, test_any_elemwise passes

* Fix split pass

* Fix 0-rank issues

* Fix

* debug

* apply Haichen's patch and clean up

* lintgit add .

* fix serializer and test_tyck_alloc_tensor test

* Fix the constant lift pass in presence of closures

* Restore old finder

* Fix rebase issues

* Fix

* Fix

* Fix issue coercing the shapes incorrectly from i64 to i32

* Fix linting

* Fix clang format

* Format memory.cc

* Fix 0-rank case

* Add fix for (0,) shape

* Ignore shapes for now

* Apply suggestions from code review

Co-authored-by: Zhi <[email protected]>

* Update src/runtime/vm/executable.cc

Co-authored-by: Zhi <[email protected]>

* Fix

* lint

Co-authored-by: Zhi Chen <[email protected]>
Co-authored-by: Zhi <[email protected]>

* Add ostream formatters for TargetPtr/TargetVal. (#5592)

* Pattern Language, Matcher, Rewriter, and Function Paritioner (#5231)

* [Reduction] Fix cross thread redunction (#5551)

- The predictions were not correctly applied after transformation.
  This leads to normal reduction itervar appearing outside of the loop,
  which is undefined. See detailed comments.

Signed-off-by: Wei Pan <[email protected]>

* Fix TVMArray layout on device (#5599)

* [LLVM] Represent alignment information in LLVM IR (#5598)

* Add debug mode to tempdir() (#5581)

* [PYTORCH]ImplicitTensorToNum support added (#5603)

* [PYTORCH]Matmul fix for batch_matmul (#5604)

* fix rpc server bug on VTA (#5607)

* [REFACTOR][IR] Streamline ir/op Registry (#5609)

* [REFACTOR][IR] Streamline ir/op Registry

This PR refactors the attrregistry mechanism in the ir/op into
a separate common base. The common base will provide a foundation
for other attr related registries such as target and pass.

We also streamlines the terminology of the registry API.

- Use AttrMap for the column maps returned by the registry
- Use RegEntry to refer to the registry entry.

* Address review comments

* [TFLITE]GATHER_ND (#5508)

Signed-off-by: Dhruva Ray <[email protected]>

* [CUDA] Fix codegen for warp shuffle intrinsics (#5606)

* fix shfl intrin

* improve test_lower_warp_memory_cuda_half_a_warp

* Fix a typo. (#5611)

Co-authored-by: Zeng Liyong <[email protected]>

* fix pattern topological order (#5612)

* [BYOC] Remove kCompiler attr from external functions (#5615)

Functions destined for external codegen keep their kCompiler attribute which means SkipFunction returns true when running a pass over such functions during the codegen step. This makes sense during graph partitioning, however when lowering the functions for codegen the is no reason to keep this behaviour.

Allowing this behaviour will mean a codegen can run a pass on functions only intended for one 3rd party library. Specifically, allowing pre-processing of a series of sub-graphs right before it is passes through codegen. This helps ensure that the functions destined for the 3rd party library are in the expected format. For example, we may want to ensure that these functions have a kernel layout of OHWI because the 3rd party library only supports OHWI. This wouldn't be possible before partitioning the graph as we don't know how the graph will be partitioned ahead of time.

Change-Id: Ia68b9da335ef1acfc405a8528aac823de60a65c2

* [Relay]Improve Shape Func handling for Tuple inputs (#5467)

* Improve Shape Func handling for Tuple inputs

* Fix lint

* Improve

* Fix build

* [Relay][Refactor][std::string --> String] Relay updated with String (#5578)

* [KERAS]Global MaxPool3d and AvgPool3d support (#5098)

* [IOS] Fix build error of iOS RPC (#5621)

* [IOS] Fix build error of iOS RPC

- Update to C++14
- Use the latest RPC protocol
- Resolve CoreML dependency

* Fix clang-format error

* Fix three typos (#5620)

Co-authored-by: Zeng Liyong <[email protected]>

* [Frontend][Tensorflow] Gather nd bug fix for one dim support in tensorflow (#5588)

* [Frontend][Tensorflow] Gather_nd one dim support added

* Test case added

* Doc error handled

* Review comment handled: reverting new attr introduced

* Check added at mxnet frontend

* Doc error handled

* TFLite test case failure resolved

* [MXNET]MaxPool3d and AvgPool3d Ops support added (#5614)

* [PYTORCH]ReflectionPad2d op (#5624)

* [BYOC][MergeComposite] if root->args[i] isn't a CallNode, then Donwcast<Call> will check fail (#5623)

we needn't execute L131 "call_map->Set(arg, new_arg)", because when arg
is CallNode and root->args[i] is not CallNode, new_arg will be a null
pointer. There is no point in caching null pointer.

Signed-off-by: windclarion <[email protected]>

* [DOCS] Move the api docs to the api subfolder (#5626)

* [DOCS] Move the api docs to the api subfolder

* Update numpydoc location

* Ignore 403

* make sure folder exists

* [RELAY][BYOC] Fix the creation of tuple of tuples in PartitionGraph (#5616)

* [RELAY][BYOC] Fix the creation of tuple of tuples in PartitionGraph

If the annotated compiler region contains multiple outputs where
some of the outputs are tuple output, the current PartitionGraph will
create tuple of tuples. This will not be handled by the runtime.
This commit flattens the such tuples and re-create them after the
call site of the partitioned function.

Change-Id: I4e7ccbda73c129a9f4ae8705d5c9f2af6ab99ef6

* [RELAY][BYOC] Fix the creation of tuple of tuples in PartitionGraph

    *code refactor : extracted the passes as a sequential

Change-Id: If4bc00b00a96fa244358d602fc1a361498342f46

* [RELAY][BYOC] Fix the creation of tuple of tuples in PartitionGraph
   *further refactor

Change-Id: I69ddd0e835e88ef97da8a3a3b949be3f7b619c02

* [RELAY][BYOC] Fix the creation of tuple of tuples in PartitionGraph
    *class description comment amended

Change-Id: I55720bf0467c96e979e1ab56c40d9d209e0f9456

* [NODE][PASS] Introduce config to PassContext. (#5631)

This PR introduces a new config field to the PassContext
to allow it store arbitary config values.

To make sure that the config is validated, we allow each pass
to register the config key they would expect and the corresponding types.

We also introduce a CreateObject from Map<str, Object> to allow config creation
from a json-nest(like in vscode) in python.

We added an example of UnrollLoopConfig.

Followup PR should migrate the passes to use the new config field.

* another cmake fix (#5630)

* Fix typo in test script (#5635)

* Label Pattern Partitions (#5627)

* Label Pattern Partitions with a default label to prevent nested partitions and an optional user supplied-label

* Add node names in topological order to Partitioned attribute

* respond to review comments

* move partition tag into const in attr namespace

* [RELAY][PYTORCH]Resize3d, Upsample3d op support (#5633)

* [TUTORIAL]TFLite QNN Tutorial (#5595)

* [TUTORIAL]TFLite QNN Tutorial

* Review comments

* Extend AttrPattern to support CallNode and FunctionNode attributes (#5637)

* Extend AttrPattern to support CallNode and FunctionNode attributes

* Update tutorial and add breaks

* add func attr test

* [DOCS] Fix the QNN TFLite tutorial build (#5641)

* [TUTORIAL] Fix execution error of TFLite quantized tutorial

* Assign TensorCore to docs build

* [RUNTIME][VULKAN] Seg fault in WorkspacePool's destructor (#5632) (#5636)

* [RUNTIME][VULKAN] Seg fault in WorkspacePool's destructor (#5632)
* fixed this issue by changing WorkspacePool's destruction order

* make line < 100 charactors long

* [PYTORCH]Padding support (#5638)

* Remove unnecessary print (#5642)

* [CI] Allow CI_PYTEST_ADD_OPTIONS to be unbound. (#5644)

This patch allows the test script to execute normally
when CI_PYTEST_ADD_OPTIONS is not available.

* [Runtime] Introduce runtime::Array (#5585)

* Introduce runtime::Array

* Sync with dmlc-core

* Tests added: size, capacity, empty, front, back, push_back, pop_back, insert * 2, erase * 2, resize, reserve, clear

* [CI] Add log check to the sphinx gallery docs (#5643)

* [CI] Add log check to the sphinx gallery docs

This PR add log check to sphinx gallery tutorials to prevent
the case when sphinx failed to capture the error in tutorials.

* Fix the status

* [RELAY][BYOC] Preserve type information in Merge Composite (#5640)

Keep the type information when extracting patterns
so that it can be used as part of 'check' functions.

Change-Id: I16cc70c3d013a794d2ceefb5bec815129c7b8825

* Add a check Callback to the Pattern Paritioner (#5646)

* add a check callback to the paritioner

* fix doc string

* fix unit test spelling

* add a test with types

* [Relay, Topi][OP] Correlation (#5628)

* [Relay,Topi] Correlation

* fix

* move

* typo

* Update test_topi_correlation.py

* HG: Commit message of changeset 6281661. (#5622)

[Relay] Move compiler_begin/end_op to local static objects

* [AutoTVM] Update XGBoost verbosity option (#5649)

* [RUNTIME] Resolve constexpr issue in debug mode. (#5651)

static constexpr is a bit weird before c++17.
They are not inlined by default and does not have symbols after compilation.
It usually isn't a problem when they are inlined(in c++17 they are inlined by default).
But will create compilation error when passed to functions that take (const)references.
This PR fixes the problem so that we can compile on debugmode.

* µtvm debug improvements (#5648)

* Forever loop in UTVMDone to aid debugging

* Use parameter and callback function as a micro debug hook.

 * Previously, users had to uncomment a region of code in
   micro_session.cc and recompile to debug. Now they can pass in a
   key in the micro.Session config:

       config = tvm.micro.device....generate_config()
       config['debug_func'] = _python_launch_gdb
       with micro.Session(config) as sess:
         ....

* clang-format

* Only forever loop on device (on host this blocks unittests)

* [REFACTOR][IR] Migrate IRModule ObjectRef to not-null (#5654)

* Upgrade XGBoost to latest (#5658)

* Increase bss section size. (#5660)

* Likely broken in PR 5590.

* [PatternLang] Convert PatternGrouper to do pre-order, non-recursive analysis (#5653)

* make the PatternGrouper iterate over the input Expr in a non-recursive pre-order fasion

* add a comment

* [Relay,Topi][OP] affine_grid and grid_sample (#5657)

* [Relay,Topi][OP] affine_grid and grid_sample

* lint

* [TIR][BUILD] Remove buffer params from pass config. (#5652)

Buffer configurations can be passed during construction
and does not need to be part of the build config.

This is a refactor step to simplify the BuildConfig for the PassContext migration.

* handle likely in IRMutatorWithAnalyzer (#5665)

* [TOPI] Improve CUDA softmax scheduling (#5600)

- Do not use multiple kernels

- Schedule with warp reductions

- Fixed a bug on the lower warp memory pass

- Fixed warp shuffle intrinsics for the nvptx backend.

Signed-off-by: Wei Pan <[email protected]>

* [Relay][Op]Support symbolic TopK, Ones, Zeros and Full (#5459)

* Support symbolic TopK, Ones, Zeros and Full

* Fix pylint

* Add docstring for topk shape func

* Fix grad

* Fix lazy_gradient_init

* Fix parser

* Fix print ir text

* Fix lint

* Improve pattern_util

* Fix topk

* Fix build

* Use Optional for attribute

* Fix clang-format

* Minot fix

* Fix pylint

* Fix build warning

* Fix parser

* Move ToScalar

* Fix lint

* Fix lint

* Make topk shape func as data independent when k is constant.

* Fix lint

* Minor fix

* [PYTHON] Add buffer name when creating tensor bindings (#5670)

* [REFACTOR][TIR][API-Change] Migrate BuildConfig to PassContext. (#5668)

* [REFACTOR][TIR] Migrate BuildConfig to PassContext.

This PR migrates the TIR configurations from BuildConfig to the
PassContext used by the unified IR.
Moving forward, PassContext will be the unified way to configure passes in the TVM stack.

Changes

- Refactored TVM_PASS_REGISTER_CONFIG_OPTION to take in the reference type.
- Removed BuildConfig.
- Migrated the passes to use PassContext.

* Update include/tvm/ir/attrs.h

Co-authored-by: Zhi <[email protected]>

Co-authored-by: Zhi <[email protected]>

* [Doc] Misc doc fix (#5672)

* [C++ RPC] Fix C++ RPC build problem on Linux (#5671)

* enable amd_apu device on vulkan target (#5659)

* [AutoTVM][TOPI] AutoTVM incorrect measurement (#5511)

* [AutoTVM][TOPI] AutoTVM incorrect measurement

* create new placeholder with converted layout

* update _schedule_winograd

* [POC][PatternLang]Remove constants from partitioned functions (#5663)

* remove constants from partitioned functions

* remove print statements

* [TF] Support TupleWrapper as direct ancestor of control flow ops (#5639)

* add tvm.micro pydoc to sphinx (#5661)

* add tvm.micro pydoc to sphinx

* making build pass and addressing tqchen comments

* add a check for null function attributes (#5674)

* [BYOC] Pattern Language MergeComposite (#5656)

* Pattern Language MergeComposite

* fix DNNL pattern

* Use builtin binary operator syntax for demo

* Improve unit test

* add a testcase for #5674 (#5677)

* Call previous excepthook in tvm_excepthook. (#5675)

* Call previous excepthook in tvm_excepthook.

* Rename prev_excepthook.

* Create a tvm_wrap_excepthook to wrap a given excepthook with tvm custom excepthook work
and call it on system previous excepthook.

* Add docstring.

* Fix the shift column for scale_shift_nchw and scale_shift_nhwc in C topi (#5679)

* [Bugfix] Fix Python debugger segfaults with TVM built with LLVM (#5685)

* Import readline before loading libtvm

* make lint happy

* [DOC] Improve Pattern Language Docs (#5676)

* [DOC] Improve Pattern Language Docs

* address comments

* address comments

* [TFLITE]Quantize & Dequantize op (#5394)

* [TFLITE]Quantize & Dequantize op

* Testcases added

* Review comment fixed

* [TIR][REFACTOR] std::string -> String Migration in TIR nodes (#5596)

* [TIR][REFACTOR] std::string -> String Migration for Var node and SizeVar Node

* update json_compact.py

* [PatternLang] Add ConstantPattern (#5689)

* Add ConstantPattern

* update doc

* [PYTORCH]Minor bug fixes (#5683)

* [PYTORCH]Minor bug fixes

* Review comment fix, testcase added

* Added testcase for bert model

* [Relay] Fix dataflow_pattern.rewrite() hang if Match in IR (#5680)

rewrite() quits only if graph stop changing, but ExprMutator
  always creates new Match node. This patch fixes this.

* [RELAY] Fix segfault in pretty print when ObjectRef is null (#5681)

* [RELAY] Fix segfault in pretty print when ObjectRef is null

Encountered when pretty printing module with function attribute equal to NullValue<ObjectRef>().

Change-Id: I2e7b304859f03038730ba9c3b9db41ebd3e1fbb5

* Add test case

Change-Id: I579b20da3f5d49054823392be80aaf78a055f596

* [REFACTOR][RELAY] move fallback_device to config (#5690)

* @zhiics -> PPMC (#5692)

* [COMMUNITY] @masahi -> PPMC (#5691)

* Support more dtypes for TVMDSOOp (#5694)

* [ONNX]LpPool Support added (#5696)

* In memory_plan, check if value is not None, instead of just checking value as boolean. (#5700)

* [PatternLang]Conditionally Embedding Constants in Partitioned Functions (#5693)

* Embed constants in the partition function if the pattern explicity requests constants

fix rst

fix pylint

* improve comments based on Cody's feedback

* [ONNX] Skip ADD inside Gemm op when vector is zero (#5697)

* [BYOC] Support Tuple Output in C/DNNL Codegen (#5701)

* Support tuple output runtime

* fix unit test

* [REFACTOR][RELAY] Replace build_config with PassContext (#5698)

* [PYTORCH]floor_divide support for squeezenet (#5702)

https://github.com/apache/incubator-tvm/issues/5133#issuecomment-636330705

* [AutoTVM][TOPI] Fix bifrost spatial packing conv2d auto tune (#5684)

* [AutoTVM][TOPI] Fix bifrost spatial packing conv2d auto tune

* [AutoTVM][TOPI] Putting placeholder replacement in compute

* Fix winograd kernel replacement

* Fix sanity check: Line too long

* [Arith] ExtendedEuclidean merge impl to int_operator (#5625)

* fix typo: anchor windoes should be anchor windows (#5706)

* [REFACTOR][PY] relay.op.Op -> tvm.ir.Op (#5705)

* [REFACTOR][PY] relay.op.Op -> tvm.ir.Op

* Improve the error check

* [PatternLang] Simplify Pattern API Implementations (#5703)

* Add syntatic sugar; include pattern to API docs

* fix doc warnings

* [PYTORCH]ReplicationPad support added (#5708)

* Remove deprecated opengl files (#5711)

* Remove opengl runtime and cmake (#5712)

* [BUGFIX][CRT] Fix Compilation Error in CRT (#5713)

* Rename tvm_dso_op to libtvm_dso_op (#5714)

* [Object] Unify StrMapNode and MapNode (#5687)

* Pass cpptest and py unittest

* fix graph runtime

* right fix

* fix a bug that runtime::String's operator < is actually compare by address

* Update container.py

* Renaming

* Address comments

* lint

* Replace ObjectHash in object.py

* [MXNET]Softmin, trunc op support added (#5715)

* Avoid downloading when TOPHUB_LOCATION is NONE (#5720)

* [Object][FFI] Introduce runtime::String::CanConvertFrom (#5718)

* [Object][FFI] Introduce runtime::String::CanConvertFrom

* Update container.h

* [Object] Restore the StrMap behavior in JSON/SHash/SEqual (#5719)

* Fix generating types like float44 and float88 (#5722)

* [ONNX]ReduceL1, ReduceL2, ReduceSumSquare, ReduceLogSum ops added (#5721)

* [TENSORFLOW]StatefulPartitionedCall/PartitionedCall Ops support added  (#5617)

* Implemented functionInvocation Unit Test for StatefulPartitionedCall operator(working) and initial changes for placeholder(not working as of now)

* Placeholder exercises with tvm

* placeholder interim

* SPOP Test cases structure

* New test cases for spop

* miscellaneous test cases for spop

* Placeholder samples..working with shapes explicitly passed

* Variables test case. Works with the same fix of shape_dict

* SPOP Positive test cases first iteration

* support output tensors as function args, multiple functions

* Corrected Indentation

* filewritter is only for debug purpose

* support variables in function args

* First working iteration of positive spop test cases

* Removed commented code, simplified code

* Code Reorganization- First working iteration of positive spop test cases

* corrected variable name after refactor

* Code Reorganization- First working iteration of positive spop test cases

* move code inside mapped operator function

* Removed extra line

* support variables in function args

* Removed commented code, simplified code

* move code inside mapped operator function

* Code Reorganization- First working iteration of positive spop test cases

# Conflicts:
#	tests/python/frontend/tensorflow/test_forward.py

* Code Reorganization- First working iteration of positive spop test cases

* Function invocation more test cases

* Simplified & Merged different Function Invocation Test cases

* support invocation of nested callables

no need to explicitly handle paratitioned and
statefulPartitioned condition in convert_operator function

* Simplified and Uniform testcases

* support invocation of nested callables

no need to explicitly handle paratitioned and
statefulPartitioned condition in convert_operator function

* Simplified and Uniform testcases

* removed duplicate and renamed testcase

* Negative scenario added for testing operator statefulness. Only Exception to stateful operators are Partitioned & StatefulPartitionedOp which have capability to execute even stateless operators within them

* Miscellaneous reorganization changes for spop scenarios

* Miscellaneous reorganization changes for spop scenarios

* Corrected import of tensorflow modules safely using try except and other code reorganization

* Negative scenario for resource variables handled

* Documentation update for code

* SPOP change in function handling

* handle nested subgraph

* refactor

* get op def compatible with tf 1x & 2x

* Fixed liniting issues

* added doctsring and few nits

* Merged changes for positive test cases and negative test cases

* Moved StatefulPartitionedCall test case to the end of the TC list

* Fixed some typos and semantics

* dmlc-core

* dmlc-core

* fixes

* Addressing Review comments in the PR for SPOP support

* Fixed pylint errors

* Corrected tensorflow import syntax

* Placed the op_def_registry module import outside of for loop

* Removed new stateful operators list and combined these operators with missing operators to display as single list. Also removed throwing seperate exception for stateful ops

Co-authored-by: Prashant Sail <[email protected]>
Co-authored-by: maheshambule <[email protected]>

* [AutoTVM, Relay] Clear compile engine after task extraction (#5724)

* Fix runtime::String backward compatibility in JSON (#5725)

* codegen llvm: move nvptx-specific intrinsic handling into codegen_nvptx (#5726)

See discussion in #5600.

I'm also throwing in a pointer lifetime fix for the context held by
NVPTX because otherwise topi/tests/python/test_topi_softmax.py
would sefault for me. With the test, I can also run resnet-18 on
the nvptx target in gpu_imagenet_bench.py.

* [TOPI,RELAY][TFLITE] Sparse to dense operator (#5447)

* [Relay][Frontend][TFLite] Add parser support for shape and range

Signed-off-by: Dhruva Ray <[email protected]>

* [TOPI,RELAY][TFLITE] Sparse to dense operator

Signed-off-by: Dhruva Ray <[email protected]>

* use param name in documentation

Signed-off-by: Dhruva Ray <[email protected]>

* sphinx doc errors fixed

Signed-off-by: Dhruva Ray <[email protected]>

* incorporated review comments

Signed-off-by: Dhruva Ray <[email protected]>

* Missing a blank line...

Signed-off-by: Dhruva Ray <[email protected]>

* use get_tensor_expr

Signed-off-by: Dhruva Ray <[email protected]>

* Accidently removed this function in the rebase...

Signed-off-by: Dhruva Ray <[email protected]>

* support default value for default_value

Signed-off-by: Dhruva Ray <[email protected]>

* clang format fixes

Signed-off-by: Dhruva Ray <[email protected]>

* topi pylint fixes

Signed-off-by: Dhruva Ray <[email protected]>

* [Frontend][TFLite] Add parser support for shape and range (#5329)

* [Relay][Frontend][TFLite] Add parser support for shape and range

Signed-off-by: Dhruva Ray <[email protected]>

* Incorporated review comments and used new functions

Signed-off-by: Dhruva Ray <[email protected]>

* Few cosmetic changes

Signed-off-by: Dhruva Ray <[email protected]>

* Removed an extra line added by rebase...

Signed-off-by: Dhruva Ray <[email protected]>

* [REFACTOR] Separate ArgTypeCode from DLDataTypeCode (#5730)

We use a single enum(TypeCode) to represent ArgTypeCode and DLDataTypeCode.
However, as we start to expand more data types, it is clear that argument
type code(in the FFI convention) and data type code needs to evolve separately.
So that we can add first class for data types without having changing the FFI ABI.

This PR makes the distinction clear and refactored the code to separate the two.

- [PY] Separate ArgTypeCode from DataTypeCode
- [WEB] Separate ArgTypeCode from DataTypeCode
- [JAVA] Separate ArgTypeCode from DataTypeCode

* [ONNX]MaxRoiPool, Mod & Xor op support added (#5729)

* ROCm: Add warp shuffles and enable reductions (#5727)

Thank you @masahi and @wpan11nv for the feedback

* Change 'delete's in Relay VM Instruction dtor to 'delete[]'s (#5735)

* Fix reshape usage in ARM Winograd (#5732)

* [TEST] Fix flaky topi/tests/python/test_topi_pooling.py:test_adaptive_pool (#5736)

* Fix the values for test_fmod since it fails way too often otherwise (#5723)

* fix small bug about dense_grad (#5695)

* [REFACTOR][ARITH] Remove legacy compute_expr.h (#5738)

Replaces most of the ComptuteReduce using foldl.

* Add some docs on downstream consistency (#5742)

https://github.com/apache/incubator-tvm/pull/5730#issuecomment-639567636

* sequential cpp test (#5745)

* [REFACTOR][TE][TIR] Call::Halide => ProducerLoad, DSL/TIR decouple. (#5743)

In the HalideIR's design, DSL components and IR are mixed together.
For example, Call::Halide can containa reference to a function which is
constructed in the tensor expression language.

While this coupled design simplifies certain aspect of the DSL construction,
it prevents the TIR to evolve as a clean standalone IR:

- The additional tensor expression provided in the function is opaque to the IR
  and may become obsolete as we transform them.
- The duplication of the information in the DSL tensor and IR makes it hard to
  design a stand-alone text format (when there are elements shared in the tensor
  expression and normal statements).

This PR aims to clearly de-couple the TIR from high-level DSL structures(tensor expression),
while still provide clear extensions to build DSLs on top of the TIR.

We introduce a DataProducer as a base class for high level tensor expressions objects
that produce data. We then introduce ProducerLoad to replace the Call::Halide usage,
so that the Call node can always be self contained and used for low-level calls.

The high-level tensor expression DSL can still generate a PrimExpr that contains a ProducerLoad.
These PrimExprs contains fragments of information that can be combined together to
generate a low-level TIR PrimFunc.

We also state clearly that DataProducer **should not** appear in any TIR PrimFunc.
Instead, the high-level DSL layer should lowered DataProducers to Buffers and TIR statements
that produces these buffers. We can further provide verifications to validate such invariance.

Changes:
- Introduce DataProducer to serve as a base class for Tensor in tensor expressions.
- Migrate use of Call::Halide to ProducerLoad
- Migrate the other usages of Calls.

We will also create follow-up PRs to migrate the remaining two DSL related IR nodes(Realize/Provide)
to use the DataProducer.

* Don't add cast for TF batch norm when type isn't changing (#5731)

* [ARITH][BACKPORT-0.6] fix a min/max simplify bug (#5749)

* fix a min/max simplify bug

* fix cpplint

* turn into oposite when c1val<0 and add more case

* fix c1=0

Co-authored-by: xqdan <[email protected]>

* [TOPI][Relay][OP] support dynamic NMS(Non Maximum Suppression), symbolic begin, end, and strides for strided_slice (#4312)

* [TOPI][Relay][OP] Dynamic NMS and strided_slice

* Incorporate comments

* fix nnvm compatibility issues

* fix InferCorrectLayout

* Minor fix

* fix for fuse

* Workaround to pass batch_size into hybrid function to handle dynamic shape

* Seperate rearrange

* fix lint

* fix ci, comments

* change attr to Optional<T>

* clang format

* remove empty lines

* partial ignore for end of strided_slice

* pylint

* add out_indices for gpu get_valid_counts

* change to slice_mode

* clang-format, fix comments

* fix comment

* change slice_mode to string

* fix CI

* update docstring

Co-authored-by: Yao Wang <[email protected]>

* Update dmlc_tvm_commit_id.txt

* Update TRT Integration to reflect upstream changes

* Sync submodules

* Fix jenkinsfile

* git-clang-format against origin/dev instead of origin/master

* Fix formatting.

* Remove is_empty in export_lib (used for old trt)

* Disable test_forward_qnn_mobilenet_v2_net

* Add Scatter to Topi/Relay/ONNX via hybrid script (#5619)

* I can construct scatter but not embed it in a Relay Graph

* working 1-4 dimesion scatter

* add scatter to ONNX

fix lint

* isolate tests to cpu backend

* Fix i386 test

* fix gpu tolerance

* use elemwise_shape_func for scatter

* fix incorrect rebase

* [Minor][Test] Clean WASM environment before build (#5759)

* [Bugfix] Fix reshape (#5739)

* Fix reshape

* fix doc warning

* fix ci

* address comments

* [REFACTOR][TIR] Provide->ProducerStore, Realize->ProducerRealize. (#5750)

This PR finishes up the final step for DSL/TIR de-coupling to refactor
Provide/Realize to use the DataProducer.

As in the case of ProducerLoad, ProducerStore/Realize are not supposed
to appear in a vaid TIR function ans are only used by high-level DSLs
as intermediate structures.

* [Rust] Second stage of Rust Refactor (#5527)

* Add tvm-rt crate

* Backport changes from frontend branch

* Format

* Add ASF headers

* Address self-code review

* Replace with helper

* Fix lint

* Fix

* Clean up repro debugging

* WIP

* Remove global resgistry to fix one memory issue

* Fix

* Format

* Format

* Update rust/tvm-rt/README.md

Co-authored-by: Jason Knight <[email protected]>

* Format

* Duplicate TVM macros

* Split macros

* Restore old macro for old crates

* Repair macros

* Fix format

* Format

Co-authored-by: Jason Knight <[email protected]>

* [topi] block sparse dense on cuda (#5746)

* [Relay] Fix for recursive let (#5757)

* Make let processing iterative

* Try again

* Fix pretty printer overflow

* cleanup

* fix lint

* Fix text printer

Co-authored-by: Jared Roesch <[email protected]>
Co-authored-by: Jared Roesch <[email protected]>

* [TOPI][RELAY][PYTORCH]Conv3d_transpose op support added (#5737)

* [TOPI][RELAY][PYTORCH]Conv3d_transpose op support added

* Test cases in topi/relay

* conv3d_transpose_ncdhw_python added

* Review comments fixed

* Fix gelu in PyTorch frontend, tighten numerical checks (#5763)

Previously, the PyTorch frontend approximated gelu with fastgelu.
To provide a more faithful conversion, we implement gelu instead.

We also tighten the numerical comparisons between PyTorch and
TVM-from-PyTorch to 1e-5. The object detection models need an
increased tolerance of 1e-4 to pass.

I had to throw in a few fixes for missing conversions
(probably due to working with very new PyTorch).

I must admit the GoogLeNet/NasNet test didn't run on my machine,
probably due to problems at my end.

* Add ShapePattern and DataTypePattern (#5760)

* Make batch matrix multiplication on GPU tunable (#5752)

This is primarily aimed at the AMD GPU backend and done as part
of a project for AMD, but should work for all users of the GPU
schedule.

* [TIR][REFACTOR][API-Change] Migrate the tvm/tir/expr.h to construct style. (#5773)

This PR migrate tvm/tir/expr.h to the new constructor style that is
consistent with the rest of the codebase and changes the affected files accordingly.

* [TIR][REFACTOR][API-Change] Migrate tir/stmt.h to use constructor. (#5778)

This PR migrate tvm/tir/stmt.h to the new constructor style that is
consistent with the rest of the codebase and changes the affected files accordingly.

* [Frontend][TensorFlow] Improve Control Flow and TensorArray (#5699)

* Improve TF parser control flow and tensor array

* Fix tf tensor array scatter

* Add ssd test

* Add back static ta test

* Minor fix for frontend and test_forward

* SplitRel for dynamic shape

* Fix test ssd

* Fix loop var naming issue

* Minor improve

* Fix format

* Fix clang format

* Fix tensor array in pytorch frontend

* Fix stack size issue for ssd test

* Address comments

* Fix slice size

* Fix build

* Rebase

* [DOC][FIX] Fix some typos in git-clang-format.sh (#5786)

* fix #5686: remove a overstrict assert in MakeAllreduce (#5686) (#5785)

* [RUNTIME] Add compile_shared option to linux compile utility fn (#5751)

* feat: Add compile_shared option to linux compile fn

* feat: Add compile_shared option for linux compile util fn

* fix: Fix minrpc testcase use executable compilation

* fix: Fix binutil case where call create_shared to create executable

Co-authored-by: baoxinqi <[email protected]>

* [REFACTOR][API-Change] Migrate all Object construction to constructor. (#5784)

This PR migrates all the remaining object constructions to the new constructor style
that is consistent with the rest of the codebase and changes the affected files accordingly.

Other changes:

- ThreadScope::make -> ThreadScope::Create
- StorageScope::make -> StorageScope::Create

* [Topi] pass-by-value -> pass-by-const-reference (#5783)

* [topi][relay] Add operation gather to relay. (#5716)

* [CODEGEN][CONTRIB] CoreML codegen (#5634)

* [CODEGEN][CONTRIB] CoreML codegen

* import coremltools only when it is necessary

* fix pylint errors

* don't import contrib.coreml when using runtime lib

* skip coreml codegen test in CI

* don't register relay.ext.coremlcompiler in __init__.py

* move tvm/contrib/coreml.py to tvm/contrib/target/coreml.py

* use existing transformers for graph partitioning

* skip test only when coremltools is not available

* add check for annotation

* move _register_coreml_op to python/tvm/relay/op/contrib/coreml.py

* skip compile when xcode is unavailable

* relay.op.Op -> tvm.ir.Op

* set USE_COREML on

* refine test

* fix calibration pass to support multiple functions (#5768)

Co-authored-by: Ubuntu <[email protected]>

* [cmake] update vulkan rules (#5777)

* Add ignore storage_order attribute to onnx pooling parser. (#5781)

* [BYOC][FIX] Infer types in MergeComposite (#5766)

If InferType isn't run between partitioning passes,
function calls are inserted which don't have a type.
This can result in failures for patterns which want
to check types.

This works around it simply by running InferType after
every partitioning.

Change-Id: Ie0887f0564a41eb0913bfe42a362e8effe9681b9

* [FRONTEND]Darknet support batch size for yolo (#5688)

Fix the issue reported in 
https://discuss.tvm.ai/t/yolov3-tiny-batch-input-test-failed/6796

* Update dmlc_tvm_commid_id.txt

* Skip tflite test_forward_mediapipe_hand_landmark

* Increase stack limit for failing tflite tests. Skip TF tests which require TF 1.x

* [PYTORCH]aten::norm support added (#5776)

* [TENSORFLOW]Conv3d Transpose OP added (#5775)

* [TENSORFLOW]Conv3d Transpose OP added

* Testcase updated, tf cpu supports only ndhwc

* [TF] Support symbolic inputs of Fill (#5762)

* [TF] Support symbolic inputs of Fill

* Rebase and simplify. Value has been converted to constant if it is
tf.Constant

* [COMMUNITY] @wpan11nv -> Reviewer (#5790)

* Edit onnx parser to infer values in post order (#5755)

* edit onnx parser to infer values in post order to speed up onnx imports with many calls to infer_value

* fix pylint

* [TIR][REFACTOR] Cleanup unused classes (#5789)

* Fix tf parser (#5794)

* support aten::type_as in the pytorch frontend (#5787)

* support aten::type_as in the pytorch frontend

* use _convert_data_type to convert torch type to tvm type and add more types in the type_as test

* [TIR][REFACTIR] Update TIR nodes std::string->String. (#5793)

This PR updates the remaining TIR node's member to use
String instead of std::string.

* [TEST] Temporary disable fp16 type_as test for PyTorch Frontend (#5799)

* [ONNX] Skip multiply with 1.0f constant for GEMM import (#5800)

* [ONNX] Skip ADD inside Gemm op when vector is zero

* [ONNX] Skip multiply with 1.0f constant for GEMM import

* [TIR][REFACTOR] Add tir prefix to type keys (#5802)

* [QUANTIZE] Add config switch for nn.dense layer type. (#5801)

* [topi] fix sparse dense schedule on cuda (#5803)

* Allow RPCWrappedFunc to rewrite runtime::String as std::string (#5796)

* [topi] fix strategy for sparse dense cuda (#5782)

* [CI] Move cpu-only frontend tests to a CPU stage (#5807)

* [MXNET]conv3d and conv3d_transpose addedx (#5814)

* Pin hand landmark network to version 0.7.4. (#5813)

* Versions above 0.7.4 are broken due to changes in the
   quantization operations in the model, which are current
   not supported by TVM.

Fixes #5774.

* [CI] Limit number of threads in all jobs (#5815)

* Update dmlc_tvm_commit_id.txt

* Disable tensorflow.test_forward_sdd because stack limit of 100mb is exceeded by WellFormedChecker

Co-authored-by: Samuel <[email protected]>
Co-authored-by: ANSHUMAN TRIPATHY <[email protected]>
Co-authored-by: wsl-inspur <[email protected]>
Co-authored-by: Krzysztof Parzyszek <[email protected]>
Co-authored-by: Matthew Brookhart <[email protected]>
Co-authored-by: Mahesh Ambule <[email protected]>
Co-authored-by: Tianqi Chen <[email protected]>
Co-authored-by: Animesh Jain <[email protected]>
Co-authored-by: Ubuntu <[email protected]>
Co-authored-by: Thierry Moreau <[email protected]>
Co-authored-by: tobe <[email protected]>
Co-authored-by: Jared Roesch <[email protected]>
Co-authored-by: Nick Hynes <[email protected]>
Co-authored-by: Tang, Shizhi <[email protected]>
Co-authored-by: Bohan Hou <[email protected]>
Co-authored-by: Wei Pan <[email protected]>
Co-authored-by: Tom Gall <[email protected]>
Co-authored-by: MORITA Kazutaka <[email protected]>
Co-authored-by: masahi <[email protected]>
Co-authored-by: Haichen Shen <[email protected]>
Co-authored-by: Ramana Radhakrishnan <[email protected]>
Co-authored-by: Menooker <[email protected]>
Co-authored-by: Josh Fromm <[email protected]>
Co-authored-by: lixiaoquan <[email protected]>
Co-authored-by: Li Xiaoquan <[email protected]>
Co-authored-by: Candy <[email protected]>
Co-authored-by: LiangLiu <[email protected]>
Co-authored-by: lhutton1 <[email protected]>
Co-authored-by: Giuseppe Rossini <[email protected]>
Co-authored-by: Andrew Reusch <[email protected]>
Co-authored-by: Liangfu Chen <[email protected]>
Co-authored-by: Michal Piszczek <[email protected]>
Co-authored-by: Zhi Chen <[email protected]>
Co-authored-by: Zhi <[email protected]>
Co-authored-by: Dhruva Ray <[email protected]>
Co-authored-by: Liyong Zeng <[email protected]>
Co-authored-by: Zeng Liyong <[email protected]>
Co-authored-by: Yao Wang <[email protected]>
Co-authored-by: windclarion <[email protected]>
Co-authored-by: manupa-arm <[email protected]>
Co-authored-by: Wuwei Lin <[email protected]>
Co-authored-by: Yi Wang <[email protected]>
Co-authored-by: Cody Yu <[email protected]>
Co-authored-by: Junru Shao <[email protected]>
Co-authored-by: mbaret <[email protected]>
Co-authored-by: hlu1 <[email protected]>
Co-authored-by: Philip Hyunsu Cho <[email protected]>
Co-authored-by: Zhao Wu <[email protected]>
Co-authored-by: Mei Ye <[email protected]>
Co-authored-by: Neo Chien <[email protected]>
Co-authored-by: notoraptor <[email protected]>
Co-authored-by: Balint Cristian <[email protected]>
Co-authored-by: Rand Xie <[email protected]>
Co-authored-by: abergeron <[email protected]>
Co-authored-by: Deepak <[email protected]>
Co-authored-by: Prashant Sail <[email protected]>
Co-authored-by: maheshambule <[email protected]>
Co-authored-by: Thomas Viehmann <[email protected]>
Co-authored-by: akosik-anyvision <[email protected]>
Co-authored-by: handar423 <[email protected]>
Co-authored-by: xqdan <[email protected]>
Co-authored-by: xqdan <[email protected]>
Co-authored-by: Yong Wu <[email protected]>
Co-authored-by: Jason Knight <[email protected]>
Co-authored-by: Zijing Gu <[email protected]>
Co-authored-by: Jared Roesch <[email protected]>
Co-authored-by: majiang31312 <[email protected]>
Co-authored-by: wrongtest <[email protected]>
Co-authored-by: baoxinqi <[email protected]>
Co-authored-by: Yi-Hsiang (Sean) Lai <[email protected]>
Co-authored-by: Ubuntu <[email protected]>
Co-authored-by: Bing Xu <[email protected]>
Co-authored-by: Leandro Nunes <[email protected]>
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Jun 18, 2020
* Add ConstantPattern

* update doc
trevor-m pushed a commit to neo-ai/tvm that referenced this pull request Jun 18, 2020
* Add ConstantPattern

* update doc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants