[Relay] External codegen #4482

zhiics · 2019-12-08T22:24:50Z

Part of #4258 to make the review process easier.

This PR adds the external codegen for Relay. It contains the following changes

C source style codegen to generate TVM compatible C library wrapper that can be compiled and library together with DSOModule
DNNL codegen that generates DNNL kernel wrappers to execute part of a Relay program
DNNL library kernel that initializes dnnl engine for execution.
graphruntime execution
All external functions are collected for codegen and an array of external runtime modules are returned and imported to the DSOModule.
Various unit tests

Followup PRs will to send separately to cover the following aspects:

VM execution
Move comprehensive dnnl kernels
Annotation and partitioning

CC @tqchen @soiferj @masahi @jroesch @icemelon9 @u99127 @comaniac

src/runtime/contrib/dnnl/dnnl_kernel.h

src/relay/backend/contrib/csource/codegen.cc

src/relay/backend/contrib/dnnl/codegen.cc

include/tvm/build_module.h

cmake/config.cmake

cmake/modules/contrib/Extern.cmake

src/relay/backend/contrib/contrib_codegen.h

src/relay/backend/contrib/csource/codegen.cc

src/relay/backend/contrib/dnnl/codegen.cc

src/relay/backend/contrib/csource/codegen.cc

masahi

LGTM modulo minor issues. I've also verified the dnnl example.
Looking forward to upcoming PRs :)

tests/python/relay/test_external_codegen.py

tqchen

I see see several places that contains "extern" keyword. It would be great if we can avoid the "extern" terminology and instead focus on generate c code, which is more clear.

e.g. contrib/codegen_c/codgen_c.h

CMakeLists.txt

src/codegen/build_module.cc

src/relay/backend/contrib/dnnl/codegen.cc

src/relay/backend/contrib/contrib_codegen.h

comaniac · 2019-12-13T19:22:01Z

Thanks @tqchen, we've addressed your comments. Please take another look when you get a chance.

src/relay/backend/contrib/codegen_c/codegen.cc

tqchen · 2019-12-16T15:59:04Z

Some additional comments on API design:

The keyword IsExternal is still quite confusing. I would suggest make the attribute "compiler", alternatively we can call it a "backend". When it is set, we will look up for the customized compiler hook. The value "default" invokes the default compilation pipeline.
Let us deliberate a bit on the FuncName keyword. While I understand it is necessary to attach such attribute before lowering to that ensure consistent symbol lookup, is it the right name?

More broadly, as we start to introduce more attributes to functions, it would be great to come up with a naming convention and document them

zhiics · 2019-12-16T18:31:46Z

@tqchen how about this?

The keyword IsExternal is still quite confusing. I would suggest make the attribute "compiler", alternatively we can call it a "backend". When it is set, we will look up for the customized compiler hook. The value "default" invokes the default compilation pipeline.

Let us deliberate a bit on the FuncName keyword. While I understand it is necessary to attach such attribute before lowering to that ensure consistent symbol lookup, is it the right name?

Change "external" to "compiler". The value is compiler+id, e.g. "dnnl_1" and "dnnl_2" so that we can have a unique name for symbol lookup. And then we can just remove the "FuncName" attribute. Or do you have other suggestions?

More broadly, as we start to introduce more attributes to functions, it would be great to come up

We have moved them to relay::attrs and documented.

include/tvm/relay/expr.h

tqchen · 2019-12-16T21:39:54Z

compiler attr sounds good, I think we want something like FuncName to indicate the symbol name, but not sure about the choice of the naming

zhiics · 2019-12-16T22:27:11Z

@tqchen How about CustomFuncSymbol?

tqchen · 2019-12-16T22:40:10Z

Let us also hear about others' opinions, perhaps create a discuss thread in the forum and list candidates, i think something along the direction of symbol, export_func_name, unique_id might makes sense. Note the same attribute might be used for compiling non-custom functions(in the case when the user wants to force a certain name)

zhiics · 2019-12-17T17:24:36Z

@tqchen We've resolved the naming issues, could you please take another look? Thanks.

zhiics · 2019-12-18T02:08:20Z

@tqchen Any other outstanding issues or concerns?

tqchen · 2019-12-18T03:18:07Z

Thanks @comaniac @zhiics @liangfu @masahi !

@cchung100m

* Change upstream url * Fix bias_add gradient (apache#4516) * Fix bias_add gradient A change caused collapse_sum_like to reject implicit dimension broadcasting for bias_add gradient, so switch to explicit sum reduction on the non-bias axis dimensions. * Lint fix * [Bugfix][Frontend][TFlite] Fix wrong function call in TANH tests (apache#4517) * Replace sigmoid() with tanh() in tests for TANH * Fixed extra reshape parameter bug. (apache#4524) * Use the best tuner possible (apache#4397) * Use the best tuner possible * Add comment denoting availability of better tuners * Fix typos and wording * [ir] use DataType instead of Type for readability because Type has been deprecated (apache#4513) * add bfloat16 typeflag support (apache#4525) * fix empty config caused KeyError (apache#4520) * fix onnx shape dtype (apache#4528) * fix crash issue in tsim backend (apache#4527) * PIL is depreciated and should be replaced with pillow (a fork of PIL) (apache#4533) Change-Id: If2075df5475505f2da87dae7145af5a7ab83d8a4 * [Relay] External codegen (apache#4482) * Update legacy places from nnvm to relay. (apache#4535) * Update legacy places from nnvm to relay. This PR prepares the current mainline to remove nnvm compiler dep. * remove legacy stage * Implement 1d deconvolution (apache#4476) * [relay][op] add expand op (from ONNX) to relay frontend (apache#4483) * Add Expand to onnx.py * add test function for expand * Fix a onnx frontend test * Add tests for the value itself instead of shape only on test_expand * Cleaned up some unnecessary modifications. * [TOPI] Allow batch matmul to be fused into injective ops (apache#4537) * [TOPI] Fixed nms max_output_size loop (apache#4541) One of the loops in hybrid_nms used for performing the max_output_size reordering was incorrectly designated as parallel resulting in incorrect behaviour. This patch changes that loop to a serial loop. Change-Id: I97184f5887f5f028d8ab339fa2808eb7630a4017 * [DOCS] Mention Ninja build system in install/from_source.rst (apache#4554) * [DOCS] Mention Ninja build system in install/from_source.rst * Address comments * [PYTHON][FFI] Cythonize NDArray.copyto (apache#4549) * [PYTHON][FFI] Cythonize NDArray.copyto * Cythonize the shape property * vm external codegen (apache#4544) * [COMMUNITY] @cchung100m -> reviewer (apache#4557) * [VTA] improved virtual memory mapping (apache#4545) * [VTA] improved virtual memory mapping * Update virtual_memory.cc * [IR] fix style in ir_mutator and ir_visitor (apache#4561) * [RUNTIME][VULKAN] Fix compiler warning (apache#4559) * [REFACTOR][DTYPE] Isolate dtype to runtime (apache#4560) dtype.h -> runtime/data_type.h Changes: - Rename all old reference of tvm::Type to DataType - ExprNode.type -> ExprNode.dtype - Expr.type() -> Expr.dtype() - Change Expr related functions to expr_operator. - DataType::min() -> min_value(DataType) - DataType::max() -> max_value(DataType) - Move type constructor Int, UInt, Float, Handle, Bool into DataType. - Int(bits) -> DataType::Int(bits) - UInt(bits) -> DataType::UInt(bits) * Support standardize runtime module (apache#4532) * [Relay][Frontend][ONNX] Support auto_pad in Conv and ConvTranspose (apache#4563) * [TEST] Remove nnvm related code in topi and test script (apache#4562) * [TEST] Remove nnvm related code in topi and test script * Remove docs dep * [Relay] add max_pool3d in relay and TF converter (apache#4551) * [Relay] add max_pool3d in relay and TF converter * fix comments * Remove nnvm (apache#4565) * [VTA][Chisel] End-to-end Inference with Chisel VTA (apache#4574) * [VTA][Chisel] End-to-end Inference with Chisel VTA * Update TensorAlu.scala * remove unnecessary cast to int32 (apache#4573) * Fix llvm-enabled build by adding missing intrinsics headers (apache#4575) * [DEPRECATION] Remove NNVM compiler (apache#4571) * Remove NNVM compiler * [Relay/Topi][Op] Added native DepthToSpace and SpaceToDepth Operators (apache#4566) * Added tvm function stencil for subpixel operations to topi. * Topi subpixel operators added and tested. * Added subpixel attrs. * Added depth_to_space relay attributes. * depth_to_space fully working. * Fixed NHWC shape bug. * SpaceToDepth in and all tests passing. * lint fixes. * Added string include * Fixed topi formatting. * Added DCR/CDR mode to depthtospace operator. * [DOC] fix doc in api.py (apache#4580) * [DEPRECATION] Cleanup legacy verilog support (apache#4576) This PR cleans up the left over code for legacy verilog support which was experimental. The new hardware backend path is now support by VTA via TSIM. * [RUNTIME] Remove Extension VTable in favor of Unified Object system. (apache#4578) Before the unified object protocol, we support pass additional extension objects around by declaring a type as an extension type. The old extension mechanism requires the types to register their constructor and deleter to a VTable and does not enjoy the benefit of the self-contained deletion property of the new Object system. This PR upgrades the extension example to make use of the new object system and removed the old Extension VTable. Note that the register_extension funtion in the python side continues to work when the passed argument does not require explicit container copy/deletion, which covers the current usecases of the extension mechanism. * Some Windows and MSVC fixes (apache#4569) * fix python exception creation in Windows * better string conversion for msvc * fix cpp style issue * [NEWS] add v0.6 release (apache#4558) * [NEWS] add v0.6 release * remove link prefix * fix issue number * [DOCS]fix typos in autotvm tutorial (apache#4585) * [Quantization, Calibrate] Fix context creation when current_target is explicity set (apache#4582) * [Container] Fix NDArray SaveDLTensor declaration and implementation signature different (apache#4586) * [TOPI][AutoTVM] NHWC conv2d templates for ARM (apache#3859) * [AutoTVM][TOPI] NHWC conv2d templates (spatial pack) for ARM As some frontends (tflite for example) are using NHWC as the default layout, we are enabling NHWC schedule templates in TOPI and AutoTVM. * some comments fix * [FIX][TOPI][X86] schedule dense pack (apache#4539) * [Relay] Convert Layout Pass. (apache#4335) * [Relay][AlterLayout] Broadcast with scalar shape (apache#4577) * [TOPI] add 3D upsampling Op. (apache#4584) * [TOPI] add 3D upsampling Op. * fix lint issues * change align_corners to coordinate_transformation_mode * fix resize3d half_pixel * make a simple function and clean up trilinear_resize3d_python * fix doc * [Runtime] add necessary const qualifier for NDArray container of parameters (apache#4590) * [autotvm] fix typos in comment (apache#4591) * fix tf.compat.v1 issue for tf verison <=1.12 (apache#4593) * [FRONTEND][TF] conv2d_transpose 'SAME' support kernel more than 1x1 (apache#4484) * [FRONTEND][TF] conv3d_transpose 'SAME' support kernel more than 1x1 * revised per as review comments * add more fallback wolkaround to make all tests pass * [GraphRuntime] Support parameter out in the graph runtime debug (apache#4598) * [GraphRuntime] Support parameter out in the graph runtime debug * Dummy commit to trigger build * [Perf] Add CublasLt extern support for better Igemm performance (apache#4550) * cublaslt added * fix lint * address comments * address more comments * Trigger CI * Trigger CI * fix codegenc (apache#4597) * [REFACTOR][RUNTIME] Update NDArray use the Unified Object System (apache#4581) * [REFACTOR][RUNTIME] Move NDArray to Object System. Previously NDArray has its own object reference counting mechanism. This PR migrates NDArray to the unified object protocol. The calling convention of NDArray remained intact. That means NDArray still has its own type_code and its handle is still DLTensor compatible. In order to do so, this PR added a few minimum runtime type detection in TVMArgValue and RetValue only when the corresponding type is a base type(ObjectRef) that could also refer to NDArray. This means that even if we return a base reference object ObjectRef which refers to the NDArray. The type_code will still be translated correctly as kNDArrayContainer. If we assign a non-base type(say Expr) that we know is not compatible with NDArray during compile time, no runtime type detection will be performed. This PR also adopts the object protocol for NDArray sub-classing and removed the legacy NDArray subclass protocol. Examples in apps/extension are now updated to reflect that. Making NDArray as an Object brings all the benefits of the object system. For example, we can now use the Array container to store NDArrays. * Address review comments * [Relay][Convert Layout] Handling batch norm layout change. (apache#4600) * [relay][refactor] Cache Op::Get in passes to reduce lookup overhead (apache#4594) * Refactor to use IsOp utility * retrigger CI * Update dmlc_tvm_commit_id.txt * disable one test_batch_norm unit test for now to check CI * enable test_batch_norm Co-authored-by: SWu <[email protected]> Co-authored-by: Ina Dobreva <[email protected]> Co-authored-by: Josh Fromm <[email protected]> Co-authored-by: miheer vaidya <[email protected]> Co-authored-by: Liang ZOU <[email protected]> Co-authored-by: YixinBao <[email protected]> Co-authored-by: Cody Yu <[email protected]> Co-authored-by: masahi <[email protected]> Co-authored-by: Liangfu Chen <[email protected]> Co-authored-by: lhutton1 <[email protected]> Co-authored-by: Tianqi Chen <[email protected]> Co-authored-by: Alex Gladkov <[email protected]> Co-authored-by: Takato Yamada <[email protected]> Co-authored-by: Haichen Shen <[email protected]> Co-authored-by: mbarrett97 <[email protected]> Co-authored-by: Hideto Ueno <[email protected]> Co-authored-by: Siyuan Feng <[email protected]> Co-authored-by: Zhao Wu <[email protected]> Co-authored-by: Neo Chien <[email protected]> Co-authored-by: Yong Wu <[email protected]> Co-authored-by: Dmitri Makarov <[email protected]> Co-authored-by: Bohan Hou <[email protected]> Co-authored-by: kice <[email protected]> Co-authored-by: Yizhi Liu <[email protected]> Co-authored-by: Wang Yucheng <[email protected]> Co-authored-by: 王振华(Zhenhua WANG) <[email protected]> Co-authored-by: deepIgnorance <[email protected]> Co-authored-by: Animesh Jain <[email protected]> Co-authored-by: optima2005 <[email protected]> Co-authored-by: zhuochen <[email protected]> Co-authored-by: Leyuan Wang <[email protected]>

zhiics force-pushed the external_codegen branch 4 times, most recently from a52e18b to 422e9b9 Compare December 9, 2019 05:42

zhiics added 2 commits December 9, 2019 21:35

external codegen

49689cb

move external function codegen to compile_engine

dc9de6c

zhiics force-pushed the external_codegen branch from 422e9b9 to dc9de6c Compare December 10, 2019 04:52

liangfu requested changes Dec 10, 2019

View reviewed changes

liangfu reviewed Dec 10, 2019

View reviewed changes

include/tvm/build_module.h Show resolved Hide resolved

fix typos, ext_dev->ext

987498d

tqchen added the status: need review label Dec 11, 2019

tqchen requested changes Dec 11, 2019

View reviewed changes

cmake/config.cmake Outdated Show resolved Hide resolved

cmake/modules/contrib/Extern.cmake Outdated Show resolved Hide resolved

update naming

a521d3a