[LLVM/RUNTIME] Support Parallel for on CPU #54

tqchen · 2017-02-26T00:51:10Z

#50 Need to wait #53 to be merged

icemelon

lgtm

* [TEST] Xavie initialization for benchmarks * remove additional line

* Start on Relay documentation * Add more docs * Copy over old manual text and setup document hierarchy * Add sphinx_autodoc_annotation

* Add tensorrt backend. Fix merge Fix merge and clean up logs Add BiasAdd, Concat, padding ceil mode, and clean up code Fix formatting and remove unused headers uncomment models Fix bug with variable input, clean up Don't split batch norm Move TRT execution to TrtExecutor Clean up Clean up Add paritioning Implement graph_runtime execution for Relay/TRT Fix bug in extern op Fix compilation Add EnableTrt pass to perform same modification as previous wholegraphannotator Renable NNVM TRT Remove SimplifyBatchnorm, add rules for converting ops Fix format, remove unused tests Enable multiple outputs Fix multiple outputs Fix activation lookup Fix no newline at eof Add license header. Add consistency test to models Add method to check TRT used. Improve comments Fix lint Add util to check TRT version Add if guards around TRT5.1 APIs Add env var for workspace size, fix logger fix build Add TRT versioning to EnableTrt pass Fix build error in DLR Fix compile for DLR Update dmlc-core, fix copyright header, undo change to includes Remove unused headers Fix IsTrtCompatible visitor and move op list to constructor Add dropout to compatible ops for CheckTrtCompatible only. Add not compatible test Add squeeze, transpose, reshape, pad, and reduce ops. Add transpose on weights workaround Fix formatting. Add unit tests Support transpose on weights for conv2d and dense. Support asymmetric padding. Temp fix for 1D inputs. Add units tests for all ops. Support StridedSlice, AdaptivePooling approximation, Pytorch addmm fixer pass Support (2,3,0,1) tranpose on weights Allow stride to be incomplete. Support ConstantNode -> kWeight Fix pass serialized graph by value in runtime. Allow inclusive count for strided pool Comments, disable failign test Fix CI lint Removed unused variables from TrtBuilder. Add more comments Fix build for TRT4 Add GetTrtVersion(), Move convert map to function, remove uneeded include, make batch_size_, logger_ TrtBuilder members, check output existence Use shared_ptr for converters. Add check for num outputs and inputs Support image.resize Make GetOpConverters return a shared_ptr Clarify count inclusive padding weirdness Use external codegen/runtime Move to src/runtime/contrib/tensorrt. Add Save and Load methods for tensorrt module. Rename some classes Require format to be tensorrt so that loader knows how to load FoldConstants Destroy engine and context after use. Store TRT weights from op converters. Formatting Always apply ConvertLayout to NCHW Clean up Add ASF header Change ObjectRef -> NodeRef Fix lint Fix pylint Fix bug with scalar weights Making TRT cmake more informative Make tensorrt tests dependent on whether trt codegen is enabled Add serialization test. * Refactor EnableTRT checkers * Fix const weight detection * remove tensorrt_module.h, add test for multiple outputs. Use normal GetShape. Remove GetType. Add flag for additional model testing Undo add comments to prevent conflicts * Separate TRT from relay. Add docstrings and more comments. Move all passes to python. Remove double lookup for Execute Formatting Fix lint Fix pylint Rename codegen_tensorrt. Check registry get. Add comments Make trt codegen off by default. * disable for ci * TRT codegen can be turned on independently * Fix tests * Fix build without runtime * Enable AvgPool approximation * Remove change to cmake config * Move passes to PreprocessForTrt. Use op.name. Rename LegalizeLayoutTransform. * Add newlin to EOF. Remove else. Reserve space for vectors * Remove AdaptivePool2D commentted out code. Add comment for transposed weight workaround * Rename IsCompatibleFn * Use ++i instead of i++ * Improve incompatible messages, use string::empty, small improvements * Use constructor to fill func_params * Remove std::move * Use opt level 3, add helper to check whether to run test, improve load_params * Replace TransposeRSCKtoCKRS/KCRS with TransposeWeights4D * Clean up VisitExpr(CallNode) for args

* nn module * address comments. * Add nn.init_params * Remove nn.Builder and use BlockBuilder instead. * Rebase. * Refactor block builder and add tests. * Address comments. * Update.

rebased [TIR][Schedule] fix reorder/buffer_flatten & finish CPU demo (apache#59) [CPU DEMO] Update cpu gemm demo and fix bug (apache#58) * [TIR][Schedule] introduce parallel and fix bugs for cpu demo * [TIR][Schedule] update cpu demo * [TIR][Schedule] fix lint * [TIR][Schedule] fix rebased [TIR][Schedule] introduce reduction block and CPU demo (apache#53) * [TIR] reduction : split_reduction * [TIR] reduction : split_reduction * [TIR] reduction : fuse_reduction * [TIR] reduction : cpu demo * [TIR] reduction : fix * [TIR] reduction : pattern detect remains * [TIR] reduction : pattern detect remains * [TIR] reduction : pattern match done * [TIR] reduction : fix lint * [TIR] reduction : fix * [TIR] reduction : fix * [TIR] reduction : fix * [TIR] reduction : fix * [TIR] reduction : rebased * [TIR] reduction : rebased [TIR][Schedule] introduce cache_read cache_write (apache#54) * [TIR][Schedule] introduce cache_read cache_write * [TIR][Schedule] add more comments * [TIR][Schedule] fix problem and add comments * [TIR][Schedule] address comments [TIR] schedule: introduce vectorize, unroll, loop validation (apache#47) * [TIR] vectorize : basically complete * [TIR] vectorize&unroll : update comments&unroll * [TIR] vectorize&unroll : rebased * [TIR] vectorize, unroll, cpu_demo: done * [TIR] vectorize, unroll, cpu_demo: simplify * [TIR] vectorize, unroll, cpu_demo: fix * [TIR] reduction : rebased * [TIR] reduction : fix [TIR][Schedule] fix sref and scopes problem during replace and compute_at (apache#50) * [TIR][Schedule] fix sref and scopes problem during replace and compute_at * [TIR][Schedule] fix * [TIR][Schedule] fix [TIR][Refactor] move function to ScheduleNode [TIR] Schedule: introduce primitive compute_at (apache#36) * [TIR] Schedule: introduce primitive compute_at * [TIR] Schedule: address comments * [TIR] Schedule: address comments * [TIR] Schedule: address comments * [TIR] Schedule: add check to compute_at * [TIR] Schedule: address comments * [TIR] Schedule: address comments [TIR] Schedule: introduce primitive reorder (apache#37) * [Schedule] debug * [TIR] Schedule: reorder, loop type detect remains * [TIR] reorder complete * [TIR] reorder complete * [TIR] fix * [TIR] reorder : rebased complete * [TIR] reorder : fix container.h * [TIR] reorder : fix * [TIR] reorder : fix * [TIR] reorder : fix * [TIR] reorder : simplify * [TIR] reorder : simplify * [TIR] reorder : simplify * [TIR] reorder : fix * [TIR] reorder : fix * [TIR] reorder : rebased * [TIR] reorder : rebased rebase [TIR] Schedule: introduce BlockRealize and Block SRef reuse(apache#39) * [TIR] BlockRealize: schedule refactor * [TIR] BlockRealize: debug * [TIR] BlockRealize finish * [TIR] BlockRealize finish * [TIR] BlockRealize fix * [TIR] BlockRealize update test * [TIR] BlockRealize: add loop var reuse * [TIR] BlockRealize: add loop var reuse * [TIR] BlockRealize: fix * [TIR] BlockRealize: fix * [TIR] BlockRealize: fix * [TIR] BlockRealize: fix * [TIR] BlockRealize: fix * [TIR] BlockRealize: fix * [TIR] BlockRealize: fix * [TIR] BlockRealize: fix * [TIR] BlockRealize: fix * [TIR] BlockRealize: fix [TIR] compare for module (apache#38) * [TIR] compare for module * [TIR] fix * [TIR] fix * [TIR] fix * [TIR] fix * [TIR] fix * [TIR] fix [Hybrid] Module init [Hybrid] Module print [Hybrid] Module print with meta [Hybrid] adjust [Hybrid] finished but without lint and comment check [Hybrid] fix lint [Hybrid] comments [Hybrid] fix script decoration API [Hybrid] using IRModule [Hybrid] fix [Hybrid] adjust API [Hybrid] fix [Hybrid] fix [Hybrid] fix [Hybrid] fix symbol table, adjust API, introduce meta_mutator and resolve import issue [Hybrid] fix lint [TIR] introduce pass BufferFlatten (apache#32) * [TIR] introduce pass BufferFlatten * [Tir] add comments & remove old TeLower * [TIR] split GatherRegion and BufferFlatten to two Visitor/Mutator * [TIR] address comments: Only consider stmt scope * [TIR] BufferFlatten: address comments * [TIR] BufferFlatten: fold BlockFlattener into BufferFlattener * [TIR] BufferFlatten: add asserts * [TIR] BufferFlatten: use Equal in testcase * [TIR] Equal Pass: Enhanced the pass * [TIR] Equal Pass: add comments [Hybrid] refactor using Doc, introduce annotation, enhance parser (apache#28) * [Hybrid] refactor printer, enhance parser * [Hybrid] refactor * [Hybrid] fix * [Hybrid] fix * [Hybrid] fix namespace issue * [Hybrid] compare using Equal [TIR] rebased [TE] fix replace again and add primitive fuse and split (apache#27) * [TE] add: schedule primitive fuse * [TE] add: schedule primitive split * [TE] address comments: add IRSubstitueInScope and other minor fix * [TE] address comments: Enhance Equal api and fix split by nparts * [TE] address comments [Hybrid] introduce printer (apache#25) * [Hybrid] substitute Block with SeqStmt, change block() syntax * [Hybrid] add printer, type declare intrin * [Hybrid] refactor * [Hybrid] meta * [Hybrid] refactor * [Hybrid] macro [TE] fix replace (apache#23) * [TE] fix replace * [TE] fix replace: add more tests * [TE] fix replace: add more tests [TE] rebased [Hybrid] python syntax parser (apache#20) * [Hybrid] python syntax parser * [Hybrid] add a testcase * [Hybrid] improve comments and fix bugs * [Hybrid] improve comments, refactor __internal_assert, add new testcases * [Hybrid] improve error report message, refactor intrin * [Hybrid] separate ScopeEmitter from parser * [Hybrid] refactor type check * [Hybrid] refactor intrin * [Hybrid] refactor intrin, allow register external functions with argument type checking, add a testcase * [Hybrid] address comments, fix a bug in te/ir.h * [Hybrid] remove type check * [Hybrid] python syntax parser * [Hybrid] add a testcase * [Hybrid] improve comments and fix bugs * [Hybrid] improve comments, refactor __internal_assert, add new testcases * [Hybrid] improve error report message, refactor intrin * [Hybrid] separate ScopeEmitter from parser * [Hybrid] refactor type check * [Hybrid] refactor intrin * [Hybrid] refactor intrin, allow register external functions with argument type checking, add a testcase * [Hybrid] address comments, fix a bug in te/ir.h * [Hybrid] remove type check * [Hybrid] refactor intrin, scope_handler, special_stmt * [Hybrid] address comments * [Hybrid] clean code, improve error reporting & testcase * [Hybrid] clean code * [Hybrid] clean code [IR] introduce dependency graph and write map [TE] refactor and clean codebase [TE] refactor IR [TE] introduce schedule, dependency graph and support fuse and split (apache#17) * fix lint * introduce dependency graph * enable create schedule * support get axes * fix lint * revert Set * add schedule primitive fuse * address comment * support split [IR] Introduce SeqStmt add TeLower pass and enable to run Te IR (apache#15) * add function data structure add TeLower pass to transform Te to current IR enable to run Te IR * address comments * unify terminology TensorIR data structure init (apache#14) * init te data structure * finish printer and enhanced ir_builder * address the comments Co-authored-by: Bohan Hou <[email protected]>

* nn module * address comments. * Add nn.init_params * Remove nn.Builder and use BlockBuilder instead. * Rebase. * Refactor block builder and add tests. * Address comments. * Update.

* `ast.Expr` and concise scope * add `BufferStore`

* nn module * address comments. * Add nn.init_params * Remove nn.Builder and use BlockBuilder instead. * Rebase. * Refactor block builder and add tests. * Address comments. * Update.

* `ast.Expr` and concise scope * add `BufferStore`

* nn module * address comments. * Add nn.init_params * Remove nn.Builder and use BlockBuilder instead. * Rebase. * Refactor block builder and add tests. * Address comments. * Update.

It's that time again, this is another merge with tvm/unity to grab the latest improvements.

Develop pr Approved-by: Mikael Sevenier

We can now build one binary and use across targets Co-authored-by: Siva <[email protected]>

* improve e4m3 decoding. * append fp16xint1 * Update submodule commit reference * chore: Update shared memory scope for float32 output dtype * BUGFIX: UINT8/INT8 Decoding * feat: Add rasterization options for roller module * Refactor tensorcore_legalization method to optimize tensor core usage * feat: Add function to collect variables from expression, improve for splitk * chore: Update typing import in __init__.py * chore: Refactor CPU execution of operators * Refactor matmul implementation for splitk layout * Refactor matmul implementation for splitk layout * Refactor matmul implementation for splitk layout * chore: Update version to 0.0.1.dev8 * chore: Enable debug output in bitblas.set_debug_level() * Refactor Linear module matmul implementation for splitk layout * Refactor matmul implementation for splitk layout * Refactor CUDA kernel launch string for dynamic symbolic set * Bumpt version to v0.0.1.dev9 * Refactor CUDA kernel launch string for dynamic symbolic set * Bump version to v0.0.1.dev10 * Refactor CUDA kernel launch string for dynamic symbolic set * Bump version to v0.0.1.dev12 and add MatmulConfigWithSplitK and MatmulWithSplitK --------- Co-authored-by: LeiWang199 <leiwang199>

tqchen requested a review from icemelon February 26, 2017 00:51

icemelon approved these changes Feb 26, 2017

View reviewed changes

[LLVM/RUNTIME] Support Parallel for on CPU

8e6c309

tqchen force-pushed the parfor branch from fa86576 to 8e6c309 Compare February 26, 2017 06:57

tqchen merged commit f6c043e into master Feb 26, 2017

tqchen added a commit to tqchen/tvm that referenced this pull request May 26, 2018

[Infer] More robust inference, support backward inference (apache#54)

b1a7f06

tqchen added a commit to tqchen/tvm that referenced this pull request May 26, 2018

[TEST] Xavie initialization for benchmarks (apache#54)

c6c69fc

* [TEST] Xavie initialization for benchmarks * remove additional line

tqchen added a commit that referenced this pull request May 29, 2018

[Infer] More robust inference, support backward inference (#54)

5575338

tqchen added a commit that referenced this pull request May 29, 2018

[TEST] Xavie initialization for benchmarks (#54)

882c3bc

* [TEST] Xavie initialization for benchmarks * remove additional line

tqchen added a commit to tqchen/tvm that referenced this pull request Jul 6, 2018

[Infer] More robust inference, support backward inference (apache#54)

45da871

tqchen added a commit to tqchen/tvm that referenced this pull request Jul 6, 2018

[TEST] Xavie initialization for benchmarks (apache#54)

5541a27

* [TEST] Xavie initialization for benchmarks * remove additional line

sergei-mironov pushed a commit to sergei-mironov/tvm that referenced this pull request Aug 8, 2018

[Infer] More robust inference, support backward inference (apache#54)

6697916

sergei-mironov pushed a commit to sergei-mironov/tvm that referenced this pull request Aug 8, 2018

[TEST] Xavie initialization for benchmarks (apache#54)

42a7c61

* [TEST] Xavie initialization for benchmarks * remove additional line

jroesch added a commit to jroesch/tvm that referenced this pull request Aug 29, 2018

Start adding API docs for Relay. (apache#54)

3652a7e

* Start on Relay documentation * Add more docs * Copy over old manual text and setup document hierarchy * Add sphinx_autodoc_annotation

cyx-6 added a commit to cyx-6/tvm that referenced this pull request Jun 29, 2022

ast.Expr and concise scope (apache#54)

e9c5662

* `ast.Expr` and concise scope * add `BufferStore`

junrushao pushed a commit to cyx-6/tvm that referenced this pull request Jul 4, 2022

ast.Expr and concise scope (apache#54)

c7cd43d

* `ast.Expr` and concise scope * add `BufferStore`

cyx-6 added a commit to cyx-6/tvm that referenced this pull request Jul 13, 2022

ast.Expr and concise scope (apache#54)

e3355f4

* `ast.Expr` and concise scope * add `BufferStore`

Hzfengsy pushed a commit to Hzfengsy/tvm that referenced this pull request Jul 30, 2022

ast.Expr and concise scope (apache#54)

c69e44b

* `ast.Expr` and concise scope * add `BufferStore`

vinx13 pushed a commit to vinx13/tvm that referenced this pull request Mar 27, 2023

[Unity] Merge with tvm/unity (apache#54)

012c9df

It's that time again, this is another merge with tvm/unity to grab the latest improvements.

mikeseven pushed a commit to mikeseven/tvm that referenced this pull request Sep 27, 2023

Merged in develop_pr (pull request apache#54)

ecbc1e4

Develop pr Approved-by: Mikael Sevenier

masahi added a commit to masahi/tvm that referenced this pull request Mar 13, 2024

Add fp16 grouped gemm support for sm90 (apache#54)

459f98f

elvin-n pushed a commit to Deelvin/tvm that referenced this pull request Mar 19, 2024

Add fp16 grouped gemm support for sm90 (apache#54)

664dae2

vinx13 pushed a commit to vinx13/tvm that referenced this pull request Mar 19, 2024

Add fp16 grouped gemm support for sm90 (apache#54)

d8dc9f2

krishnaraj36 pushed a commit to krishnaraj36/tvm_mainline that referenced this pull request Aug 9, 2024

Dynamic backward compatibility (apache#54)

a8e4bef

We can now build one binary and use across targets Co-authored-by: Siva <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LLVM/RUNTIME] Support Parallel for on CPU #54

[LLVM/RUNTIME] Support Parallel for on CPU #54

tqchen commented Feb 26, 2017

icemelon left a comment

[LLVM/RUNTIME] Support Parallel for on CPU #54

[LLVM/RUNTIME] Support Parallel for on CPU #54

Conversation

tqchen commented Feb 26, 2017

icemelon left a comment

Choose a reason for hiding this comment