-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LLVM/RUNTIME] Support Parallel for on CPU #54
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
icemelon
approved these changes
Feb 26, 2017
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
tqchen
added a commit
to tqchen/tvm
that referenced
this pull request
May 26, 2018
tqchen
added a commit
to tqchen/tvm
that referenced
this pull request
May 26, 2018
* [TEST] Xavie initialization for benchmarks * remove additional line
tqchen
added a commit
that referenced
this pull request
May 29, 2018
tqchen
added a commit
that referenced
this pull request
May 29, 2018
* [TEST] Xavie initialization for benchmarks * remove additional line
tqchen
added a commit
to tqchen/tvm
that referenced
this pull request
Jul 6, 2018
tqchen
added a commit
to tqchen/tvm
that referenced
this pull request
Jul 6, 2018
* [TEST] Xavie initialization for benchmarks * remove additional line
sergei-mironov
pushed a commit
to sergei-mironov/tvm
that referenced
this pull request
Aug 8, 2018
sergei-mironov
pushed a commit
to sergei-mironov/tvm
that referenced
this pull request
Aug 8, 2018
* [TEST] Xavie initialization for benchmarks * remove additional line
jroesch
added a commit
to jroesch/tvm
that referenced
this pull request
Aug 29, 2018
* Start on Relay documentation * Add more docs * Copy over old manual text and setup document hierarchy * Add sphinx_autodoc_annotation
icemelon
pushed a commit
to icemelon/tvm
that referenced
this pull request
Apr 14, 2020
* Add tensorrt backend. Fix merge Fix merge and clean up logs Add BiasAdd, Concat, padding ceil mode, and clean up code Fix formatting and remove unused headers uncomment models Fix bug with variable input, clean up Don't split batch norm Move TRT execution to TrtExecutor Clean up Clean up Add paritioning Implement graph_runtime execution for Relay/TRT Fix bug in extern op Fix compilation Add EnableTrt pass to perform same modification as previous wholegraphannotator Renable NNVM TRT Remove SimplifyBatchnorm, add rules for converting ops Fix format, remove unused tests Enable multiple outputs Fix multiple outputs Fix activation lookup Fix no newline at eof Add license header. Add consistency test to models Add method to check TRT used. Improve comments Fix lint Add util to check TRT version Add if guards around TRT5.1 APIs Add env var for workspace size, fix logger fix build Add TRT versioning to EnableTrt pass Fix build error in DLR Fix compile for DLR Update dmlc-core, fix copyright header, undo change to includes Remove unused headers Fix IsTrtCompatible visitor and move op list to constructor Add dropout to compatible ops for CheckTrtCompatible only. Add not compatible test Add squeeze, transpose, reshape, pad, and reduce ops. Add transpose on weights workaround Fix formatting. Add unit tests Support transpose on weights for conv2d and dense. Support asymmetric padding. Temp fix for 1D inputs. Add units tests for all ops. Support StridedSlice, AdaptivePooling approximation, Pytorch addmm fixer pass Support (2,3,0,1) tranpose on weights Allow stride to be incomplete. Support ConstantNode -> kWeight Fix pass serialized graph by value in runtime. Allow inclusive count for strided pool Comments, disable failign test Fix CI lint Removed unused variables from TrtBuilder. Add more comments Fix build for TRT4 Add GetTrtVersion(), Move convert map to function, remove uneeded include, make batch_size_, logger_ TrtBuilder members, check output existence Use shared_ptr for converters. Add check for num outputs and inputs Support image.resize Make GetOpConverters return a shared_ptr Clarify count inclusive padding weirdness Use external codegen/runtime Move to src/runtime/contrib/tensorrt. Add Save and Load methods for tensorrt module. Rename some classes Require format to be tensorrt so that loader knows how to load FoldConstants Destroy engine and context after use. Store TRT weights from op converters. Formatting Always apply ConvertLayout to NCHW Clean up Add ASF header Change ObjectRef -> NodeRef Fix lint Fix pylint Fix bug with scalar weights Making TRT cmake more informative Make tensorrt tests dependent on whether trt codegen is enabled Add serialization test. * Refactor EnableTRT checkers * Fix const weight detection * remove tensorrt_module.h, add test for multiple outputs. Use normal GetShape. Remove GetType. Add flag for additional model testing Undo add comments to prevent conflicts * Separate TRT from relay. Add docstrings and more comments. Move all passes to python. Remove double lookup for Execute Formatting Fix lint Fix pylint Rename codegen_tensorrt. Check registry get. Add comments Make trt codegen off by default. * disable for ci * TRT codegen can be turned on independently * Fix tests * Fix build without runtime * Enable AvgPool approximation * Remove change to cmake config * Move passes to PreprocessForTrt. Use op.name. Rename LegalizeLayoutTransform. * Add newlin to EOF. Remove else. Reserve space for vectors * Remove AdaptivePool2D commentted out code. Add comment for transposed weight workaround * Rename IsCompatibleFn * Use ++i instead of i++ * Improve incompatible messages, use string::empty, small improvements * Use constructor to fill func_params * Remove std::move * Use opt level 3, add helper to check whether to run test, improve load_params * Replace TransposeRSCKtoCKRS/KCRS with TransposeWeights4D * Clean up VisitExpr(CallNode) for args
MasterJH5574
pushed a commit
to MasterJH5574/tvm
that referenced
this pull request
Feb 26, 2022
* nn module * address comments. * Add nn.init_params * Remove nn.Builder and use BlockBuilder instead. * Rebase. * Refactor block builder and add tests. * Address comments. * Update.
MasterJH5574
pushed a commit
to MasterJH5574/tvm
that referenced
this pull request
Mar 3, 2022
* nn module * address comments. * Add nn.init_params * Remove nn.Builder and use BlockBuilder instead. * Rebase. * Refactor block builder and add tests. * Address comments. * Update.
vinx13
pushed a commit
to vinx13/tvm
that referenced
this pull request
Mar 9, 2022
rebased [TIR][Schedule] fix reorder/buffer_flatten & finish CPU demo (apache#59) [CPU DEMO] Update cpu gemm demo and fix bug (apache#58) * [TIR][Schedule] introduce parallel and fix bugs for cpu demo * [TIR][Schedule] update cpu demo * [TIR][Schedule] fix lint * [TIR][Schedule] fix rebased [TIR][Schedule] introduce reduction block and CPU demo (apache#53) * [TIR] reduction : split_reduction * [TIR] reduction : split_reduction * [TIR] reduction : fuse_reduction * [TIR] reduction : cpu demo * [TIR] reduction : fix * [TIR] reduction : pattern detect remains * [TIR] reduction : pattern detect remains * [TIR] reduction : pattern match done * [TIR] reduction : fix lint * [TIR] reduction : fix * [TIR] reduction : fix * [TIR] reduction : fix * [TIR] reduction : fix * [TIR] reduction : rebased * [TIR] reduction : rebased [TIR][Schedule] introduce cache_read cache_write (apache#54) * [TIR][Schedule] introduce cache_read cache_write * [TIR][Schedule] add more comments * [TIR][Schedule] fix problem and add comments * [TIR][Schedule] address comments [TIR] schedule: introduce vectorize, unroll, loop validation (apache#47) * [TIR] vectorize : basically complete * [TIR] vectorize&unroll : update comments&unroll * [TIR] vectorize&unroll : rebased * [TIR] vectorize, unroll, cpu_demo: done * [TIR] vectorize, unroll, cpu_demo: simplify * [TIR] vectorize, unroll, cpu_demo: fix * [TIR] reduction : rebased * [TIR] reduction : fix [TIR][Schedule] fix sref and scopes problem during replace and compute_at (apache#50) * [TIR][Schedule] fix sref and scopes problem during replace and compute_at * [TIR][Schedule] fix * [TIR][Schedule] fix [TIR][Refactor] move function to ScheduleNode [TIR] Schedule: introduce primitive compute_at (apache#36) * [TIR] Schedule: introduce primitive compute_at * [TIR] Schedule: address comments * [TIR] Schedule: address comments * [TIR] Schedule: address comments * [TIR] Schedule: add check to compute_at * [TIR] Schedule: address comments * [TIR] Schedule: address comments [TIR] Schedule: introduce primitive reorder (apache#37) * [Schedule] debug * [TIR] Schedule: reorder, loop type detect remains * [TIR] reorder complete * [TIR] reorder complete * [TIR] fix * [TIR] reorder : rebased complete * [TIR] reorder : fix container.h * [TIR] reorder : fix * [TIR] reorder : fix * [TIR] reorder : fix * [TIR] reorder : simplify * [TIR] reorder : simplify * [TIR] reorder : simplify * [TIR] reorder : fix * [TIR] reorder : fix * [TIR] reorder : rebased * [TIR] reorder : rebased rebase [TIR] Schedule: introduce BlockRealize and Block SRef reuse(apache#39) * [TIR] BlockRealize: schedule refactor * [TIR] BlockRealize: debug * [TIR] BlockRealize finish * [TIR] BlockRealize finish * [TIR] BlockRealize fix * [TIR] BlockRealize update test * [TIR] BlockRealize: add loop var reuse * [TIR] BlockRealize: add loop var reuse * [TIR] BlockRealize: fix * [TIR] BlockRealize: fix * [TIR] BlockRealize: fix * [TIR] BlockRealize: fix * [TIR] BlockRealize: fix * [TIR] BlockRealize: fix * [TIR] BlockRealize: fix * [TIR] BlockRealize: fix * [TIR] BlockRealize: fix * [TIR] BlockRealize: fix [TIR] compare for module (apache#38) * [TIR] compare for module * [TIR] fix * [TIR] fix * [TIR] fix * [TIR] fix * [TIR] fix * [TIR] fix [Hybrid] Module init [Hybrid] Module print [Hybrid] Module print with meta [Hybrid] adjust [Hybrid] finished but without lint and comment check [Hybrid] fix lint [Hybrid] comments [Hybrid] fix script decoration API [Hybrid] using IRModule [Hybrid] fix [Hybrid] adjust API [Hybrid] fix [Hybrid] fix [Hybrid] fix [Hybrid] fix symbol table, adjust API, introduce meta_mutator and resolve import issue [Hybrid] fix lint [TIR] introduce pass BufferFlatten (apache#32) * [TIR] introduce pass BufferFlatten * [Tir] add comments & remove old TeLower * [TIR] split GatherRegion and BufferFlatten to two Visitor/Mutator * [TIR] address comments: Only consider stmt scope * [TIR] BufferFlatten: address comments * [TIR] BufferFlatten: fold BlockFlattener into BufferFlattener * [TIR] BufferFlatten: add asserts * [TIR] BufferFlatten: use Equal in testcase * [TIR] Equal Pass: Enhanced the pass * [TIR] Equal Pass: add comments [Hybrid] refactor using Doc, introduce annotation, enhance parser (apache#28) * [Hybrid] refactor printer, enhance parser * [Hybrid] refactor * [Hybrid] fix * [Hybrid] fix * [Hybrid] fix namespace issue * [Hybrid] compare using Equal [TIR] rebased [TE] fix replace again and add primitive fuse and split (apache#27) * [TE] add: schedule primitive fuse * [TE] add: schedule primitive split * [TE] address comments: add IRSubstitueInScope and other minor fix * [TE] address comments: Enhance Equal api and fix split by nparts * [TE] address comments [Hybrid] introduce printer (apache#25) * [Hybrid] substitute Block with SeqStmt, change block() syntax * [Hybrid] add printer, type declare intrin * [Hybrid] refactor * [Hybrid] meta * [Hybrid] refactor * [Hybrid] macro [TE] fix replace (apache#23) * [TE] fix replace * [TE] fix replace: add more tests * [TE] fix replace: add more tests [TE] rebased [Hybrid] python syntax parser (apache#20) * [Hybrid] python syntax parser * [Hybrid] add a testcase * [Hybrid] improve comments and fix bugs * [Hybrid] improve comments, refactor __internal_assert, add new testcases * [Hybrid] improve error report message, refactor intrin * [Hybrid] separate ScopeEmitter from parser * [Hybrid] refactor type check * [Hybrid] refactor intrin * [Hybrid] refactor intrin, allow register external functions with argument type checking, add a testcase * [Hybrid] address comments, fix a bug in te/ir.h * [Hybrid] remove type check * [Hybrid] python syntax parser * [Hybrid] add a testcase * [Hybrid] improve comments and fix bugs * [Hybrid] improve comments, refactor __internal_assert, add new testcases * [Hybrid] improve error report message, refactor intrin * [Hybrid] separate ScopeEmitter from parser * [Hybrid] refactor type check * [Hybrid] refactor intrin * [Hybrid] refactor intrin, allow register external functions with argument type checking, add a testcase * [Hybrid] address comments, fix a bug in te/ir.h * [Hybrid] remove type check * [Hybrid] refactor intrin, scope_handler, special_stmt * [Hybrid] address comments * [Hybrid] clean code, improve error reporting & testcase * [Hybrid] clean code * [Hybrid] clean code [IR] introduce dependency graph and write map [TE] refactor and clean codebase [TE] refactor IR [TE] introduce schedule, dependency graph and support fuse and split (apache#17) * fix lint * introduce dependency graph * enable create schedule * support get axes * fix lint * revert Set * add schedule primitive fuse * address comment * support split [IR] Introduce SeqStmt add TeLower pass and enable to run Te IR (apache#15) * add function data structure add TeLower pass to transform Te to current IR enable to run Te IR * address comments * unify terminology TensorIR data structure init (apache#14) * init te data structure * finish printer and enhanced ir_builder * address the comments Co-authored-by: Bohan Hou <[email protected]>
jinhongyii
pushed a commit
to jinhongyii/tvm
that referenced
this pull request
Jun 20, 2022
* nn module * address comments. * Add nn.init_params * Remove nn.Builder and use BlockBuilder instead. * Rebase. * Refactor block builder and add tests. * Address comments. * Update.
cyx-6
added a commit
to cyx-6/tvm
that referenced
this pull request
Jun 29, 2022
* `ast.Expr` and concise scope * add `BufferStore`
junrushao
pushed a commit
to cyx-6/tvm
that referenced
this pull request
Jul 4, 2022
* `ast.Expr` and concise scope * add `BufferStore`
cyx-6
added a commit
to cyx-6/tvm
that referenced
this pull request
Jul 13, 2022
* `ast.Expr` and concise scope * add `BufferStore`
Hzfengsy
pushed a commit
to Hzfengsy/tvm
that referenced
this pull request
Jul 30, 2022
* nn module * address comments. * Add nn.init_params * Remove nn.Builder and use BlockBuilder instead. * Rebase. * Refactor block builder and add tests. * Address comments. * Update.
Hzfengsy
pushed a commit
to Hzfengsy/tvm
that referenced
this pull request
Jul 30, 2022
* `ast.Expr` and concise scope * add `BufferStore`
areusch
pushed a commit
to areusch/tvm
that referenced
this pull request
Sep 20, 2022
* nn module * address comments. * Add nn.init_params * Remove nn.Builder and use BlockBuilder instead. * Rebase. * Refactor block builder and add tests. * Address comments. * Update.
gigiblender
pushed a commit
to gigiblender/tvm
that referenced
this pull request
Nov 3, 2022
* nn module * address comments. * Add nn.init_params * Remove nn.Builder and use BlockBuilder instead. * Rebase. * Refactor block builder and add tests. * Address comments. * Update.
MasterJH5574
pushed a commit
to MasterJH5574/tvm
that referenced
this pull request
Nov 20, 2022
* nn module * address comments. * Add nn.init_params * Remove nn.Builder and use BlockBuilder instead. * Rebase. * Refactor block builder and add tests. * Address comments. * Update.
yelite
pushed a commit
to yelite/tvm
that referenced
this pull request
Feb 17, 2023
* nn module * address comments. * Add nn.init_params * Remove nn.Builder and use BlockBuilder instead. * Rebase. * Refactor block builder and add tests. * Address comments. * Update.
vinx13
pushed a commit
to vinx13/tvm
that referenced
this pull request
Mar 27, 2023
It's that time again, this is another merge with tvm/unity to grab the latest improvements.
mikeseven
pushed a commit
to mikeseven/tvm
that referenced
this pull request
Sep 27, 2023
Develop pr Approved-by: Mikael Sevenier
masahi
added a commit
to masahi/tvm
that referenced
this pull request
Mar 13, 2024
elvin-n
pushed a commit
to Deelvin/tvm
that referenced
this pull request
Mar 19, 2024
vinx13
pushed a commit
to vinx13/tvm
that referenced
this pull request
Mar 19, 2024
krishnaraj36
pushed a commit
to krishnaraj36/tvm_mainline
that referenced
this pull request
Aug 9, 2024
We can now build one binary and use across targets Co-authored-by: Siva <[email protected]>
LeiWang1999
added a commit
to LeiWang1999/tvm
that referenced
this pull request
Nov 8, 2024
* improve e4m3 decoding. * append fp16xint1 * Update submodule commit reference * chore: Update shared memory scope for float32 output dtype * BUGFIX: UINT8/INT8 Decoding * feat: Add rasterization options for roller module * Refactor tensorcore_legalization method to optimize tensor core usage * feat: Add function to collect variables from expression, improve for splitk * chore: Update typing import in __init__.py * chore: Refactor CPU execution of operators * Refactor matmul implementation for splitk layout * Refactor matmul implementation for splitk layout * Refactor matmul implementation for splitk layout * chore: Update version to 0.0.1.dev8 * chore: Enable debug output in bitblas.set_debug_level() * Refactor Linear module matmul implementation for splitk layout * Refactor matmul implementation for splitk layout * Refactor CUDA kernel launch string for dynamic symbolic set * Bumpt version to v0.0.1.dev9 * Refactor CUDA kernel launch string for dynamic symbolic set * Bump version to v0.0.1.dev10 * Refactor CUDA kernel launch string for dynamic symbolic set * Bump version to v0.0.1.dev12 and add MatmulConfigWithSplitK and MatmulWithSplitK --------- Co-authored-by: LeiWang199 <leiwang199>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
#50 Need to wait #53 to be merged