Return empty CSourceModule when no lowered_funcs exists in Relay mod #4847

kumasento · 2020-02-08T20:46:34Z

This PR implements the dummy function idea as mentioned in #4748 - when the whole Relay module is optimized to empty, we can insert a dummy operator that allows TVM to still produce a library.

tqchen · 2020-02-10T19:22:51Z

cc @zhiics @FrozenGene

mbaret

Thanks for contributing this - I've hit this issue myself in the case where the entire graph is off-loaded to external codegen. I wonder whether there's a case for making tvm::build return a 'dummy' module in the case of no lowered_funcs being provided? That way we could avoid having to put the workaround here.

mbaret · 2020-02-10T20:09:47Z

src/relay/backend/build_module.cc

+      Stmt body = EvaluateNode::make(0);
+      Array<ObjectRef> api_args;
+      auto dummy_func = MakeAPI(body, "__dummy__", api_args, 0, false);
+      lowered_funcs.Set("llvm", Array<LoweredFunc>({dummy_func}));


Is defaulting the LLVM the correct behaviour here (eg. will this fall over if we build without LLVM support)?

I think should set target_host_. Even we have LLVM support, it is not correct too, imagine our target host is ARM.

Thank you guys for your kind comments. We don't need to set target in the latest commit.

mbaret · 2020-02-10T20:11:23Z

src/relay/backend/build_module.cc

-        lowered_funcs,
-        target_host_,
-        BuildConfig::Current());
+      LOG(WARNING) << "No lowered funcs exist in the compiled module, "


Do we need to retain this warning? With external codegen, having no lowered funcs can be a perfectly normal mode of operation.

Thanks @mbarrett97 , I've removed that log.

tqchen · 2020-02-10T22:16:02Z

I agree that perhaps an empty module provides useful middle ground. The closest thing so far might be CSourceModule with an empty string https://github.com/apache/incubator-tvm/blob/master/src/target/source/source_module.cc#L190

zhiics · 2020-02-11T03:17:24Z

CSourceModule with an empty string looks to me as well. @kumasento could you do that instead of creating a dummy llvm module? Thanks.

kumasento · 2020-02-11T21:46:24Z

Thank you guys for all your kind reviews! @mbarrett97 @FrozenGene @tqchen @zhiics

I've updated this PR to fulfill the following revisions:

Using CSourceModule with an empty string as the returned module object ret_.mod, instead of creating a dummy lowered function.
Removed the warning based on @mbarrett97 review.

Now the generated module looks clean and tidy. No redundant dummy function generated and no extra design decisions should be made.

Please let me know if there is anything else you feel should be done. Thanks!

src/relay/backend/build_module.cc

zhiics · 2020-02-11T22:01:22Z

src/relay/backend/build_module.cc

@@ -438,13 +439,14 @@ class RelayBuildModule : public runtime::ModuleNode {

    auto lowered_funcs = graph_codegen_->GetLoweredFunc();
    if (lowered_funcs.size() == 0) {
-      LOG(WARNING) << "no lowered funcs exist in the compiled module";
+      ret_.mod = tvm::codegen::CSourceModuleCreate("", "");


Sorry for the back and forth. Could you please add a comment here so that ppl would know what we are doing here?

You can force push so that your previous CI could be terminated earlier.

Sure thing, just added that

mbaret · 2020-02-12T12:20:59Z

I don't think the empty CSourceModule method works. There's a check in source_module.cc that fails when you try and create one with an empty string.

kumasento · 2020-02-12T13:09:00Z

I don't think the empty CSourceModule method works. There's a check in source_module.cc that fails when you try and create one with an empty string.

Hi @mbarrett97 thanks for your comment. Currently I haven't met such an issue while testing. Would you mind letting me know which assertion you were referring to?

mbaret

Here's the assert:
https://github.com/apache/incubator-tvm/blob/a5661611472c8e92b20bbe4d074333b8183f2878/src/target/source/source_module.cc#L101

There's a problem that you hit before that though, you need to remove the check here:
https://github.com/apache/incubator-tvm/blob/a5661611472c8e92b20bbe4d074333b8183f2878/src/relay/backend/build_module.cc#L452
This is because the new module still has no lowered functions.

kumasento · 2020-02-12T14:54:34Z

@mbarrett97 Thanks, I just noticed that the base of my PR is not the latest commit. I will update it soon.

tqchen · 2020-03-10T02:38:18Z

ping @FrozenGene please followup

FrozenGene · 2020-03-11T03:15:16Z

src/target/llvm/llvm_module.cc

+  auto target = args[0].operator std::string();
+  auto module_name = args[1].operator std::string();
+
+  // create a default data layout


Sorry for later response. Minor comment: The logic here doesn't only create default data layout but also create default target triple. We should update the comment.

Thanks @FrozenGene I've updated the comments based on your comments :)

FrozenGene

LGTM

FrozenGene · 2020-03-11T11:52:07Z

@mbaret @zhiics @tqchen could take a look another round.

tqchen · 2020-03-16T19:53:02Z

Thanks @kumasento @FrozenGene @mbaret @zhiics This PR is now merged

trevor-m · 2020-03-17T17:29:08Z

This commit is causing segfaults when my entire relay program is offloaded to an external codegen.

kumasento · 2020-03-17T17:50:47Z

Hi Trevor, Thanks for the info. Can you provide more information about the bug you've got? Thanks.

…

On Tue, 17 Mar 2020 at 17:29, Trevor Morris ***@***.***> wrote: This commit is causing segfaults when my entire relay program is offloaded to an external codegen. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#4847 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACC42R5SF23UYPEF6M5ABT3RH6XPNANCNFSM4KR43YZQ> .

trevor-m · 2020-03-17T18:12:23Z

HI @kumasento the graph runtime is trying to get my external codegen functions from the empty LLVM module.

Stack trace shows the segfault is coming from LLVMModuleNode, so I added a print statement after this line which showed that GetFunction() was called on the LLVM module with name=tensorrt_29 which should be going to my external codegen module instead.

Stack trace:

[18:07:04] /data/neo-ai-tvm/src/target/llvm/llvm_module.cc:60: LLVMModuleNode::GetFunction() func name: tensorrt_29

  [bt] (0) /home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x2e6b140) [0x7f3b510c0140]
  [bt] (1) /lib/x86_64-linux-gnu/libc.so.6(+0x354b0) [0x7f3bdd7134b0]
  [bt] (2) /lib/x86_64-linux-gnu/libc.so.6(strlen+0x26) [0x7f3bdd769746]
  [bt] (3) /data/neo-ai-tvm/build/libtvm.so(tvm::codegen::LLVMModuleNode::LazyInitJIT()+0x894) [0x7f3bcca32ad4]
  [bt] (4) /data/neo-ai-tvm/build/libtvm.so(tvm::codegen::LLVMModuleNode::GetFunction(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)+0x488) [0x7f3bcca334b8]
  [bt] (5) /data/neo-ai-tvm/build/libtvm.so(tvm::runtime::ModuleNode::GetFunction(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)+0x45) [0x7f3bcca757f5]
  [bt] (6) /data/neo-ai-tvm/build/libtvm.so(tvm::runtime::GraphRuntime::CreateTVMOp(tvm::runtime::TVMOpParam const&, std::vector<DLTensor, std::allocator<DLTensor> > const&, unsigned long)+0x4d5) [0x7f3bccad8e05]
  [bt] (7) /data/neo-ai-tvm/build/libtvm.so(tvm::runtime::GraphRuntime::SetupOpExecs()+0x661) [0x7f3bccadb0d1]
  [bt] (8) /data/neo-ai-tvm/build/libtvm.so(tvm::runtime::GraphRuntime::Init(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, tvm::runtime::Module, std::vector<DLContext, std::allocator<DLContext> > const&)+0x260) [0x7f3bccadcf60]

kumasento · 2020-03-17T19:16:45Z

Thanks @trevor-m

@FrozenGene sorry for bothering you but does this ring a bell?

I feel it is weird that GetFunction won't return an invalid/empty value directly when the function name cannot be found. I'm wondering why LazyJIT should be called in this scenario?

FrozenGene · 2020-03-18T02:37:48Z

Thanks @trevor-m

@FrozenGene sorry for bothering you but does this ring a bell?

I feel it is weird that GetFunction won't return an invalid/empty value directly when the function name cannot be found. I'm wondering why LazyJIT should be called in this scenario?

Please see: https://github.com/apache/incubator-tvm/pull/4847/files#diff-8baddb83a9684e8373691bb48a946900R469-R474 When entire program is offloaded, I find previous logic will

// Execute the whole module using external runtime.
        ret_.mod = ext_mods[0];

However, current logic we will

// Import all external runtime modules.
    for (const auto& it : ext_mods)
      ret_.mod.Import(it);

Could you double check the logic is the same as previous? Thanks.

kumasento · 2020-03-21T09:47:33Z

Hi @FrozenGene

Thanks for your explanation. I do have a question about this part, hope you won't mind:

Before this PR, we replace ret_.mod with ext_mods[0] when there is no lowered_funcs, which is sensible since no ret_.mod is available when lowered_funcs does not exist.
Now we will always create a ret_.mod. What would the logic be if there is only one external module (the condition for ret_.mod replacement)? Should we do replacement or import?

Also @trevor-m would you mind sending me a minimal workable example? I would like to do the tracing myself as well.

Thanks

FrozenGene · 2020-03-21T10:05:23Z

Hi @FrozenGene

Thanks for your explanation. I do have a question about this part, hope you won't mind:

Before this PR, we replace ret_.mod with ext_mods[0] when there is no lowered_funcs, which is sensible since no ret_.mod is available when lowered_funcs does not exist.

Now we will always create a ret_.mod. What would the logic be if there is only one external module (the condition for ret_.mod replacement)? Should we do replacement or import?

Also @trevor-m would you mind sending me a minimal workable example? I would like to do the tracing myself as well.

Thanks

I think we should be ret_.mod replacement as previous pr. @zhiics do the external codegen part. He could answer it more authority. @zhiics Could you help to answer this question?

zhiics · 2020-03-21T17:00:13Z

I think changing it to a llvm module and import all submodules is okay. Now if you only have an external module. You will need to create a llvm module first and them import the external module to it.

Stepping into llvm module to find the symbol is not wrong because we will always try to find the symbol from the host module first. If it is not found, we will then try to check each imported module. See the code here:

https://github.com/apache/incubator-tvm/blob/050f2bde2c694af9b5569ca954ca041c3767787b/src/runtime/module.cc#L65

A minimal example to reproduce this and track the root cause would be more helpful.

trevor-m · 2020-03-24T16:59:08Z

I think changing it to a llvm module and import all submodules is okay. Now if you only have an external module. You will need to create a llvm module first and them import the external module to it.

Stepping into llvm module to find the symbol is not wrong because we will always try to find the symbol from the host module first. If it is not found, we will then try to check each imported module. See the code here:

https://github.com/apache/incubator-tvm/blob/050f2bde2c694af9b5569ca954ca041c3767787b/src/runtime/module.cc#L65

A minimal example to reproduce this and track the root cause would be more helpful.

You can reproduce this by running test_extern_dnnl() after commenting out this line: https://github.com/apache/incubator-tvm/blob/master/tests/python/relay/test_pass_partition_graph.py#L203

kumasento · 2020-03-25T11:41:29Z

Hi @trevor-m
Thanks for this information. I ran the test and reproduced the bug. I've located that the segfault should be raised from the create function of llvm::EngineBuilder (this line).

Now I'm looking at the internal logic in LLVM to find out what the actual cause is. Please bear with me for 1-2 days.

kumasento · 2020-03-25T14:07:59Z

@trevor-m a tentative fix has been posted in #5146

…pache#4847) * Use dummy func when no lowered_funcs exists in Relay mod * Dummy func -> CSourceModule with empty code str * Added comments describing the empty CSouceModule * Always import external modules w/o assertions * Use CSourceModule as a fallback for LLVMModule * Changed cond for target == llvm * Create an empty LLVM module w/o using dummy func * Avoid using IR str concat to create LLVM module * Improved comments for codegen.LLVMModuleCreate * Satisfy the linter for LLVMModuleCreate

kumasento requested review from jroesch, tqchen, zhiics and FrozenGene and removed request for FrozenGene February 8, 2020 20:46

Use dummy func when no lowered_funcs exists in Relay mod

d6301d6

kumasento force-pushed the dev-relay-dummy branch from 3bcb8aa to d6301d6 Compare February 8, 2020 20:48

kumasento mentioned this pull request Feb 8, 2020

[RELAY] Support RelayBuild with Only Constants #4748

Closed

tqchen assigned FrozenGene and zhiics Feb 10, 2020

tqchen added the status: need review label Feb 10, 2020

mbaret reviewed Feb 10, 2020

View reviewed changes

comaniac mentioned this pull request Feb 11, 2020

[Relay] Ignore Primitive functions in Visitors #4864

Closed

kumasento force-pushed the dev-relay-dummy branch 3 times, most recently from b3b15f2 to 1f13b9e Compare February 11, 2020 21:52

zhiics reviewed Feb 11, 2020

View reviewed changes

src/relay/backend/build_module.cc Show resolved Hide resolved

src/relay/backend/build_module.cc Outdated Show resolved Hide resolved

Dummy func -> CSourceModule with empty code str

adbe5a2

kumasento force-pushed the dev-relay-dummy branch from 1f13b9e to adbe5a2 Compare February 11, 2020 21:59

zhiics reviewed Feb 11, 2020

View reviewed changes

Added comments describing the empty CSouceModule

ca6e419

mbaret reviewed Feb 12, 2020

View reviewed changes

kumasento changed the title ~~Use dummy func when no lowered_funcs exists in Relay mod~~ Return empty CSourceModule when no lowered_funcs exists in Relay mod Feb 12, 2020

Avoid using IR str concat to create LLVM module

ccded94

kumasento force-pushed the dev-relay-dummy branch from 7a79f6b to ccded94 Compare March 2, 2020 19:33

FrozenGene reviewed Mar 11, 2020

View reviewed changes

kumasento added 2 commits March 11, 2020 10:45

Improved comments for codegen.LLVMModuleCreate

bb55935

Satisfy the linter for LLVMModuleCreate

41450a5

FrozenGene approved these changes Mar 11, 2020

View reviewed changes

tqchen approved these changes Mar 16, 2020

View reviewed changes

tqchen merged commit 11ee1a0 into apache:master Mar 16, 2020

tqchen added status: accepted and removed status: need review labels Mar 16, 2020

kumasento deleted the dev-relay-dummy branch March 16, 2020 21:34

kumasento mentioned this pull request Mar 25, 2020

Handle empty LLVMModule in GetFunction #5146

Merged

ZihengJiang mentioned this pull request Sep 25, 2020

TVM v0.7 Release Note Candidate #6486

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Return empty CSourceModule when no lowered_funcs exists in Relay mod #4847

Return empty CSourceModule when no lowered_funcs exists in Relay mod #4847

kumasento commented Feb 8, 2020

tqchen commented Feb 10, 2020

mbaret left a comment

mbaret Feb 10, 2020

FrozenGene Feb 11, 2020

kumasento Feb 11, 2020

mbaret Feb 10, 2020

kumasento Feb 11, 2020

tqchen commented Feb 10, 2020

zhiics commented Feb 11, 2020

kumasento commented Feb 11, 2020

zhiics Feb 11, 2020

zhiics Feb 11, 2020

kumasento Feb 11, 2020

mbaret commented Feb 12, 2020

kumasento commented Feb 12, 2020

mbaret left a comment

kumasento commented Feb 12, 2020

tqchen commented Mar 10, 2020

FrozenGene Mar 11, 2020

kumasento Mar 11, 2020

FrozenGene left a comment

FrozenGene commented Mar 11, 2020

tqchen commented Mar 16, 2020

trevor-m commented Mar 17, 2020

kumasento commented Mar 17, 2020 via email

trevor-m commented Mar 17, 2020 •

edited

Loading

kumasento commented Mar 17, 2020

FrozenGene commented Mar 18, 2020

kumasento commented Mar 21, 2020

FrozenGene commented Mar 21, 2020

zhiics commented Mar 21, 2020

trevor-m commented Mar 24, 2020

kumasento commented Mar 25, 2020

kumasento commented Mar 25, 2020

Return empty CSourceModule when no lowered_funcs exists in Relay mod #4847

Return empty CSourceModule when no lowered_funcs exists in Relay mod #4847

Conversation

kumasento commented Feb 8, 2020

tqchen commented Feb 10, 2020

mbaret left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tqchen commented Feb 10, 2020

zhiics commented Feb 11, 2020

kumasento commented Feb 11, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mbaret commented Feb 12, 2020

kumasento commented Feb 12, 2020

mbaret left a comment

Choose a reason for hiding this comment

kumasento commented Feb 12, 2020

tqchen commented Mar 10, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FrozenGene left a comment

Choose a reason for hiding this comment

FrozenGene commented Mar 11, 2020

tqchen commented Mar 16, 2020

trevor-m commented Mar 17, 2020

kumasento commented Mar 17, 2020 via email

trevor-m commented Mar 17, 2020 • edited Loading

kumasento commented Mar 17, 2020

FrozenGene commented Mar 18, 2020

kumasento commented Mar 21, 2020

FrozenGene commented Mar 21, 2020

zhiics commented Mar 21, 2020

trevor-m commented Mar 24, 2020

kumasento commented Mar 25, 2020

kumasento commented Mar 25, 2020

trevor-m commented Mar 17, 2020 •

edited

Loading