Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CodeGen] Generate blob use LLVM directly #4657

Merged
merged 1 commit into from
Jan 10, 2020

Conversation

FrozenGene
Copy link
Member

This is one prior work of Module based Model Runtime Interface RFC. In this RFC, we want to serialize params (weight) into shared library directly.

However, previous way of serializing the blob is to use PackImportToC, i.e. we will serialize the blob into char array __tvm_dev_blob, which stored the serialized result (0x...0x...), then we write this into dev.cc and pass to the compiler compiling. If we don't serialize params, it is ok. However, when we serialize params, the __tvm_dev_blob and dev.cc will become very large (resnet18 workload of dev.cc will be 380M, without serializing weight is 2.6M). The compiling time will increase very much (Even 10x slower).

According to the investigation, the bottleneck is the parsing time (see GCC time report).

So we decide to avoid the C++ compiler (i.e. avoid the parsing time) and generate LLVM module directly. After testing, even we we serialize the params into shared library, the compiling time of generating blob is very fast. The compiling time of resnet18 on one cloud machine I tested before could boost from 128.46s to 13.28s. see previous test

After this work, I find it also have some a little improvement of compiling time compared previous way testing on my local machine (Intel i7-7700 CPU @ 3.60GHz).

Workload is restnet18 on CUDA. Run 10 times and get the average

PackImportToC PackImportToLLVM
8.586s 8.082s

@tqchen @zhiics @yzhliu @icemelon9

@FrozenGene FrozenGene force-pushed the codegen_blob branch 2 times, most recently from 92b9d98 to 4f1772c Compare January 8, 2020 14:03
src/codegen/codegen.cc Outdated Show resolved Hide resolved
src/codegen/build_module.cc Outdated Show resolved Hide resolved
@tqchen
Copy link
Member

tqchen commented Jan 8, 2020

cc @zhiics @yzhliu @ajtulloch

Copy link
Member

@zhiics zhiics left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Halfway done. Will spend some time to review the llvm part. The major concern I have so far is that I think we probably don't need to have the setter and getter in Target as we only use each of them once and we know where we need it.

include/tvm/build_module.h Outdated Show resolved Hide resolved
include/tvm/codegen.h Outdated Show resolved Hide resolved
python/tvm/build_module.py Outdated Show resolved Hide resolved
python/tvm/module.py Outdated Show resolved Hide resolved
src/codegen/build_module.cc Outdated Show resolved Hide resolved
src/codegen/build_module.cc Outdated Show resolved Hide resolved
src/codegen/llvm/codegen_blob.cc Show resolved Hide resolved
@FrozenGene
Copy link
Member Author

FrozenGene commented Jan 8, 2020

@tqchen @zhiics About the LLVM target part, I also think current way maybe is not good. I also think of it and cost my some time. So I want to discuss it too.

Firstly Let me describe why we need llvm target host.

When we use PackImportToLLVM, we will need to use LLVM to create LLVM module.

However, we must need one target machine, this is answered to @tqchen 's question. We can not create one generic target. Because we have platform's specific feature to handle. See:https://github.com/apache/incubator-tvm/blob/4f1772c0598ae708477f457a993f5be52fa71eb9/src/codegen/llvm/codegen_blob.cc#L107 and https://github.com/apache/incubator-tvm/blob/4f1772c0598ae708477f457a993f5be52fa71eb9/src/codegen/llvm/codegen_blob.cc#L167. Another thing is if we leave the target is empty, i.e. no module->setTargetTriple(target_triple.str()); On llvm 6.0 of Mac will report problem : assert error Target-incompatible DataLayout.

More import thing is we need to consider the target host is not x86 cpu. For example, target host is llvm -target aarch64, if we leave it empty and build it on x86 server, we will generate devc.o into x86 format, but in fact we want aarch64 of devc.o. So in the codegen_blob part, we should know the correct llvm target host to generate correct file.

Current way is a little ugly I think too. Current way will create target host based on LLVM when we have detected target host finally (because tvm.build / relay.build's api could leave target_host be None, we have some logic to handle this situation). One simple way of course is let users specific the target host of llvm representation, but I use this ugly way is just to detect it automatically and don't want to let users do. I also want to listen to your advices. Thanks in advance.

@tqchen
Copy link
Member

tqchen commented Jan 8, 2020

If we indeed needs to pass target triple, perhaps the best way to do so is to pass that as an argument of PackImportToLLVM

@FrozenGene
Copy link
Member Author

FrozenGene commented Jan 8, 2020

If we indeed needs to pass target triple, perhaps the best way to do so is to pass that as an argument of PackImportToLLVM

@tqchen So, your opinion is let the users specify. The API will become this:

  def export_library(self,
                      file_name,
                      fcompile=None,
                      llvm_target=None,
                      **kwargs):
   if enabled("llvm"):
      path_obj = temp.relpath("devc.o")
      m = _PackImportsToLLVM(self, is_system_lib,  llvm_target)

If users doesn't set llvm_target, we use the default value. (system target triple).

if users want to compile into another os / another cpu, they have to do this:
module.export_library(..., llvm_target="llvm -target aarch64-linux-gnu-none") now.

Does it make sense to you?

@tqchen
Copy link
Member

tqchen commented Jan 8, 2020

Hmm, i meant to have someway to automatically discover the triple and pass around, let us think a bit more about it. For example, if there is already an LLVM module in the modules, then we can get that from that module.

@tqchen
Copy link
Member

tqchen commented Jan 8, 2020

Some ideas about automatic detection, in the order of things that can be tried

  • If the module lists already contains an LLVM module, we can get triples from those modules
  • The property could be part of fcompile.get_target_triple()
    • Use hasattr to detect if fcompile contains the property
    • The function can return None, which means unable to detect
    • Note that for gcc and clang gcc -dumpmachine will give you the triple
    • Add support for this function in create_shared(using gcc -dumpmachine)
  • If we cannot detect using the above approach, fallback to PackToC, note that this is super unlikely

@FrozenGene
Copy link
Member Author

Some ideas about automatic detection, in the order of things that can be tried

  • If the module lists already contains an LLVM module, we can get triples from those modules

  • The property could be part of fcompile.get_target_triple()

    • Use hasattr to detect if fcompile contains the property
    • The function can return None, which means unable to detect
    • Note that for gcc and clang gcc -dumpmachine will give you the triple
    • Add support for this function in create_shared(using gcc -dumpmachine)
  • If we cannot detect using the above approach, fallback to PackToC, note that this is super unlikely

Good idea. One quick question, how about Windows's compiler cl? The cl (https://docs.microsoft.com/en-us/cpp/build/reference/compiler-options-listed-by-category?view=vs-2019) seems doesn't have option to show arch information of its self.

@tqchen
Copy link
Member

tqchen commented Jan 9, 2020

if cl is used, I think we can safely assume the target is windows, we only need to know whether if it is win32 or win64, i am not that familar with cl to know for sure, but i guess in that case we can pass and use LLVM module detection, or fallback to c

@FrozenGene
Copy link
Member Author

if cl is used, I think we can safely assume the target is windows, we only need to know whether if it is win32 or win64, i am not that familar with cl to know for sure, but i guess in that case we can pass and use LLVM module detection, or fallback to c

Yes. We could assume the target is Windows for sure. However, i think besides we need to know win32 / win64, one more thing I am a little worried. Windows could run ARM CPU (cl.exe could run Windows 10 on ARM? I can not make sure yet too).

@FrozenGene
Copy link
Member Author

FrozenGene commented Jan 9, 2020

if cl is used, I think we can safely assume the target is windows, we only need to know whether if it is win32 or win64, i am not that familar with cl to know for sure, but i guess in that case we can pass and use LLVM module detection, or fallback to c

Yes. We could assume the target is Windows for sure. However, i think besides we need to know win32 / win64, one more thing I am a little worried. Windows could run ARM CPU (cl.exe could run Windows 10 on ARM? I can not make sure yet too).

Ah...One tricky way I just see on the YoutuBe...
image
Maybe we could use DUMPBIN of cl.exe to make sure it is ARM or x86...But, it is so tricky, I personally think we maybe could fallback to c on Windows if we have fallen into fcompile.get_target_triple(). Because I think

If the module lists already contains an LLVM module, we can get triples from those modules

could solve most of our cases.

How about your opinion? @tqchen

@tqchen
Copy link
Member

tqchen commented Jan 9, 2020

sounds good @FrozenGene i agree with all your points

@FrozenGene FrozenGene force-pushed the codegen_blob branch 4 times, most recently from 8fed25b to c24fddf Compare January 9, 2020 06:53
@FrozenGene
Copy link
Member Author

Now, we could have new elegant and simpler automatic detection of LLVM target triple (thanks @tqchen 's good idea). @tqchen @zhiics could you help to review new code again? Thanks!

include/tvm/codegen.h Outdated Show resolved Hide resolved
python/tvm/contrib/cc.py Outdated Show resolved Hide resolved
python/tvm/contrib/cc.py Outdated Show resolved Hide resolved
python/tvm/contrib/ndk.py Outdated Show resolved Hide resolved
python/tvm/contrib/cc.py Outdated Show resolved Hide resolved
src/codegen/llvm/llvm_module.cc Outdated Show resolved Hide resolved
src/codegen/llvm/codegen_blob.cc Outdated Show resolved Hide resolved
src/codegen/llvm/codegen_blob.h Outdated Show resolved Hide resolved
@FrozenGene FrozenGene force-pushed the codegen_blob branch 3 times, most recently from dc08bfa to 0da6116 Compare January 10, 2020 06:31
python/tvm/contrib/cc.py Outdated Show resolved Hide resolved
python/tvm/contrib/cc.py Outdated Show resolved Hide resolved
@zhiics
Copy link
Member

zhiics commented Jan 10, 2020

Thanks @FrozenGene @tqchen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants