[CodeGen] Generate blob use LLVM directly #4657

FrozenGene · 2020-01-08T12:15:26Z

This is one prior work of Module based Model Runtime Interface RFC. In this RFC, we want to serialize params (weight) into shared library directly.

However, previous way of serializing the blob is to use PackImportToC, i.e. we will serialize the blob into char array __tvm_dev_blob, which stored the serialized result (0x...0x...), then we write this into dev.cc and pass to the compiler compiling. If we don't serialize params, it is ok. However, when we serialize params, the __tvm_dev_blob and dev.cc will become very large (resnet18 workload of dev.cc will be 380M, without serializing weight is 2.6M). The compiling time will increase very much (Even 10x slower).

According to the investigation, the bottleneck is the parsing time (see GCC time report).

So we decide to avoid the C++ compiler (i.e. avoid the parsing time) and generate LLVM module directly. After testing, even we we serialize the params into shared library, the compiling time of generating blob is very fast. The compiling time of resnet18 on one cloud machine I tested before could boost from 128.46s to 13.28s. see previous test

After this work, I find it also have some a little improvement of compiling time compared previous way testing on my local machine (Intel i7-7700 CPU @ 3.60GHz).

Workload is restnet18 on CUDA. Run 10 times and get the average

PackImportToC	PackImportToLLVM
8.586s	8.082s

@tqchen @zhiics @yzhliu @icemelon9

src/codegen/codegen.cc

src/codegen/build_module.cc

tqchen · 2020-01-08T17:00:28Z

cc @zhiics @yzhliu @ajtulloch

zhiics

Halfway done. Will spend some time to review the llvm part. The major concern I have so far is that I think we probably don't need to have the setter and getter in Target as we only use each of them once and we know where we need it.

include/tvm/build_module.h

include/tvm/codegen.h

python/tvm/build_module.py

python/tvm/module.py

src/codegen/build_module.cc

src/codegen/llvm/codegen_blob.cc

FrozenGene · 2020-01-08T18:37:35Z

@tqchen @zhiics About the LLVM target part, I also think current way maybe is not good. I also think of it and cost my some time. So I want to discuss it too.

Firstly Let me describe why we need llvm target host.

When we use PackImportToLLVM, we will need to use LLVM to create LLVM module.

However, we must need one target machine, this is answered to @tqchen 's question. We can not create one generic target. Because we have platform's specific feature to handle. See:https://github.com/apache/incubator-tvm/blob/4f1772c0598ae708477f457a993f5be52fa71eb9/src/codegen/llvm/codegen_blob.cc#L107 and https://github.com/apache/incubator-tvm/blob/4f1772c0598ae708477f457a993f5be52fa71eb9/src/codegen/llvm/codegen_blob.cc#L167. Another thing is if we leave the target is empty, i.e. no module->setTargetTriple(target_triple.str()); On llvm 6.0 of Mac will report problem : assert error Target-incompatible DataLayout.

More import thing is we need to consider the target host is not x86 cpu. For example, target host is llvm -target aarch64, if we leave it empty and build it on x86 server, we will generate devc.o into x86 format, but in fact we want aarch64 of devc.o. So in the codegen_blob part, we should know the correct llvm target host to generate correct file.

Current way is a little ugly I think too. Current way will create target host based on LLVM when we have detected target host finally (because tvm.build / relay.build's api could leave target_host be None, we have some logic to handle this situation). One simple way of course is let users specific the target host of llvm representation, but I use this ugly way is just to detect it automatically and don't want to let users do. I also want to listen to your advices. Thanks in advance.

tqchen · 2020-01-08T19:27:11Z

If we indeed needs to pass target triple, perhaps the best way to do so is to pass that as an argument of PackImportToLLVM

FrozenGene · 2020-01-08T19:34:34Z

If we indeed needs to pass target triple, perhaps the best way to do so is to pass that as an argument of PackImportToLLVM

@tqchen So, your opinion is let the users specify. The API will become this:

  def export_library(self,
                      file_name,
                      fcompile=None,
                      llvm_target=None,
                      **kwargs):
   if enabled("llvm"):
      path_obj = temp.relpath("devc.o")
      m = _PackImportsToLLVM(self, is_system_lib,  llvm_target)

If users doesn't set llvm_target, we use the default value. (system target triple).

if users want to compile into another os / another cpu, they have to do this:
module.export_library(..., llvm_target="llvm -target aarch64-linux-gnu-none") now.

Does it make sense to you?

tqchen · 2020-01-08T20:00:27Z

Hmm, i meant to have someway to automatically discover the triple and pass around, let us think a bit more about it. For example, if there is already an LLVM module in the modules, then we can get that from that module.

tqchen · 2020-01-08T20:10:10Z

Some ideas about automatic detection, in the order of things that can be tried

If the module lists already contains an LLVM module, we can get triples from those modules
The property could be part of fcompile.get_target_triple()
- Use hasattr to detect if fcompile contains the property
- The function can return None, which means unable to detect
- Note that for gcc and clang gcc -dumpmachine will give you the triple
- Add support for this function in create_shared(using gcc -dumpmachine)
If we cannot detect using the above approach, fallback to PackToC, note that this is super unlikely

FrozenGene · 2020-01-09T03:04:07Z

Some ideas about automatic detection, in the order of things that can be tried

If the module lists already contains an LLVM module, we can get triples from those modules

The property could be part of fcompile.get_target_triple()

Use hasattr to detect if fcompile contains the property

The function can return None, which means unable to detect

Note that for gcc and clang gcc -dumpmachine will give you the triple

Add support for this function in create_shared(using gcc -dumpmachine)

If we cannot detect using the above approach, fallback to PackToC, note that this is super unlikely

Good idea. One quick question, how about Windows's compiler cl? The cl (https://docs.microsoft.com/en-us/cpp/build/reference/compiler-options-listed-by-category?view=vs-2019) seems doesn't have option to show arch information of its self.

tqchen · 2020-01-09T03:55:28Z

if cl is used, I think we can safely assume the target is windows, we only need to know whether if it is win32 or win64, i am not that familar with cl to know for sure, but i guess in that case we can pass and use LLVM module detection, or fallback to c

FrozenGene · 2020-01-09T04:01:00Z

if cl is used, I think we can safely assume the target is windows, we only need to know whether if it is win32 or win64, i am not that familar with cl to know for sure, but i guess in that case we can pass and use LLVM module detection, or fallback to c

Yes. We could assume the target is Windows for sure. However, i think besides we need to know win32 / win64, one more thing I am a little worried. Windows could run ARM CPU (cl.exe could run Windows 10 on ARM? I can not make sure yet too).

FrozenGene · 2020-01-09T04:07:30Z

if cl is used, I think we can safely assume the target is windows, we only need to know whether if it is win32 or win64, i am not that familar with cl to know for sure, but i guess in that case we can pass and use LLVM module detection, or fallback to c

Yes. We could assume the target is Windows for sure. However, i think besides we need to know win32 / win64, one more thing I am a little worried. Windows could run ARM CPU (cl.exe could run Windows 10 on ARM? I can not make sure yet too).

Ah...One tricky way I just see on the YoutuBe...

Maybe we could use DUMPBIN of cl.exe to make sure it is ARM or x86...But, it is so tricky, I personally think we maybe could fallback to c on Windows if we have fallen into fcompile.get_target_triple(). Because I think

If the module lists already contains an LLVM module, we can get triples from those modules

could solve most of our cases.

How about your opinion? @tqchen

tqchen · 2020-01-09T04:24:20Z

sounds good @FrozenGene i agree with all your points

FrozenGene · 2020-01-09T06:54:56Z

Now, we could have new elegant and simpler automatic detection of LLVM target triple (thanks @tqchen 's good idea). @tqchen @zhiics could you help to review new code again? Thanks!

include/tvm/codegen.h

python/tvm/contrib/cc.py

python/tvm/contrib/ndk.py

python/tvm/contrib/cc.py

src/codegen/llvm/llvm_module.cc

src/codegen/llvm/codegen_blob.cc

src/codegen/llvm/codegen_blob.h

python/tvm/contrib/cc.py

zhiics · 2020-01-10T17:00:53Z

Thanks @FrozenGene @tqchen

FrozenGene force-pushed the codegen_blob branch 2 times, most recently from 92b9d98 to 4f1772c Compare January 8, 2020 14:03

tqchen requested changes Jan 8, 2020

View reviewed changes

src/codegen/codegen.cc Outdated Show resolved Hide resolved

src/codegen/build_module.cc Outdated Show resolved Hide resolved

tqchen added the status: need review label Jan 8, 2020

zhiics requested changes Jan 8, 2020

View reviewed changes

FrozenGene force-pushed the codegen_blob branch 4 times, most recently from 8fed25b to c24fddf Compare January 9, 2020 06:53

tqchen requested changes Jan 9, 2020

View reviewed changes

zhiics reviewed Jan 9, 2020

View reviewed changes

src/codegen/llvm/codegen_blob.cc Outdated Show resolved Hide resolved

src/codegen/llvm/codegen_blob.h Outdated Show resolved Hide resolved

FrozenGene force-pushed the codegen_blob branch 3 times, most recently from dc08bfa to 0da6116 Compare January 10, 2020 06:31

tqchen requested changes Jan 10, 2020

View reviewed changes

python/tvm/contrib/cc.py Outdated Show resolved Hide resolved

python/tvm/contrib/cc.py Outdated Show resolved Hide resolved

[CodeGen] Generate blob use LLVM directly

f354a00

FrozenGene force-pushed the codegen_blob branch from 0da6116 to f354a00 Compare January 10, 2020 06:47

FrozenGene mentioned this pull request Jan 10, 2020

[Doc] Introduction to module serialization #4564

Merged

tqchen approved these changes Jan 10, 2020

View reviewed changes

tqchen assigned zhiics Jan 10, 2020

zhiics approved these changes Jan 10, 2020

View reviewed changes

zhiics merged commit 54fb55a into apache:master Jan 10, 2020

zhiics added status: accepted and removed status: need review labels Jan 10, 2020

FrozenGene deleted the codegen_blob branch January 13, 2020 03:47

alexwong pushed a commit to alexwong/tvm that referenced this pull request Feb 26, 2020

[CodeGen] Generate blob use LLVM directly (apache#4657)

135691d

alexwong pushed a commit to alexwong/tvm that referenced this pull request Feb 28, 2020

[CodeGen] Generate blob use LLVM directly (apache#4657)

d60aaa2

zhiics pushed a commit to neo-ai/tvm that referenced this pull request Mar 2, 2020

[CodeGen] Generate blob use LLVM directly (apache#4657)

8a8291a

trevor-m mentioned this pull request Jul 8, 2020

32-bit arm-linux-android compilation for OpenCL broken after PR #4657 ([CodeGen] Generate blob use LLVM directly) #6019

Closed

trevor-m mentioned this pull request Jul 27, 2020

Fix -mfloat-abi=soft compilation for ARM with OpenCL target #6150

Merged

ZihengJiang mentioned this pull request Sep 17, 2020

TVM v0.7 Release Note Candidate #6486

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CodeGen] Generate blob use LLVM directly #4657

[CodeGen] Generate blob use LLVM directly #4657

FrozenGene commented Jan 8, 2020

tqchen commented Jan 8, 2020

zhiics left a comment

FrozenGene commented Jan 8, 2020 •

edited

Loading

tqchen commented Jan 8, 2020

FrozenGene commented Jan 8, 2020 •

edited

Loading

tqchen commented Jan 8, 2020

tqchen commented Jan 8, 2020 •

edited

Loading

FrozenGene commented Jan 9, 2020

tqchen commented Jan 9, 2020

FrozenGene commented Jan 9, 2020

FrozenGene commented Jan 9, 2020 •

edited

Loading

tqchen commented Jan 9, 2020

FrozenGene commented Jan 9, 2020

zhiics commented Jan 10, 2020

[CodeGen] Generate blob use LLVM directly #4657

[CodeGen] Generate blob use LLVM directly #4657

Conversation

FrozenGene commented Jan 8, 2020

tqchen commented Jan 8, 2020

zhiics left a comment

Choose a reason for hiding this comment

FrozenGene commented Jan 8, 2020 • edited Loading

tqchen commented Jan 8, 2020

FrozenGene commented Jan 8, 2020 • edited Loading

tqchen commented Jan 8, 2020

tqchen commented Jan 8, 2020 • edited Loading

FrozenGene commented Jan 9, 2020

tqchen commented Jan 9, 2020

FrozenGene commented Jan 9, 2020

FrozenGene commented Jan 9, 2020 • edited Loading

tqchen commented Jan 9, 2020

FrozenGene commented Jan 9, 2020

zhiics commented Jan 10, 2020

FrozenGene commented Jan 8, 2020 •

edited

Loading

FrozenGene commented Jan 8, 2020 •

edited

Loading

tqchen commented Jan 8, 2020 •

edited

Loading

FrozenGene commented Jan 9, 2020 •

edited

Loading