Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Runtime][WIP] Add prototype Relay AoT compiler directly into TVM #6219

Closed
wants to merge 4 commits into from

Conversation

slyubomirsky
Copy link
Contributor

In reference to this RFC, this PR is intended to incorporate the existing external Relay ahead-of-time (AoT) compiler, which was primarily written by @MarisaKirisame, into TVM. To start, I am simply including most of the files from the AoT compiler repo nearly verbatim, though the interfaces should be changed to better adhere to the high-level vision for TVM (especially since the initial code comes from a research prototype).

The prototype AoT compiler operates by translating Relay ASTs directly into C++ code and using TVM's JIT compiler to register all primitive functions (i.e., the C++ code calls into TVM's operator cache to handle operators). This results in producing a C++ file and requires calling the system's C++ compiler (in the prototype, assuming it to be clang).

I would be curious to hear others' thoughts (e.g., @jroesch @weberlo @tqchen) about how this compiler can be better integrated into TVM's systems. Based on the discussion in the RFC, it sounds that the interface should be made to take an IRModule and produce a runtime module that can call the compiled functions. Ideally the system could be made modular to allow for target languages other than C++

@tqchen
Copy link
Member

tqchen commented Aug 6, 2020

Thanks @slyubomirsky . Some highlevel comments.

First of all, AOT should not be part of the runtime. Runtime contains the minimum set of things we need to execute the program. Most of the AOT logic are actually part of target translation phase. Taking the current organization of relay into account, they should be moved to relay/backend(eventually a better place might be target).

Of course there are additional runtime features and wrappers needed to run the program compiled from AOT. From the interface point of view, we should remove these wrapper code and completely rely on the PackedFunc and runtime.Module interface.

So the AOT compilation should take in an IRModule and output a runtime.Module, which contains the functions necessary to run the generated program. Ideally the runtime.Module should contain similar interface with other compiled programs, such as the vm and the graph runtime

@tqchen tqchen self-assigned this Aug 6, 2020
@slyubomirsky
Copy link
Contributor Author

Thank you for the suggestions! I will move the files to backend. I will see about getting PackedFuncs for operators directly instead of using the JIT to register them (this might fix some of the other weird bugs we've seen in the research prototype).

@slyubomirsky
Copy link
Contributor Author

slyubomirsky commented Aug 7, 2020

Hm, if the CI fails because TVM_HOME can't be assumed to be defined, is there any way for the produced C++ files to refer to TVM data structures definitions like NDArrays? I will see if there is any way to reference the compiled TVM so file from aot.py

edit: I guess I can use find_include_path() and find_lib_path() from libinfo

if lib_path is None:
lib_path = os.curdir

debug_source_path = os.path.join(lib_path, 'source.cc')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you put this functionality behind a debug flag?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

with open(debug_source_path, 'w') as source_file:
source_file.write(source)

# with tempfile.TmporaryDirectory() as tmpdir:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

@manupak
Copy link
Contributor

manupak commented Aug 11, 2020

@slyubomirsky Thanks for the work! -- Some very high-level comments.

IIUC, this is bypassing the TIR lowering as it stands today. Thus, possibly losing the benefits TIR scheduling based optimizations in AOT compilation path. I just wanted to know what the roadmap looks like if we are to re-target the AOT codegen at the TIR level. Would it be an incremental work on top of this ? or would it require a complete re-write ?

@MarisaKirisame
Copy link
Contributor

@manupa-arm the primitive function is still lowered to TIR - we only compile Relay fragment to C++. This is in accordance to how Relay had work for Interpreter/VM - Relay fragment get handle separately where primtive function get lowered to TIR and handled by TVM.

If you are talking about lowering everything to TIR, the biggest problem is the design implication. It will make TIR less like fortran and way more like SML.

@manupak
Copy link
Contributor

manupak commented Aug 11, 2020

@MarisaKirisame , thanks for the clarification!

Copy link
Contributor

@manupak manupak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments, see if you agree.

return res if not record_time else (res, end - begin)
return _wrapper

def compile_prog(func, mod, ctx, tgt, name='default', record_time=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be a good idea to make the func the "main" inside mod ? So that we only needs to pass the mod with a "main".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this was definitely a flaw in the research prototype that we never took the time to correct. I agree that using the main function would be a much more sensible convention

f"-L{TVM_PATH}/build"
]

if system == 'Darwin':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compiler config could be a class variable -- a dictionary to be precise.
Maybe look that up for the flags and refactor the rest as they are mostly same.
Maybe a comment explaining the special cased flags for "Darwin" could be useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think this should be written in a more maintainable manner. I will see if it can be made to programmatically match up with TVM's own C++ build configuration, for example

_LIB_COUNTER += 1
return lib_name, packed_name

def _mk_wrapper(func, ctx, constants, record_time):
Copy link
Contributor

@manupak manupak Aug 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wrapper seems to enable runtime perf. measurement; Maybe add a comment describing the functionality;
Also it would be better make record_time, default to False.
[Suggestion] make this a decorator and remove the wrapper from the implemetation and use the decorator where the performance measurement is needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may remove the time recording feature entirely for now, as it was something we included ad hoc for a single experiment

func = compiler.visit(func)
lib_name, packed_name = lib_and_func_name(name)
constants, source_code = to_source.to_source(func, compiler.gv_map, ctx, packed_name)
lib_name = f"librelay_aot_{_LIB_COUNTER}.so"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for user to optionally give the paths for the artifacts : .so and .cc ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be a good option, I will include that in my next revision.

@manupak
Copy link
Contributor

manupak commented Aug 12, 2020

[Clarification Question] How are the memory allocated for tensors -- in-between primitive functions -- ? Please point me to the code if its there -- It seems I have missed that. Do you do storage_id usage optimizations such as done in graph plan memory ?


must_run_process(["clang-format", "-i", debug_source_path])

system = os.uname()[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make this an argument to the compile cpp_function?
In future, it would be easier to enable cross-compilation.

@slyubomirsky
Copy link
Contributor Author

[Clarification Question] How are the memory allocated for tensors -- in-between primitive functions -- ? Please point me to the code if its there -- It seems I have missed that. Do you do storage_id usage optimizations such as done in graph plan memory ?

There are not, to my knowledge (@MarisaKirisame wrote most of the compiler), any memory planning optimizations in the AoT prototype, though it would certainly be a good addition. I never specifically looked into the memory allocation behavior (it was an area we ignored in the prototype altogether), but I believe allocations simply happen when the NDArray constructor is called in the generated code -- I will check that.

@tqchen tqchen changed the base branch from master to main October 11, 2020 18:21
@jroesch
Copy link
Member

jroesch commented Aug 27, 2021

I am doing triage on old PRs, going to close this, please feel free to follow up if you would like to still merge these changes. Thanks for your contributions!.

@jroesch jroesch closed this Aug 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants