-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TOPI][x86] Introduce schedule_injective_from_existing and unify external schedules for all targets #3983
Conversation
topi/python/topi/generic/extern.py
Outdated
for out in outs: | ||
if isinstance(out.op, tvm.tensor.ExternOp): | ||
continue | ||
_schedule_injective(out.op, s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vinx13, I have moved this logic from cuda/extern.py
to generic/extern.py
. Will schedule_injective
still call the correct overridden function per target? Or will this just call the default one?
If it just calls the default one, it seems like I have to add a new file, x86/extern.py
Update: it seems like it does call the right one
@soiferj if you change the default behavior of schedule_extern in topi/python/topi/generic/extern.py, you should update topi/include/topi/generic/extern.h too. I would rather move your implementation of schedule_extern in x86/extern.h to generic/extern.h, and use it from both python and x86 cpp. |
Sure, I can work on that. Can you give me a pointer for how to call the cpp schedule from Python? |
Following might be useful. Look for cpp.generic or cpp.nn |
Sorry for all of the questions, I'm new to this code |
I see, you are right about the lack of target dispatch mechanism in our c++ topi (introduced for python in #556). It seems such dynamism is not possible in c++ topi at the moment. I can think of two options:
|
Personally, I feel that the more logic we put in C++, the better. That way, we only have to implement things once, and users can use the Python or C++ API with the exact same features. How about this: I'll implement this change like your first suggestion (duplicate logic in I'll also post an RFC on the forum about adding a target dispatch API for C++. In fact, the schedules are already registered as generic functions in C++ here. If we have an API like What do you think? |
@soiferj I agree that putting more things to C++ side is better. Lack of target dispatch in C++ can be confusing to users as they only need to call |
@soiferj Great! I thought implementing the target dispatch in C++ would not be straightforward, but if you are willing to do it I'm happy to help. You can continue with this PR as you see fit and I'll merge this ASAP. |
topi/include/topi/generic/extern.h
Outdated
continue; | ||
} | ||
Array<Tensor> new_outs = { out }; | ||
tvm::GenericFunc::Get("schedule_injective")(new_outs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@masahi or @vinx13 , this call seems to work as expected. It calls the correct schedule_injective function for the current target. However, the function that it calls creates its own schedule (see cuda/injective.py). This causes failures in the unit tests. This is probably why the previous implementation used a helper function, _schedule_injective.py). Can you think of any way to fix this when calling from C++?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the problem with cuda/injective.py
? Is overriding native generic from Python side a problem in your case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The specific error is Direct host side access to device memory is detected in fused_nn_conv2d_multiply_add_nn_relu_1. Did you forget to bind?
. This is being hit in tutorials/frontend/using_external_lib.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it has to do with the fact that schedule_injective
creates its own schedule, but I'm not totally sure.
Update: I just tried changing override_native_generic_func
to generic_func
and it has the same issue.
Thanks a lot for all of the help, btw. I'm just trying to avoid duplicating code :( I think if we solve this, we can open the door to cleaner refactoring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, we can't create a new schedule using schedule_injective
. So the problem is that we don't want to duplicate the helper function in C++ right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exactly. Maybe we should create a new generic function? Or have a generic function that is overridden with more arguments? Is that possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think generic func can be overridden by number of args. We can register the function as a new generic func
I just added a new generic function I see a lot of areas to refactor - in another change I think I'll move all of the |
Yeah, I am little confused by that too. I tried to mirror what In my testing, yes, the C++ code calls back into Python. I can try to remove target from the C++ interface and fix that unit test. Is that alright? In the future, we should also remove target from the schedule function signature. |
Yes, I prefer fixing the unit test and cleaning up the interface, if you can. |
Thanks a lot. Sorry for the long back-and-forth. I’m still pretty new to this codebase and am trying to make sure I do this the right way. |
This is a CPP unit test - how can I set the current target? |
@masahi or @vinx13 I am testing this new change, and while my code is being called as expected, I'm still confused as to whether this actually works. For example, I am testing the last few ops of BERT base, Would one of you be able to look at this script and verify whether the behavior is expected? Edit: That being said, the performance issue does seem to be resolved, I'm just a little confused as to why it works :) |
@soiferj Only the final output of the fused group will be passed to the schedule function, and that's why we need to use |
I see. But if you look at line 140, extern schedule returns immediately. In my example, the fused op is never fully traversed since it calls It almost seems like the callback should be returning |
|
For your case, I'm expecting that |
Thanks, it seems you're right. AutoInlineInjective seems to do the right thing here. Also, regarding the CPP unit test, the current target can be gotten by using |
Let's fix the test. As schedules are target-specific, we expect target to be set before calling it. We can the target ( |
Ok got the tests working :) |
topi/src/topi.cc
Outdated
*/ | ||
inline PackedFunc WrapScheduleFromExisting(FTVMScheduleFromExistingBuilder builder) { | ||
return PackedFunc([builder](TVMArgs args, TVMRetValue* ret) { | ||
*ret = builder(args[1], args[2]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it be args0 and args1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, fixed
Great @soiferj I will merge after CI. |
Awesome, thanks again for all of your help! |
…rnal schedules for all targets (apache#3983) * Fix extern schedule for x86 * Register x86::schedule_extern * Fix * Fix * Replace extern.py with extern.h * Introduce new generic function schedule_injective_from_existing * Fix * Fix * Add back to C++ * Fix style * Injective schedule calls local schedule_injective_from_existing * Fix * Remove target arg from schedule_injective_from_existing * Fix docs * Try to fix unit test * Fix test * Fix other tests * Fix bug
…rnal schedules for all targets (apache#3983) * Fix extern schedule for x86 * Register x86::schedule_extern * Fix * Fix * Replace extern.py with extern.h * Introduce new generic function schedule_injective_from_existing * Fix * Fix * Add back to C++ * Fix style * Injective schedule calls local schedule_injective_from_existing * Fix * Remove target arg from schedule_injective_from_existing * Fix docs * Try to fix unit test * Fix test * Fix other tests * Fix bug
…rnal schedules for all targets (apache#3983) * Fix extern schedule for x86 * Register x86::schedule_extern * Fix * Fix * Replace extern.py with extern.h * Introduce new generic function schedule_injective_from_existing * Fix * Fix * Add back to C++ * Fix style * Injective schedule calls local schedule_injective_from_existing * Fix * Remove target arg from schedule_injective_from_existing * Fix docs * Try to fix unit test * Fix test * Fix other tests * Fix bug
Currently x86
schedule_extern
does not work properly, and will treat extern ops as injective ops. This PR introduces a new generic function,schedule_injective_from_existing
that has the core logic ofschedule_injective
for each target.schedule_extern
then calls this method. This ends up fixingschedule_extern
for many targets besides just x86.Related to the discussion here.
@masahi @vinx13 would you be able to take a look?