-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Semantic discrepancy on requires_grad after compiling Tensor.detach #1052
Comments
cc @bdhirsh might be totally off but is this related to any of the work that you were doing to make requires_grad track correctly on proxies? |
Hmm I don't think so. It looks like it's because we're compiling the whole thing (including the @albanD brought up a good point - in aot autograd, we already run the forward once to get the expected output(s) to pass to the joint graph for tracing, and that point we should know the expected requires-gradness of every forward output. We can use autograd.function's mark-nondifferentiable API, to (statically) mark those outputs as not requiring gradients, which would fix this problem. That would technically make the |
Here's a potential fix based on my discussion with Alban: pytorch/pytorch#86838 |
Thanks to @bdhirsh for the quick fix. However, the following program still fails after cherry-picking the PR:
|
It looks like that repro runs ok with the aot-eager backend, but not with inductor:
When I print the output of inductor's codegen, I get:
Where it looks like inductor is treating the To be fair, it seems fair to argue that inductor shouldn't have to worry about |
Yes, Inductor treats
Agree. It will be good if AOT autograd can handle this automatically. Although, in theory, Inductor could generate code to handle |
@albanD Does this sound like correct If that sounds like incorrect behavior, I can dig into
|
I guess I found the reason why Basically, if the output is a view, then Maybe a stupid question. But can we just apply |
@sangongs nice catch. I'll defer to Alban, but... I think that sees reasonable (it feels bad, because it looks like we're trying to ignore |
I came up with a work-around in Inductor to deal with this special |
This is a good catch. In this case, if the user explicitely state that this is not differentiable, then we should properly detach as it is a non-differentiable view. |
@albanD to confirm - you think that this is something that should be handled transparently by aka if one of the tensors that the user marks with |
Yes |
…that dont require grad" Fixes pytorch/functorch#1052 I got here after some discussion with Alban. Today, if you aot_function() trace a program where some of its inputs have `requires_grad=True`, but some outputs are expected to have `requires_grad=False`, we will incorrectly set all outputs to have `requires_grad=True`. A simple solution is to use autograd.function's API for marking outputs as non-differentiable, based on what we witnessed when we traced the forward. This will make the `autograd.Function` that we return **wrong**, if you created it using inputs that required grad, and tried to re-use it with inputs that have different `requires_grad` field. But as long as we're hiding behind dynamo, which should guard on requires_grad, then we'll re-run `aot_function()` and get out a new compiled function that does the right thing. [ghstack-poisoned]
…e grad" Fixes pytorch/functorch#1052 I got here after some discussion with Alban. Today, if you aot_function() trace a program where some of its inputs have `requires_grad=True`, but some outputs are expected to have `requires_grad=False`, we will incorrectly set all outputs to have `requires_grad=True`. A simple solution is to use autograd.function's API for marking outputs as non-differentiable, based on what we witnessed when we traced the forward. This will make the `autograd.Function` that we return **wrong**, if you created it using inputs that required grad, and tried to re-use it with inputs that have different `requires_grad` field. But as long as we're hiding behind dynamo, which should guard on requires_grad, then we'll re-run `aot_function()` and get out a new compiled function that does the right thing. [ghstack-poisoned]
…that dont require grad" Fixes pytorch/functorch#1052 I got here after some discussion with Alban. Today, if you aot_function() trace a program where some of its inputs have `requires_grad=True`, but some outputs are expected to have `requires_grad=False`, we will incorrectly set all outputs to have `requires_grad=True`. A simple solution is to use autograd.function's API for marking outputs as non-differentiable, based on what we witnessed when we traced the forward. This will make the `autograd.Function` that we return **wrong**, if you created it using inputs that required grad, and tried to re-use it with inputs that have different `requires_grad` field. But as long as we're hiding behind dynamo, which should guard on requires_grad, then we'll re-run `aot_function()` and get out a new compiled function that does the right thing. [ghstack-poisoned]
…e grad" Fixes pytorch/functorch#1052 I got here after some discussion with Alban. Today, if you aot_function() trace a program where some of its inputs have `requires_grad=True`, but some outputs are expected to have `requires_grad=False`, we will incorrectly set all outputs to have `requires_grad=True`. A simple solution is to use autograd.function's API for marking outputs as non-differentiable, based on what we witnessed when we traced the forward. This will make the `autograd.Function` that we return **wrong**, if you created it using inputs that required grad, and tried to re-use it with inputs that have different `requires_grad` field. But as long as we're hiding behind dynamo, which should guard on requires_grad, then we'll re-run `aot_function()` and get out a new compiled function that does the right thing. [ghstack-poisoned]
…that dont require grad" Fixes pytorch/functorch#1052 I got here after some discussion with Alban. Today, if you aot_function() trace a program where some of its inputs have `requires_grad=True`, but some outputs are expected to have `requires_grad=False`, we will incorrectly set all outputs to have `requires_grad=True`. A simple solution is to use autograd.function's API for marking outputs as non-differentiable, based on what we witnessed when we traced the forward. This will make the `autograd.Function` that we return **wrong**, if you created it using inputs that required grad, and tried to re-use it with inputs that have different `requires_grad` field. But as long as we're hiding behind dynamo, which should guard on requires_grad, then we'll re-run `aot_function()` and get out a new compiled function that does the right thing. [ghstack-poisoned]
…e grad" Fixes pytorch/functorch#1052 I got here after some discussion with Alban. Today, if you aot_function() trace a program where some of its inputs have `requires_grad=True`, but some outputs are expected to have `requires_grad=False`, we will incorrectly set all outputs to have `requires_grad=True`. A simple solution is to use autograd.function's API for marking outputs as non-differentiable, based on what we witnessed when we traced the forward. This will make the `autograd.Function` that we return **wrong**, if you created it using inputs that required grad, and tried to re-use it with inputs that have different `requires_grad` field. But as long as we're hiding behind dynamo, which should guard on requires_grad, then we'll re-run `aot_function()` and get out a new compiled function that does the right thing. [ghstack-poisoned]
Looks like the issue is still not fixed for backends like inductor that do not handle
|
Reproduce:
PyTorch version: 1.13.0.dev20220929+cu116
Not sure if this is related to #376.
The text was updated successfully, but these errors were encountered: