Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Fix .view().detach() not handled correctly by AOT autograd #1661

Closed
wants to merge 1 commit into from

Conversation

sangongs
Copy link
Contributor

This is to fix issue pytorch/pytorch#93677

Depending on pytorch/pytorch#86838

@Chillee
Copy link
Contributor

Chillee commented Oct 14, 2022

@sangongs I don't think this is the right solution. Inductor should not be responsible for handling things like autograd.

@sangongs
Copy link
Contributor Author

@Chillee Thanks for having an eye on this.

Inductor should not be responsible for handling things like autograd.

Agreed.

This is just a tentative workaround for an AOT autograd issue. AOT autograd currently does not handle detach() correctly. According to @bdhirsh:

it's because we're compiling the whole thing (including the detach() call) into an autograd.Function, and autograd.Function will unconditionally mark all of its forward outputs as requiring gradients.

@bdhirsh made a fix:

We can use autograd.function's mark-nondifferentiable API, to (statically) mark those outputs as not requiring gradients

But the fix does not work as expected if the output is a view, because mark_non_differentiable takes no effect in such a case:
https://github.com/pytorch/pytorch/blob/ae45dab57e22e3d04516e7dd81ef8dbefd51bfe3/torch/csrc/autograd/custom_function.cpp#L290-L299

See the AOT autograd issue for more details: pytorch/functorch#1052

In my view, there are four options to fix this:

  1. Leave autograd.Function as it is. Work around the is_view() problem in Inductor like this PR.
  2. Work around the problem in AOT autograd. We are discussing this in the AOT autograd issue. But apparently @bdhirsh has concerns.
  3. Work around the problem in Dynamo. This may take a lot of efforts.
  4. Update autograd.Function such that mark_non_differentiable() takes effect when an output is a view. But I am not sure if this is viable.

@Chillee Do you have any suggestions?

@bdhirsh
Copy link
Contributor

bdhirsh commented Oct 14, 2022

Oh tbc, my vote is probably still for (2) - manually add in some extra .detach() calls in AOTAutograd, so inductor isn't forced to worry about requires-grad-ness (thread)

@jansel
Copy link
Contributor

jansel commented Oct 15, 2022

We have migrated torchdynamo to torch._dynamo and will use the pytorch/pytorch repo for future development. Please resubmit this PR to https://github.com/pytorch/pytorch/

More details and instructions to port this PR over can be found in #1588

@jansel jansel closed this Oct 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants