Add initial support for torch.utils.checkpoint #1127

IvanYashchuk · 2024-09-09T14:45:48Z

A checkpointed function doesn't save any intermediates from forward to backward. Instead, all required values are recomputed during the backward pass. Because less intermediates are saved, peak memory usage is usually decreased.

This PR introduces the support of recognizing torch.utils.checkpoint.checkpoint calls and inserting a new bound symbol in the initial trace. Then in the forward-backward generation pass this bound symbol is converted into augmented forward and backward parts of the computation. This step requires the function argument to thunder.torch.checkpoint be a Thunder function. Currently, there's no conversion PyTorch->Thunder implemented and this works only for simple functions that are both recognized by Thunder and PyTorch, for example when only methods are used.

The PyTorch function needs to be converted to a Thunder function in Thunder's JIT. Previously we could simply use thunder.preprocess which is not available today. When I attempted implementing a redispatching/reinterpretation of PyTorch functions using general_thunder_jit I hit the following bug: #1126.

Example:

import thunder
import torch

def f(x):
    return torch.utils.checkpoint.checkpoint(lambda x: x.sin().cos().exp(), x)

jf = thunder.jit(f)
x = torch.randn(3, 4, device="cuda", requires_grad=True)
jf(x).backward(x)
print(thunder.last_traces(jf)[-1])
print(thunder.last_backward_traces(jf)[-1])

Forward execution trace:

def augmented_forward_fn(x):
  # x: "cuda:0 f32[3, 4]"
  [t2] = nvFusion0(x)
    # t0 = prims.sin(x)  # t0: "cuda:0 f32[3, 4]"
    # t1 = prims.cos(t0)  # t1: "cuda:0 f32[3, 4]"
    # t2 = prims.exp(t1)  # t2: "cuda:0 f32[3, 4]"
  return {'output': t2, 'flat_args': [x], 'flat_output': (t2,)}, ((x,), ())

Backward execution trace:

def backward_fn(saved_for_backward, cotangents):
  # saved_for_backward: "Collection"
  # cotangents: "Collection"
  C0, _, = saved_for_backward
  clear_mutable_collection(saved_for_backward)
  del saved_for_backward
  t3, = cotangents
  clear_mutable_collection(cotangents)
  del cotangents
  x, = C0
  clear_mutable_collection(C0)
  del C0
  [t12] = nvFusion0(x, t3)
    # t4 = prims.sin(x)  # t4: "cuda:0 f32[3, 4]"
    # t11 = prims.cos(x)  # t11: "cuda:0 f32[3, 4]"
    # t5 = prims.cos(t4)  # t5: "cuda:0 f32[3, 4]"
    # t8 = prims.sin(t4)  # t8: "cuda:0 f32[3, 4]"
    # t6 = prims.exp(t5)  # t6: "cuda:0 f32[3, 4]"
    # t7 = prims.mul(t3, t6)  # t7: "cuda:0 f32[3, 4]"
    # t9 = prims.neg(t8)  # t9: "cuda:0 f32[3, 4]"
    # t10 = prims.mul(t7, t9)  # t10: "cuda:0 f32[3, 4]"
    # t12 = prims.mul(t10, t11)  # t12: "cuda:0 f32[3, 4]"
  del x, t3
  return (t12,)

…ing support

for more information, see https://pre-commit.ci

thunder/core/jit_ext.py

mruberry

Cool! @t-vi, do you want to take a look?

t-vi

I'm not convinced of the design.
Why would we not just let the function be any function and have a state "currently checkpointing" that informs Thunder to add a tag to the proxies that are generated during the checkpointing instead?
We would need to clear that tag on the outputs, but that would be easier than reentrant jit and higher order functions.

IvanYashchuk · 2024-09-10T07:21:34Z

Why would we not just let the function be any function and have a state "currently checkpointing" that informs Thunder to add a tag to the proxies that are generated during the checkpointing instead? We would need to clear that tag on the outputs, but that would be easier than reentrant jit.

Do you have ideas about how the "currently checkpointing" approach would generalize to supporting, for example, torch.cond? Please continue in the issue #1134.

t-vi · 2024-09-10T08:36:11Z

I don't have immediate ideas, but I don't see that we should be having higher order functions right now.
If anything it's the wrong sequencing.

syed-ahmed · 2024-10-02T19:36:15Z

@IvanYashchuk You might wanna checkout selective activation checkpointing available in PyTorch nightlies: https://pytorch.org/docs/main/checkpoint.html#torch.utils.checkpoint.create_selective_checkpoint_contexts to specify which activations to save for backward.

.

IvanYashchuk · 2024-10-09T14:11:31Z

@IvanYashchuk You might wanna checkout selective activation checkpointing available in PyTorch nightlies: https://pytorch.org/docs/main/checkpoint.html#torch.utils.checkpoint.create_selective_checkpoint_contexts to specify which activations to save for backward.

Awesome, thanks for the link, Syed! Not a fan of ATen ops leaking into the PyTorch Python interface with torch.matmul becoming torch.ops.aten.mm.default, but I will check out how it could be recognized by Thunder.

IvanYashchuk added 3 commits September 9, 2024 13:48

Allow functions in tree_flatten

88c98dd

Add thunder.torch.checkpoint for PyTorch-native activation checkpoint…

29a1245

…ing support

Add test_torch_checkpoint

21bfdc4

IvanYashchuk added the autograd label Sep 9, 2024

IvanYashchuk requested review from mruberry, lantiga and t-vi as code owners September 9, 2024 14:45

IvanYashchuk requested a review from crcrpar September 9, 2024 14:45

[pre-commit.ci] auto fixes from pre-commit.com hooks

16f7b22

for more information, see https://pre-commit.ci

mruberry reviewed Sep 9, 2024

View reviewed changes

thunder/core/jit_ext.py Show resolved Hide resolved

mruberry approved these changes Sep 9, 2024

View reviewed changes

t-vi previously requested changes Sep 9, 2024

View reviewed changes

IvanYashchuk mentioned this pull request Sep 10, 2024

Reentrant JIT for higher order operators #1134

Open

IvanYashchuk added 2 commits September 10, 2024 10:29

Add a docstring to _general_jit_torch_checkpoint_lookaside

2c2136d

Fix formatting of docstring

d3ba631

This was referenced Sep 25, 2024

Calling general_thunder_jit inside lookasides doesn't work #1126

Open

Define torchsymbol for torch.ops.higher_order.autograd_function_apply #1106

Merged

IvanYashchuk mentioned this pull request Oct 9, 2024

[WIP] Draft checkpoint interpret call #1275

Draft

4 tasks

IvanYashchuk merged commit 3f3d46a into main Oct 18, 2024
37 checks passed

IvanYashchuk deleted the functional-autograd-checkpoint branch October 18, 2024 10:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add initial support for torch.utils.checkpoint #1127

Add initial support for torch.utils.checkpoint #1127

IvanYashchuk commented Sep 9, 2024

mruberry left a comment

t-vi left a comment •

edited

Loading

IvanYashchuk commented Sep 10, 2024

t-vi commented Sep 10, 2024 •

edited

Loading

syed-ahmed commented Oct 2, 2024

IvanYashchuk commented Oct 9, 2024

Add initial support for torch.utils.checkpoint #1127

Add initial support for torch.utils.checkpoint #1127

Conversation

IvanYashchuk commented Sep 9, 2024

mruberry left a comment

Choose a reason for hiding this comment

t-vi left a comment • edited Loading

Choose a reason for hiding this comment

IvanYashchuk commented Sep 10, 2024

t-vi commented Sep 10, 2024 • edited Loading

syed-ahmed commented Oct 2, 2024

IvanYashchuk commented Oct 9, 2024

t-vi left a comment •

edited

Loading

t-vi commented Sep 10, 2024 •

edited

Loading