Add option to skip optim steps for 0 grad params #636

epwalsh · 2024-06-28T22:15:38Z

#605 should be reviewed and merged first.

This PR adds the ability to skip optimizer updates for the parts of parameters that have 0 gradients, such as the embeddings for tokens not present in the current batch (assuming no weight tying).

- Adds configuration field `optimizer.record_update_metrics`, which defaults to `False`, but when set to `True` will trigger AdamW to collect the step size norm and absolute max for each parameter. - Changes the behavior of the Lion optimizer to only record the update cosine similarity when `optimizer.record_update_metrics` is `True` in order to be consistent with the API.

2015aroras · 2024-06-30T23:12:11Z

olmo/optim.py

+                # Perform step weight decay
+                mask: Optional[torch.Tensor] = None
+                if self._selective_updates:
+                    mask = grad != 0


thought: you could instead do mask = grad != 0 if self._selective_updates else 1, and assume the mask is always present in subsequent logic.

good call: 1024122

2015aroras · 2024-06-30T23:13:00Z

olmo/optim.py

@@ -373,9 +376,12 @@ def __init__(
        super().__init__(params, defaults)
        for group in self.param_groups:
            group["initial_lr"] = group["lr"]
+        self._selective_updates = selective_updates


nit: Like in the other PR, this could be moved into the parent class

done: e597e5f

2015aroras · 2024-07-09T17:06:14Z

olmo/optim.py

@@ -510,16 +512,20 @@ def step(self, closure=None) -> None:
 class AdamW(torch.optim.AdamW, Optimizer):
    def __init__(self, *args, record_update_metrics: bool = False, selective_updates: bool = False, **kwargs):
        super().__init__(*args, **kwargs)
-        self._record_step_size = record_update_metrics
+
+        # Need to set these here just like in our base `Optimizer` class since our `Optimizer.__init__`


Any reason we don't call Optimizer.__init__ too? Because multiple inheritance is complicated?

yea this gets messy b/c our Optimizer.__init__() also calls PyTorch's Optimizer.__init__(), which would then get called twice here unless we didn't call torch.optim.AdamW.__init__() (via super().__init__()), but then we'd have to copy over all the other code that happens within torch.optim.AdamW.__init__().

epwalsh and others added 9 commits May 28, 2024 13:53

Merge branch 'main' into epwalsh/update-logging

395ce68

Merge branch 'main' into epwalsh/update-logging

e77f49c

clean up

ff27fe1

changelog

dc8773c

fix case when local norms are empty

81628c9

actually fix this time

3bc65d4

final cleanup

e4d88c1

Add option to skip optim steps for 0 grad params

6345971

epwalsh requested review from 2015aroras, ananyahjha93 and dirkgr June 28, 2024 22:17

pass config field through

c87b0ee

2015aroras approved these changes Jun 30, 2024

View reviewed changes

epwalsh added 3 commits July 9, 2024 09:55

Merge branch 'main' into epwalsh/selective-wd

e597e5f

merge some code paths with/without mask

1024122

clean up

60a3118

2015aroras reviewed Jul 9, 2024

View reviewed changes

epwalsh merged commit bc60b8a into main Jul 9, 2024
12 checks passed

epwalsh deleted the epwalsh/selective-wd branch July 9, 2024 17:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option to skip optim steps for 0 grad params #636

Add option to skip optim steps for 0 grad params #636

epwalsh commented Jun 28, 2024 •

edited

Loading

2015aroras Jun 30, 2024

epwalsh Jul 9, 2024

2015aroras Jun 30, 2024

epwalsh Jul 9, 2024

2015aroras Jul 9, 2024

epwalsh Jul 9, 2024

Add option to skip optim steps for 0 grad params #636

Add option to skip optim steps for 0 grad params #636

Conversation

epwalsh commented Jun 28, 2024 • edited Loading

2015aroras Jun 30, 2024

Choose a reason for hiding this comment

epwalsh Jul 9, 2024

Choose a reason for hiding this comment

2015aroras Jun 30, 2024

Choose a reason for hiding this comment

epwalsh Jul 9, 2024

Choose a reason for hiding this comment

2015aroras Jul 9, 2024

Choose a reason for hiding this comment

epwalsh Jul 9, 2024

Choose a reason for hiding this comment

epwalsh commented Jun 28, 2024 •

edited

Loading