Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

Don't cache reinit_modules #5543

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

JohnGiorgi
Copy link
Contributor

@JohnGiorgi JohnGiorgi commented Jan 16, 2022

Fixes #5505 (comment)

Changes proposed in this pull request:

  • Don't cache transformers when reinit_modules is provided.
  • Removes reinit_modules from the transformer spec
  • Always load a new model when reinit_modules is not None

Before submitting

  • I've read and followed all steps in the Making a pull request
    section of the CONTRIBUTING docs.
  • I've updated or added any relevant docstrings following the syntax described in the
    Writing docstrings section of the CONTRIBUTING docs.
  • If this PR fixes a bug, I've added a test that will fail without my fix.
  • If this PR adds a new feature, I've added tests that sufficiently cover my new functionality.

After submitting

  • All GitHub Actions jobs for my pull request have passed.
  • codecov/patch reports high test coverage (at least 90%).
    You can find this under the "Actions" tab of the pull request once the other checks have finished.

@JohnGiorgi JohnGiorgi changed the title Dont cache reinit_modules Don't cache reinit_modules Jan 16, 2022
Copy link
Member

@epwalsh epwalsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one of @dirkgr's points was that we don't want to add a reinitialized transformer to the cache. So maybe only run _model_cache[spec] = transformer when reinit_modules is None.

CHANGELOG.md Outdated
@@ -26,6 +26,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Removed a spurious error message "'torch.cuda' has no attribute '_check_driver'" that would be appear in the logs
when a `ConfigurationError` for missing GPU was raised.
- Load model on CPU post training to save GPU memory.
- Don't cache models with `cached_transformers` when `reinit_modules` is not `None`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to omit this actually since this feature hasn't been released yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed!

@JohnGiorgi
Copy link
Contributor Author

I think one of @dirkgr's points was that we don't want to add a reinitialized transformer to the cache. So maybe only run _model_cache[spec] = transformer when reinit_modules is None.

Oh right, good catch. I have fixed this.

@dirkgr
Copy link
Member

dirkgr commented Jan 18, 2022

The way reinit should work is that if reinit is specified, it loads the model first using the normal cached_transformers code path, with make_copy = True, and then re-inits the layers it needs. If you already have the weights loaded (which happens a lot in AllenNLP), then it'll be much faster the second time.

I would almost say that we should have an entirely separate function that reinits some layers from a given transformer model. It doesn't have to be part of cached_transformers at all. But I don't know how you're using this downstream, so maybe that's not practical.

@JohnGiorgi
Copy link
Contributor Author

@dirkgr I had originally added this functionality to PretrainedTransformerEmbedder, but then I figured it made sense to move it to cached_transformers, so any part of the library that loads a transformer could reinit some layers/modules. Is there somewhere else in the library it could live that makes more sense? If it is its own function, how would a user access it, e.g. in a training config.

@dirkgr
Copy link
Member

dirkgr commented Feb 24, 2022

I'd say for ease of use, just have it in the cached_transformers module as well. While strictly speaking it doesn't have to be used with cached transformers, I think in practice that's what's going to happen.

@dirkgr dirkgr self-assigned this Feb 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants