Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix _init_weights for ResNetPreTrainedModel #31851

Merged
merged 25 commits into from
Jul 9, 2024
Merged

Fix _init_weights for ResNetPreTrainedModel #31851

merged 25 commits into from
Jul 9, 2024

Conversation

ydshieh
Copy link
Collaborator

@ydshieh ydshieh commented Jul 9, 2024

What does this PR do?

Fix #31841. The cause is clearly explained by @williford.

I still have to check why this is not captured by CI.

OK, it's because the test added in # (test_mismatched_shapes_have_properly_initialized_weights) only checks AutoModelForSequenceClassification but not AutoModelForSequenceClassification.

I will have to update it.

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for digging into this and fixing!

I think there's a wider issue outside of mismatched shapes due to the fast_init logic. I started looking into it here: #30451 but haven't had time to finish as there's quite a few models (notably audio models) which will not have all of their parameters properly initialized.

@@ -3179,9 +3179,9 @@ def test_mismatched_shapes_have_properly_initialized_weights(self):
continue

# TODO: ydshieh
if model_class.__name__ in ["Wav2Vec2ForSequenceClassification", "Wav2Vec2ForSequenceClassification"]:
if model_class.__name__ in ["Wav2Vec2ForSequenceClassification", "Wav2Vec2ForSequenceClassification", "CLIPForImageClassification", "RegNetForImageClassification"]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm - skipping a test because it's failing it's the best reason.... As this is already done, I'm OK with it providing there's an issue to track so this is resolved.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amyeroberts Here there are 2 kinds of failures:

  • one is about not initialized at all and giving very large or even nan values: this is important and should be fixed
  • one is about if we are using config.initializer_factor to control the initialization, but that is really more for testing purpose (historically)
        initializer_factor (`float`, *optional*, defaults to 1.0):
            A factor for initializing all weight matrices (should be kept to 1, used internally for initialization
            testing).

and that is less important. For the model classes I added here, they seem all belong to the 2nd cases.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding #30451, it might contain other issues/cases I don't observed here though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#30451 Would fall under the first kind of error - for some models and some parameters we get unpredictable values e.g. nans because the empty arrays are never properly initialized as they're skipped during initialization. It's due to how our initialization logic works, where a layer or paramater can be marked as initialized even if it's only partially done. For example, only the bias being initialized. The main issue is that it's not properly caught in our tests at the moment.

@ydshieh
Copy link
Collaborator Author

ydshieh commented Jul 9, 2024

For example

FAILED tests/models/wav2vec2/test_modeling_wav2vec2.py::Wav2Vec2ModelTest::test_mismatched_shapes_have_properly_initialized_weights - AssertionError: 0.4431610107421875 not found in [0.0, 1.0] : Parameter wav2vec2.masked_spec_embed of model <class 'transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2ForSequenceClassification'> seems not properly initialized

FAILED tests/models/wav2vec2/test_modeling_wav2vec2.py::Wav2Vec2RobustModelTest::test_mismatched_shapes_have_properly_initialized_weights - AssertionError: 0.4431610107421875 not found in [0.0, 1.0] : Parameter wav2vec2.masked_spec_embed of model <class 'transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2ForSequenceClassification'> seems not properly initialized


FAILED tests/models/clip/test_modeling_clip.py::CLIPForImageClassificationModelTest::test_mismatched_shapes_have_properly_initialized_weights - AssertionError: 0.0022945739328861237 not found in [0.0, 1.0] : Parameter classifier.weight of model <class 'transformers.models.clip.modeling_clip.CLIPForImageClassification'> seems not properly initialized

FAILED tests/models/regnet/test_modeling_regnet.py::RegNetModelTest::test_mismatched_shapes_have_properly_initialized_weights - AssertionError: 0.0011449280427768826 not found in [0.0, 1.0] : Parameter regnet.embedder.embedder.convolution.weight of model <class 'transformers.models.regnet.modeling_regnet.RegNetForImageClassification'> seems not properly initialized

Values like 0.4431610107421875, 0.0022945739328861237 or 0.0011449280427768826 are all reasonable, despite it's neither 0.0 nor 1.0.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

"Wav2Vec2ForSequenceClassification",
"CLIPForImageClassification",
"RegNetForImageClassification",
"ResNetForImageClassification",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amyeroberts I realized we can't just skip by looking the model class (otherwise we don't detect ResNetForImageClassification before the fix of this PR)

Comment on lines +3189 to +3209
special_param_names = [
r"wav2vec2\.masked_spec_embed",
r"wav2vec2\.feature_extractor\.conv_layers\..+\.conv\.weight",
r"wav2vec2\.feature_projection\.projection\.weight",
r"wav2vec2\.feature_projection\.projection\.bias",
r"wav2vec2\.encoder\.pos_conv_embed\.conv\.parametrizations\.weight\.original.",
r"classifier\.weight",
r"regnet\.embedder\.embedder\.convolution\.weight",
r"regnet\.encoder\.stages\..+\.layers\..+\.layer\..+\.convolution\.weight",
r"regnet\.encoder\.stages\..+\.layers\..+\.shortcut\.convolution\.weight",
r"regnet\.encoder\.stages\..+\.layers\..+\.layer\..+\.attention\..+\.weight",
r"regnet\.encoder\.stages\..+\.layers\..+\.layer\..+\.attention\..+\.bias",
r"classifier\..+\.weight",
r"classifier\..+\.bias",
r"resnet\.embedder\.embedder\.convolution\.weight",
r"resnet\.encoder\.stages\..+\.layers\..+\.shortcut\.convolution\.weight",
r"resnet\.encoder\.stages\..+\.layers\..+\.layer\..+\.convolution\.weight",
r"resnet\.encoder\.stages\..+\.layers\..+\.shortcut\.convolution\.weight",
r"resnet\.encoder\.stages\..+\.layers\..+\.layer\..+\.attention\..+\.weight",
r"resnet\.encoder\.stages\..+\.layers\..+\.layer\..+\.attention\..+\.bias",
]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So let's not skip, but allow the mean values to be in [-1.0, 1.0] for some parameters of certain model classes.

Comment on lines +3239 to +3248
self.assertGreaterEqual(
param_mean,
-1.0,
msg=f"Parameter {name} of model {model_class} seems not properly initialized",
)
self.assertLessEqual(
param_mean,
1.0,
msg=f"Parameter {name} of model {model_class} seems not properly initialized",
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the new block.

@ydshieh ydshieh merged commit 4c8149d into main Jul 9, 2024
22 checks passed
@ydshieh ydshieh deleted the fix_init_resnet branch July 9, 2024 18:09
ydshieh added a commit that referenced this pull request Jul 9, 2024
ydshieh added a commit that referenced this pull request Jul 9, 2024
Revert "Fix `_init_weights` for `ResNetPreTrainedModel` (#31851)"

This reverts commit 4c8149d.
ydshieh added a commit that referenced this pull request Jul 10, 2024
* Revert "Revert "Fix `_init_weights` for `ResNetPreTrainedModel`" (#31868)"

This reverts commit b45dd5d.

* fix

* [test_all] check

* fix

* [test_all] check

* fix

* [test_all] check

* fix

* [test_all] check

* fix

* [test_all] check

* fix

* [test_all] check

* fix

* [test_all] check

* fix

* [test_all] check

* fix

* [test_all] check

* fix

* [test_all] check

* fix

* [test_all] check

* fix

* [test_all] check

---------

Co-authored-by: ydshieh <[email protected]>
@williford
Copy link

Thanks @ydshieh and @amyeroberts ! Good job finding other impacted models and fixing!

amyeroberts pushed a commit to amyeroberts/transformers that referenced this pull request Jul 19, 2024
* Revert "Revert "Fix `_init_weights` for `ResNetPreTrainedModel`" (huggingface#31868)"

This reverts commit b45dd5d.

* fix

* [test_all] check

* fix

* [test_all] check

* fix

* [test_all] check

* fix

* [test_all] check

* fix

* [test_all] check

* fix

* [test_all] check

* fix

* [test_all] check

* fix

* [test_all] check

* fix

* [test_all] check

* fix

* [test_all] check

* fix

* [test_all] check

* fix

* [test_all] check

---------

Co-authored-by: ydshieh <[email protected]>
MHRDYN7 pushed a commit to MHRDYN7/transformers that referenced this pull request Jul 23, 2024
* Revert "Revert "Fix `_init_weights` for `ResNetPreTrainedModel`" (huggingface#31868)"

This reverts commit b45dd5d.

* fix

* [test_all] check

* fix

* [test_all] check

* fix

* [test_all] check

* fix

* [test_all] check

* fix

* [test_all] check

* fix

* [test_all] check

* fix

* [test_all] check

* fix

* [test_all] check

* fix

* [test_all] check

* fix

* [test_all] check

* fix

* [test_all] check

* fix

* [test_all] check

---------

Co-authored-by: ydshieh <[email protected]>
zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request Jul 24, 2024
* Revert "Revert "Fix `_init_weights` for `ResNetPreTrainedModel`" (huggingface#31868)"

This reverts commit b45dd5d.

* fix

* [test_all] check

* fix

* [test_all] check

* fix

* [test_all] check

* fix

* [test_all] check

* fix

* [test_all] check

* fix

* [test_all] check

* fix

* [test_all] check

* fix

* [test_all] check

* fix

* [test_all] check

* fix

* [test_all] check

* fix

* [test_all] check

* fix

* [test_all] check

---------

Co-authored-by: ydshieh <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Finetuning doesn't initialize microsoft/resnet classifier weights with _fast_init
4 participants