Fix `_init_weights` for `ResNetPreTrainedModel` #31851

ydshieh · 2024-07-09T08:37:50Z

What does this PR do?

Fix #31841. The cause is clearly explained by @williford.

I still have to check why this is not captured by CI.

OK, it's because the test added in # (test_mismatched_shapes_have_properly_initialized_weights) only checks AutoModelForSequenceClassification but not AutoModelForSequenceClassification.

I will have to update it.

amyeroberts

Thanks for digging into this and fixing!

I think there's a wider issue outside of mismatched shapes due to the fast_init logic. I started looking into it here: #30451 but haven't had time to finish as there's quite a few models (notably audio models) which will not have all of their parameters properly initialized.

amyeroberts · 2024-07-09T09:32:10Z

tests/test_modeling_common.py

@@ -3179,9 +3179,9 @@ def test_mismatched_shapes_have_properly_initialized_weights(self):
                continue

            # TODO: ydshieh
-            if model_class.__name__ in ["Wav2Vec2ForSequenceClassification", "Wav2Vec2ForSequenceClassification"]:
+            if model_class.__name__ in ["Wav2Vec2ForSequenceClassification", "Wav2Vec2ForSequenceClassification", "CLIPForImageClassification", "RegNetForImageClassification"]:


Hmmm - skipping a test because it's failing it's the best reason.... As this is already done, I'm OK with it providing there's an issue to track so this is resolved.

@amyeroberts Here there are 2 kinds of failures:

one is about not initialized at all and giving very large or even nan values: this is important and should be fixed

one is about if we are using config.initializer_factor to control the initialization, but that is really more for testing purpose (historically)

initializer_factor (`float`, *optional*, defaults to 1.0): A factor for initializing all weight matrices (should be kept to 1, used internally for initialization testing).

and that is less important. For the model classes I added here, they seem all belong to the 2nd cases.

Regarding #30451, it might contain other issues/cases I don't observed here though.

#30451 Would fall under the first kind of error - for some models and some parameters we get unpredictable values e.g. nans because the empty arrays are never properly initialized as they're skipped during initialization. It's due to how our initialization logic works, where a layer or paramater can be marked as initialized even if it's only partially done. For example, only the bias being initialized. The main issue is that it's not properly caught in our tests at the moment.

ydshieh · 2024-07-09T09:46:13Z

For example

FAILED tests/models/wav2vec2/test_modeling_wav2vec2.py::Wav2Vec2ModelTest::test_mismatched_shapes_have_properly_initialized_weights - AssertionError: 0.4431610107421875 not found in [0.0, 1.0] : Parameter wav2vec2.masked_spec_embed of model <class 'transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2ForSequenceClassification'> seems not properly initialized

FAILED tests/models/wav2vec2/test_modeling_wav2vec2.py::Wav2Vec2RobustModelTest::test_mismatched_shapes_have_properly_initialized_weights - AssertionError: 0.4431610107421875 not found in [0.0, 1.0] : Parameter wav2vec2.masked_spec_embed of model <class 'transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2ForSequenceClassification'> seems not properly initialized


FAILED tests/models/clip/test_modeling_clip.py::CLIPForImageClassificationModelTest::test_mismatched_shapes_have_properly_initialized_weights - AssertionError: 0.0022945739328861237 not found in [0.0, 1.0] : Parameter classifier.weight of model <class 'transformers.models.clip.modeling_clip.CLIPForImageClassification'> seems not properly initialized

FAILED tests/models/regnet/test_modeling_regnet.py::RegNetModelTest::test_mismatched_shapes_have_properly_initialized_weights - AssertionError: 0.0011449280427768826 not found in [0.0, 1.0] : Parameter regnet.embedder.embedder.convolution.weight of model <class 'transformers.models.regnet.modeling_regnet.RegNetForImageClassification'> seems not properly initialized

Values like 0.4431610107421875, 0.0022945739328861237 or 0.0011449280427768826 are all reasonable, despite it's neither 0.0 nor 1.0.

HuggingFaceDocBuilderDev · 2024-07-09T10:00:56Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ydshieh · 2024-07-09T13:08:46Z

tests/test_modeling_common.py

+                "Wav2Vec2ForSequenceClassification",
+                "CLIPForImageClassification",
+                "RegNetForImageClassification",
+                "ResNetForImageClassification",


@amyeroberts I realized we can't just skip by looking the model class (otherwise we don't detect ResNetForImageClassification before the fix of this PR)

ydshieh · 2024-07-09T13:09:35Z

tests/test_modeling_common.py

+            special_param_names = [
+                r"wav2vec2\.masked_spec_embed",
+                r"wav2vec2\.feature_extractor\.conv_layers\..+\.conv\.weight",
+                r"wav2vec2\.feature_projection\.projection\.weight",
+                r"wav2vec2\.feature_projection\.projection\.bias",
+                r"wav2vec2\.encoder\.pos_conv_embed\.conv\.parametrizations\.weight\.original.",
+                r"classifier\.weight",
+                r"regnet\.embedder\.embedder\.convolution\.weight",
+                r"regnet\.encoder\.stages\..+\.layers\..+\.layer\..+\.convolution\.weight",
+                r"regnet\.encoder\.stages\..+\.layers\..+\.shortcut\.convolution\.weight",
+                r"regnet\.encoder\.stages\..+\.layers\..+\.layer\..+\.attention\..+\.weight",
+                r"regnet\.encoder\.stages\..+\.layers\..+\.layer\..+\.attention\..+\.bias",
+                r"classifier\..+\.weight",
+                r"classifier\..+\.bias",
+                r"resnet\.embedder\.embedder\.convolution\.weight",
+                r"resnet\.encoder\.stages\..+\.layers\..+\.shortcut\.convolution\.weight",
+                r"resnet\.encoder\.stages\..+\.layers\..+\.layer\..+\.convolution\.weight",
+                r"resnet\.encoder\.stages\..+\.layers\..+\.shortcut\.convolution\.weight",
+                r"resnet\.encoder\.stages\..+\.layers\..+\.layer\..+\.attention\..+\.weight",
+                r"resnet\.encoder\.stages\..+\.layers\..+\.layer\..+\.attention\..+\.bias",
+            ]


So let's not skip, but allow the mean values to be in [-1.0, 1.0] for some parameters of certain model classes.

ydshieh · 2024-07-09T13:09:48Z

tests/test_modeling_common.py

+                                self.assertGreaterEqual(
+                                    param_mean,
+                                    -1.0,
+                                    msg=f"Parameter {name} of model {model_class} seems not properly initialized",
+                                )
+                                self.assertLessEqual(
+                                    param_mean,
+                                    1.0,
+                                    msg=f"Parameter {name} of model {model_class} seems not properly initialized",
+                                )


This is the new block.

This reverts commit 4c8149d.

Revert "Fix `_init_weights` for `ResNetPreTrainedModel` (#31851)" This reverts commit 4c8149d.

* Revert "Revert "Fix `_init_weights` for `ResNetPreTrainedModel`" (#31868)" This reverts commit b45dd5d. * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check --------- Co-authored-by: ydshieh <[email protected]>

williford · 2024-07-10T13:00:07Z

Thanks @ydshieh and @amyeroberts ! Good job finding other impacted models and fixing!

* Revert "Revert "Fix `_init_weights` for `ResNetPreTrainedModel`" (huggingface#31868)" This reverts commit b45dd5d. * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check * fix * [test_all] check --------- Co-authored-by: ydshieh <[email protected]>

ydshieh added 8 commits July 9, 2024 10:27

init

ca450de

init

d4100af

init

f613f76

init

a0c6c31

test

8251848

test

8283e77

test

5aad5b8

test

5fcc811

amyeroberts approved these changes Jul 9, 2024

View reviewed changes

amyeroberts reviewed Jul 9, 2024

View reviewed changes

test

dcc2a08

ydshieh added 16 commits July 9, 2024 12:18

test

57b42f3

test

79d4671

test

eb83035

test

e260ea1

test

dad34a2

test

52a3fa8

test

a21cf43

test

6351996

test

6891fb1

test

62120d7

test

4e79b5b

test

c315a5f

test

7537757

test

b629be4

test

d264eed

test

fc4b3ca

ydshieh commented Jul 9, 2024

View reviewed changes

ydshieh merged commit 4c8149d into main Jul 9, 2024
22 checks passed

ydshieh deleted the fix_init_resnet branch July 9, 2024 18:09

ydshieh added a commit that referenced this pull request Jul 9, 2024

Revert "Fix _init_weights for ResNetPreTrainedModel (#31851)"

2dd3bb4

This reverts commit 4c8149d.

ydshieh mentioned this pull request Jul 9, 2024

Revert "Fix _init_weights for ResNetPreTrainedModel" #31868

Merged

ydshieh added a commit that referenced this pull request Jul 9, 2024

Revert "Fix _init_weights for ResNetPreTrainedModel" (#31868)

b45dd5d

Revert "Fix `_init_weights` for `ResNetPreTrainedModel` (#31851)" This reverts commit 4c8149d.

ydshieh mentioned this pull request Jul 10, 2024

Fix failed tests in #31851 #31879

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix `_init_weights` for `ResNetPreTrainedModel` #31851

Fix `_init_weights` for `ResNetPreTrainedModel` #31851

ydshieh commented Jul 9, 2024 •

edited

Loading

amyeroberts left a comment

amyeroberts Jul 9, 2024

ydshieh Jul 9, 2024

ydshieh Jul 9, 2024

amyeroberts Jul 9, 2024

ydshieh commented Jul 9, 2024

HuggingFaceDocBuilderDev commented Jul 9, 2024

ydshieh Jul 9, 2024

ydshieh Jul 9, 2024

ydshieh Jul 9, 2024

williford commented Jul 10, 2024

Fix _init_weights for ResNetPreTrainedModel #31851

Fix _init_weights for ResNetPreTrainedModel #31851

Conversation

ydshieh commented Jul 9, 2024 • edited Loading

What does this PR do?

amyeroberts left a comment

Choose a reason for hiding this comment

amyeroberts Jul 9, 2024

Choose a reason for hiding this comment

ydshieh Jul 9, 2024

Choose a reason for hiding this comment

ydshieh Jul 9, 2024

Choose a reason for hiding this comment

amyeroberts Jul 9, 2024

Choose a reason for hiding this comment

ydshieh commented Jul 9, 2024

HuggingFaceDocBuilderDev commented Jul 9, 2024

ydshieh Jul 9, 2024

Choose a reason for hiding this comment

ydshieh Jul 9, 2024

Choose a reason for hiding this comment

ydshieh Jul 9, 2024

Choose a reason for hiding this comment

williford commented Jul 10, 2024

Fix `_init_weights` for `ResNetPreTrainedModel` #31851

Fix `_init_weights` for `ResNetPreTrainedModel` #31851

ydshieh commented Jul 9, 2024 •

edited

Loading