Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update BridgeTowerModelTester #23029

Merged
merged 10 commits into from
Apr 27, 2023
Merged

Update BridgeTowerModelTester #23029

merged 10 commits into from
Apr 27, 2023

Conversation

ydshieh
Copy link
Collaborator

@ydshieh ydshieh commented Apr 27, 2023

What does this PR do?

Update BridgeTowerModelTester to use small values for config.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Apr 27, 2023

The documentation is not available anymore as the PR was closed or merged.

@@ -54,87 +60,169 @@
from transformers import BridgeTowerProcessor


class BridgeTowerModelTester:
class BridgeTowerTextModelTester:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no BridgeTowerTextModelTest however: we just use this tester class to create text config and text inputs

)


class BridgeTowerImageModelTester:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as mentioned for text model tester above.

Comment on lines +174 to +177
hidden_size=128,
num_hidden_layers=2,
num_attention_heads=4,
intermediate_size=256,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This model requires some attributes to be defined in the top config (BridgeTowerConfig).

@ydshieh ydshieh marked this pull request as ready for review April 27, 2023 15:52
@@ -225,6 +319,18 @@ class BridgeTowerModelTest(ModelTesterMixin, PipelineTesterMixin, unittest.TestC
test_resize_embeddings = False
has_attentions = False

@unittest.skip(reason="Does not work on the tiny model as we keep hitting edge cases.")
def test_cpu_offload(self):
Copy link
Collaborator Author

@ydshieh ydshieh Apr 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With large version, this test passes

pass

@unittest.skip(reason="Does not work on the tiny model as we keep hitting edge cases.")
def test_disk_offload(self):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same


@unittest.skip(reason="Does not work on the tiny model as we keep hitting edge cases.")
def test_model_parallelism(self):
pass
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With large model, there is a device issue when running the forward pass.
I tried to look it, but constantly got GPU OOM. So I decided to update this test file.
I will take a look this test with larger model (but not too large)

@ydshieh ydshieh requested a review from sgugger April 27, 2023 16:11
@@ -202,7 +297,6 @@ def prepare_config_and_inputs_for_common(self):
return config, inputs_dict


@slow
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fast now

@ydshieh
Copy link
Collaborator Author

ydshieh commented Apr 27, 2023

Remark: with lager model (but not too large), we get

FAILED tests/models/bridgetower/test_modeling_bridgetower.py::BridgeTowerModelTest::test_model_parallelism - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!

Better to check this separately.


Here is the full log

>                   new_output = new_model(**inputs_dict_class)

tests/test_modeling_common.py:2616: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py:1501: in _call_impl
    return forward_call(*args, **kwargs)
/usr/local/lib/python3.8/dist-packages/accelerate/hooks.py:165: in new_forward
    output = old_forward(*args, **kwargs)
src/transformers/models/bridgetower/modeling_bridgetower.py:1423: in forward
    image_embeds = self.vision_model.visual.transformer.resblocks[i](image_embeds).type(
/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py:1501: in _call_impl
    return forward_call(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = BridgeTowerResidualAttention(
  (attn): MultiheadAttention(
    (out_proj): NonDynamicallyQuantizableLinear(in_feature...ar(in_features=2048, out_features=512, bias=True)
  )
  (ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
)
hidden_state = tensor([[[ 0.5531,  0.0555, -0.0248,  ...,  0.2110, -0.0403,  0.0487]],

        [[ 0.2963, -0.1709,  0.0074,  ...,  0...      [[ 0.3324, -0.0536, -0.0069,  ...,  0.0911, -0.0565, -0.2751]]],
       device='cuda:1', grad_fn=<ViewBackward0>)
attention_mask = None

    def forward(self, hidden_state: torch.Tensor, attention_mask: torch.Tensor = None):
        residual_state = hidden_state + self.attention(self.ln_1(hidden_state), attention_mask)
        hidden_state = self.ln_2(residual_state)
        for _, layer in self.mlp.items():
            hidden_state = layer(hidden_state)
>       hidden_state = residual_state + hidden_state
E       RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!

src/transformers/models/bridgetower/modeling_bridgetower.py:237: RuntimeError
================================================================================================== warnings summary ==================================================================================================
../usr/local/lib/python3.8/dist-packages/detectron2/data/transforms/transform.py:46
  /usr/local/lib/python3.8/dist-packages/detectron2/data/transforms/transform.py:46: DeprecationWarning: LINEAR is deprecated and will be removed in Pillow 10 (2023-07-01). Use BILINEAR or Resampling.BILINEAR instead.
    def __init__(self, src_rect, output_size, interp=Image.LINEAR, fill=0):

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================================================================================== short test summary info ===============================================================================================
FAILED tests/models/bridgetower/test_modeling_bridgetower.py::BridgeTowerModelTest::test_model_parallelism - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot!

@ydshieh ydshieh merged commit 27b66be into main Apr 27, 2023
@ydshieh ydshieh deleted the fix_bridge branch April 27, 2023 16:26
@ydshieh ydshieh mentioned this pull request May 23, 2023
gojiteji pushed a commit to gojiteji/transformers that referenced this pull request Jun 5, 2023
* update

---------

Co-authored-by: ydshieh <[email protected]>
novice03 pushed a commit to novice03/transformers that referenced this pull request Jun 23, 2023
* update

---------

Co-authored-by: ydshieh <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants