-
Notifications
You must be signed in to change notification settings - Fork 26.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update BridgeTowerModelTester
#23029
Conversation
The documentation is not available anymore as the PR was closed or merged. |
@@ -54,87 +60,169 @@ | |||
from transformers import BridgeTowerProcessor | |||
|
|||
|
|||
class BridgeTowerModelTester: | |||
class BridgeTowerTextModelTester: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no BridgeTowerTextModelTest
however: we just use this tester class to create text config and text inputs
) | ||
|
||
|
||
class BridgeTowerImageModelTester: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as mentioned for text model tester above.
hidden_size=128, | ||
num_hidden_layers=2, | ||
num_attention_heads=4, | ||
intermediate_size=256, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This model requires some attributes to be defined in the top config (BridgeTowerConfig
).
@@ -225,6 +319,18 @@ class BridgeTowerModelTest(ModelTesterMixin, PipelineTesterMixin, unittest.TestC | |||
test_resize_embeddings = False | |||
has_attentions = False | |||
|
|||
@unittest.skip(reason="Does not work on the tiny model as we keep hitting edge cases.") | |||
def test_cpu_offload(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With large version, this test passes
pass | ||
|
||
@unittest.skip(reason="Does not work on the tiny model as we keep hitting edge cases.") | ||
def test_disk_offload(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
|
||
@unittest.skip(reason="Does not work on the tiny model as we keep hitting edge cases.") | ||
def test_model_parallelism(self): | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With large model, there is a device issue when running the forward pass.
I tried to look it, but constantly got GPU OOM. So I decided to update this test file.
I will take a look this test with larger model (but not too large)
@@ -202,7 +297,6 @@ def prepare_config_and_inputs_for_common(self): | |||
return config, inputs_dict | |||
|
|||
|
|||
@slow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fast now
Remark: with lager model (but not too large), we get FAILED tests/models/bridgetower/test_modeling_bridgetower.py::BridgeTowerModelTest::test_model_parallelism - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! Better to check this separately. Here is the full log > new_output = new_model(**inputs_dict_class)
tests/test_modeling_common.py:2616:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py:1501: in _call_impl
return forward_call(*args, **kwargs)
/usr/local/lib/python3.8/dist-packages/accelerate/hooks.py:165: in new_forward
output = old_forward(*args, **kwargs)
src/transformers/models/bridgetower/modeling_bridgetower.py:1423: in forward
image_embeds = self.vision_model.visual.transformer.resblocks[i](image_embeds).type(
/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py:1501: in _call_impl
return forward_call(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = BridgeTowerResidualAttention(
(attn): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_feature...ar(in_features=2048, out_features=512, bias=True)
)
(ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
)
hidden_state = tensor([[[ 0.5531, 0.0555, -0.0248, ..., 0.2110, -0.0403, 0.0487]],
[[ 0.2963, -0.1709, 0.0074, ..., 0... [[ 0.3324, -0.0536, -0.0069, ..., 0.0911, -0.0565, -0.2751]]],
device='cuda:1', grad_fn=<ViewBackward0>)
attention_mask = None
def forward(self, hidden_state: torch.Tensor, attention_mask: torch.Tensor = None):
residual_state = hidden_state + self.attention(self.ln_1(hidden_state), attention_mask)
hidden_state = self.ln_2(residual_state)
for _, layer in self.mlp.items():
hidden_state = layer(hidden_state)
> hidden_state = residual_state + hidden_state
E RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!
src/transformers/models/bridgetower/modeling_bridgetower.py:237: RuntimeError
================================================================================================== warnings summary ==================================================================================================
../usr/local/lib/python3.8/dist-packages/detectron2/data/transforms/transform.py:46
/usr/local/lib/python3.8/dist-packages/detectron2/data/transforms/transform.py:46: DeprecationWarning: LINEAR is deprecated and will be removed in Pillow 10 (2023-07-01). Use BILINEAR or Resampling.BILINEAR instead.
def __init__(self, src_rect, output_size, interp=Image.LINEAR, fill=0):
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================================================================================== short test summary info ===============================================================================================
FAILED tests/models/bridgetower/test_modeling_bridgetower.py::BridgeTowerModelTest::test_model_parallelism - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot!
* update --------- Co-authored-by: ydshieh <[email protected]>
* update --------- Co-authored-by: ydshieh <[email protected]>
What does this PR do?
Update
BridgeTowerModelTester
to use small values for config.