You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
it gives me shape error for torch.cat of these two paramters:
visual_embeds shape: torch.Size([32, 4, 257, 1024])
text_embeds shape: torch.Size([32, 77, 768])
So eventually, I cannot use pretrain weights to fine tune the text-3D. I only can fine-tune the image-3D.
The text was updated successfully, but these errors were encountered:
I am trying to train this model using image and text function.
However,
craftsman:
CraftsMan/craftsman/models/conditional_encoders/clip_encoder.py
Line 138 in 2f9ff14
vision_outputs has not projection, its embedding is 1024 (visual_embeds shape: torch.Size([32, 4, 257, 1024])
)
Transformers:
https://github.com/huggingface/transformers/blob/be9cf070ee2cb6a9f0d162e5be32d9d68b9df3af/src/transformers/models/clip/modeling_clip.py#L1503
image_embeds has projection, its embedding is 768
CraftsMan/craftsman/models/conditional_encoders/clip_encoder.py
Line 163 in 2f9ff14
But text_features has its projection, its embedding is 768 (text_embeds shape: torch.Size([32, 77, 768]))
Eventually,
CraftsMan/craftsman/models/conditional_encoders/base.py
Line 97 in 2f9ff14
it gives me shape error for torch.cat of these two paramters:
visual_embeds shape: torch.Size([32, 4, 257, 1024])
text_embeds shape: torch.Size([32, 77, 768])
So eventually, I cannot use pretrain weights to fine tune the text-3D. I only can fine-tune the image-3D.
The text was updated successfully, but these errors were encountered: