Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to parse CLIP to HAR? #108

Open
jayong-sv opened this issue Jul 5, 2024 · 2 comments
Open

How to parse CLIP to HAR? #108

jayong-sv opened this issue Jul 5, 2024 · 2 comments

Comments

@jayong-sv
Copy link

I noticed that CLIP is already present in the Hailo Model Zoo, which suggests that conversion is possible. link

I need help converting a model I trained myself. How can I parse CLIP to HAR?
After converting a ResNet-based CLIP model to ONNX, I encountered the following error when parsing torch.nn.functional.multi_head_attention_forward from ONNX to HAR.

/local/workspace/hailo_virtualenv/bin/python /local/shared_with_docker/pycharm_codes/hailo_model_zoo/tutorials/clip-reid_parsing.py 
Model has been exported to attention_pool2d.onnx
[info] Translation started on ONNX model mha_test
[info] Restored ONNX model mha_test (completion time: 00:00:00.13)
[info] Extracted ONNXRuntime meta-data for Hailo model (completion time: 00:00:00.14)
[info] Attempting to retry parsing on a simplified model, using onnx simplifier
[info] Simplified ONNX model for a retry attempt (completion time: 00:00:01.65)
Traceback (most recent call last):
  File "/local/workspace/hailo_virtualenv/lib/python3.8/site-packages/hailo_sdk_client/sdk_backend/parser/parser.py", line 176, in translate_onnx_model
    parsing_results = self._parse_onnx_model_to_hn(onnx_model, valid_net_name, start_node_names,
  File "/local/workspace/hailo_virtualenv/lib/python3.8/site-packages/hailo_sdk_client/sdk_backend/parser/parser.py", line 231, in _parse_onnx_model_to_hn
    return self.parse_model_to_hn(onnx_model, None, net_name, start_node_names, end_node_names,
  File "/local/workspace/hailo_virtualenv/lib/python3.8/site-packages/hailo_sdk_client/sdk_backend/parser/parser.py", line 257, in parse_model_to_hn
    fuser = HailoNNFuser(converter.convert_model(), net_name, converter.end_node_names)
  File "/local/workspace/hailo_virtualenv/lib/python3.8/site-packages/hailo_sdk_client/model_translator/translator.py", line 63, in convert_model
    self._create_layers()
  File "/local/workspace/hailo_virtualenv/lib/python3.8/site-packages/hailo_sdk_client/model_translator/edge_nn_translator.py", line 26, in _create_layers
    self._add_direct_layers()
  File "/local/workspace/hailo_virtualenv/lib/python3.8/site-packages/hailo_sdk_client/model_translator/edge_nn_translator.py", line 131, in _add_direct_layers
    raise ParsingWithRecommendationException(
hailo_sdk_client.model_translator.exceptions.ParsingWithRecommendationException: Parsing failed. The errors found in the graph are:
 UnsupportedShuffleLayerError in op /Reshape_2: Unable to create shuffle layer at /Reshape_2
 UnsupportedShuffleLayerError in op /Reshape_3: Unable to create shuffle layer at /Reshape_3
 UnsupportedShuffleLayerError in op /Reshape_1: Unable to create shuffle layer at /Reshape_1
 UnsupportedShuffleLayerError in op /Reshape_5: Failed to determine type of layer to create in node /Reshape_5
 UnsupportedShuffleLayerError in op /Reshape_6: Failed to determine type of layer to create in node /Reshape_6
 UnsupportedShuffleLayerError in op /Reshape_4: Failed to determine type of layer to create in node /Reshape_4
 UnsupportedShuffleLayerError in op /Transpose_5: Failed to determine type of layer to create in node /Transpose_5
 UnsupportedShuffleLayerError in op /Reshape_7: Failed to determine type of layer to create in node /Reshape_7
Please try to parse the model again, using these end node names: /Add_3, /Add_2, /Constant_19, /Constant_22, /Constant_20, /Constant_15, /Add_1

Reproduction Code

The code to reproduce the error is as follows. AttentionPool2d is taken from the OpenAI CLIP code: link

from hailo_sdk_client import ClientRunner
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.onnx

class AttentionPool2d(nn.Module):
    def __init__(self, spacial_dim: int, embed_dim: int, num_heads: int, output_dim: int = None):
        super().__init__()
        self.positional_embedding = nn.Parameter(torch.randn(spacial_dim ** 2 + 1, embed_dim) / embed_dim ** 0.5)
        self.k_proj = nn.Linear(embed_dim, embed_dim)
        self.q_proj = nn.Linear(embed_dim, embed_dim)
        self.v_proj = nn.Linear(embed_dim, embed_dim)
        self.c_proj = nn.Linear(embed_dim, output_dim or embed_dim)
        self.num_heads = num_heads

    def forward(self, x):
        x = x.flatten(start_dim=2).permute(2, 0, 1)  # NCHW -> (HW)NC
        x = torch.cat([x.mean(dim=0, keepdim=True), x], dim=0)  # (HW+1)NC
        x = x + self.positional_embedding[:, None, :].to(x.dtype)  # (HW+1)NC
        x, _ = F.multi_head_attention_forward(
            query=x[:1], key=x, value=x,
            embed_dim_to_check=x.shape[-1],
            num_heads=self.num_heads,
            q_proj_weight=self.q_proj.weight,
            k_proj_weight=self.k_proj.weight,
            v_proj_weight=self.v_proj.weight,
            in_proj_weight=None,
            in_proj_bias=torch.cat([self.q_proj.bias, self.k_proj.bias, self.v_proj.bias]),
            bias_k=None,
            bias_v=None,
            add_zero_attn=False,
            dropout_p=0,
            out_proj_weight=self.c_proj.weight,
            out_proj_bias=self.c_proj.bias,
            use_separate_proj_weight=True,
            training=self.training,
            need_weights=False
        )
        return x.squeeze(0)

# Model definition
spacial_dim = 7  # Adjusted to match the input data size of 7x7
embed_dim = 2048
num_heads = 8
output_dim = 1024
model = AttentionPool2d(spacial_dim, embed_dim, num_heads, output_dim)

# Generate dummy input (batch_size=1, channels=2048, height=7, width=7)
dummy_input = torch.randn(1, embed_dim, 7, 7)

# Export the model to ONNX format
onnx_path = "attention_pool2d.onnx"
model_name = "attention_pool2d"
torch.onnx.export(model, dummy_input, onnx_path,
                  export_params=True, opset_version=17,
                  do_constant_folding=True, input_names=['input'],
                  output_names=['output'],
                  dynamic_axes={'input': {0: 'batch_size'}, 'output': {0: 'batch_size'}})

print(f"Model has been exported to {onnx_path}")

runner = ClientRunner(hw_arch='hailo8')
hn, npz = runner.translate_onnx_model(onnx_path, 'mha_test', net_input_shapes={'input': list(dummy_input.shape)})

Execution Environment

  • HailoRT v4.16.0
  • Hailo Dataflow Compiler v3.26.0

@omerwer
Copy link

omerwer commented Aug 18, 2024

Hi @jayong-sv,
In the version you are working with, the Clip model is not supported. In the current release (DFC 3.28.0), only the clip_resnet image encoder is supported. More Clip models, including the text encoder, would be supported in the future.

Regards,

@jayong-sv
Copy link
Author

@omerwer
Hello, The error message I posted was occurred when I converted clip_resnet image encoder. F.multi_head_attention_forward is included in clip_resnet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants