Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring visual prompting #3789

Merged

Conversation

sungchul2
Copy link
Contributor

@sungchul2 sungchul2 commented Aug 5, 2024

Summary

This PR includes:

  • Refine directory structure and merge fine-tuning and zero-shot into a single file

    src/otx/algo/visual_prompting/
    ├── backbones
    │   ├── __init__.py
    │   ├── tiny_vit.py
    │   └── vit.py
    ├── decoders
    │   ├── __init__.py
    │   └── sam_mask_decoder.py
    ├── encoders
    │   ├── __init__.py
    │   ├── sam_image_encoder.py
    │   └── sam_prompt_encoder.py
    ├── __init__.py
    ├── losses **
    │   ├── __init__.py
    │   └── sam_loss.py
    ├── sam.py **
    ├── utils
    │   ├── __init__.py
    │   ├── layer_norm_2d.py
    │   ├── mlp_block.py
    │   └── postprocess.py
    └── visual_prompters **
        ├── __init__.py
        └── segment_anything.py **
  • Rename OTXSegmentAnything to SAM and move common modules to OTXVisualPromptingModel

    before after
    class OTXSegmentAnything(OTXVisualPromptingModel): ...
    class OTXZeroShotSegmentAnything(OTXZeroShotVisualPromptingModel): ...
    class SAM(OTXVisualPromptingModel): ...
    class ZeroShotSAM(OTXZeroShotVisualPromptingModel): ...
  • Move functions related to OTX functionalities to OTXModel

    before after
    class SegmentAnything(nn.Module):
        ...
        def __init__(...) -> None:
        def forward(self) -> Any: ...
        def forward_train(self) -> Tensor | tuple[list[Tensor], list[Tensor]]: ...
    
        # -> SAM
        def freeze_networks(self) -> None: ...
        def load_checkpoint(self) -> None: ...
        def forward_inference(self) -> tuple[Tensor, ...]: ...
        def _embed_points(self) -> Tensor: ...
        def _embed_masks(self) -> Tensor: ...
        def calculate_stability_score(self) -> Tensor: ...
        def select_masks(self) -> tuple[Tensor, Tensor]: ...
    
        # -> src/otx/algo/visual_prompting/losses/sam_loss.py
        def calculate_dice_loss(self) -> Tensor: ...
        def calculate_sigmoid_ce_focal_loss(self) -> Tensor: ...
        def calculate_iou(self) -> Tensor: ...
    
        # -> src/otx/algo/visual_prompting/utils/postprocess.py
        def postprocess_masks(self) -> Tensor: ...
        def get_prepadded_size(self) -> Tensor: ...
    # src/otx/algo/visual_prompting/segment_anything.py
    class SegmentAnything(nn.Module):
        ...
        def __init__(...) -> None:
        def forward(self) -> Any: ...
    
    class SAM(OTXVisualPromptingModel):
        ...
        def __init__(self) -> None: ...
        def freeze_networks(self) -> None: ...
        def load_checkpoint(self) -> None: ...
        def forward_for_tracing(self) -> tuple[Tensor, ...]: ...
        def _embed_points(self) -> Tensor: ...
        def _embed_masks(self) -> Tensor: ...
        def calculate_stability_score(self) -> Tensor: ...
        def select_masks(self) -> tuple[Tensor, Tensor]: ...
    
    # src/otx/algo/visual_prompting/losses/sam_loss.py
    class SAMCriterion(nn.Module):
        ...
        def __init__(self) -> None: ...
        def calculate_dice_loss(self) -> Tensor: ...
        def calculate_sigmoid_ce_focal_loss(self) -> Tensor: ...
        def calculate_iou(self) -> Tensor: ...
    
    # src/otx/algo/visual_prompting/utils/postprocess.py
    def postprocess_masks(...) -> Tensor:
    def get_prepadded_size(self) -> Tensor: ...
  • Enable for NNModel to get loss module

    class SAM(OTXVisualPromptingModel):
        ...
        def _build_model(self) -> nn.Module:
            image_encoder = SAMImageEncoder(...)
            prompt_encoder = SAMPromptEncoder(...)
            mask_decoder = SAMMaskDecoder(...)
            criterion = SAMCriterion(image_size=self.image_size)
            return SegmentAnything(
                image_encoder=image_encoder,
                prompt_encoder=prompt_encoder,
                mask_decoder=mask_decoder,
                criterion=criterion,
                ...
            )

How to test

Checklist

  • I have added unit tests to cover my changes.​
  • I have added integration tests to cover my changes.​
  • I have ran e2e tests and there is no issues.
  • I have added the description of my changes into CHANGELOG in my target branch (e.g., CHANGELOG in develop).​
  • I have updated the documentation in my target branch accordingly (e.g., documentation in develop).
  • I have linked related issues.

License

  • I submit my code changes under the same Apache License that covers the project.
    Feel free to contact the maintainers if that's a concern.
  • I have updated the license header for each file (see an example below).
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

@sungchul2 sungchul2 force-pushed the refactoring-visual-prompting branch from 4b58132 to 9f35628 Compare August 5, 2024 08:27
@sungchul2 sungchul2 force-pushed the refactoring-visual-prompting branch from 5ffc596 to c2f5e64 Compare August 6, 2024 02:24
Copy link
Contributor

@eunwoosh eunwoosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work :) I think code becomes much clear. I left minor comments. Please take a look.

src/otx/core/model/visual_prompting.py Outdated Show resolved Hide resolved
src/otx/algo/visual_prompting/segment_anything.py Outdated Show resolved Hide resolved
@sungchul2
Copy link
Contributor Author

sungchul2 commented Aug 6, 2024

@kprokofi I discussed our current design with @eunwoosh and @harimkang, and they suggested using only one main model (YOLOX) instead of separating specific models (YOLOX-TINY, YOLOX-S, ...) because there are only a few different parameters in the same model family and we can just change them.
For example, in below figure, deepen_factor and widen_factor in backbone, in_channels and out_channels in neck, and in_channels and feat_channels in bbox_head are only scalable.
image

We can change those values using a parameter container like below:
image

@kprokofi
Copy link
Contributor

kprokofi commented Aug 6, 2024

My proposal was to unify all our models and tasks. Here I see that instead of creating separate classes and use simple
model = YOLOXTiny(), we will have to create a lot of dicts with parameters and use "if" statements in the base class. Also, API usage will be hurt .
If we have all parameters in common, can we just set them as default in backbone and head? It will reduce number of lines and keep the modularity for each model size.

So, @sungchul2 , @harimkang , @eunwoosh , are you proposing to use config based approach for all models rather than creating separate classes for each model size?

@kprokofi
Copy link
Contributor

kprokofi commented Aug 6, 2024

I personally like the way you refactored VPT task (from the whole structure) and vote to stick such structure for other tasks as well. @eunwoosh @sungchul2 @harimkang, what concerns do you have regarding this example?

@kprokofi
Copy link
Contributor

kprokofi commented Aug 6, 2024

I have only one question regarding postprocessing and other methods related to NN model. You put postprocessing in separate file and added methods related to NN model like freeze_network(), _embed_point(), etc to SAM class which inherited from OTXModel (as main parent).
Isn't it better to keep postprocessing in the core NN model? (SegmentAnything). We could provide "def postprocess()" function and also I would consider to move some of the methods from SAM() class to SegmentAnything() which are related to base NN model.
That's why I proposed 2 levels for NN models in each task. First level common base functionality like forward, init. Second level it is specific model implementation like SegmentAnything with specific for SegmentAnything methods like freeze_network(), _embed_point(), etc.
OTX models (SAM here) it is a wrapper that connects SegmentAnything with OTXModel and in my opinion it should include only OTXModel related stuff

@sungchul2
Copy link
Contributor Author

I personally like the way you refactored VPT task (from the whole structure) and vote to stick such structure for other tasks as well. @eunwoosh @sungchul2 @harimkang, what concerns do you have regarding this example?

I think both directions have pros and cons respectively.
I'd like to have a discussion to make our final decision.

@sungchul2
Copy link
Contributor Author

sungchul2 commented Aug 6, 2024

I have only one question regarding postprocessing and other methods related to NN model. You put postprocessing in separate file and added methods related to NN model like freeze_network(), _embed_point(), etc to SAM class which inherited from OTXModel (as main parent). Isn't it better to keep postprocessing in the core NN model? (SegmentAnything). We could provide "def postprocess()" function and also I would consider to move some of the methods from SAM() class to SegmentAnything() which are related to base NN model. That's why I proposed 2 levels for NN models in each task. First level common base functionality like forward, init. Second level it is specific model implementation like SegmentAnything with specific for SegmentAnything methods like freeze_network(), _embed_point(), etc. OTX models (SAM here) it is a wrapper that connects SegmentAnything with OTXModel and in my opinion it should include only OTXModel related stuff

Yes, I'm considering to move postprocess_mask into SegmentAnything because it is one of the essential functionalities for SegmentAnything.
But I still think that postprocess cannot be a common function for other NNModels because classification models don't need postprocessing and detection models need postprocessing for both loss calculation and getting predictions.
(Sorry if I missed for this) If NNModels for each task have independent functionalities except for init and forward, depending on tasks, it doesn't seem to matter whether it has postprocess or not.
But I'm not sure if it is fine to have different method structures across tasks. (Of course, it's a minor issue :D)

And, I think it is very important to distinguish functionalities between NNModel's and OTXModel's.
In this example, postprocess can be NNModel's, but freeze_network or _embed_point should be OTXModel's because the former is for light training and the latter is for exportation.
And light training and exportation are otx's functionalities, not NNModel's responsibility.
I think this implementation is correct one following your opinion except for postprocess.
What do you think?

@harimkang
Copy link
Contributor

harimkang commented Aug 6, 2024

I personally like the way you refactored VPT task (from the whole structure) and vote to stick such structure for other tasks as well. @eunwoosh @sungchul2 @harimkang, what concerns do you have regarding this example?

image
@kprokofi I actually prefer the before method. This is because even one model can have many versions depending on the parameter configuration. For example, we can look at EfficientNet.
image
EfficientNet is available in 9 different versions, depending on param configs. If we separate these to fit the structure of After case, we have 9 classes, and for classification, we need to create 27 classes if we consider multi-class, multi-label, and h-label. (We may need more classes if we consider train_type). For other architectures, we might want to offer more types. Increasing the number of classes can only take us so far.

Therefore, this is considered to be a detriment to flexibility. In such cases, it might make sense to have one class that can be distinguished by the argument version or name or mode.
However, for certain models, the configuration of the backbone should be different for both neck and head. I think this could be the case with After case, so I think we should use both in some cases. What do you think?

What me and Sungchul and Eunwoo were talking about was what would be appropriate to apply to the entire model class of OTX, and I mentioned the big drawback of After. So I want this to have a flexible design, depending on the model and the situation.

@eunwoosh
Copy link
Contributor

eunwoosh commented Aug 6, 2024

My proposal was to unify all our models and tasks. Here I see that instead of creating separate classes and use simple model = YOLOXTiny(), we will have to create a lot of dicts with parameters and use "if" statements in the base class. Also, API usage will be hurt . If we have all parameters in common, can we just set them as default in backbone and head? It will reduce number of lines and keep the modularity for each model size.

So, @sungchul2 , @harimkang , @eunwoosh , are you proposing to use config based approach for all models rather than creating separate classes for each model size?

It may look dicts parameter and regression, but I think it's different than that. I think it's rather similar to what you said, providing a class w/ default parameter. I think the point is whether API provides interface to set configuration using dict. I think current design just provides interface to set backbone using backbone name. So, user who want to use the model w/ other backbone can do that with single argument. So, in my thought, drawback of 'before' design is that backbone initialization is a little bit hidden because it's conducted in factory function rather than aggravating API. This is my thought and it may be wrong. Please correct me if it is or tell me your opinion :)

@sungchul2
Copy link
Contributor Author

@sungchul2
Copy link
Contributor Author

But I cannot agree with @harimkang comment below, because I think we have to have a unified structure across all tasks, not having a flexible structure depending on tasks.

However, for certain models, the configuration of the backbone should be different for both neck and head. I think this could be the case with After case, so I think we should use both in some cases. What do you think?

If there is any one model that is difficult to handle with a parameter container, we have no choice but to choose @kprokofi's opinion.

@harimkang
Copy link
Contributor

I agree with both directions, and it's up to us to decide.

Using a parameter container like dict is similar with mmlab direction. https://github.com/open-mmlab/mmpretrain/blob/17a886cb5825cd8c26df4e65f7112d404b99fe12/mmpretrain/models/backbones/vision_transformer.py#L173-L243
And separating classes that can have different parameters or modules is similar with torchvision and timm.

With these examples Sungchul gave, it seems to me that the way mmlab or another library does it is the right way to go.
What I would also like to say is that if we think about the classification task (with three subtasks), it would be very inefficient and redundant to create a version of each and every model as a Class. That's why I'm against introducing the After method to classification.

@sungchul2
Copy link
Contributor Author

I have only one question regarding postprocessing and other methods related to NN model. You put postprocessing in separate file and added methods related to NN model like freeze_network(), _embed_point(), etc to SAM class which inherited from OTXModel (as main parent). Isn't it better to keep postprocessing in the core NN model? (SegmentAnything). We could provide "def postprocess()" function and also I would consider to move some of the methods from SAM() class to SegmentAnything() which are related to base NN model. That's why I proposed 2 levels for NN models in each task. First level common base functionality like forward, init. Second level it is specific model implementation like SegmentAnything with specific for SegmentAnything methods like freeze_network(), _embed_point(), etc. OTX models (SAM here) it is a wrapper that connects SegmentAnything with OTXModel and in my opinion it should include only OTXModel related stuff

Yes, I'm considering to move postprocess_mask into SegmentAnything because it is one of the essential functionalities for SegmentAnything. But I still think that postprocess cannot be a common function for other NNModels because classification models don't need postprocessing and detection models need postprocessing for both loss calculation and getting predictions. (Sorry if I missed for this) If NNModels for each task have independent functionalities except for init and forward, depending on tasks, it doesn't seem to matter whether it has postprocess or not. But I'm not sure if it is fine to have different method structures across tasks. (Of course, it's a minor issue :D)

And, I think it is very important to distinguish functionalities between NNModel's and OTXModel's. In this example, postprocess can be NNModel's, but freeze_network or _embed_point should be OTXModel's because the former is for light training and the latter is for exportation. And light training and exportation are otx's functionalities, not NNModel's responsibility. I think this implementation is correct one following your opinion except for postprocess. What do you think?

@kprokofi For this case about moving postprocess into SegmentAnything, circular import occurred and I think it seems better to keep this version for now.

@kprokofi
Copy link
Contributor

kprokofi commented Aug 8, 2024

@sungchul2 Can we add postprocess as method for SegmentAnything model? Why circular import occurred? "get_prepadded_size" can be auxiliary function inside postprocess method.

@sungchul2
Copy link
Contributor Author

@sungchul2 Can we add postprocess as method for SegmentAnything model? Why circular import occurred? "get_prepadded_size" can be auxiliary function inside postprocess method.

Both SegmentAnything and SAMCrietrion need postprocess but SegmentAnything also needs SAMCriterion.
If moving postprocess into SegmentAnything, SAMCrietrion must call SegmentAnything and vice versa.
This process causes circular imports.

harimkang
harimkang previously approved these changes Aug 16, 2024
@github-actions github-actions bot added DEPENDENCY Any changes in any dependencies (new dep or its version) should be produced via Change Request on PM BUILD labels Aug 16, 2024
Copy link

codecov bot commented Aug 16, 2024

Codecov Report

Attention: Patch coverage is 82.45192% with 73 lines in your changes missing coverage. Please review.

Project coverage is 80.48%. Comparing base (7b86e2d) to head (0727c8b).
Report is 3 commits behind head on develop.

Files Patch % Lines
src/otx/core/model/visual_prompting.py 26.47% 50 Missing ⚠️
src/otx/algo/visual_prompting/sam.py 92.74% 14 Missing ⚠️
...ual_prompting/visual_prompters/segment_anything.py 89.77% 9 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #3789      +/-   ##
===========================================
- Coverage    80.63%   80.48%   -0.15%     
===========================================
  Files          272      274       +2     
  Lines        27507    27541      +34     
===========================================
- Hits         22180    22167      -13     
- Misses        5327     5374      +47     
Flag Coverage Δ
py310 80.46% <82.45%> (+0.04%) ⬆️
py311 80.48% <82.45%> (-0.15%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@sungchul2 sungchul2 added this pull request to the merge queue Aug 16, 2024
Merged via the queue into openvinotoolkit:develop with commit d77423e Aug 16, 2024
19 of 20 checks passed
@sungchul2 sungchul2 deleted the refactoring-visual-prompting branch August 16, 2024 09:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BUILD DEPENDENCY Any changes in any dependencies (new dep or its version) should be produced via Change Request on PM TEST Any changes in tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants