Refactoring visual prompting #3789

sungchul2 · 2024-08-05T06:45:35Z

Summary

This PR includes:

Refine directory structure and merge fine-tuning and zero-shot into a single file

src/otx/algo/visual_prompting/
├── backbones
│   ├── __init__.py
│   ├── tiny_vit.py
│   └── vit.py
├── decoders
│   ├── __init__.py
│   └── sam_mask_decoder.py
├── encoders
│   ├── __init__.py
│   ├── sam_image_encoder.py
│   └── sam_prompt_encoder.py
├── __init__.py
├── losses **
│   ├── __init__.py
│   └── sam_loss.py
├── sam.py **
├── utils
│   ├── __init__.py
│   ├── layer_norm_2d.py
│   ├── mlp_block.py
│   └── postprocess.py
└── visual_prompters **
    ├── __init__.py
    └── segment_anything.py **

Rename OTXSegmentAnything to SAM and move common modules to OTXVisualPromptingModel

before

after

class OTXSegmentAnything(OTXVisualPromptingModel): ...
class OTXZeroShotSegmentAnything(OTXZeroShotVisualPromptingModel): ...

class SAM(OTXVisualPromptingModel): ...
class ZeroShotSAM(OTXZeroShotVisualPromptingModel): ...

Move functions related to OTX functionalities to OTXModel

before

after

class SegmentAnything(nn.Module):
    ...
    def __init__(...) -> None:
    def forward(self) -> Any: ...
    def forward_train(self) -> Tensor | tuple[list[Tensor], list[Tensor]]: ...

    # -> SAM
    def freeze_networks(self) -> None: ...
    def load_checkpoint(self) -> None: ...
    def forward_inference(self) -> tuple[Tensor, ...]: ...
    def _embed_points(self) -> Tensor: ...
    def _embed_masks(self) -> Tensor: ...
    def calculate_stability_score(self) -> Tensor: ...
    def select_masks(self) -> tuple[Tensor, Tensor]: ...

    # -> src/otx/algo/visual_prompting/losses/sam_loss.py
    def calculate_dice_loss(self) -> Tensor: ...
    def calculate_sigmoid_ce_focal_loss(self) -> Tensor: ...
    def calculate_iou(self) -> Tensor: ...

    # -> src/otx/algo/visual_prompting/utils/postprocess.py
    def postprocess_masks(self) -> Tensor: ...
    def get_prepadded_size(self) -> Tensor: ...

# src/otx/algo/visual_prompting/segment_anything.py
class SegmentAnything(nn.Module):
    ...
    def __init__(...) -> None:
    def forward(self) -> Any: ...

class SAM(OTXVisualPromptingModel):
    ...
    def __init__(self) -> None: ...
    def freeze_networks(self) -> None: ...
    def load_checkpoint(self) -> None: ...
    def forward_for_tracing(self) -> tuple[Tensor, ...]: ...
    def _embed_points(self) -> Tensor: ...
    def _embed_masks(self) -> Tensor: ...
    def calculate_stability_score(self) -> Tensor: ...
    def select_masks(self) -> tuple[Tensor, Tensor]: ...

# src/otx/algo/visual_prompting/losses/sam_loss.py
class SAMCriterion(nn.Module):
    ...
    def __init__(self) -> None: ...
    def calculate_dice_loss(self) -> Tensor: ...
    def calculate_sigmoid_ce_focal_loss(self) -> Tensor: ...
    def calculate_iou(self) -> Tensor: ...

# src/otx/algo/visual_prompting/utils/postprocess.py
def postprocess_masks(...) -> Tensor:
def get_prepadded_size(self) -> Tensor: ...

Enable for NNModel to get loss module

class SAM(OTXVisualPromptingModel):
    ...
    def _build_model(self) -> nn.Module:
        image_encoder = SAMImageEncoder(...)
        prompt_encoder = SAMPromptEncoder(...)
        mask_decoder = SAMMaskDecoder(...)
        criterion = SAMCriterion(image_size=self.image_size)
        return SegmentAnything(
            image_encoder=image_encoder,
            prompt_encoder=prompt_encoder,
            mask_decoder=mask_decoder,
            criterion=criterion,
            ...
        )

How to test

Checklist

I have added unit tests to cover my changes.
I have added integration tests to cover my changes.
I have ran e2e tests and there is no issues.
I have added the description of my changes into CHANGELOG in my target branch (e.g., CHANGELOG in develop).
I have updated the documentation in my target branch accordingly (e.g., documentation in develop).
I have linked related issues.

License

I submit my code changes under the same Apache License that covers the project.
Feel free to contact the maintainers if that's a concern.
I have updated the license header for each file (see an example below).

# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

…cify backbones

eunwoosh

Thanks for your work :) I think code becomes much clear. I left minor comments. Please take a look.

src/otx/core/model/visual_prompting.py

src/otx/algo/visual_prompting/segment_anything.py

sungchul2 · 2024-08-06T07:51:35Z

@kprokofi I discussed our current design with @eunwoosh and @harimkang, and they suggested using only one main model (YOLOX) instead of separating specific models (YOLOX-TINY, YOLOX-S, ...) because there are only a few different parameters in the same model family and we can just change them.
For example, in below figure, deepen_factor and widen_factor in backbone, in_channels and out_channels in neck, and in_channels and feat_channels in bbox_head are only scalable.

We can change those values using a parameter container like below:

kprokofi · 2024-08-06T08:51:12Z

My proposal was to unify all our models and tasks. Here I see that instead of creating separate classes and use simple
model = YOLOXTiny(), we will have to create a lot of dicts with parameters and use "if" statements in the base class. Also, API usage will be hurt .
If we have all parameters in common, can we just set them as default in backbone and head? It will reduce number of lines and keep the modularity for each model size.

So, @sungchul2 , @harimkang , @eunwoosh , are you proposing to use config based approach for all models rather than creating separate classes for each model size?

kprokofi · 2024-08-06T08:55:51Z

I personally like the way you refactored VPT task (from the whole structure) and vote to stick such structure for other tasks as well. @eunwoosh @sungchul2 @harimkang, what concerns do you have regarding this example?

kprokofi · 2024-08-06T09:09:20Z

I have only one question regarding postprocessing and other methods related to NN model. You put postprocessing in separate file and added methods related to NN model like freeze_network(), _embed_point(), etc to SAM class which inherited from OTXModel (as main parent).
Isn't it better to keep postprocessing in the core NN model? (SegmentAnything). We could provide "def postprocess()" function and also I would consider to move some of the methods from SAM() class to SegmentAnything() which are related to base NN model.
That's why I proposed 2 levels for NN models in each task. First level common base functionality like forward, init. Second level it is specific model implementation like SegmentAnything with specific for SegmentAnything methods like freeze_network(), _embed_point(), etc.
OTX models (SAM here) it is a wrapper that connects SegmentAnything with OTXModel and in my opinion it should include only OTXModel related stuff

sungchul2 · 2024-08-06T09:11:17Z

I personally like the way you refactored VPT task (from the whole structure) and vote to stick such structure for other tasks as well. @eunwoosh @sungchul2 @harimkang, what concerns do you have regarding this example?

I think both directions have pros and cons respectively.
I'd like to have a discussion to make our final decision.

sungchul2 · 2024-08-06T09:22:58Z

I have only one question regarding postprocessing and other methods related to NN model. You put postprocessing in separate file and added methods related to NN model like freeze_network(), _embed_point(), etc to SAM class which inherited from OTXModel (as main parent). Isn't it better to keep postprocessing in the core NN model? (SegmentAnything). We could provide "def postprocess()" function and also I would consider to move some of the methods from SAM() class to SegmentAnything() which are related to base NN model. That's why I proposed 2 levels for NN models in each task. First level common base functionality like forward, init. Second level it is specific model implementation like SegmentAnything with specific for SegmentAnything methods like freeze_network(), _embed_point(), etc. OTX models (SAM here) it is a wrapper that connects SegmentAnything with OTXModel and in my opinion it should include only OTXModel related stuff

Yes, I'm considering to move postprocess_mask into SegmentAnything because it is one of the essential functionalities for SegmentAnything.
But I still think that postprocess cannot be a common function for other NNModels because classification models don't need postprocessing and detection models need postprocessing for both loss calculation and getting predictions.
(Sorry if I missed for this) If NNModels for each task have independent functionalities except for init and forward, depending on tasks, it doesn't seem to matter whether it has postprocess or not.
But I'm not sure if it is fine to have different method structures across tasks. (Of course, it's a minor issue :D)

And, I think it is very important to distinguish functionalities between NNModel's and OTXModel's.
In this example, postprocess can be NNModel's, but freeze_network or _embed_point should be OTXModel's because the former is for light training and the latter is for exportation.
And light training and exportation are otx's functionalities, not NNModel's responsibility.
I think this implementation is correct one following your opinion except for postprocess.
What do you think?

Co-authored-by: Eunwoo Shin <[email protected]>

harimkang · 2024-08-06T10:56:10Z

I personally like the way you refactored VPT task (from the whole structure) and vote to stick such structure for other tasks as well. @eunwoosh @sungchul2 @harimkang, what concerns do you have regarding this example?

@kprokofi I actually prefer the before method. This is because even one model can have many versions depending on the parameter configuration. For example, we can look at EfficientNet.

EfficientNet is available in 9 different versions, depending on param configs. If we separate these to fit the structure of After case, we have 9 classes, and for classification, we need to create 27 classes if we consider multi-class, multi-label, and h-label. (We may need more classes if we consider train_type). For other architectures, we might want to offer more types. Increasing the number of classes can only take us so far.

Therefore, this is considered to be a detriment to flexibility. In such cases, it might make sense to have one class that can be distinguished by the argument version or name or mode.
However, for certain models, the configuration of the backbone should be different for both neck and head. I think this could be the case with After case, so I think we should use both in some cases. What do you think?

What me and Sungchul and Eunwoo were talking about was what would be appropriate to apply to the entire model class of OTX, and I mentioned the big drawback of After. So I want this to have a flexible design, depending on the model and the situation.

eunwoosh · 2024-08-06T11:55:46Z

My proposal was to unify all our models and tasks. Here I see that instead of creating separate classes and use simple model = YOLOXTiny(), we will have to create a lot of dicts with parameters and use "if" statements in the base class. Also, API usage will be hurt . If we have all parameters in common, can we just set them as default in backbone and head? It will reduce number of lines and keep the modularity for each model size.

So, @sungchul2 , @harimkang , @eunwoosh , are you proposing to use config based approach for all models rather than creating separate classes for each model size?

It may look dicts parameter and regression, but I think it's different than that. I think it's rather similar to what you said, providing a class w/ default parameter. I think the point is whether API provides interface to set configuration using dict. I think current design just provides interface to set backbone using backbone name. So, user who want to use the model w/ other backbone can do that with single argument. So, in my thought, drawback of 'before' design is that backbone initialization is a little bit hidden because it's conducted in factory function rather than aggravating API. This is my thought and it may be wrong. Please correct me if it is or tell me your opinion :)

sungchul2 · 2024-08-07T00:41:25Z

I agree with both directions, and it's up to us to decide.

Using a parameter container like dict is similar with mmlab direction.
https://github.com/open-mmlab/mmpretrain/blob/17a886cb5825cd8c26df4e65f7112d404b99fe12/mmpretrain/models/backbones/vision_transformer.py#L173-L243

And separating classes that can have different parameters or modules is similar with torchvision and timm.
https://github.com/pytorch/vision/blob/5242d6ac27c5eae7b74b65f28bd1373de955327e/torchvision/models/vision_transformer.py#L619-L786

https://github.com/huggingface/pytorch-image-models/blob/10344625bea17750fb116acfde11100e96e70879/timm/models/vision_transformer.py#L2044-L3165

sungchul2 · 2024-08-07T00:48:54Z

But I cannot agree with @harimkang comment below, because I think we have to have a unified structure across all tasks, not having a flexible structure depending on tasks.

However, for certain models, the configuration of the backbone should be different for both neck and head. I think this could be the case with After case, so I think we should use both in some cases. What do you think?

If there is any one model that is difficult to handle with a parameter container, we have no choice but to choose @kprokofi's opinion.

harimkang · 2024-08-07T01:11:22Z

I agree with both directions, and it's up to us to decide.

Using a parameter container like dict is similar with mmlab direction. https://github.com/open-mmlab/mmpretrain/blob/17a886cb5825cd8c26df4e65f7112d404b99fe12/mmpretrain/models/backbones/vision_transformer.py#L173-L243
And separating classes that can have different parameters or modules is similar with torchvision and timm.

With these examples Sungchul gave, it seems to me that the way mmlab or another library does it is the right way to go.
What I would also like to say is that if we think about the classification task (with three subtasks), it would be very inefficient and redundant to create a version of each and every model as a Class. That's why I'm against introducing the After method to classification.

sungchul2 · 2024-08-08T07:42:44Z

I have only one question regarding postprocessing and other methods related to NN model. You put postprocessing in separate file and added methods related to NN model like freeze_network(), _embed_point(), etc to SAM class which inherited from OTXModel (as main parent). Isn't it better to keep postprocessing in the core NN model? (SegmentAnything). We could provide "def postprocess()" function and also I would consider to move some of the methods from SAM() class to SegmentAnything() which are related to base NN model. That's why I proposed 2 levels for NN models in each task. First level common base functionality like forward, init. Second level it is specific model implementation like SegmentAnything with specific for SegmentAnything methods like freeze_network(), _embed_point(), etc. OTX models (SAM here) it is a wrapper that connects SegmentAnything with OTXModel and in my opinion it should include only OTXModel related stuff

Yes, I'm considering to move postprocess_mask into SegmentAnything because it is one of the essential functionalities for SegmentAnything. But I still think that postprocess cannot be a common function for other NNModels because classification models don't need postprocessing and detection models need postprocessing for both loss calculation and getting predictions. (Sorry if I missed for this) If NNModels for each task have independent functionalities except for init and forward, depending on tasks, it doesn't seem to matter whether it has postprocess or not. But I'm not sure if it is fine to have different method structures across tasks. (Of course, it's a minor issue :D)

And, I think it is very important to distinguish functionalities between NNModel's and OTXModel's. In this example, postprocess can be NNModel's, but freeze_network or _embed_point should be OTXModel's because the former is for light training and the latter is for exportation. And light training and exportation are otx's functionalities, not NNModel's responsibility. I think this implementation is correct one following your opinion except for postprocess. What do you think?

@kprokofi For this case about moving postprocess into SegmentAnything, circular import occurred and I think it seems better to keep this version for now.

…gchul2/training_extensions into refactoring-visual-prompting

src/otx/algo/visual_prompting/segment_anything.py

src/otx/algo/visual_prompting/encoders/sam_image_encoder.py

src/otx/algo/visual_prompting/segment_anything.py

kprokofi · 2024-08-08T14:02:45Z

@sungchul2 Can we add postprocess as method for SegmentAnything model? Why circular import occurred? "get_prepadded_size" can be auxiliary function inside postprocess method.

sungchul2 · 2024-08-09T03:40:54Z

@sungchul2 Can we add postprocess as method for SegmentAnything model? Why circular import occurred? "get_prepadded_size" can be auxiliary function inside postprocess method.

Both SegmentAnything and SAMCrietrion need postprocess but SegmentAnything also needs SAMCriterion.
If moving postprocess into SegmentAnything, SAMCrietrion must call SegmentAnything and vice versa.
This process causes circular imports.

codecov · 2024-08-16T01:47:56Z

Codecov Report

Attention: Patch coverage is 82.45192% with 73 lines in your changes missing coverage. Please review.

Project coverage is 80.48%. Comparing base (7b86e2d) to head (0727c8b).
Report is 3 commits behind head on develop.

Files	Patch %	Lines
src/otx/core/model/visual_prompting.py	26.47%	50 Missing ⚠️
src/otx/algo/visual_prompting/sam.py	92.74%	14 Missing ⚠️
...ual_prompting/visual_prompters/segment_anything.py	89.77%	9 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #3789      +/-   ##
===========================================
- Coverage    80.63%   80.48%   -0.15%     
===========================================
  Files          272      274       +2     
  Lines        27507    27541      +34     
===========================================
- Hits         22180    22167      -13     
- Misses        5327     5374      +47

Flag	Coverage Δ
py310	`80.46% <82.45%> (+0.04%)`	⬆️
py311	`80.48% <82.45%> (-0.15%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

github-actions bot added the OTX 2.0 label Aug 5, 2024

sungchul2 added 2 commits August 5, 2024 16:49

Divide OTXSegmentAnything into SAMTinyViT and SAMViTBase to spe…

58dfcdb

…cify backbones

Move load_checkpoint and freeze_networks to OTXModel

b614283

sungchul2 force-pushed the refactoring-visual-prompting branch from 4b58132 to 9f35628 Compare August 5, 2024 08:27

Move export related functions to OTXModel

9f35628

sungchul2 force-pushed the refactoring-visual-prompting branch from 5ffc596 to c2f5e64 Compare August 6, 2024 02:24

Create loss module

c2f5e64

eunwoosh reviewed Aug 6, 2024

View reviewed changes

src/otx/core/model/visual_prompting.py Outdated Show resolved Hide resolved

src/otx/algo/visual_prompting/segment_anything.py Outdated Show resolved Hide resolved

Fix an unnecessary condition statement

f17b0a0

Co-authored-by: Eunwoo Shin <[email protected]>

sungchul2 added 3 commits August 8, 2024 17:06

Revert to use a super model instead of sub models

e7a692c

Merge branch 'refactoring-visual-prompting' of https://github.com/sun…

084d0c5

…gchul2/training_extensions into refactoring-visual-prompting

Keep essential parameters only

9626660

kprokofi reviewed Aug 8, 2024

View reviewed changes

sungchul2 added 3 commits August 13, 2024 23:26

Reduce redundant parameters

1a46d26

Rename backbone to backbone_type

42c2f5a

Set load_from dict as class var

5dabaeb

sungchul2 requested review from harimkang, eugene123tw, chuneuny-emily, sovrasov, GalyaZalesskaya, negvet, goodsong81, yunchu and wonjuleee as code owners August 15, 2024 14:51

sungchul2 added 4 commits August 15, 2024 23:52

Move zsl to sam

0689c1a

Merge branch 'develop' into refactoring-visual-prompting

34ce2ec

precommit

dcd51ad

Fix unit test

14227c1

sungchul2 requested review from kprokofi and eunwoosh August 15, 2024 15:16

sungchul2 enabled auto-merge August 15, 2024 15:21

Fix missed part

ffa821d

harimkang previously approved these changes Aug 16, 2024

View reviewed changes

sungchul2 dismissed harimkang’s stale review via 0727c8b August 16, 2024 01:28

github-actions bot added DEPENDENCY Any changes in any dependencies (new dep or its version) should be produced via Change Request on PM BUILD labels Aug 16, 2024

Update pytorchcv version

0727c8b

sungchul2 requested a review from harimkang August 16, 2024 02:39

eunwoosh approved these changes Aug 16, 2024

View reviewed changes

harimkang approved these changes Aug 16, 2024

View reviewed changes

sungchul2 added this pull request to the merge queue Aug 16, 2024

Merged via the queue into openvinotoolkit:develop with commit d77423e Aug 16, 2024
19 of 20 checks passed

sungchul2 deleted the refactoring-visual-prompting branch August 16, 2024 09:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactoring visual prompting #3789

Refactoring visual prompting #3789

sungchul2 commented Aug 5, 2024 •

edited

Loading

eunwoosh left a comment

sungchul2 commented Aug 6, 2024 •

edited

Loading

kprokofi commented Aug 6, 2024

kprokofi commented Aug 6, 2024

kprokofi commented Aug 6, 2024

sungchul2 commented Aug 6, 2024

sungchul2 commented Aug 6, 2024 •

edited

Loading

harimkang commented Aug 6, 2024 •

edited

Loading

eunwoosh commented Aug 6, 2024

sungchul2 commented Aug 7, 2024

sungchul2 commented Aug 7, 2024

harimkang commented Aug 7, 2024

sungchul2 commented Aug 8, 2024

kprokofi commented Aug 8, 2024

sungchul2 commented Aug 9, 2024

codecov bot commented Aug 16, 2024 •

edited

Loading

Refactoring visual prompting #3789

Refactoring visual prompting #3789

Conversation

sungchul2 commented Aug 5, 2024 • edited Loading

Summary

How to test

Checklist

License

eunwoosh left a comment

Choose a reason for hiding this comment

sungchul2 commented Aug 6, 2024 • edited Loading

kprokofi commented Aug 6, 2024

kprokofi commented Aug 6, 2024

kprokofi commented Aug 6, 2024

sungchul2 commented Aug 6, 2024

sungchul2 commented Aug 6, 2024 • edited Loading

harimkang commented Aug 6, 2024 • edited Loading

eunwoosh commented Aug 6, 2024

sungchul2 commented Aug 7, 2024

sungchul2 commented Aug 7, 2024

harimkang commented Aug 7, 2024

sungchul2 commented Aug 8, 2024

kprokofi commented Aug 8, 2024

sungchul2 commented Aug 9, 2024

codecov bot commented Aug 16, 2024 • edited Loading

Codecov Report

sungchul2 commented Aug 5, 2024 •

edited

Loading

sungchul2 commented Aug 6, 2024 •

edited

Loading

sungchul2 commented Aug 6, 2024 •

edited

Loading

harimkang commented Aug 6, 2024 •

edited

Loading

codecov bot commented Aug 16, 2024 •

edited

Loading