[Model] Added support for interactive video object tracking by SAM2 (#…

…602)
CVHub520 · Sep 3, 2024 · 0437e39 · 0437e39
1 parent 70345c6
commit 0437e39
Show file tree

Hide file tree

Showing 16 changed files with 689 additions and 31 deletions.
diff --git a/README.md b/README.md
@@ -33,18 +33,20 @@
 
 ## 🥳 What's New
 
-- Aug. 2024:
-  - 🤗 Release the latest version [2.4.1](https://github.com/CVHub520/X-AnyLabeling/releases/tag/v2.4.1) 🤗
-  - 🔥🔥🔥 Support [tracking-by-det/obb/seg/pose](./examples/multiple_object_tracking/README.md) tasks.
-  - ✨✨✨ Support [Segment-Anything-2](https://github.com/facebookresearch/segment-anything-2) model! (Recommended)
-  - 👏👏👏 Support [Grounding-SAM2](./docs/en/model_zoo.md) model.
-  - Support lightweight model for Japanese recognition.
+- Sep. 2024:
+  - 🔥🔥🔥 Added support for interactive video object tracking based on [Segment-Anything-2](https://github.com/CVHub520/segment-anything-2). [[Tutorial](examples/interactive_video_object_segmentation/README.md)]
 
 <br>
 
 <details> 
 <summary>Click to view more news.</summary>
 
+- Aug. 2024:
+  - Release version [2.4.1](https://github.com/CVHub520/X-AnyLabeling/releases/tag/v2.4.1)
+  - Support [tracking-by-det/obb/seg/pose](./examples/multiple_object_tracking/README.md) tasks.
+  - Support [Segment-Anything-2](https://github.com/facebookresearch/segment-anything-2) model! (Recommended)
+  - Support [Grounding-SAM2](./docs/en/model_zoo.md) model.
+  - Support lightweight model for Japanese recognition.
 - Jul. 2024:
   - Add PPOCR-Recognition and KIE import/export functionality for training PP-OCR task.
   - Add ODVG import/export functionality for training grounding task.

diff --git a/README_zh-CN.md b/README_zh-CN.md
@@ -32,18 +32,20 @@
 
 ## 🥳 新功能
 
-- 2024年8月:
-  - 🤗 发布[X-AnyLabeling v2.4.1](https://github.com/CVHub520/X-AnyLabeling/releases/tag/v2.4.1)最新版本 🤗
-  - 🔥🔥🔥 支持[tracking-by-det/obb/seg/pose](./examples/multiple_object_tracking/README.md)任务。
-  - ✨✨✨ 支持[Segment-Anything-2](https://github.com/facebookresearch/segment-anything-2)模型。
-  - 👏👏👏 支持[Grounding-SAM2](./docs/zh_cn/model_zoo.md)模型。
-  - 支持[日文字符识别](./anylabeling/configs/auto_labeling/japan_ppocr.yaml)模型。
+- 2024年9月:
+  - 🔥🔥🔥 支持基于[Segment-Anything-2](https://github.com/CVHub520/segment-anything-2)交互式视频目标追踪功能。【[教程](examples/interactive_video_object_segmentation/README.md)】
 
 <br>
 
 <details> 
 <summary>点击查看历史更新。</summary>
 
+- 2024年8月:
+  - 发布[X-AnyLabeling v2.4.1](https://github.com/CVHub520/X-AnyLabeling/releases/tag/v2.4.1)版本。
+  - 支持[tracking-by-det/obb/seg/pose](./examples/multiple_object_tracking/README.md)任务。
+  - 支持[Segment-Anything-2](https://github.com/facebookresearch/segment-anything-2)模型。
+  - 支持[Grounding-SAM2](./docs/zh_cn/model_zoo.md)模型。
+  - 支持[日文字符识别](./anylabeling/configs/auto_labeling/japan_ppocr.yaml)模型。
 - 2024年7月:
   - 新增 PPOCR 识别和关键信息提取标签导入/导出功能。
   - 新增 ODVG 标签导入/导出功能，以支持 Grounding 模型训练。

diff --git a/anylabeling/configs/auto_labeling/models.yaml b/anylabeling/configs/auto_labeling/models.yaml
@@ -1,5 +1,7 @@
 - model_name: "sam2_hiera_base-r20240801"
   config_file: ":/sam2_hiera_base.yaml"
+- model_name: "sam2_hiera_large_video-r20240901"
+  config_file: ":/sam2_hiera_large_video.yaml"
 - model_name: "yolov5s-r20230520"
   config_file: ":/yolov5s.yaml"
 - model_name: "yolov5_car_plate-r20230112"
@@ -120,6 +122,12 @@
   config_file: ":/sam2_hiera_small.yaml"
 - model_name: "sam2_hiera_tiny-r20240801"
   config_file: ":/sam2_hiera_tiny.yaml"
+- model_name: "sam2_hiera_base_video-r20240901"
+  config_file: ":/sam2_hiera_base_video.yaml"
+- model_name: "sam2_hiera_small_video-r20240901"
+  config_file: ":/sam2_hiera_small_video.yaml"
+- model_name: "sam2_hiera_tiny_video-r20240901"
+  config_file: ":/sam2_hiera_tiny_video.yaml"
 - model_name: "sam-hq_vit_b-r20231111"
   config_file: ":/sam_hq_vit_b.yaml"
 - model_name: "sam-hq_vit_h_quant-r20231111"

diff --git a/anylabeling/configs/auto_labeling/sam2_hiera_base_video.yaml b/anylabeling/configs/auto_labeling/sam2_hiera_base_video.yaml
@@ -0,0 +1,5 @@
+type: segment_anything_2_video
+name: sam2_hiera_base_video-r20240901
+display_name: Segment Anything 2 Video (Base)
+model_cfg: sam2_hiera_b+.yaml
+model_path: https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_base_plus.pt
diff --git a/anylabeling/configs/auto_labeling/sam2_hiera_large_video.yaml b/anylabeling/configs/auto_labeling/sam2_hiera_large_video.yaml
@@ -0,0 +1,5 @@
+type: segment_anything_2_video
+name: sam2_hiera_large_video-r20240901
+display_name: Segment Anything 2 Video (Large)
+model_cfg: sam2_hiera_l.yaml
+model_path: https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_large.pt
diff --git a/anylabeling/configs/auto_labeling/sam2_hiera_small_video.yaml b/anylabeling/configs/auto_labeling/sam2_hiera_small_video.yaml
@@ -0,0 +1,5 @@
+type: segment_anything_2_video
+name: sam2_hiera_small_video-r20240901
+display_name: Segment Anything 2 Video (Small)
+model_cfg: sam2_hiera_s.yaml
+model_path: https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_small.pt
diff --git a/anylabeling/configs/auto_labeling/sam2_hiera_tiny_video.yaml b/anylabeling/configs/auto_labeling/sam2_hiera_tiny_video.yaml
@@ -0,0 +1,5 @@
+type: segment_anything_2_video
+name: sam2_hiera_tiny_video-r20240901
+display_name: Segment Anything 2 Video (Tiny)
+model_cfg: sam2_hiera_t.yaml
+model_path: https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_tiny.pt
diff --git a/anylabeling/services/auto_labeling/model_manager.py b/anylabeling/services/auto_labeling/model_manager.py
@@ -21,6 +21,7 @@ class ModelManager(QObject):
     CUSTOM_MODELS = [
         "segment_anything",
         "segment_anything_2",
+        "segment_anything_2_video"
         "sam_med2d",
         "sam_hq",
         "yolov5",
@@ -967,6 +968,29 @@ def _load_model(self, model_id):
                 return
             # Request next files for prediction
             self.request_next_files_requested.emit()
+        elif model_config["type"] == "segment_anything_2_video":
+            try:
+                from .segment_anything_2_video import SegmentAnything2Video
+                model_config["model"] = SegmentAnything2Video(
+                    model_config, on_message=self.new_model_status.emit
+                )
+                self.auto_segmentation_model_selected.emit()
+            except Exception as e:  # noqa
+                print(
+                    "Error in loading model: {error_message}".format(
+                        error_message=str(e)
+                    )
+                )
+                self.new_model_status.emit(
+                    self.tr(
+                        "Error in loading model: {error_message}".format(
+                            error_message=str(e)
+                        )
+                    )
+                )
+                return
+            # Request next files for prediction
+            self.request_next_files_requested.emit()
         elif model_config["type"] == "efficientvit_sam":
             from .efficientvit_sam import EfficientViT_SAM
 
@@ -1472,6 +1496,7 @@ def set_auto_labeling_marks(self, marks):
         marks_model_list = [
             "segment_anything",
             "segment_anything_2",
+            "segment_anything_2_video",
             "sam_med2d",
             "sam_hq",
             "yolov5_sam",
@@ -1498,6 +1523,7 @@ def set_auto_labeling_reset_tracker(self):
             "yolov8_obb_track",
             "yolov8_seg_track",
             "yolov8_pose_track",
+            "segment_anything_2_video",
         ]
         if (
             self.loaded_model_config is None
@@ -1606,13 +1632,23 @@ def set_auto_labeling_preserve_existing_annotations_state(self, state):
                 "model"
             ].set_auto_labeling_preserve_existing_annotations_state(state)
 
+    def set_auto_labeling_prompt(self):
+        model_list = ['segment_anything_2_video']
+        if (
+            self.loaded_model_config is not None
+            and self.loaded_model_config["type"] in model_list
+        ):
+            self.loaded_model_config[
+                "model"
+            ].set_auto_labeling_prompt()
+
     def unload_model(self):
         """Unload model"""
         if self.loaded_model_config is not None:
             self.loaded_model_config["model"].unload()
             self.loaded_model_config = None
 
-    def predict_shapes(self, image, filename=None, text_prompt=None):
+    def predict_shapes(self, image, filename=None, text_prompt=None, run_tracker=False):
         """Predict shapes.
         NOTE: This function is blocking. The model can take a long time to
         predict. So it is recommended to use predict_shapes_threading instead.
@@ -1624,14 +1660,18 @@ def predict_shapes(self, image, filename=None, text_prompt=None):
             self.prediction_finished.emit()
             return
         try:
-            if text_prompt is None:
+            if text_prompt is not None:
                 auto_labeling_result = self.loaded_model_config[
                     "model"
-                ].predict_shapes(image, filename)
+                ].predict_shapes(image, filename, text_prompt=text_prompt)
+            elif run_tracker is True:
+                auto_labeling_result = self.loaded_model_config[
+                    "model"
+                ].predict_shapes(image, filename, run_tracker=run_tracker)
             else:
                 auto_labeling_result = self.loaded_model_config[
                     "model"
-                ].predict_shapes(image, filename, text_prompt)
+                ].predict_shapes(image, filename)
             self.new_auto_labeling_result.emit(auto_labeling_result)
             self.new_model_status.emit(
                 self.tr("Finished inferencing AI model. Check the result.")
@@ -1646,7 +1686,7 @@ def predict_shapes(self, image, filename=None, text_prompt=None):
         self.prediction_finished.emit()
 
     @pyqtSlot()
-    def predict_shapes_threading(self, image, filename=None, text_prompt=None):
+    def predict_shapes_threading(self, image, filename=None, text_prompt=None, run_tracker=False):
         """Predict shapes.
         This function starts a thread to run the prediction.
         """
@@ -1675,13 +1715,17 @@ def predict_shapes_threading(self, image, filename=None, text_prompt=None):
                 return
 
             self.model_execution_thread = QThread()
-            if text_prompt is None:
+            if text_prompt is not None:
                 self.model_execution_worker = GenericWorker(
-                    self.predict_shapes, image, filename
+                    self.predict_shapes, image, filename, text_prompt=text_prompt
+                )
+            elif run_tracker is True:
+                self.model_execution_worker = GenericWorker(
+                    self.predict_shapes, image, filename, run_tracker=run_tracker
                 )
             else:
                 self.model_execution_worker = GenericWorker(
-                    self.predict_shapes, image, filename, text_prompt
+                    self.predict_shapes, image, filename
                 )
             self.model_execution_worker.finished.connect(
                 self.model_execution_thread.quit