Skip to content

Commit

Permalink
[Model] Added support for interactive video object tracking by SAM2 (#…
Browse files Browse the repository at this point in the history
  • Loading branch information
CVHub520 committed Sep 3, 2024
1 parent 70345c6 commit 0437e39
Show file tree
Hide file tree
Showing 16 changed files with 689 additions and 31 deletions.
14 changes: 8 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,18 +33,20 @@

## 🥳 What's New

- Aug. 2024:
- 🤗 Release the latest version [2.4.1](https://github.com/CVHub520/X-AnyLabeling/releases/tag/v2.4.1) 🤗
- 🔥🔥🔥 Support [tracking-by-det/obb/seg/pose](./examples/multiple_object_tracking/README.md) tasks.
- ✨✨✨ Support [Segment-Anything-2](https://github.com/facebookresearch/segment-anything-2) model! (Recommended)
- 👏👏👏 Support [Grounding-SAM2](./docs/en/model_zoo.md) model.
- Support lightweight model for Japanese recognition.
- Sep. 2024:
- 🔥🔥🔥 Added support for interactive video object tracking based on [Segment-Anything-2](https://github.com/CVHub520/segment-anything-2). [[Tutorial](examples/interactive_video_object_segmentation/README.md)]

<br>

<details>
<summary>Click to view more news.</summary>

- Aug. 2024:
- Release version [2.4.1](https://github.com/CVHub520/X-AnyLabeling/releases/tag/v2.4.1)
- Support [tracking-by-det/obb/seg/pose](./examples/multiple_object_tracking/README.md) tasks.
- Support [Segment-Anything-2](https://github.com/facebookresearch/segment-anything-2) model! (Recommended)
- Support [Grounding-SAM2](./docs/en/model_zoo.md) model.
- Support lightweight model for Japanese recognition.
- Jul. 2024:
- Add PPOCR-Recognition and KIE import/export functionality for training PP-OCR task.
- Add ODVG import/export functionality for training grounding task.
Expand Down
14 changes: 8 additions & 6 deletions README_zh-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,18 +32,20 @@

## 🥳 新功能

- 2024年8月:
- 🤗 发布[X-AnyLabeling v2.4.1](https://github.com/CVHub520/X-AnyLabeling/releases/tag/v2.4.1)最新版本 🤗
- 🔥🔥🔥 支持[tracking-by-det/obb/seg/pose](./examples/multiple_object_tracking/README.md)任务。
- ✨✨✨ 支持[Segment-Anything-2](https://github.com/facebookresearch/segment-anything-2)模型。
- 👏👏👏 支持[Grounding-SAM2](./docs/zh_cn/model_zoo.md)模型。
- 支持[日文字符识别](./anylabeling/configs/auto_labeling/japan_ppocr.yaml)模型。
- 2024年9月:
- 🔥🔥🔥 支持基于[Segment-Anything-2](https://github.com/CVHub520/segment-anything-2)交互式视频目标追踪功能。【[教程](examples/interactive_video_object_segmentation/README.md)

<br>

<details>
<summary>点击查看历史更新。</summary>

- 2024年8月:
- 发布[X-AnyLabeling v2.4.1](https://github.com/CVHub520/X-AnyLabeling/releases/tag/v2.4.1)版本。
- 支持[tracking-by-det/obb/seg/pose](./examples/multiple_object_tracking/README.md)任务。
- 支持[Segment-Anything-2](https://github.com/facebookresearch/segment-anything-2)模型。
- 支持[Grounding-SAM2](./docs/zh_cn/model_zoo.md)模型。
- 支持[日文字符识别](./anylabeling/configs/auto_labeling/japan_ppocr.yaml)模型。
- 2024年7月:
- 新增 PPOCR 识别和关键信息提取标签导入/导出功能。
- 新增 ODVG 标签导入/导出功能,以支持 Grounding 模型训练。
Expand Down
8 changes: 8 additions & 0 deletions anylabeling/configs/auto_labeling/models.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
- model_name: "sam2_hiera_base-r20240801"
config_file: ":/sam2_hiera_base.yaml"
- model_name: "sam2_hiera_large_video-r20240901"
config_file: ":/sam2_hiera_large_video.yaml"
- model_name: "yolov5s-r20230520"
config_file: ":/yolov5s.yaml"
- model_name: "yolov5_car_plate-r20230112"
Expand Down Expand Up @@ -120,6 +122,12 @@
config_file: ":/sam2_hiera_small.yaml"
- model_name: "sam2_hiera_tiny-r20240801"
config_file: ":/sam2_hiera_tiny.yaml"
- model_name: "sam2_hiera_base_video-r20240901"
config_file: ":/sam2_hiera_base_video.yaml"
- model_name: "sam2_hiera_small_video-r20240901"
config_file: ":/sam2_hiera_small_video.yaml"
- model_name: "sam2_hiera_tiny_video-r20240901"
config_file: ":/sam2_hiera_tiny_video.yaml"
- model_name: "sam-hq_vit_b-r20231111"
config_file: ":/sam_hq_vit_b.yaml"
- model_name: "sam-hq_vit_h_quant-r20231111"
Expand Down
5 changes: 5 additions & 0 deletions anylabeling/configs/auto_labeling/sam2_hiera_base_video.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
type: segment_anything_2_video
name: sam2_hiera_base_video-r20240901
display_name: Segment Anything 2 Video (Base)
model_cfg: sam2_hiera_b+.yaml
model_path: https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_base_plus.pt
5 changes: 5 additions & 0 deletions anylabeling/configs/auto_labeling/sam2_hiera_large_video.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
type: segment_anything_2_video
name: sam2_hiera_large_video-r20240901
display_name: Segment Anything 2 Video (Large)
model_cfg: sam2_hiera_l.yaml
model_path: https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_large.pt
5 changes: 5 additions & 0 deletions anylabeling/configs/auto_labeling/sam2_hiera_small_video.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
type: segment_anything_2_video
name: sam2_hiera_small_video-r20240901
display_name: Segment Anything 2 Video (Small)
model_cfg: sam2_hiera_s.yaml
model_path: https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_small.pt
5 changes: 5 additions & 0 deletions anylabeling/configs/auto_labeling/sam2_hiera_tiny_video.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
type: segment_anything_2_video
name: sam2_hiera_tiny_video-r20240901
display_name: Segment Anything 2 Video (Tiny)
model_cfg: sam2_hiera_t.yaml
model_path: https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_tiny.pt
60 changes: 52 additions & 8 deletions anylabeling/services/auto_labeling/model_manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ class ModelManager(QObject):
CUSTOM_MODELS = [
"segment_anything",
"segment_anything_2",
"segment_anything_2_video"
"sam_med2d",
"sam_hq",
"yolov5",
Expand Down Expand Up @@ -967,6 +968,29 @@ def _load_model(self, model_id):
return
# Request next files for prediction
self.request_next_files_requested.emit()
elif model_config["type"] == "segment_anything_2_video":
try:
from .segment_anything_2_video import SegmentAnything2Video
model_config["model"] = SegmentAnything2Video(
model_config, on_message=self.new_model_status.emit
)
self.auto_segmentation_model_selected.emit()
except Exception as e: # noqa
print(
"Error in loading model: {error_message}".format(
error_message=str(e)
)
)
self.new_model_status.emit(
self.tr(
"Error in loading model: {error_message}".format(
error_message=str(e)
)
)
)
return
# Request next files for prediction
self.request_next_files_requested.emit()
elif model_config["type"] == "efficientvit_sam":
from .efficientvit_sam import EfficientViT_SAM

Expand Down Expand Up @@ -1472,6 +1496,7 @@ def set_auto_labeling_marks(self, marks):
marks_model_list = [
"segment_anything",
"segment_anything_2",
"segment_anything_2_video",
"sam_med2d",
"sam_hq",
"yolov5_sam",
Expand All @@ -1498,6 +1523,7 @@ def set_auto_labeling_reset_tracker(self):
"yolov8_obb_track",
"yolov8_seg_track",
"yolov8_pose_track",
"segment_anything_2_video",
]
if (
self.loaded_model_config is None
Expand Down Expand Up @@ -1606,13 +1632,23 @@ def set_auto_labeling_preserve_existing_annotations_state(self, state):
"model"
].set_auto_labeling_preserve_existing_annotations_state(state)

def set_auto_labeling_prompt(self):
model_list = ['segment_anything_2_video']
if (
self.loaded_model_config is not None
and self.loaded_model_config["type"] in model_list
):
self.loaded_model_config[
"model"
].set_auto_labeling_prompt()

def unload_model(self):
"""Unload model"""
if self.loaded_model_config is not None:
self.loaded_model_config["model"].unload()
self.loaded_model_config = None

def predict_shapes(self, image, filename=None, text_prompt=None):
def predict_shapes(self, image, filename=None, text_prompt=None, run_tracker=False):
"""Predict shapes.
NOTE: This function is blocking. The model can take a long time to
predict. So it is recommended to use predict_shapes_threading instead.
Expand All @@ -1624,14 +1660,18 @@ def predict_shapes(self, image, filename=None, text_prompt=None):
self.prediction_finished.emit()
return
try:
if text_prompt is None:
if text_prompt is not None:
auto_labeling_result = self.loaded_model_config[
"model"
].predict_shapes(image, filename)
].predict_shapes(image, filename, text_prompt=text_prompt)
elif run_tracker is True:
auto_labeling_result = self.loaded_model_config[
"model"
].predict_shapes(image, filename, run_tracker=run_tracker)
else:
auto_labeling_result = self.loaded_model_config[
"model"
].predict_shapes(image, filename, text_prompt)
].predict_shapes(image, filename)
self.new_auto_labeling_result.emit(auto_labeling_result)
self.new_model_status.emit(
self.tr("Finished inferencing AI model. Check the result.")
Expand All @@ -1646,7 +1686,7 @@ def predict_shapes(self, image, filename=None, text_prompt=None):
self.prediction_finished.emit()

@pyqtSlot()
def predict_shapes_threading(self, image, filename=None, text_prompt=None):
def predict_shapes_threading(self, image, filename=None, text_prompt=None, run_tracker=False):
"""Predict shapes.
This function starts a thread to run the prediction.
"""
Expand Down Expand Up @@ -1675,13 +1715,17 @@ def predict_shapes_threading(self, image, filename=None, text_prompt=None):
return

self.model_execution_thread = QThread()
if text_prompt is None:
if text_prompt is not None:
self.model_execution_worker = GenericWorker(
self.predict_shapes, image, filename
self.predict_shapes, image, filename, text_prompt=text_prompt
)
elif run_tracker is True:
self.model_execution_worker = GenericWorker(
self.predict_shapes, image, filename, run_tracker=run_tracker
)
else:
self.model_execution_worker = GenericWorker(
self.predict_shapes, image, filename, text_prompt
self.predict_shapes, image, filename
)
self.model_execution_worker.finished.connect(
self.model_execution_thread.quit
Expand Down
Loading

0 comments on commit 0437e39

Please sign in to comment.