openvinotoolkit · sovrasov · Sep 25, 2024 · Aug 20, 2024 · Aug 21, 2024 · Aug 21, 2024
@@ -2,7 +2,7 @@
 
 All notable changes to this project will be documented in this file.
 
-## \[unreleased\]
+## \[2.2.0\]
 
 ### New features
 
@@ -45,15 +45,31 @@ All notable changes to this project will be documented in this file.
   (<https://github.com/openvinotoolkit/training_extensions/pull/3769>)
 - Refactoring `ConvModule` by removing `conv_cfg`, `norm_cfg`, and `act_cfg`
   (<https://github.com/openvinotoolkit/training_extensions/pull/3783>, <https://github.com/openvinotoolkit/training_extensions/pull/3816>, <https://github.com/openvinotoolkit/training_extensions/pull/3809>)
+- Support ImageFromBytes
+  (<https://github.com/openvinotoolkit/training_extensions/pull/3948>)
+- Enable model export
+  (<https://github.com/openvinotoolkit/training_extensions/pull/3952>)
+- Move templates from OTX1.X to OTX2.X
+  (<https://github.com/openvinotoolkit/training_extensions/pull/3951>)
+- Include Geti arrow dataset subset names
+  (<https://github.com/openvinotoolkit/training_extensions/pull/3962>)
+- Include full image with anno in case there's no tile in tile dataset
+  (<https://github.com/openvinotoolkit/training_extensions/pull/3964>)
+- Add type checker in converter for callable functions (optimizer, scheduler)
+  (<https://github.com/openvinotoolkit/training_extensions/pull/3968>)
 
 ### Bug fixes
 
 - Fix Combined Dataloader & unlabeled warmup loss in Semi-SL
-  (https://github.com/openvinotoolkit/training_extensions/pull/3723)
+  (<https://github.com/openvinotoolkit/training_extensions/pull/3723>)
 - Revert #3579 to fix issues with replacing coco_instance with a different format in some dataset
-  (https://github.com/openvinotoolkit/training_extensions/pull/3753)
+  (<https://github.com/openvinotoolkit/training_extensions/pull/3753>)
 - Add num_devices in Engine for multi-gpu training
-  (https://github.com/openvinotoolkit/training_extensions/pull/3778)
+  (<https://github.com/openvinotoolkit/training_extensions/pull/3778>)
+- Add missing tile recipes and various tile recipe changes
+  (<https://github.com/openvinotoolkit/training_extensions/pull/3942>)
+- Change categories mapping logic
+  (<https://github.com/openvinotoolkit/training_extensions/pull/3946>)
 
 ## \[v2.1.0\]
 
@@ -191,6 +207,8 @@ All notable changes to this project will be documented in this file.
   (<https://github.com/openvinotoolkit/training_extensions/pull/3684>)
 - Fix MaskRCNN SwinT NNCF Accuracy Drop
   (<https://github.com/openvinotoolkit/training_extensions/pull/3685>)
+- Fix MaskRCNN SwinT NNCF Accuracy Drop By Adding More PTQ Configs
+  (<https://github.com/openvinotoolkit/training_extensions/pull/3929>)
 
 ### Known issues
 

@@ -166,83 +166,44 @@ In addition to the examples above, please refer to the documentation for tutoria
 
 ---
 
-## Updates
-
-### v2.1.0 (3Q24)
-
-> _**NOTES**_
->
-> OpenVINO™ Training Extensions, version 2.1.0 does not include the latest functional and security updates. OpenVINO™ Training Extensions, version 2.2.0 is targeted to be released in September 2024 and will include additional functional and security updates. Customers should update to the latest version as it becomes available.
+## Updates - v2.2.0 (3Q24)
 
 ### New features
 
-- Add a flag to enable OV inference on dGPU
-- Add early stopping with warmup. Remove mandatory background label in semantic segmentation task
-- RTMDet-tiny enablement for detection task
-- Add data_format validation and update in OTXDataModule
-- Add torchvision.MaskRCNN
-- Add Semi-SL for Multi-class Classification (EfficientNet-B0)
-- Decoupling mmaction for action classification (MoviNet, X3D)
-- Add Semi-SL Algorithms for mv3-large, effnet-v2, deit-tiny, dino-v2
-- RTMDet-tiny enablement for detection task (export/optimize)
-- Enable ruff & ruff-format into otx/algo/classification/backbones
-- Add TV MaskRCNN Tile Recipe
-- Add rotated det OV recipe
+- Add RT-DETR model for Object Detection
+- Add Multi-Label & H-label Classification with torchvision models
+- Add Hugging-Face Model Wrapper for Classification
+- Add LoRA finetuning capability for ViT Architectures
+- Add Hugging-Face Model Wrapper for Object Detection
+- Add Hugging-Face Model Wrapper for Semantic Segmentation
+- Enable torch.compile to work with classification
+- Add `otx benchmark` subcommand
+- Add RTMPose for Keypoint Detection Task
+- Add Semi-SL MeanTeacher algorithm for Semantic Segmentation
+- Update head and h-label format for hierarchical label classification
+- Support configurable input size
 
 ### Enhancements
 
-- Change load_stat_dict to on_load_checkpoint
-- Add try - except to keep running the remaining tests
-- Update instance_segmentation.py to resolve conflict with 2.0.0
-- Update XPU install
-- Sync rgb order between torch and ov inference of action classification task
-- Make Perf test available to load pervious Perf test to skip training stage
-- Reenable e2e classification XAI tests
-- Remove action detection task support
-- Increase readability of pickling error log during HPO & fix minor bug
-- Update RTMDet checkpoint url
-- Refactor Torchvision Model for Classification Semi-SL
-- Add coverage omit mm-related code
-- Add docs semi-sl part
-- Refactor docs design & Add contents
-- Add execution example of auto batch size in docs
-- Add Semi-SL for cls Benchmark Test
-- Move value to device before logging for metric
-- Add .codecov.yaml
-- Update benchmark tool for otx2.1
-- Collect pretrained weight binary files in one place
-- Minimize compiled dependency files
-- Update README & CODEOWNERS
-- Update Engine's docstring & CLI --help outputs
-- Align integration test to exportable code interface update for release branch
-- Refactor exporter for anomaly task and fix a bug with exportable code
-- Update pandas version constraint
-- Include more models to export test into test_otx_e2e
-- Move assigning tasks to Models from Engine to Anomaly Model Classes
-- Refactoring detection modules
+- Reimplement of ViT Architecture following TIMM
+- Enable to override data configurations
+- Enable to use input_size at transforms in recipe
+- Enable to use polygon and bitmap mask as prompt inputs for zero-shot learning
+- Refactoring `ConvModule` by removing `conv_cfg`, `norm_cfg`, and `act_cfg`
+- Support ImageFromBytes
+- enable model export
+- Move templates from OTX1.X to OTX2.X
+- Include Geti arrow dataset subset names
+- Include full image with anno in case there's no tile in tile dataset
+- Add type checker in converter for callable functions (optimizer, scheduler)
 
 ### Bug fixes
 
-- Fix conflicts between develop and 2.0.0
-- Fix polygon mask
-- Fix vpm intg test error
-- Fix anomaly
-- Bug fix in Semantic Segmentation + enable DINOV2 export in ONNX
-- Fix some export issues. Remove EXPORTABLE_CODE as export parameter.
-- Fix `load_from_checkpoint` to apply original model's hparams
-- Fix `load_from_checkpoint` args to apply original model's hparams
-- Fix zero-shot `learn` for ov model
-- Various fixes for XAI in 2.1
-- Fix tests to work in a mm-free environment
-- Fix a bug in benchmark code
-- Update exportable code dependency & fix a bug
-- Fix getting wrong shape during resizing
-- Fix detection prediction outputs
-- Fix RTMDet PTQ performance
-- Fix segmentation fault on VPM PTQ
-- Fix NNCF MaskRCNN-Eff accuracy drop
-- Fix optimize with Semi-SL data pipeline
-- Fix MaskRCNN SwinT NNCF Accuracy Drop
+- Fix Combined Dataloader & unlabeled warmup loss in Semi-SL
+- Revert #3579 to fix issues with replacing coco_instance with a different format in some dataset
+- Add num_devices in Engine for multi-gpu training
+- Add missing tile recipes and various tile recipe changes
+- Change categories mapping logic
 
 ### Known issues
 

@@ -1,7 +1,9 @@
 #!/bin/bash
-# shellcheck disable=SC2154
+# shellcheck disable=SC2154,SC2035,SC2046
 
-OTX_VERSION=$(python -c 'import otx; print(otx.__version__)')
+if [ "$OTX_VERSION" == "" ]; then
+    OTX_VERSION=$(python -c 'import otx; print(otx.__version__)')
+fi
 THIS_DIR=$(dirname "$0")
 
 echo "Build OTX ${OTX_VERSION} CUDA Docker image..."

@@ -32,10 +32,6 @@ def download_all() -> None:
             msg = f"Skip {config_path} since it is not a PyTorch config."
             logger.warning(msg)
             continue
-        if "anomaly_" in str(config_path) or "dino_v2" in str(config_path) or "h_label_cls" in str(config_path):
-            msg = f"Skip {config_path} since those models show errors on instantiation."
-            logger.warning(msg)
-            continue
 
         config = OmegaConf.load(config_path)
         init_model = next(iter(partial_instantiate_class(config.model)))

@@ -0,0 +1,116 @@
+Configurable Input Size
+=======================
+
+The Configurable Input Size feature allows users to adjust the input resolution of their deep learning models
+to balance between training and inference speed and model performance.
+This flexibility enables users to tailor the input size to their specific needs without manually altering
+the data pipeline configurations.
+
+To utilize this feature, simply specify the desired input size as an argument during the train command.
+Additionally, OTX ensures compatibility with model trained on non-default input sizes by automatically adjusting
+the data pipeline to match the input size during other engine entry points.
+
+Usage example:
+
+.. code-block::
+
+    $ otx train \
+        --config ... \
+
+.. tab-set::
+
+    .. tab-item:: API 1
+
+        .. code-block:: python
+
+            from otx.algo.detection.yolox import YOLOXS
+            from otx.core.data.module import OTXDataModule
+            from otx.engine import Engine
+
+            input_size = (512, 512)
+            model = YOLOXS(label_info=5, input_size=input_size)  # should be tuple[int, int]
+            datamodule = OTXDataModule(..., input_size=input_size)
+            engine = Engine(model=model, datamodule=datamodule)
+            engine.train()
+
+    .. tab-item:: API 2
+
+        .. code-block:: python
+
+            from otx.core.data.module import OTXDataModule
+            from otx.engine import Engine
+
+            datamodule = OTXDataModule(..., input_size=(512, 512))
+            engine = Engine(model="yolox_s", datamodule=datamodule)  # model input size will be aligned with the datamodule input size
+            engine.train()
+
+    .. tab-item:: CLI
+
+        .. code-block:: bash
+
+            (otx) ...$ otx train ... --data.input_size 512
+
+.. _adaptive-input-size:
+
+Adaptive Input Size
+-------------------
+
+The Adaptive Input Size feature intelligently determines an optimal input size for the model
+by analyzing the dataset's statistics.
+It operates in two distinct modes: "auto" and "downscale".
+In "auto" mode, the input size may increase or decrease based on the dataset's characteristics.
+In "downscale" mode, the input size will either decrease or remain unchanged, ensuring that the model training or inference speed deosn't drop.
+
+
+To activate this feature, use the following command with the desired mode:
+
+.. tab-set::
+
+    .. tab-item:: API
+
+        .. code-block:: python
+
+            from otx.algo.detection.yolox import YOLOXS
+            from otx.core.data.module import OTXDataModule
+            from otx.engine import Engine
+
+            datamodule = OTXDataModule(
+                ...
+                adaptive_input_size="auto",  # auto or downscale
+                input_size_multiplier=YOLOXS.input_size_multiplier, # should set the input_size_multiplier of the model
+            )
+            model = YOLOXS(label_info=5, input_size=datamodule.input_size)
+            engine = Engine(model=model, datamodule=datamodule)
+            engine.train()
+
+    .. tab-item:: CLI
+
+        .. code-block:: bash
+
+            (otx) ...$ otx train ... --data.adaptive_input_size "auto | downscale"
+
+The adaptive process includes the following steps:
+
+1. OTX computes robust statistics from the input dataset.
+
+2. The initial input size is set based on the typical large image size within the dataset.
+
+3. (Optional) The input size may be further refined to account for the sizes of objects present in the dataset.
+   The model's minimum recognizable object size, typically ranging from 16x16 to 32x32 pixels, serves as a reference to
+   proportionally adjust the input size relative to the average small object size observed in the dataset.
+   For instance, if objects are generally 64x64 pixels in a 512x512 image, the input size would be adjusted
+   to 256x256 to maintain detectability.
+
+   Adjustments are subject to the following constraints:
+
+   * If the recalculated input size exceeds the maximum image size determined in the previous step, it will be capped at that maximum size.
+   * If the recalculated input size falls below the minimum threshold defined by MIN_DETECTION_INPUT_SIZE, the input size will be scaled up. This is done by increasing the smaller dimension (width or height) to MIN_DETECTION_INPUT_SIZE while maintaining the aspect ratio, ensuring that the model's minimum criteria for object detection are met.
+
+4. (downscale only) Any scale-up beyond the default model input size is restricted.
+
+
+.. Note::
+    Opting for a smaller input size can be advantageous for datasets with lower-resolution images or larger objects,
+    as it may improve speed with minimal impact on model accuracy. However, it is important to consider that selecting
+    a smaller input size could affect model performance depending on the task, model architecture, and dataset
+    properties.
@@ -143,10 +143,16 @@ Here is explanation of all HPO configuration.
 
 - **mode** (*str*, *default='max'*) - Optimization mode for the metric. It determines whether the metric should be maximized or minimized. The possible values are 'max' and 'min', respectively.
 
-- **num_workers** (*int*, *default=1*) How many trials will be executed in parallel.
+- **num_trials** (*int*, *default=None*) The number of training trials to perform during HPO. If not provided, the number of trials will be determined based on the expected time ratio. Defaults to None.
+
+- **num_workers** (*int*, *default=None*) The number of trials that will be run concurrently.
 
 - **expected_time_ratio** (*int*, *default=4*) How many times to use for HPO compared to training time.
 
+- **metric_name** (*str*, *default=None*) The name of the performance metric to be optimized during HPO. If not specified, the metric will be selected based on the configured callbacks. Defaults to None.
+
+- **adapt_bs_search_space_max_val** (*Literal["None", "Safe", "Full"]*, *default="None"*) Whether to execute `Auto-adapt batch size` prior to HPO. This step finds the maximum batch size value, which then serves as the upper limit for the batch size search space during HPO. For further information on `Auto-adapt batch size`, please refer to the `Auto-configuration` documentation. Defaults to "None".
+
 - **maximum_resource** (*int*, *default=None*) - Maximum number of training epochs for each trial. When the training epochs reaches this value, the trial stop to train.
 
 - **minimum_resource** (*int*, *default=None*) - Minimum number of training epochs for each trial. Each trial will run at least this epochs, even if the performance of the model is not improving.

@@ -14,3 +14,4 @@ Additional Features
    fast_data_loading
    tiling
    class_incremental_sampler
+   configurable_input_size
@@ -6,4 +6,3 @@ Action Recognition
 
 
    action_classification
-   action_detection
@@ -339,11 +339,11 @@ The results will be saved in ``./otx-workspace/`` folder by default. The output
 
             (otx) ...$ otx train --model <model-class-path-or-name> --task <task-type> --data_root <dataset-root>
 
-        For example, if you want to use the ``otx.algo.detection.atss.ATSS`` model class, you can train it as shown below.
+        For example, if you want to use the ``otx.algo.classification.torchvision_model.TVModelForMulticlassCls`` model class, you can train it as shown below.
 
         .. code-block:: shell
 
-            (otx) ...$ otx train --model otx.algo.detection.atss.ATSS --model.variant mobilenetv2 --task DETECTION ...
+            (otx) ...$ otx train --model otx.algo.classification.torchvision_model.TVModelForMulticlassCls --model.backbone mobilenet_v3_small ...
 
 .. note::
     You also can visualize the training using ``Tensorboard`` as these logs are located in ``<work_dir>/tensorboard``.