PIMO #1726

jpcbertoldo · 2024-02-09T12:56:03Z

📝 Description

Replace #1557, which replaces the PRs from https://gist.github.com/jpcbertoldo/12553b7eaa97cfbf3e55bfd7d1cafe88 .

Implements refactors from https://github.com/jpcbertoldo/anomalib/blob/metrics/refactors/src/anomalib/utils/metrics/perimg/.refactors .

arxiv: https://arxiv.org/abs/2401.01984
medium post: https://medium.com/p/c653ac30e802
GSoC deliverable: https://gist.github.com/jpcbertoldo/12553b7eaa97cfbf3e55bfd7d1cafe88

Closes #1728 1728

✨ Changes

Select what type of change your PR is:

🐞 Bug fix (non-breaking change which fixes an issue)
🔨 Refactor (non-breaking change which refactors the code base)
🚀 New feature (non-breaking change which adds functionality)
💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
📚 Documentation update
🔒 Security update

✅ Checklist

Before you submit your pull request, please make sure you have completed the following steps:

📋 I have summarized my changes in the CHANGELOG and followed the guidelines for my type of change (skip for minor changes, documentation updates, and test enhancements).
📚 I have made the necessary updates to the documentation (if applicable).
🧪 I have written tests that support my changes and prove that my fix is effective or my feature works (if applicable).

For more information about code review checklists, see the Code Review Checklist.

src/anomalib/metrics/per_image/binclf_curve.py

src/anomalib/metrics/per_image/_binclf_curve_numba.py

src/anomalib/metrics/per_image/_validate.py

src/anomalib/metrics/per_image/binclf_curve_numpy.py

src/anomalib/metrics/per_image/pimo_numpy.py

jpcbertoldo · 2024-02-09T13:10:50Z

another unresolved issue from the previous PR

[about "ATTENTION..." in docstrings]

@ashwinvaidya17

Same here. We need to consider how these docstrings will be rendered in sphinx.

how can i check that?

samet-akcay · 2024-02-09T14:17:15Z

another unresolved issue from the previous PR

[about "ATTENTION..." in docstrings]

@ashwinvaidya17

Same here. We need to consider how these docstrings will be rendered in sphinx.

how can i check that?

The documentation is built here based on your changes

src/anomalib/metrics/per_image/__init__.py

jpcbertoldo · 2024-02-09T16:23:03Z

another unresolved issue from the previous PR
[about "ATTENTION..." in docstrings]
@ashwinvaidya17

Same here. We need to consider how these docstrings will be rendered in sphinx.

how can i check that?

The documentation is built here based on your changes

there are some [metadata] stuff showing up, idk why

apparently sphinx doesnt like dataclasses?
https://anomalib--1726.org.readthedocs.build/en/1726/markdown/guides/reference/metrics/index.html#id5
this field is in the class AUPIMOResult but it's showing as if it was a function or w/e in the root (?)
while AUPIMOResult doesnt show at all

samet-akcay · 2024-02-09T16:48:17Z

another unresolved issue from the previous PR
[about "ATTENTION..." in docstrings]
@ashwinvaidya17

Same here. We need to consider how these docstrings will be rendered in sphinx.

how can i check that?

The documentation is built here based on your changes

there are some [metadata] stuff showing up, idk why

apparently sphinx doesnt like dataclasses? https://anomalib--1726.org.readthedocs.build/en/1726/markdown/guides/reference/metrics/index.html#id5 this field is in the class AUPIMOResult but it's showing as if it was a function or w/e in the root (?) while AUPIMOResult doesnt show at all

Tree structure is also messed up a bit. It might be an idea to split each metric into a separate section.

jpcbertoldo · 2024-02-10T15:13:31Z

another unresolved issue from the previous PR
[about "ATTENTION..." in docstrings]
@ashwinvaidya17

Same here. We need to consider how these docstrings will be rendered in sphinx.

how can i check that?

The documentation is built here based on your changes

https://anomalib--1726.org.readthedocs.build/en/1726/markdown/guides/reference/metrics/index.html

it's not quite working as expected

i expected per_image to show as submenu in metrics, how could I do that?
it seems not to like dataclasses; there are attributes of PIMOResult and AUPIMOResult showing as if it was a function (?) and the classes themselves dont' show

ashwinvaidya17

A lot has changed since this PR was submitted, but I've finally gotten around to reviewing it. This is a huge PR with a lot of efforts behind it. However, I have some concerns. I've gone over it once but I think I'll require a few more passes for a more thorough review. But meanwhile we can start the discussions for the current opens.

src/anomalib/metrics/per_image/binclf_curve.py

src/anomalib/metrics/per_image/_binclf_curve_numba.py

requirements/base.txt

tests/unit/metrics/per_image/test_utils.py

src/anomalib/metrics/per_image/_validate.py

src/anomalib/metrics/per_image/utils.py

src/anomalib/metrics/per_image/binclf_curve_numpy.py

src/anomalib/metrics/per_image/_binclf_curve_numba.py

src/anomalib/metrics/per_image/pimo_numpy.py

Signed-off-by: jpcbertoldo <[email protected]>

jpcbertoldo · 2024-06-26T15:45:51Z

Thanks! I think we are almost there. I have two major comments remaining. First, the author tag in the headers is not consistent with the rest of the repo. The current scans check third-party-programs.txt to verify the license and proper attribution for codes taken from other repositories. Second, numba adds complexity to the codebase. If it is not strictly necessary, let's not include it. But I am happy to hear other opinions.

I removed the author tag and add the original code's repo to third-party-programs.txt.

I think the decision for numba is rather up to y`all as maintainers, but it does give a nice speedup.

For the record (if i get it well), it only has two requirements:

install_requires = [
    'llvmlite >={},<{}'.format(min_llvmlite_version, max_llvmlite_version),
    'numpy >={},<{}'.format(min_numpy_run_version, max_numpy_run_version),
    'importlib_metadata; python_version < "3.9"',
]

https://github.com/numba/numba/blob/d4460feb8c91213e7b89f97b632d19e34a776cd3/setup.py#L369-L373

min_numpy_run_version = "1.22"
max_numpy_run_version = "1.27"
min_llvmlite_version = "0.41.0dev0"
max_llvmlite_version = "0.42"

https://github.com/numba/numba/blob/d4460feb8c91213e7b89f97b632d19e34a776cd3/setup.py#L25-L28

I copied these from numba==0.58.1, which is the minimum here.

djdameln

I'm getting the following error when I try to use the AUPIMO metric to evaluate an anomalib model. Can you check it out?

ValueError: The `.compute()` return of the metric logged as 'pixel_AUPIMO' must be a tensor. Found (PIMOResult(shared_fpr_metric='mean-per-image-fpr'), AUPIMOResult(shared_fpr_metric='mean-per-image-fpr', fpr_lower_bound=1e-05, fpr_upper_bound=0.0001, num_threshs=48480))

src/anomalib/metrics/per_image/pimo_numpy.py

src/anomalib/metrics/per_image/pimo.py

src/anomalib/__init__.py

djdameln · 2024-07-02T15:25:24Z

src/anomalib/metrics/per_image/_binclf_curve_numba.py

+# Copyright (C) 2024 Intel Corporation
+# SPDX-License-Identifier: Apache-2.0
+
+import numba


I'm also a bit hesitant to add the numba requirement. I can see the benefit that it brings, but at the same time it adds an unnecessary dependency and it increases the complexity of the code. Without Numba we could have a pure pytorch implementation of the metric, which would be much cleaner and more in line with the rest of the library.

src/anomalib/metrics/per_image/pimo.py

jpcbertoldo · 2024-07-08T14:03:45Z

I'm getting the following error when I try to use the AUPIMO metric to evaluate an anomalib model. Can you check it out?
ValueError: The `.compute()` return of the metric logged as 'pixel_AUPIMO' must be a tensor. Found (PIMOResult(shared_fpr_metric='mean-per-image-fpr'), AUPIMOResult(shared_fpr_metric='mean-per-image-fpr', fpr_lower_bound=1e-05, fpr_upper_bound=0.0001, num_threshs=48480))

This is normal because aupimo returns many values so it was encapsulated in that dataclass.
We could create an option in the torchmetrics interface to optionally return this and (by default) return just the average aupimo instead.
Sounds good?

djdameln · 2024-07-09T13:36:56Z

This is normal because aupimo returns many values so it was encapsulated in that dataclass.
We could create an option in the torchmetrics interface to optionally return this and (by default) return just the average aupimo instead.
Sounds good?

I think that would be a good idea. With the default setting the metric should be fully compatible with Anomalib's pipeline. So users should be able to enable it from the config/cli or API to have Anomalib report the average AUPIMO value.

Signed-off-by: jpcbertoldo <[email protected]>

jpcbertoldo · 2024-08-21T15:47:29Z

This is normal because aupimo returns many values so it was encapsulated in that dataclass.
We could create an option in the torchmetrics interface to optionally return this and (by default) return just the average aupimo instead.
Sounds good?

I think that would be a good idea. With the default setting the metric should be fully compatible with Anomalib's pipeline. So users should be able to enable it from the config/cli or API to have Anomalib report the average AUPIMO value.

@djdameln Done : )

i think the only missing issue is about numba (here #1726 (comment))

i can remove it if that's better, but it is already optional like you suggested in your last comment

Signed-off-by: jpcbertoldo <[email protected]>

jpcbertoldo · 2024-08-22T13:13:54Z

@samet-akcay could you launch a code check here please?

About CDO, I think it is ok to apply the correction? But I'm not 100% sure (don't want to mess up the commit history 😬 )

samet-akcay · 2024-08-22T13:15:33Z

@samet-akcay could you launch a code check here please?

About CDO, I think it is ok to apply the correction? But I'm not 100% sure (don't want to mess up the commit history 😬 )

done

Signed-off-by: jpcbertoldo <[email protected]>

samet-akcay

@jpcbertoldo, thanks for creating this huuuge and amazing PR! Also, thanks for your patience. Took me a while to go through this.

I've got some comments/questions.

Clarity vs Conciseness in Naming

When reading the code, I would prefer clarity over conciseness. I find the abbreviations hard to follow, which overall slows me down reading the code. Overall, I would prefer full words over abbreviations. Something like;
classification_curve.py or binary_classification_curve.py
thresholds instead of threshs

Validation

I'm wondering how important the validation stuff for this metric evaluation?
- Is it possible to carry out these checks multiple times? Or just in the correct place.
- Is it possible to store validators in more organized way. currently there is a _validation.py, but there are quite a bit of validators scattered across the sub-package.

Structure of the sub-package

I feel we could organise the sub-package a bit more. For example, instead of placing multiple modules horizontally like pimo.py, pimo_numpy.py, we could potentially create a pimo sub package that organises these, which would be easier to follow. Similarly, it might be an idea to create something like utils/... , even maybe curves/..?
For example, the actual PIMO implementation starts in Line 700ish. As a reader, I would expect to see AUPIMO(Metric) in aupimo.py and PIMO(Metric) in pimo.py almost right after imports. The rest of the stuff is sort of util to me

I'm working on something similar in PR #2305 to structure sub packages in data, for easier navigation

data
├── __init__.py
├── dataclasses
│   ├── __init__.py
│   ├── generic.py
│   ├── numpy
│   │   ├── __init__.py
│   │   ├── base.py
│   │   ├── depth.py
│   │   ├── image.py
│   │   └── video.py
│   └── torch
│       ├── __init__.py
│       ├── base.py
│       ├── depth.py
│       ├── image.py
│       └── video.py
├── datamodules
│   ├── __init__.py
│   ├── base
│   │   ├── __init__.py
│   │   ├── image.py
│   │   └── video.py
│   ├── depth
│   │   ├── __init__.py
│   │   ├── folder_3d.py
│   │   └── mvtec_3d.py
│   ├── image
│   │   ├── __init__.py
│   │   ├── btech.py
│   │   ├── folder.py
│   │   ├── kolektor.py
│   │   ├── mvtec.py
│   │   └── visa.py
│   └── video
│       ├── __init__.py
│       ├── avenue.py
│       ├── shanghaitech.py
│       └── ucsd_ped.py
├── datasets
│   ├── __init__.py
│   ├── base
│   │   ├── __init__.py
│   │   ├── depth.py
│   │   ├── image.py
│   │   └── video.py
│   ├── depth
│   │   ├── __init__.py
│   │   ├── folder_3d.py
│   │   └── mvtec_3d.py
│   ├── image
│   │   ├── __init__.py
│   │   ├── btech.py
│   │   ├── folder.py
│   │   ├── kolektor.py
│   │   ├── mvtec.py
│   │   └── visa.py
│   └── video
│       ├── __init__.py
│       ├── avenue.py
│       ├── shanghaitech.py
│       └── ucsd_ped.py
├── ...
├── transforms
│   └── ...
├── utils
│   └── ...
└── validators
    ├── __init__.py
    ├── numpy
    │   ├── __init__.py
    │   ├── depth.py
    │   ├── image.py
    │   └── video.py
    ├── path.py
    └── torch
        ├── __init__.py
        ├── depth.py
        ├── image.py
        └── video.py

Given your other commitments these days, one possibility would be to work on it together by merging it to a feature branch? Any thoughts? @jpcbertoldo, @ashwinvaidya17, @djdameln ?

samet-akcay · 2024-09-13T08:42:07Z

src/anomalib/data/utils/path.py

+    path: str | Path,
+    base_dir: str | Path | None = None,
+    should_exist: bool = True,
+    accepted_extensions: tuple[str, ...] | None = None,


I think extensions would be sufficient in this case

Suggested change

accepted_extensions: tuple[str, ...] | None = None,

extensions: tuple[str, ...] | None = None,

samet-akcay · 2024-09-13T08:42:17Z

src/anomalib/data/utils/path.py

    """Validate the path.

    Args:
        path (str | Path): Path to validate.
        base_dir (str | Path): Base directory to restrict file access.
        should_exist (bool): If True, do not raise an exception if the path does not exist.
+        accepted_extensions (tuple[str, ...] | None): Accepted extensions for the path. An exception is raised if the


Suggested change

accepted_extensions (tuple[str, ...] | None): Accepted extensions for the path. An exception is raised if the

extensions (tuple[str, ...] | None): Accepted extensions for the path. An exception is raised if the

samet-akcay · 2024-09-13T08:42:33Z

src/anomalib/data/utils/path.py

@@ -213,6 +220,11 @@ def validate_path(path: str | Path, base_dir: str | Path | None = None, should_e
            msg = f"Read or execute permissions denied for the path: {path}"
            raise PermissionError(msg)

+    # Check if the path has one of the accepted extensions
+    if accepted_extensions is not None and path.suffix not in accepted_extensions:


Suggested change

if accepted_extensions is not None and path.suffix not in accepted_extensions:

if extensions is not None and path.suffix not in extensions:

samet-akcay · 2024-09-13T08:50:08Z

src/anomalib/metrics/per_image/__init__.py

+# Copyright (C) 2024 Intel Corporation
+# SPDX-License-Identifier: Apache-2.0
+
+from .binclf_curve import per_image_binclf_curve, per_image_fpr, per_image_tpr


I have a question regarding .binclf: Is there any case where the classification is not binary? If binary only, can we assume that the abbreviation bin is redundant?

During reading the code, I find it a bit hard to follow these abbreviations

samet-akcay · 2024-09-13T08:59:48Z

src/anomalib/metrics/per_image/binclf_curve_numpy.py

+
+
+class BinclfThreshsChoice(Enum):
+    """Sequence of thresholds to use."""


I think expanding the docstring would be good to explain what each choice would do. For example, why would a user, choose given over minmax_linspace, or mean_fpr_optimized

samet-akcay · 2024-09-13T09:20:35Z

src/anomalib/metrics/per_image/utils.py

+    return utils_numpy.compare_models_pairwise_ttest_rel(scores_per_model_with_arrays, alternative, higher_is_better)
+
+
+def compare_models_pairwise_wilcoxon(


same as above

samet-akcay · 2024-09-13T09:21:59Z

src/anomalib/metrics/per_image/utils_numpy.py

+    HI: str = "hi"
+    LO: str = "lo"


why not
HIGH, instead of HI, and LOW, instead LO?

samet-akcay · 2024-09-13T09:23:30Z

src/anomalib/metrics/per_image/utils_numpy.py

+            - 'image_idx': Index of the image in `per_image_scores` whose score is the closest to the statistic's value.
+            - 'score': The score of the image at index `image_idx` (not necessarily the same as `stat_value`).
+
+        The list is sorted by increasing `stat_value`.


As a reader/user, I would love to see an example to understand how to use this

samet-akcay · 2024-09-13T09:27:04Z

src/anomalib/metrics/per_image/pimo.py

+# =========================================== ARGS VALIDATION ===========================================
+
+
+def _validate_is_anomaly_maps(anomaly_maps: Tensor) -> None:


would it be an idea to move these validation utils to _validate.py? these functions already use functions from validate.

samet-akcay · 2024-09-13T09:28:09Z

src/anomalib/metrics/per_image/pimo.py

+
+    Returns:
+        PIMOResult: PIMO curves dataclass object. See `PIMOResult` for details.
+    """


I think examples section here would be really useful. Like what do I need to run AUPIMO with some basic torch input and output

ashwinvaidya17

I agree with Samet’s suggestion to move all validation checks to a separate module, as it would significantly improve the clarity and organization of pimo.py. Furthermore, I propose we also consider removing the NumPy-based computations. Since we aren’t using NumPy for other metrics in Anomalib, and there doesn’t seem to be a current need for NumPy in our metric computations, eliminating it could further simplify the codebase.
As Samet suggested, we can target this change to a feature branch so that the rest of the team can assist in reducing your workload, especially given the significant amount of work you’ve already done.

jpcbertoldo · 2024-09-14T15:42:05Z

I agree with Samet’s suggestion to move all validation checks to a separate module, as it would significantly improve the clarity and organization of pimo.py. Furthermore, I propose we also consider removing the NumPy-based computations. Since we aren’t using NumPy for other metrics in Anomalib, and there doesn’t seem to be a current need for NumPy in our metric computations, eliminating it could further simplify the codebase. As Samet suggested, we can target this change to a feature branch so that the rest of the team can assist in reducing your workload, especially given the significant amount of work you’ve already done.

Ok, no worries. So shoul I remove the *_numpy.py stuff and move the code to the torch onlye versions ?

Just one warning. PIMO is fast to compute thanks to

https://github.com/jpcbertoldo/anomalib/blob/241df0e2aea82373bb69c216e73a7e43106aee22/src/anomalib/metrics/per_image/binclf_curve_numpy.py#L98

which is in numpy, so that specific one would be worth keeping.

Using torchmetrics (even on GPU) is slower. The constraint is that the thresholds are shared across images, but the binary classification (for multiple thresholds) are computed per image.

If this one is removed, then we need a torch-based implementation, which I'm sure is significantly slower for evaluation on the original resolution of images (1000-ish), which was a major argument in the paper. This is a topic we went through last year, but I can try to find the graphs to put numbers in this difference of execution time.

So, do we keep or remove this one? (_binclf_one_curve_python)

ashwinvaidya17 · 2024-09-16T07:06:22Z

Okay, then let's keep bin_clfcurve_numpy.py. I wonder how will this affect the computation when we support distributed training. Anyway, we can look into that later. Let's get this merged.

jpcbertoldo · 2024-09-16T09:21:57Z

Okay, then let's keep bin_clfcurve_numpy.py. I wonder how will this affect the computation when we support distributed training. Anyway, we can look into that later. Let's get this merged.

(just guessing) In the torchmetrics class, there isn't much happening in the update(), it basically stores score maps and masks. The compute() method will do everything. It' pretty much what torchmetrics does with AUROC. So it should be ok?

jpcbertoldo requested review from samet-akcay, ashwinvaidya17 and djdameln as code owners February 9, 2024 12:56

github-actions bot added Dependencies Pull requests that update a dependency file Tests labels Feb 9, 2024

jpcbertoldo mentioned this pull request Feb 9, 2024

🚀 Add support for MVTec LOCO dataset and sPRO metric #1686

Merged

9 tasks