metrics[4 and 5 refactored] save results and compare models [GSoC23 cont'd] #12

jpcbertoldo · 2023-09-05T17:26:37Z

Description

This is a continuation of Anomaly Segmentation Metrics for Anomalib — GSoC 2023 @ OpenVINO.

Updated PRs at https://gist.github.com/jpcbertoldo/12553b7eaa97cfbf3e55bfd7d1cafe88?permalink_comment_id=4681988#gistcomment-4681988

Replaces #9 and #10 after a refactor on the PImOResult, which became a dataclass incorporating the save and load methods.

Refactors

Return types

PImOResult becomes a dataclass holding all the information for pimo curves, and the aucs from AUPImO becomes an analogous AUPImOResult.

Others

move demo notebooks to a common folder
change some docstrings
update the 1st notebook's text
some validations
From the previous PR:

Known issue: the plot functions in AULogPImO were not entirely encapsulated in a function. This was corrected in the next PR.

Adress this and move the logic of the plots from AULogPImO to a functional interface (classes practically just pass the arguments).

Save & Load

Make it possible to save and load pimo curves and their aucs.

PImOResult will save the curves' metadat (shared fpr type and bounds) all the returns (fprs, tprs, shared fpr, etc) in a dict of tensors in a .pt so the curves are fully recoverable.

AUPImOResult will save the auc's metadata (shared fpr type, bounds, thresholds of the bounds) in a JSON file so it can be human readable.

Compare models

Add utilities to compare two or more models on the same dataset with a per-image metric.

Two types of comparison are proposed: parametric (the metric values) and non-parametric (ranks, i.e. sorting of models). For each method, there is a statistical test (respec., paired t-test and wilcoxon signed rank test) and a plot that visually explains the test.

Plots can eventually show multiple (>2) methods, but the statistical tests are performed pairwise.
Results are read in a table with rows and columns are model names (model1 and model2) sorted by their average score or rank.

Statistical tests:

Null hypothesis: methods are not significantly different.
Alternative hypothesis: method1 (row) > method2 (column).
Cell values are the "confidence to reject the null hypothesis" (1 - pvalue).

Example of parametric comparison (AULogPImO):

Example of non-parametric comparison from the parametric above:

List of refactors before merging feature branch:
https://github.com/jpcbertoldo/anomalib/blob/metrics/refactors/src/anomalib/utils/metrics/perimg/.refactors

Adds from this PR

Changes

Bug fix (non-breaking change which fixes an issue)
Refactor (non-breaking change which refactors the code base)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

Checklist

My code follows the pre-commit style and check guidelines of this project.
I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing tests pass locally with my changes
I have added a summary of my changes to the CHANGELOG (not for minor changes, docs and tests).

This reverts commit 837fa7a.

jpcbertoldo · 2023-12-19T17:17:50Z

Replaced by openvinotoolkit#1557 to catch up with v1.

jpcbertoldo added 25 commits August 18, 2023 10:16

normalize axes to ax

baff328

implement save

d95231c

add and fix tests to save and load

af90b25

add demo notebook

101962e

make auLOGpimo boxplot plot functional

a5cf4ff

add imgix vs metric or rank plots

6df11f6

put models dict validation in common

19401a1

enable higher_is_better=False

baf2586

correct mistake in ranks and make rank plot show draws

c897f89

add pairwise statistical tests

20f440c

small changes and comparison of 3 models in nb

5ab33bc

add tests and fix small issues

26a76e6

[0] fpr plot bounds colors

eb0f421

complete some missing text in notebook

48d0085

complete some module docstrings

621f58e

make saturation colors and transparent superimposed colormap

837fa7a

Revert "make saturation colors and transparent superimposed colormap"

ee2cf55

This reverts commit 837fa7a.

write exception messages

c444bbe

make saturation colors configurable

9617076

add tests and fix small issues

f018d7c

Merge branch 'metrics/next04-compare-models' into metrics/next05-viz

2b2c96e

refactor save/load and comparisions

f120e7a

fix tests

31d869f

fix notebook kernelspec

104d53b

small fixes

33eef64

github-actions bot added Notebooks Tests labels Sep 5, 2023

roll back viz stuff (next pr)

7e1cd36

This was referenced Sep 5, 2023

metrics[5]: compare models [GSoC 2023 @ OpenVINO] #10

Closed

metrics[4] save results [GSoC 2023 @ OpenVINO] #9

Closed

jpcbertoldo changed the title ~~metrics[4 and 5 refactored] save results and compare models~~ metrics[4 and 5 refactored] save results and compare models [GSoC23 cont'd] Sep 6, 2023

fix types in boxplot

7e696b6

jpcbertoldo closed this Dec 19, 2023

jpcbertoldo deleted the metrics/next-03-04-save-load-compare branch February 9, 2024 12:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

metrics[4 and 5 refactored] save results and compare models [GSoC23 cont'd] #12

metrics[4 and 5 refactored] save results and compare models [GSoC23 cont'd] #12

jpcbertoldo commented Sep 5, 2023 •

edited

Loading

jpcbertoldo commented Dec 19, 2023

metrics[4 and 5 refactored] save results and compare models [GSoC23 cont'd] #12

metrics[4 and 5 refactored] save results and compare models [GSoC23 cont'd] #12

Conversation

jpcbertoldo commented Sep 5, 2023 • edited Loading

Description

Refactors

Return types

Others

Save & Load

Compare models

Changes

Checklist

jpcbertoldo commented Dec 19, 2023

jpcbertoldo commented Sep 5, 2023 •

edited

Loading