Mean Average Precision producing incorrect values, maybe? #2746

Borda · 2024-09-16T10:54:14Z

Discussed in #1695

^{Originally posted by dominic-simon April 6, 2023}
Hello,

I'm trying to find the Mean Average Precision (mAP) over a set of images from the MSCOCO dataset that have been modified to cause the object detector in use (Faster-RCNN, in this case) to predict objects that don't exist. To be clear, the objects that don't exist are objects that generally have very little to no overlap with any ground truth objects.

My issue is that when I compare the original image mAP to the modified image mAP, they are the exact same, which does not seem correct. Below are what the images look like:
Ground Truths

Original Image Predictions

Modified Image Predictions

To the best of my knowledge, these two images should have two different mAPs, with the original image having a much higher mAP than the modified image since the modified image has many False Positive predictions (all the boxes to the left of the man on the surfboard). However, that is not the case, with both images receiving a mAP of 0.85.

Can someone confirm that I am correct in thinking that these two images should receive different mAPs? If I'm wrong, then perhaps an explanation of where I'm going wrong would be useful? I'm not sure what I'm misunderstanding here, so any help is appreciated.

Here's my code, in case I'm doing something wrong there:

import torch
import torchvision.transforms as transforms
import torchvision.models.detection as models
from torchvision.datasets import CocoDetection
from torchmetrics.detection.mean_ap import MeanAveragePrecision

import os
import numpy as np
from PIL import Image
from tqdm import tqdm
import matplotlib.pyplot as plt

PATCH_SIZE = 100
FOLDER_PATH = f'./SAC_PGD/{PATCH_SIZE}'
DEVICE = 'cuda'

model = models.fasterrcnn_resnet50_fpn_v2(weights='FasterRCNN_ResNet50_FPN_V2_Weights.COCO_V1').eval().to(DEVICE)

mAP = MeanAveragePrecision()

ms_coco = CocoDetection('../datasets/MSCOCO/val2017', '../datasets/MSCOCO/annotations/instances_val2017.json')

for i in tqdm(range(14, 15)):
    # Reformatting the mscoco ground truths
    b = []
    l = []
    s = []
    for j in range(len(ms_coco[i][1])):
        bboxes = ms_coco[i][1][j]['bbox']
        corrected = [bboxes[0], bboxes[1], bboxes[0]+bboxes[2], bboxes[1]+bboxes[3]]
        b.append(corrected)
        l.append(ms_coco[i][1][j]['category_id'])
        s.append(1.0)
    gt = [{'boxes':torch.from_numpy(np.array(b)).to(DEVICE),
           'labels':torch.from_numpy(np.array(l)).to(DEVICE),
           'scores':torch.from_numpy(np.array(s)).to(DEVICE)}]

   # Open the original image 
    original = (np.asarray(ms_coco[i][0]) / 255).astype(np.float32)
    row, col = original.shape[0], original.shape[1]

    to_tensor = transforms.Compose([transforms.Resize((row, col)),
                                    transforms.ToTensor()])

   # Open the modified image and a mask showing where the modification is
    image_path = os.path.join(FOLDER_PATH, f'{i}_adv.png')
    mask_path = os.path.join(FOLDER_PATH, f'{i}_mask.png')
    image = Image.open(image_path).convert('RGB')
    mask = Image.open(mask_path).convert('L')
    image = np.transpose(to_tensor(image).numpy(), (1, 2, 0))
    mask = np.stack([to_tensor(mask).numpy()] * 3, axis=3)[0]

    # I was having some issues with data. This solves that 
    image = np.where(mask == 1, original, image)

   # Reformat modified image for use in pytorch Faster-RCNN
    image = torch.from_numpy(np.transpose(image, (2, 0 ,1))).to(DEVICE)

   # Get modified and original image predictions
    with torch.no_grad():
        output = model([image])
        b_out = model([torch.from_numpy(np.transpose(original, (2, 0 ,1))).to(DEVICE)])

   # Draw the bounding boxes onto the images
    from PIL import ImageDraw
    to_pil = transforms.ToPILImage()
    original = to_pil((original * 255).astype(np.uint8))
    attacked_image = to_pil((image * 255).type(torch.uint8))
    drawer = ImageDraw.Draw(original)
    drawer_copy = ImageDraw.Draw(attacked_image)
    bboxes = output[0]['boxes']
    bbboxes = b_out[0]['boxes']
    for box in bboxes:
        drawer_copy.rectangle([box[0], box[1], box[2], box[3]], outline=(255, 0, 0), width=3)
    #for box in ms_coco[i][1]:
    for box in bbboxes:
        #box = box['bbox']
        drawer.rectangle([box[0], box[1], box[2], box[3]], outline=(0, 255, 0), width=3)
    original.save(f'./{i}_benign.png')
    attacked_image.save(f'./{i}_adv.png')

   # Update the predictions used in the mAP. 
   # I switch b_out and output to test the original and modified images mAP, respectively. 
    mAP.update(b_out, gt)

# Compute and show the final mAP.
total_mAP = mAP.compute()
print(total_mAP['map'])
```</div>

github-actions · 2024-09-16T10:54:34Z

Hi! thanks for your contribution!, great first issue!

SkafteNicki · 2024-10-09T10:42:25Z

See related issues: #1966, #1793, #1774

TLDR: the MAP metric is extremely tricky to get an intuition about and this mostly boils down to how the precision-recall curve is calculated and how it is interpolated. It should never ever be evaluated on only a few samples because the computations will varying a lot.

To calculate the MAP in this case I am going to assume one thing and that is that the two boxes around the surfer and the surf board have the highest confidence score in the modified case (not unreasonable since they are actually matches the objects). Why does this matter? Because to calculate the mAP score we first need to construct the precision-recall curve which is done based on sorting the boxes by their confidence score. In the related issues I go into more details how to do this, but we essentially just need to count if a box is a true-positive (matches a ground truth box within a certain area threshold) or a false-positives (not a match). The two around the surfer and surf board are true-positives, the rest is false-positives.

Matches	Cumulative TP	Cumulative FP	Precision	Recall
Yes	1	0	1/(1+0) = 1	1/2
Yes	2	0	2/(2+0) = 1	2/2 = 1
No	2	1	2/(2+1) = 2/3	2/2 = 1
No	2	2	2/(2+2) = 1/2	2/2 = 1
No	2	3	2/(2+3) = 2/5	2/2 = 1
No	2	4	2/(2+4) = 2/6 = 1/3	2/2 = 1
...	...	...	...	...

As we can see that regardless of how many "bad" false positive boxes we add after having correctly found the two correct ones, does not change the recall only the precision. If we plot this (sorry for drawing skills) it looks something like this:

Red points are the data from the table above and the blue line is what the actual curve that will be interpolated for the mAP value will probably look like. The realization here is that it does not matter how many false positives after having found the true-positives, the curve stays the same.

The implementation therefor works as intended.
Closing issue.

SkafteNicki added the working as intended label Oct 9, 2024

SkafteNicki closed this as completed Oct 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mean Average Precision producing incorrect values, maybe? #2746

Mean Average Precision producing incorrect values, maybe? #2746

Borda commented Sep 16, 2024

github-actions bot commented Sep 16, 2024

SkafteNicki commented Oct 9, 2024

Mean Average Precision producing incorrect values, maybe? #2746

Mean Average Precision producing incorrect values, maybe? #2746

Comments

Borda commented Sep 16, 2024

Discussed in #1695

github-actions bot commented Sep 16, 2024

SkafteNicki commented Oct 9, 2024