Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mean Average Precision producing incorrect values, maybe? #2746

Closed
Borda opened this issue Sep 16, 2024 Discussed in #1695 · 2 comments
Closed

Mean Average Precision producing incorrect values, maybe? #2746

Borda opened this issue Sep 16, 2024 Discussed in #1695 · 2 comments

Comments

@Borda
Copy link
Member

Borda commented Sep 16, 2024

Discussed in #1695

Originally posted by dominic-simon April 6, 2023
Hello,

I'm trying to find the Mean Average Precision (mAP) over a set of images from the MSCOCO dataset that have been modified to cause the object detector in use (Faster-RCNN, in this case) to predict objects that don't exist. To be clear, the objects that don't exist are objects that generally have very little to no overlap with any ground truth objects.

My issue is that when I compare the original image mAP to the modified image mAP, they are the exact same, which does not seem correct. Below are what the images look like:
Ground Truths
14_gt
Original Image Predictions
14_benign
Modified Image Predictions
14_adv

To the best of my knowledge, these two images should have two different mAPs, with the original image having a much higher mAP than the modified image since the modified image has many False Positive predictions (all the boxes to the left of the man on the surfboard). However, that is not the case, with both images receiving a mAP of 0.85.

Can someone confirm that I am correct in thinking that these two images should receive different mAPs? If I'm wrong, then perhaps an explanation of where I'm going wrong would be useful? I'm not sure what I'm misunderstanding here, so any help is appreciated.

Here's my code, in case I'm doing something wrong there:

import torch
import torchvision.transforms as transforms
import torchvision.models.detection as models
from torchvision.datasets import CocoDetection
from torchmetrics.detection.mean_ap import MeanAveragePrecision

import os
import numpy as np
from PIL import Image
from tqdm import tqdm
import matplotlib.pyplot as plt

PATCH_SIZE = 100
FOLDER_PATH = f'./SAC_PGD/{PATCH_SIZE}'
DEVICE = 'cuda'

model = models.fasterrcnn_resnet50_fpn_v2(weights='FasterRCNN_ResNet50_FPN_V2_Weights.COCO_V1').eval().to(DEVICE)

mAP = MeanAveragePrecision()

ms_coco = CocoDetection('../datasets/MSCOCO/val2017', '../datasets/MSCOCO/annotations/instances_val2017.json')

for i in tqdm(range(14, 15)):
    # Reformatting the mscoco ground truths
    b = []
    l = []
    s = []
    for j in range(len(ms_coco[i][1])):
        bboxes = ms_coco[i][1][j]['bbox']
        corrected = [bboxes[0], bboxes[1], bboxes[0]+bboxes[2], bboxes[1]+bboxes[3]]
        b.append(corrected)
        l.append(ms_coco[i][1][j]['category_id'])
        s.append(1.0)
    gt = [{'boxes':torch.from_numpy(np.array(b)).to(DEVICE),
           'labels':torch.from_numpy(np.array(l)).to(DEVICE),
           'scores':torch.from_numpy(np.array(s)).to(DEVICE)}]

   # Open the original image 
    original = (np.asarray(ms_coco[i][0]) / 255).astype(np.float32)
    row, col = original.shape[0], original.shape[1]

    to_tensor = transforms.Compose([transforms.Resize((row, col)),
                                    transforms.ToTensor()])

   # Open the modified image and a mask showing where the modification is
    image_path = os.path.join(FOLDER_PATH, f'{i}_adv.png')
    mask_path = os.path.join(FOLDER_PATH, f'{i}_mask.png')
    image = Image.open(image_path).convert('RGB')
    mask = Image.open(mask_path).convert('L')
    image = np.transpose(to_tensor(image).numpy(), (1, 2, 0))
    mask = np.stack([to_tensor(mask).numpy()] * 3, axis=3)[0]

    # I was having some issues with data. This solves that 
    image = np.where(mask == 1, original, image)

   # Reformat modified image for use in pytorch Faster-RCNN
    image = torch.from_numpy(np.transpose(image, (2, 0 ,1))).to(DEVICE)

   # Get modified and original image predictions
    with torch.no_grad():
        output = model([image])
        b_out = model([torch.from_numpy(np.transpose(original, (2, 0 ,1))).to(DEVICE)])

   # Draw the bounding boxes onto the images
    from PIL import ImageDraw
    to_pil = transforms.ToPILImage()
    original = to_pil((original * 255).astype(np.uint8))
    attacked_image = to_pil((image * 255).type(torch.uint8))
    drawer = ImageDraw.Draw(original)
    drawer_copy = ImageDraw.Draw(attacked_image)
    bboxes = output[0]['boxes']
    bbboxes = b_out[0]['boxes']
    for box in bboxes:
        drawer_copy.rectangle([box[0], box[1], box[2], box[3]], outline=(255, 0, 0), width=3)
    #for box in ms_coco[i][1]:
    for box in bbboxes:
        #box = box['bbox']
        drawer.rectangle([box[0], box[1], box[2], box[3]], outline=(0, 255, 0), width=3)
    original.save(f'./{i}_benign.png')
    attacked_image.save(f'./{i}_adv.png')

   # Update the predictions used in the mAP. 
   # I switch b_out and output to test the original and modified images mAP, respectively. 
    mAP.update(b_out, gt)

# Compute and show the final mAP.
total_mAP = mAP.compute()
print(total_mAP['map'])
```</div>
Copy link

Hi! thanks for your contribution!, great first issue!

@SkafteNicki
Copy link
Member

See related issues: #1966, #1793, #1774

TLDR: the MAP metric is extremely tricky to get an intuition about and this mostly boils down to how the precision-recall curve is calculated and how it is interpolated. It should never ever be evaluated on only a few samples because the computations will varying a lot.

To calculate the MAP in this case I am going to assume one thing and that is that the two boxes around the surfer and the surf board have the highest confidence score in the modified case (not unreasonable since they are actually matches the objects). Why does this matter? Because to calculate the mAP score we first need to construct the precision-recall curve which is done based on sorting the boxes by their confidence score. In the related issues I go into more details how to do this, but we essentially just need to count if a box is a true-positive (matches a ground truth box within a certain area threshold) or a false-positives (not a match). The two around the surfer and surf board are true-positives, the rest is false-positives.

Matches Cumulative TP Cumulative FP Precision Recall
Yes 1 0 1/(1+0) = 1 1/2
Yes 2 0 2/(2+0) = 1 2/2 = 1
No 2 1 2/(2+1) = 2/3 2/2 = 1
No 2 2 2/(2+2) = 1/2 2/2 = 1
No 2 3 2/(2+3) = 2/5 2/2 = 1
No 2 4 2/(2+4) = 2/6 = 1/3 2/2 = 1
... ... ... ... ...

As we can see that regardless of how many "bad" false positive boxes we add after having correctly found the two correct ones, does not change the recall only the precision. If we plot this (sorry for drawing skills) it looks something like this:
image
Red points are the data from the table above and the blue line is what the actual curve that will be interpolated for the mAP value will probably look like. The realization here is that it does not matter how many false positives after having found the true-positives, the curve stays the same.

The implementation therefor works as intended.
Closing issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants