Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DETR] Update the processing to adapt masks & bboxes to reflect padding #28363

Merged
merged 7 commits into from
Feb 13, 2024

Conversation

amyeroberts
Copy link
Collaborator

@amyeroberts amyeroberts commented Jan 5, 2024

What does this PR do?

Fixes an issue with the processing of batches of images for DETR and DETR related models.

Previously, the annotations for the models - specifically the masks and bboxes, wouldn't be updated to account for the new image sizes if padding occurred. This PR pads the masks to match their corresponding masks.

The following images show the processing behaviour for annotations when there are two images in the batch, of different sizes.

  • Each image is resized so that it's shortest edge is 800, and the other edge is resized to maintain the aspect ratio. If the resulting longest edge is longer than 1333, then the longest edge is resized to 1333 and the shortest edge resized to maintain the aspect ratio.
  • The images are padded to the largest height and width dimensions in the batch
  • The masks are padded to match their respective image's padding.
  • The bounding box values are readjusted to account for the padded image size

Fixes #28153

Bounding boxes

In the previous processing logic, there were two possible scenarios:

  • If do_normalize=False then no action is needed. The output bboxes are not in relative format.
  • If do_normalize=True the relative coordinates of the bbox needs to be updated to account for the additional pixels when padding.

This PR:

  • Adds a new argument do_convert_annotations which enables the user to control whether the bboxes are converted independent of do_normalize. This is useful because 1) the normalization of bounding boxes is independent of the pixel values 2) The current normalize_annotations logic both normalizes AND converts to a different bbox format ((x0, y0, x1, y1) -> (center_x, center_y, width, height))
  • Conditionally updates the bounding boxes wrt the padded images only if do_convert_annotations=True. If do_convert_annotations=False this isn't necessary.

Here we see the input and output images, and their bbox annotations.

On main:
annotations_original

On this branch:
annotations_fixed

Masks

Masks are updated so they have the same padding as their corresponding image.

Here are the input and output images and masks:

On main:
masks_0_original

On this branch:
masks_0_fixed

On main:
masks_1_original

On this branch:
masks_1_fixed

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@amyeroberts amyeroberts changed the title Update the processing so bbox coords are adjusted for padding [DETR] Update the processing to adapt masks to reflect padding Feb 8, 2024
@amyeroberts amyeroberts marked this pull request as ready for review February 8, 2024 21:57
@amyeroberts amyeroberts changed the title [DETR] Update the processing to adapt masks to reflect padding [DETR] Update the processing to adapt masks & bboxes to reflect padding Feb 9, 2024
Copy link
Collaborator

@ydshieh ydshieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @amyeroberts , LGTM.

I would suggest to make the docstring in normalize_annotation more precisely: i.e. mention it also change the coordinates to relative one.

    def normalize_annotation(self, annotation: Dict, image_size: Tuple[int, int]) -> Dict:
        """
        Normalize the boxes in the annotation from `[top_left_x, top_left_y, bottom_right_x, bottom_right_y]` to
        `[center_x, center_y, width, height]` format.
        """
        return normalize_annotation(annotation, image_size=image_size)

And also make the fact that padding is done in bottom-right explicitly.

Both of these could be done in follow up PRs.

@amyeroberts amyeroberts merged commit bd4b83e into huggingface:main Feb 13, 2024
17 of 18 checks passed
@amyeroberts amyeroberts deleted the fix-detr-bbox-processing branch February 13, 2024 18:27
sbucaille pushed a commit to sbucaille/transformers that referenced this pull request Feb 14, 2024
…ding (huggingface#28363)

* Update the processing so bbox coords are adjusted for padding

* Just pad masks

* Tidy up, add tests

* Better tests

* Fix yolos and mark as slow for pycocotols

* Fix yolos - return_tensors

* Clarify padding and normalization behaviour
itazap pushed a commit that referenced this pull request May 14, 2024
…ding (#28363)

* Update the processing so bbox coords are adjusted for padding

* Just pad masks

* Tidy up, add tests

* Better tests

* Fix yolos and mark as slow for pycocotols

* Fix yolos - return_tensors

* Clarify padding and normalization behaviour
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Annotations not being transformed after padding on Deformable DETR preprocessing
3 participants