Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalize floating point cast #27249

Merged

Conversation

amyeroberts
Copy link
Collaborator

What does this PR do?

This is done if the input image isn't of floating type. Issues can occur when do_rescale=False is set in an image processor. When this happens, the image passed to the call is of type uint8 because of the type casting that happens in resize because of the PIL image library. As the mean and std values are cast to match the image dtype, this can cause NaNs and infs to appear in the normalized image, as the floating values being used to divide the image are now set to 0.

The reason the mean and std values are cast is because previously they were set as float32 by default. However, if the input image was of type float16, the normalization would result in the image being upcast to float32 too.

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Nov 2, 2023

The documentation is not available anymore as the PR was closed or merged.

Copy link
Contributor

@rafaelpadilla rafaelpadilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow! Nice finding! :)

Thank you for fixing it.

@@ -278,7 +278,7 @@ def test_resize(self):
self.assertEqual(resized_image.shape, (4, 30, 40))

def test_normalize(self):
image = np.random.randint(0, 256, (224, 224, 3)) / 255
image = np.random.randint(0, 256, (224, 224, 3)).astype(np.float32) / 255
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this place need to cast to fp32?
(we have a test case below for it IIUC.)
If we cast here, we no longer test with python float numbers, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was just a case of my laziness keeping it in float32, because otherwise it's float64 but it all works if I remove the cast!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I didn't realize this ...


# Test image with 4 channels is normalized correctly
image = np.random.randint(0, 256, (224, 224, 4)) / 255
mean = (0.5, 0.6, 0.7, 0.8)
std = (0.1, 0.2, 0.3, 0.4)
expected_image = (image - mean) / std
expected_image = (image.astype(np.float32) - mean) / std
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question

)

# Test float32 image input keeps float32 dtype
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I mean previously: we have the test case as above ..(?)

@ydshieh
Copy link
Collaborator

ydshieh commented Nov 3, 2023

as the floating values being used to divide the image are now set to 0.

could you explain this a bit more?

@amyeroberts
Copy link
Collaborator Author

as the floating values being used to divide the image are now set to 0.

could you explain this a bit more?

Sure! So for a lot of the image processors, the default normalization constants are floats e.g. (0.5, 0.5, 0.5). In the current normalization implementation, mean and std are cast to the dtype of the image. If the input image is e.g. int32, then the normalization constants are cast to int and would become e.g. (0, 0, 0). So when normalizing with img - mean / std, we can end up with division by zero errors

@ydshieh
Copy link
Collaborator

ydshieh commented Nov 9, 2023

as the floating values being used to divide the image are now set to 0.

could you explain this a bit more?

Sure! So for a lot of the image processors, the default normalization constants are floats e.g. (0.5, 0.5, 0.5). In the current normalization implementation, mean and std are cast to the dtype of the image. If the input image is e.g. int32, then the normalization constants are cast to int and would become e.g. (0, 0, 0). So when normalizing with img - mean / std, we can end up with division by zero errors

I see, thank you a lot for the detail.

Copy link
Collaborator

@ydshieh ydshieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

This is done if the input image isn't of floating type. Issues can occur when do_rescale=False is set in an image processor. When this happens, the image passed to the call is of type uint8 becuase of the type casting that happens in resize because of the PIL image library. As the mean and std values are cast to match the image dtype, this can cause NaNs and infs to appear in the normalized image, as the floating values being used to divide the image are now set to 0.

The reason the mean and std values are cast is because previously they were set as float32 by default. However, if the input image was of type float16, the normalization would result in the image being upcast to float32 too.
@amyeroberts amyeroberts merged commit ed115b3 into huggingface:main Nov 10, 2023
21 checks passed
@amyeroberts amyeroberts deleted the normalize-floating-point-cast branch November 10, 2023 15:35
@amyeroberts amyeroberts mentioned this pull request Nov 10, 2023
4 tasks
EduardoPach pushed a commit to EduardoPach/transformers that referenced this pull request Nov 19, 2023
* Normalize image - cast input images to float32.

This is done if the input image isn't of floating type. Issues can occur when do_rescale=False is set in an image processor. When this happens, the image passed to the call is of type uint8 becuase of the type casting that happens in resize because of the PIL image library. As the mean and std values are cast to match the image dtype, this can cause NaNs and infs to appear in the normalized image, as the floating values being used to divide the image are now set to 0.

The reason the mean and std values are cast is because previously they were set as float32 by default. However, if the input image was of type float16, the normalization would result in the image being upcast to float32 too.

* Add tests

* Remove float32 cast
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants