Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add imageArray #24577

Closed
wants to merge 16 commits into from
Closed

Add imageArray #24577

wants to merge 16 commits into from

Conversation

amyeroberts
Copy link
Collaborator

@amyeroberts amyeroberts commented Jun 29, 2023

What does this PR do?

Adds a new class ImageArray for use as part of the image processing pipeline. It acts as an array container, which we can use to store information about the image e.g. the data format. This is the recommended way to create 'array-like' numpy objects with persistent attributes: https://numpy.org/doc/stable/user/basics.dispatch.html

The intention is to enable users to explicitly set information about the image e.g. data_format and have that carried through the processing pipeline in a stateful way. This addresses issues where the input image(s) information is repeatedly inferred unnecessarily in functions, or when it's ambiguous e.g. image of shape (3, 3, 3). See:

Defining __array_ufunc__ and __array_function__ means ImageArray can have numpy operations e.g.

>>> from transformers.image_utils import ImageArray
>>> import numpy as np

>>> x = np.random.randint(0, 256, (2, 2, 3))
>>> img = ImageArray(x)
>>> img
ImageArray([[[ 20 232 120]
  [197 244 147]]

 [[ 47 241  95]
  [ 73 251 140]]], data_format=channels_last, num_channels=3, shape=(2, 2, 3))

# Standard array operations - multiplication, addition etc. are possible
>>> img * 2
ImageArray([[[ 40 464 240]
  [394 488 294]]

 [[ 94 482 190]
  [146 502 280]]], data_format=channels_last, num_channels=3, shape=(2, 2, 3))

>>> img + img
ImageArray([[[ 40 464 240]
  [394 488 294]]

 [[ 94 482 190]
  [146 502 280]]], data_format=channels_last, num_channels=3, shape=(2, 2, 3))

# Numpy functions and array methods can be used
>>> np.mean(img, axis=-1)
ImageArray([[124.         196.        ]
 [127.66666667 154.66666667]], data_format=none, num_channels=0, shape=(2, 2))

>>> img.mean(axis=-1)
ImageArray([[124.         196.        ]
 [127.66666667 154.66666667]], data_format=none, num_channels=0, shape=(2, 2))

# Supports slicing
>>> img[:, :, 1]
ImageObject([[232 244]
 [241 251]], data_format=none, num_channels=0, shape=(2, 2))

# Supports type casting
>>> img.astype(np.float32)
ImageObject([[[ 20. 232. 120.]
  [197. 244. 147.]]

 [[ 47. 241.  95.]
  [ 73. 251. 140.]]], data_format=channels_last, num_channels=3, shape=(2, 2, 3))

# Can be cast back as a numpy array
>>> np.array(img)
array([[[ 20, 232, 120],
        [197, 244, 147]],

       [[ 47, 241,  95],
        [ 73, 251, 140]]])

# Is a numpy array isinstance
>>> isinstance(img, np.ndarray)
True

🔪 🔪 🔪 Tricky bits 🔪 🔪 🔪

Although this enables the ImageArray to be used directly in existing numpy logic, it does create issues when interfacing between other frameworks like torch or PIL. The following operations fail:

PIL.Image.fromarray(img)
torch.from_numpy(img)

This is because these libraries directly access the underlying memory using python's buffer protocol. As far as I can tell, there is no direct way of exposing this on the Python side, and it would require writing c code to enable. This seems like overkill to me. The only case I know this to cause an issue, is in the pix2struct image processor which uses some torch specific logic (which ideally would be removed).

As image processor are almost exclusively used with direct calls i.e. image_processor(img, return_tensors="pt"), and the torch.tensor batch conversion still works, I don't expect this to cause many issues.

One way of getting this to work is to return numpy arrays when array methods are called:

  • np.mean(arr) would return an ImageArray, image_array.mean(...)` would return a numpy array.

Tbh, I wasn't able to completely figure out the interplay between this functionality as torch.from_numpy seems to be just calling C code.

Next steps

  • Adapt functionality in image_transforms to use the
  • Add some logic for array operations to remove repeatedly finding e.g. num_channels when resulting array is created

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

@amyeroberts amyeroberts requested a review from ydshieh July 3, 2023 13:46
@amyeroberts
Copy link
Collaborator Author

@ydshieh Adding you as a first reviewer as you're always able to give good SWE advice :) This is quite a big design decision, so will be following up with requests from reviews from others once we've ironed out details here, if that's OK.

@ydshieh
Copy link
Collaborator

ydshieh commented Jul 3, 2023

@amyeroberts Thanks for requesting me for the first review. I would like to hear a bit more from you despite not looking through the changes yet 🙏

  • So ImageObject will be only used inside the methods of image processor/processing, and the arguments/return values will remain as numpy array?

  • From the changes in image processor files, I don't see how this new class helps reducing/simplifying the logic of processing images. Am I missing anything ...?

@amyeroberts
Copy link
Collaborator Author

@ydshieh

So ImageObject will be only used inside the methods of image processor/processing, and the arguments/return values will remain as numpy array?

Mostly. In most cases, this won't be seen by users because the image processors are called with a framework specified e.g. return_tensors="pt"

If the user doesn't specify return_tensors, then the returned objects would be a list of ImageObject. Currently it return_tensors=None returns a list of numpy arrays.

I could make sure to return numpy arrays at the end of processing with something like:

images = [image.numpy() for image in images]

before passing to BatchFeature.

In the future. once the image transformers have been adapted to use the ImageObject attributes, then users will see ImageObject returned if they called methods directly, either on the image processor or from the transforms library:

from transformers.image_transforms import resize

# Returned resized_image is an ImageObject object
resized_image = image_processor.resize(image, size={"height": h, "width": w})
resized_image = resize(image, size={"height": h, "width": w})

From the changes in image processor files, I don't see how this new class helps reducing/simplifying the logic of processing images. Am I missing thing ...?

It doesn't yet. This is just introducing the object and replacing the current numpy arrays to ensure everything still works as-is. Part of the next steps is updating logic in e.g. image_transforms and in some of the array logic to simplifying things are reduce repeated calculations.

@ydshieh
Copy link
Collaborator

ydshieh commented Jul 3, 2023

OK Thank you.

In this PR, image argument of preprocess remains as numpy array, which is ✅ . The return values should keep as numpy array (if not returning tensor) for which you will update in this PR ✅ .

In the next PR, you will update the file src/transformers/image_transforms.py to use ImageObject ✅ .

The only thing I am a bit worried is if the input/output of methods in that file will be changed: Those are still considered as public methods (right? We should discuss this with @sgugger anyway.) and we should keep them still accepting numpy array input, and return numpy array if it is currently. This might cause the conversion between numpy array <--> ImageObject several times, for which I am not 100% sure if you would love.

However, this is a question for the next PR, not for this PR.

Copy link
Collaborator

@ydshieh ydshieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @amyeroberts . Other than the concern I mentioned (input/output - but that's a question for the next PR), I leave a few questions.

Also, it might be a good idea to benchmark if this new wrapper will slow down the array computation. (I don't think so - at least now much, but nice to measure)

src/transformers/image_utils.py Outdated Show resolved Hide resolved
src/transformers/image_utils.py Show resolved Hide resolved
src/transformers/image_utils.py Outdated Show resolved Hide resolved
Comment on lines 154 to 191
def __array_ufunc__(self, ufunc, method, *inputs: Iterable[Any], **kwargs: Mapping[str, Any]) -> Any:
if not all(isinstance(input, (np.ndarray, ImageObject) + np.ScalarType) for input in inputs):
return NotImplemented

scalars = ()
for input in inputs:
if isinstance(input, ImageObject):
scalars += (input._data,)
else:
scalars += (input,)
result = getattr(ufunc, method)(*scalars, **kwargs)
return _output_wrapper(result)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to consider both np.ndarray and ImageObject here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the input or the output or something else?

The input we get when collecting the scalars and the output we convert to ImageArray if the output is np.ndarray

@amyeroberts amyeroberts force-pushed the image-object branch 2 times, most recently from 6bac3f1 to 6cd731e Compare July 13, 2023 17:09
@amyeroberts amyeroberts changed the title Add image object Add imageArray Jul 18, 2023
@ydshieh
Copy link
Collaborator

ydshieh commented Jul 18, 2023

Hi @amyeroberts

Could you remind me the reason to remove ImageObject and only use ImageArray. I just need to refresh my memory, thank you 🙏

@amyeroberts
Copy link
Collaborator Author

@ydshieh Of course :) Sorry, I should have added some explanatory comments.

I actually just renamed ImageObject to ImageArray - the class hasn't been removed.

I did remove casting inputs to ImageObject / ImageArray in the image processors as it make the PR big and required tackling a few parts of the processing logic which I believe is out of scope.

Copy link
Collaborator

@ydshieh ydshieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few more comments, but in short:

_output_wrapper:

- probably a recursive transformation is required (to take care of the case where `result` is list/tuple of np.array)
- a shape checking to determine if we should call `ImageArray`
  • __init__: need shape guard

  • __getitem__: avoid failing if the fetched data is 1-D (related to _output_wrapper above)

Let me know your thoughts. It's a bit tedious, I know, as ImageArray can only handle certain shapes.

src/transformers/image_utils.py Outdated Show resolved Hide resolved
src/transformers/image_utils.py Show resolved Hide resolved
src/transformers/image_utils.py Outdated Show resolved Hide resolved

def __getattribute__(self, __name: str) -> Any:
if __name in ("_data",):
return super().__getattribute__(__name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is necessary, but would be nice to comment this particular situation: as in all other cases, the results will be ImageArray or methods return that type (whenever possible).

Copy link
Collaborator Author

@amyeroberts amyeroberts Jul 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed :) Added one in ac483ac, LMK if this comment is OK

src/transformers/image_utils.py Show resolved Hide resolved
src/transformers/image_utils.py Outdated Show resolved Hide resolved
"""
Casts `result` to an `ImageArray` if it is a NumPy array.
"""
return ImageArray(result) if isinstance(result, np.ndarray) else result
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(The following comment might be overkill)

If result is a tuple (as some return values from numpy operations), we should keep it as tuple but change its elmeents to ImageArray whenever possible (proper shape)

See example below.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One example I can find:

import numpy as np
from transformers.image_utils import ImageArray

np_data = np.ones(shape=(3, 224, 224))
img_array = ImageArray(np_data)

# A list of 3 `np.ndarray` of shape `(1, 224, 224)`
np.split(np_data, 3)

# A list of 3 `np.ndarray` of shape `(1, 224, 224)`
np.split(img_array , 3)

But should np.split(img_array , 3) return a list of 3 ImageArray?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above return a single value of type list. It's kind OK to keep it as numpy array. But for function/methods return multiple values (so a tuple), it seems make more sense to return the same structure (tuple) but with underlying elements being transformed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added logic so that it recursively calls _output_wrapper if the result is a list or tuple, and added splitting as an example in the tests a46bcca

src/transformers/image_utils.py Show resolved Hide resolved
Copy link
Collaborator

@ydshieh ydshieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the iteration! 🚀

@amyeroberts amyeroberts requested a review from sgugger July 19, 2023 16:29
Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have mixed feelings about this: it adds a lot of complexity to be able to save some additional metadata, and it seems like we will only be able to use it internally since it breaks the integration with PIL and torch. You know better than me if it's worth the extra complexity so I'll trust your judgement on that. If you think it's worth the addition, let's go for it!

@amyeroberts
Copy link
Collaborator Author

@sgugger Yes, I understand. Tbh, I'd rather not have this class. Originally I wanted just a wrapper around the array that could be passed along with the image instead of additional arguments everywhere in the processing methods and functions. Unfortunately, it's necessary to wrap the array like this to have the state persist with numpy array operations and not needing tonnes of extra handling code.

I'm going to quickly write up an alternative with passing arguments around and compare the two.

Copy link
Contributor

@rafaelpadilla rafaelpadilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work! 👏 💯

Just added an alternative solution that can make ImageArray object seen as numpy object to PIL.Image.fromarray. I still need to investigate more to understand torch.from_numpy deeper.


def __init__(
self,
data: Union["PIL.Image.Image", np.ndarray, "torch.Tensor", "tf.Tensor"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that ImageArray can also receive data an ImageArray object.
So, maybe:
data: Union["PIL.Image.Image", np.ndarray, "torch.Tensor", "tf.Tensor", "ImageArray"]
or

from __future__ import annotations
`data: Union["PIL.Image.Image", np.ndarray, "torch.Tensor", "tf.Tensor", ImageArray]`

self._height = height
self._width = width

def __getattribute__(self, __name: str) -> Any:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not very familiarized with __getattribute__. Sorry if this is a dummy question: Why using __getattribute__ to expose all variables outside the class, instead of exposing them individually with @property ?

By removing __get_attribute__, it is possible to make PIL.Image.fromarray(img_array) work using __array_interface__, with the following steps:

Step 1: Comment out __get_attribute__.

Step 2: include these methods in your ImageArray:

    @property        
    def __array_interface__(self):
        return {
                "version": 3,
                "shape": self._data.shape, 
                "typestr": self._data.dtype.str,
                "strides": self._data.strides, # Pillow takes anything, but not None
                "data": self._data.data,
                }
    
    def tobytes(self):
        return self._data.tobytes()

Explanation: Pillow's from_array(obj) method (here) makes usage of __array_interface__ (a dictionary) to convert obj to array without need to go down level (e.g. using C code or marshalling). So, creating the __array_interface__ as documented here we can make our object behave like a numpy array. 🤓

I'm not sure, but I believe the problem with __get_attribute__ is that when it calls _output_wrapper with the __array_interface__ dictionary, somehow it uses a different dictionary.

Step 3: Now you can pass an ImageArray object to PIL.Image.fromarray, like in the example below:

import requests
import PIL

# Mocking a pillow image
url = "https://images.pexels.com/photos/2869354/pexels-photo-2869354.jpeg"
im = PIL.Image.open(requests.get(url, stream=True).raw).convert('RGB')

# Transform pillow im to ImageArray
img_array = ImageArray(im)

# Transform ImageArray object back to pillow. The `img_array` object will be seen like a numpy for PIL.Image.
pillow = PIL.Image.fromarray(img_array)
pillow.save("test.jpg")  # ---> Just to visualize that the image is "reconstructed" back to PIL.Image object.

torch.from_numpy behaves differently. I need to investigate it further.

@amyeroberts amyeroberts mentioned this pull request Aug 16, 2023
5 tasks
@amyeroberts
Copy link
Collaborator Author

Closing as superseded for #25464, a bit uglier but simpler solution :)

@ydshieh @rafaelpadilla @sgugger Thank you for your extensive reviews and help on this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants