Add imageArray #24577

amyeroberts · 2023-06-29T14:27:41Z

What does this PR do?

Adds a new class ImageArray for use as part of the image processing pipeline. It acts as an array container, which we can use to store information about the image e.g. the data format. This is the recommended way to create 'array-like' numpy objects with persistent attributes: https://numpy.org/doc/stable/user/basics.dispatch.html

The intention is to enable users to explicitly set information about the image e.g. data_format and have that carried through the processing pipeline in a stateful way. This addresses issues where the input image(s) information is repeatedly inferred unnecessarily in functions, or when it's ambiguous e.g. image of shape (3, 3, 3). See:

Defining __array_ufunc__ and __array_function__ means ImageArray can have numpy operations e.g.

>>> from transformers.image_utils import ImageArray
>>> import numpy as np

>>> x = np.random.randint(0, 256, (2, 2, 3))
>>> img = ImageArray(x)
>>> img
ImageArray([[[ 20 232 120]
  [197 244 147]]

 [[ 47 241  95]
  [ 73 251 140]]], data_format=channels_last, num_channels=3, shape=(2, 2, 3))

# Standard array operations - multiplication, addition etc. are possible
>>> img * 2
ImageArray([[[ 40 464 240]
  [394 488 294]]

 [[ 94 482 190]
  [146 502 280]]], data_format=channels_last, num_channels=3, shape=(2, 2, 3))

>>> img + img
ImageArray([[[ 40 464 240]
  [394 488 294]]

 [[ 94 482 190]
  [146 502 280]]], data_format=channels_last, num_channels=3, shape=(2, 2, 3))

# Numpy functions and array methods can be used
>>> np.mean(img, axis=-1)
ImageArray([[124.         196.        ]
 [127.66666667 154.66666667]], data_format=none, num_channels=0, shape=(2, 2))

>>> img.mean(axis=-1)
ImageArray([[124.         196.        ]
 [127.66666667 154.66666667]], data_format=none, num_channels=0, shape=(2, 2))

# Supports slicing
>>> img[:, :, 1]
ImageObject([[232 244]
 [241 251]], data_format=none, num_channels=0, shape=(2, 2))

# Supports type casting
>>> img.astype(np.float32)
ImageObject([[[ 20. 232. 120.]
  [197. 244. 147.]]

 [[ 47. 241.  95.]
  [ 73. 251. 140.]]], data_format=channels_last, num_channels=3, shape=(2, 2, 3))

# Can be cast back as a numpy array
>>> np.array(img)
array([[[ 20, 232, 120],
        [197, 244, 147]],

       [[ 47, 241,  95],
        [ 73, 251, 140]]])

# Is a numpy array isinstance
>>> isinstance(img, np.ndarray)
True

🔪 🔪 🔪 Tricky bits 🔪 🔪 🔪

Although this enables the ImageArray to be used directly in existing numpy logic, it does create issues when interfacing between other frameworks like torch or PIL. The following operations fail:

PIL.Image.fromarray(img)
torch.from_numpy(img)

This is because these libraries directly access the underlying memory using python's buffer protocol. As far as I can tell, there is no direct way of exposing this on the Python side, and it would require writing c code to enable. This seems like overkill to me. The only case I know this to cause an issue, is in the pix2struct image processor which uses some torch specific logic (which ideally would be removed).

As image processor are almost exclusively used with direct calls i.e. image_processor(img, return_tensors="pt"), and the torch.tensor batch conversion still works, I don't expect this to cause many issues.

One way of getting this to work is to return numpy arrays when array methods are called:

np.mean(arr) would return an ImageArray, image_array.mean(...)` would return a numpy array.

Tbh, I wasn't able to completely figure out the interplay between this functionality as torch.from_numpy seems to be just calling C code.

Next steps

Adapt functionality in image_transforms to use the
Add some logic for array operations to remove repeatedly finding e.g. num_channels when resulting array is created

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

HuggingFaceDocBuilderDev · 2023-06-29T15:01:40Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

amyeroberts · 2023-07-03T13:47:23Z

@ydshieh Adding you as a first reviewer as you're always able to give good SWE advice :) This is quite a big design decision, so will be following up with requests from reviews from others once we've ironed out details here, if that's OK.

ydshieh · 2023-07-03T14:06:44Z

@amyeroberts Thanks for requesting me for the first review. I would like to hear a bit more from you despite not looking through the changes yet 🙏

So ImageObject will be only used inside the methods of image processor/processing, and the arguments/return values will remain as numpy array?
From the changes in image processor files, I don't see how this new class helps reducing/simplifying the logic of processing images. Am I missing anything ...?

amyeroberts · 2023-07-03T14:20:15Z

@ydshieh

So ImageObject will be only used inside the methods of image processor/processing, and the arguments/return values will remain as numpy array?

Mostly. In most cases, this won't be seen by users because the image processors are called with a framework specified e.g. return_tensors="pt"

If the user doesn't specify return_tensors, then the returned objects would be a list of ImageObject. Currently it return_tensors=None returns a list of numpy arrays.

I could make sure to return numpy arrays at the end of processing with something like:

images = [image.numpy() for image in images]

before passing to BatchFeature.

In the future. once the image transformers have been adapted to use the ImageObject attributes, then users will see ImageObject returned if they called methods directly, either on the image processor or from the transforms library:

from transformers.image_transforms import resize

# Returned resized_image is an ImageObject object
resized_image = image_processor.resize(image, size={"height": h, "width": w})
resized_image = resize(image, size={"height": h, "width": w})

From the changes in image processor files, I don't see how this new class helps reducing/simplifying the logic of processing images. Am I missing thing ...?

It doesn't yet. This is just introducing the object and replacing the current numpy arrays to ensure everything still works as-is. Part of the next steps is updating logic in e.g. image_transforms and in some of the array logic to simplifying things are reduce repeated calculations.

ydshieh · 2023-07-03T14:49:49Z

OK Thank you.

In this PR, image argument of preprocess remains as numpy array, which is ✅ . The return values should keep as numpy array (if not returning tensor) for which you will update in this PR ✅ .

In the next PR, you will update the file src/transformers/image_transforms.py to use ImageObject ✅ .

The only thing I am a bit worried is if the input/output of methods in that file will be changed: Those are still considered as public methods (right? We should discuss this with @sgugger anyway.) and we should keep them still accepting numpy array input, and return numpy array if it is currently. This might cause the conversion between numpy array <--> ImageObject several times, for which I am not 100% sure if you would love.

However, this is a question for the next PR, not for this PR.

ydshieh

Thank you @amyeroberts . Other than the concern I mentioned (input/output - but that's a question for the next PR), I leave a few questions.

Also, it might be a good idea to benchmark if this new wrapper will slow down the array computation. (I don't think so - at least now much, but nice to measure)

src/transformers/image_utils.py

ydshieh · 2023-07-03T15:18:56Z

src/transformers/image_utils.py

+    def __array_ufunc__(self, ufunc, method, *inputs: Iterable[Any], **kwargs: Mapping[str, Any]) -> Any:
+        if not all(isinstance(input, (np.ndarray, ImageObject) + np.ScalarType) for input in inputs):
+            return NotImplemented
+
+        scalars = ()
+        for input in inputs:
+            if isinstance(input, ImageObject):
+                scalars += (input._data,)
+            else:
+                scalars += (input,)
+        result = getattr(ufunc, method)(*scalars, **kwargs)
+        return _output_wrapper(result)


Do we need to consider both np.ndarray and ImageObject here?

For the input or the output or something else?

The input we get when collecting the scalars and the output we convert to ImageArray if the output is np.ndarray

torch.from_numpy can't accept the ImageObject. I suspect this is because the ImageObject bytes can't be viewed directly i.e. memoryview(img) is not possible

ydshieh · 2023-07-18T16:12:34Z

Hi @amyeroberts

Could you remind me the reason to remove ImageObject and only use ImageArray. I just need to refresh my memory, thank you 🙏

amyeroberts · 2023-07-18T17:10:43Z

@ydshieh Of course :) Sorry, I should have added some explanatory comments.

I actually just renamed ImageObject to ImageArray - the class hasn't been removed.

I did remove casting inputs to ImageObject / ImageArray in the image processors as it make the PR big and required tackling a few parts of the processing logic which I believe is out of scope.

ydshieh

I left a few more comments, but in short:

_output_wrapper:

- probably a recursive transformation is required (to take care of the case where `result` is list/tuple of np.array)
- a shape checking to determine if we should call `ImageArray`

__init__: need shape guard
__getitem__: avoid failing if the fetched data is 1-D (related to _output_wrapper above)

Let me know your thoughts. It's a bit tedious, I know, as ImageArray can only handle certain shapes.

src/transformers/image_utils.py

ydshieh · 2023-07-19T07:08:10Z

src/transformers/image_utils.py

+
+    def __getattribute__(self, __name: str) -> Any:
+        if __name in ("_data",):
+            return super().__getattribute__(__name)


This is necessary, but would be nice to comment this particular situation: as in all other cases, the results will be ImageArray or methods return that type (whenever possible).

Agreed :) Added one in ac483ac, LMK if this comment is OK

src/transformers/image_utils.py

ydshieh · 2023-07-19T08:23:08Z

src/transformers/image_utils.py

+        """
+        Casts `result` to an `ImageArray` if it is a NumPy array.
+        """
+        return ImageArray(result) if isinstance(result, np.ndarray) else result


(The following comment might be overkill)

If result is a tuple (as some return values from numpy operations), we should keep it as tuple but change its elmeents to ImageArray whenever possible (proper shape)

See example below.

One example I can find:

import numpy as np from transformers.image_utils import ImageArray np_data = np.ones(shape=(3, 224, 224)) img_array = ImageArray(np_data) # A list of 3 `np.ndarray` of shape `(1, 224, 224)` np.split(np_data, 3) # A list of 3 `np.ndarray` of shape `(1, 224, 224)` np.split(img_array , 3)

But should np.split(img_array , 3) return a list of 3 ImageArray?

The above return a single value of type list. It's kind OK to keep it as numpy array. But for function/methods return multiple values (so a tuple), it seems make more sense to return the same structure (tuple) but with underlying elements being transformed.

I've added logic so that it recursively calls _output_wrapper if the result is a list or tuple, and added splitting as an example in the tests a46bcca

src/transformers/image_utils.py

ydshieh

Thanks for the iteration! 🚀

sgugger

I have mixed feelings about this: it adds a lot of complexity to be able to save some additional metadata, and it seems like we will only be able to use it internally since it breaks the integration with PIL and torch. You know better than me if it's worth the extra complexity so I'll trust your judgement on that. If you think it's worth the addition, let's go for it!

amyeroberts · 2023-07-25T10:42:56Z

@sgugger Yes, I understand. Tbh, I'd rather not have this class. Originally I wanted just a wrapper around the array that could be passed along with the image instead of additional arguments everywhere in the processing methods and functions. Unfortunately, it's necessary to wrap the array like this to have the state persist with numpy array operations and not needing tonnes of extra handling code.

I'm going to quickly write up an alternative with passing arguments around and compare the two.

rafaelpadilla

Excellent work! 👏 💯

Just added an alternative solution that can make ImageArray object seen as numpy object to PIL.Image.fromarray. I still need to investigate more to understand torch.from_numpy deeper.

rafaelpadilla · 2023-07-27T18:30:43Z

src/transformers/image_utils.py

+
+    def __init__(
+        self,
+        data: Union["PIL.Image.Image", np.ndarray, "torch.Tensor", "tf.Tensor"],


I noticed that ImageArray can also receive data an ImageArray object.
So, maybe:
data: Union["PIL.Image.Image", np.ndarray, "torch.Tensor", "tf.Tensor", "ImageArray"]
or

from __future__ import annotations `data: Union["PIL.Image.Image", np.ndarray, "torch.Tensor", "tf.Tensor", ImageArray]`

rafaelpadilla · 2023-07-27T23:45:50Z

src/transformers/image_utils.py

+        self._height = height
+        self._width = width
+
+    def __getattribute__(self, __name: str) -> Any:


I'm not very familiarized with __getattribute__. Sorry if this is a dummy question: Why using __getattribute__ to expose all variables outside the class, instead of exposing them individually with @property ?

By removing __get_attribute__, it is possible to make PIL.Image.fromarray(img_array) work using __array_interface__, with the following steps:

Step 1: Comment out __get_attribute__.

Step 2: include these methods in your ImageArray:

@property def __array_interface__(self): return { "version": 3, "shape": self._data.shape, "typestr": self._data.dtype.str, "strides": self._data.strides, # Pillow takes anything, but not None "data": self._data.data, } def tobytes(self): return self._data.tobytes()

Explanation: Pillow's from_array(obj) method (here) makes usage of __array_interface__ (a dictionary) to convert obj to array without need to go down level (e.g. using C code or marshalling). So, creating the __array_interface__ as documented here we can make our object behave like a numpy array. 🤓

I'm not sure, but I believe the problem with __get_attribute__ is that when it calls _output_wrapper with the __array_interface__ dictionary, somehow it uses a different dictionary.

Step 3: Now you can pass an ImageArray object to PIL.Image.fromarray, like in the example below:

import requests import PIL # Mocking a pillow image url = "https://images.pexels.com/photos/2869354/pexels-photo-2869354.jpeg" im = PIL.Image.open(requests.get(url, stream=True).raw).convert('RGB') # Transform pillow im to ImageArray img_array = ImageArray(im) # Transform ImageArray object back to pillow. The `img_array` object will be seen like a numpy for PIL.Image. pillow = PIL.Image.fromarray(img_array) pillow.save("test.jpg") # ---> Just to visualize that the image is "reconstructed" back to PIL.Image object.

torch.from_numpy behaves differently. I need to investigate it further.

amyeroberts · 2023-08-16T16:58:04Z

Closing as superseded for #25464, a bit uglier but simpler solution :)

@ydshieh @rafaelpadilla @sgugger Thank you for your extensive reviews and help on this PR.

amyeroberts requested a review from ydshieh July 3, 2023 13:46

ydshieh reviewed Jul 3, 2023

View reviewed changes

amyeroberts force-pushed the image-object branch 2 times, most recently from 6bac3f1 to 6cd731e Compare July 13, 2023 17:09

amyeroberts added 11 commits July 18, 2023 15:23

Add image object

24b088d

Update test

2f8f3bc

Add tests

00f94a9

Resolve test

307d604

torch.from_numpy can't accept the ImageObject. I suspect this is because the ImageObject bytes can't be viewed directly i.e. memoryview(img) is not possible

Add docstrings

4e93b60

ImageObject -> ImageArray

3ec84fa

Add numpy() method

2a612fe

Remove __array__ method

977931b

Add tests for item assignment

522792b

Don't convert to image array yet in image processors

0a57acf

Update docstring

7523f9a

amyeroberts force-pushed the image-object branch from 6cd731e to 7523f9a Compare July 18, 2023 14:55

Fix up

e0039ff

amyeroberts changed the title ~~Add image object~~ Add imageArray Jul 18, 2023

ydshieh reviewed Jul 19, 2023

View reviewed changes

amyeroberts added 4 commits July 19, 2023 14:57

Only create ImageArray if valid shape and type

01cbfcb

Handle cases when lists/tuples of numpy arrays are returned

a46bcca

Update input type info

5b1fd20

Tidy up comments and add comments

ac483ac

ydshieh approved these changes Jul 19, 2023

View reviewed changes

amyeroberts requested a review from sgugger July 19, 2023 16:29

sgugger reviewed Jul 19, 2023

View reviewed changes

rafaelpadilla reviewed Jul 27, 2023

View reviewed changes

amyeroberts mentioned this pull request Aug 16, 2023

Input data format #25464

Merged

5 tasks

amyeroberts closed this Aug 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add imageArray #24577

Add imageArray #24577

amyeroberts commented Jun 29, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Jun 29, 2023

amyeroberts commented Jul 3, 2023

ydshieh commented Jul 3, 2023 •

edited

Loading

amyeroberts commented Jul 3, 2023

ydshieh commented Jul 3, 2023

ydshieh left a comment •

edited

Loading

ydshieh Jul 3, 2023

amyeroberts Jul 18, 2023

ydshieh commented Jul 18, 2023

amyeroberts commented Jul 18, 2023

ydshieh left a comment

ydshieh Jul 19, 2023

amyeroberts Jul 19, 2023 •

edited

Loading

ydshieh Jul 19, 2023

ydshieh Jul 19, 2023

ydshieh Jul 19, 2023

amyeroberts Jul 19, 2023

ydshieh left a comment

sgugger left a comment

amyeroberts commented Jul 25, 2023

rafaelpadilla left a comment

rafaelpadilla Jul 27, 2023

rafaelpadilla Jul 27, 2023

amyeroberts commented Aug 16, 2023

Add imageArray #24577

Add imageArray #24577

Conversation

amyeroberts commented Jun 29, 2023 • edited Loading

What does this PR do?

🔪 🔪 🔪 Tricky bits 🔪 🔪 🔪

Next steps

Before submitting

HuggingFaceDocBuilderDev commented Jun 29, 2023

amyeroberts commented Jul 3, 2023

ydshieh commented Jul 3, 2023 • edited Loading

amyeroberts commented Jul 3, 2023

ydshieh commented Jul 3, 2023

ydshieh left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ydshieh commented Jul 18, 2023

amyeroberts commented Jul 18, 2023

ydshieh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amyeroberts Jul 19, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ydshieh left a comment

Choose a reason for hiding this comment

sgugger left a comment

Choose a reason for hiding this comment

amyeroberts commented Jul 25, 2023

rafaelpadilla left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amyeroberts commented Aug 16, 2023

amyeroberts commented Jun 29, 2023 •

edited

Loading

ydshieh commented Jul 3, 2023 •

edited

Loading

ydshieh left a comment •

edited

Loading

amyeroberts Jul 19, 2023 •

edited

Loading