-
Notifications
You must be signed in to change notification settings - Fork 27k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add imageArray #24577
Add imageArray #24577
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
@ydshieh Adding you as a first reviewer as you're always able to give good SWE advice :) This is quite a big design decision, so will be following up with requests from reviews from others once we've ironed out details here, if that's OK. |
@amyeroberts Thanks for requesting me for the first review. I would like to hear a bit more from you despite not looking through the changes yet 🙏
|
Mostly. In most cases, this won't be seen by users because the image processors are called with a framework specified e.g. If the user doesn't specify I could make sure to return numpy arrays at the end of processing with something like: images = [image.numpy() for image in images] before passing to In the future. once the image transformers have been adapted to use the from transformers.image_transforms import resize
# Returned resized_image is an ImageObject object
resized_image = image_processor.resize(image, size={"height": h, "width": w})
resized_image = resize(image, size={"height": h, "width": w})
It doesn't yet. This is just introducing the object and replacing the current numpy arrays to ensure everything still works as-is. Part of the next steps is updating logic in e.g. |
OK Thank you. In this PR, In the next PR, you will update the file The only thing I am a bit worried is if the input/output of methods in that file will be changed: Those are still considered as public methods (right? We should discuss this with @sgugger anyway.) and we should keep them still accepting numpy array input, and return numpy array if it is currently. This might cause the conversion between numpy array <--> ImageObject several times, for which I am not 100% sure if you would love. However, this is a question for the next PR, not for this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @amyeroberts . Other than the concern I mentioned (input/output - but that's a question for the next PR), I leave a few questions.
Also, it might be a good idea to benchmark if this new wrapper will slow down the array computation. (I don't think so - at least now much, but nice to measure)
src/transformers/image_utils.py
Outdated
def __array_ufunc__(self, ufunc, method, *inputs: Iterable[Any], **kwargs: Mapping[str, Any]) -> Any: | ||
if not all(isinstance(input, (np.ndarray, ImageObject) + np.ScalarType) for input in inputs): | ||
return NotImplemented | ||
|
||
scalars = () | ||
for input in inputs: | ||
if isinstance(input, ImageObject): | ||
scalars += (input._data,) | ||
else: | ||
scalars += (input,) | ||
result = getattr(ufunc, method)(*scalars, **kwargs) | ||
return _output_wrapper(result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to consider both np.ndarray
and ImageObject
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the input or the output or something else?
The input we get when collecting the scalars
and the output we convert to ImageArray
if the output is np.ndarray
6bac3f1
to
6cd731e
Compare
torch.from_numpy can't accept the ImageObject. I suspect this is because the ImageObject bytes can't be viewed directly i.e. memoryview(img) is not possible
6cd731e
to
7523f9a
Compare
Hi @amyeroberts Could you remind me the reason to remove |
@ydshieh Of course :) Sorry, I should have added some explanatory comments. I actually just renamed I did remove casting inputs to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few more comments, but in short:
_output_wrapper
:
- probably a recursive transformation is required (to take care of the case where `result` is list/tuple of np.array)
- a shape checking to determine if we should call `ImageArray`
-
__init__
: need shape guard -
__getitem__
: avoid failing if the fetched data is 1-D (related to_output_wrapper
above)
Let me know your thoughts. It's a bit tedious, I know, as ImageArray
can only handle certain shapes.
|
||
def __getattribute__(self, __name: str) -> Any: | ||
if __name in ("_data",): | ||
return super().__getattribute__(__name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is necessary, but would be nice to comment this particular situation: as in all other cases, the results will be ImageArray
or methods return that type (whenever possible).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed :) Added one in ac483ac, LMK if this comment is OK
src/transformers/image_utils.py
Outdated
""" | ||
Casts `result` to an `ImageArray` if it is a NumPy array. | ||
""" | ||
return ImageArray(result) if isinstance(result, np.ndarray) else result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(The following comment might be overkill)
If result
is a tuple (as some return values from numpy operations), we should keep it as tuple but change its elmeents to ImageArray
whenever possible (proper shape)
See example below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One example I can find:
import numpy as np
from transformers.image_utils import ImageArray
np_data = np.ones(shape=(3, 224, 224))
img_array = ImageArray(np_data)
# A list of 3 `np.ndarray` of shape `(1, 224, 224)`
np.split(np_data, 3)
# A list of 3 `np.ndarray` of shape `(1, 224, 224)`
np.split(img_array , 3)
But should np.split(img_array , 3)
return a list of 3 ImageArray
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above return a single value of type list. It's kind OK to keep it as numpy array. But for function/methods return multiple values (so a tuple), it seems make more sense to return the same structure (tuple) but with underlying elements being transformed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added logic so that it recursively calls _output_wrapper if the result is a list or tuple, and added splitting as an example in the tests a46bcca
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the iteration! 🚀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have mixed feelings about this: it adds a lot of complexity to be able to save some additional metadata, and it seems like we will only be able to use it internally since it breaks the integration with PIL and torch. You know better than me if it's worth the extra complexity so I'll trust your judgement on that. If you think it's worth the addition, let's go for it!
@sgugger Yes, I understand. Tbh, I'd rather not have this class. Originally I wanted just a wrapper around the array that could be passed along with the image instead of additional arguments everywhere in the processing methods and functions. Unfortunately, it's necessary to wrap the array like this to have the state persist with numpy array operations and not needing tonnes of extra handling code. I'm going to quickly write up an alternative with passing arguments around and compare the two. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent work! 👏 💯
Just added an alternative solution that can make ImageArray
object seen as numpy object to PIL.Image.fromarray
. I still need to investigate more to understand torch.from_numpy
deeper.
|
||
def __init__( | ||
self, | ||
data: Union["PIL.Image.Image", np.ndarray, "torch.Tensor", "tf.Tensor"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that ImageArray
can also receive data
an ImageArray
object.
So, maybe:
data: Union["PIL.Image.Image", np.ndarray, "torch.Tensor", "tf.Tensor", "ImageArray"]
or
from __future__ import annotations
`data: Union["PIL.Image.Image", np.ndarray, "torch.Tensor", "tf.Tensor", ImageArray]`
self._height = height | ||
self._width = width | ||
|
||
def __getattribute__(self, __name: str) -> Any: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not very familiarized with __getattribute__
. Sorry if this is a dummy question: Why using __getattribute__
to expose all variables outside the class, instead of exposing them individually with @property
?
By removing __get_attribute__
, it is possible to make PIL.Image.fromarray(img_array)
work using __array_interface__
, with the following steps:
Step 1: Comment out __get_attribute__
.
Step 2: include these methods in your ImageArray
:
@property
def __array_interface__(self):
return {
"version": 3,
"shape": self._data.shape,
"typestr": self._data.dtype.str,
"strides": self._data.strides, # Pillow takes anything, but not None
"data": self._data.data,
}
def tobytes(self):
return self._data.tobytes()
Explanation: Pillow's from_array(obj)
method (here) makes usage of __array_interface__
(a dictionary) to convert obj
to array without need to go down level (e.g. using C code or marshalling). So, creating the __array_interface__
as documented here we can make our object behave like a numpy array. 🤓
I'm not sure, but I believe the problem with __get_attribute__
is that when it calls _output_wrapper
with the __array_interface__
dictionary, somehow it uses a different dictionary.
Step 3: Now you can pass an ImageArray
object to PIL.Image.fromarray
, like in the example below:
import requests
import PIL
# Mocking a pillow image
url = "https://images.pexels.com/photos/2869354/pexels-photo-2869354.jpeg"
im = PIL.Image.open(requests.get(url, stream=True).raw).convert('RGB')
# Transform pillow im to ImageArray
img_array = ImageArray(im)
# Transform ImageArray object back to pillow. The `img_array` object will be seen like a numpy for PIL.Image.
pillow = PIL.Image.fromarray(img_array)
pillow.save("test.jpg") # ---> Just to visualize that the image is "reconstructed" back to PIL.Image object.
torch.from_numpy
behaves differently. I need to investigate it further.
Closing as superseded for #25464, a bit uglier but simpler solution :) @ydshieh @rafaelpadilla @sgugger Thank you for your extensive reviews and help on this PR. |
What does this PR do?
Adds a new class
ImageArray
for use as part of the image processing pipeline. It acts as an array container, which we can use to store information about the image e.g. the data format. This is the recommended way to create 'array-like' numpy objects with persistent attributes: https://numpy.org/doc/stable/user/basics.dispatch.htmlThe intention is to enable users to explicitly set information about the image e.g. data_format and have that carried through the processing pipeline in a stateful way. This addresses issues where the input image(s) information is repeatedly inferred unnecessarily in functions, or when it's ambiguous e.g. image of shape
(3, 3, 3)
. See:Defining
__array_ufunc__
and__array_function__
meansImageArray
can have numpy operations e.g.🔪 🔪 🔪 Tricky bits 🔪 🔪 🔪
Although this enables the ImageArray to be used directly in existing numpy logic, it does create issues when interfacing between other frameworks like
torch
orPIL
. The following operations fail:This is because these libraries directly access the underlying memory using python's buffer protocol. As far as I can tell, there is no direct way of exposing this on the Python side, and it would require writing c code to enable. This seems like overkill to me. The only case I know this to cause an issue, is in the pix2struct image processor which uses some torch specific logic (which ideally would be removed).
As image processor are almost exclusively used with direct calls i.e.
image_processor(img, return_tensors="pt")
, and the torch.tensor batch conversion still works, I don't expect this to cause many issues.One way of getting this to work is to return numpy arrays when array methods are called:
np.mean(arr) would return an ImageArray,
image_array.mean(...)` would return a numpy array.Tbh, I wasn't able to completely figure out the interplay between this functionality as
torch.from_numpy
seems to be just calling C code.Next steps
image_transforms
to use thenum_channels
when resulting array is createdBefore submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.