feat(python): design an image extension type #1272

rok · 2023-09-13T02:13:16Z

See #1199.

changhiskhan

is it possible to have a super type interface? e.g., for higher level applications, they may only care about showing an image, or getting a numpy array, and not necessarily care whether the underlying image is represented as string uri, tensor, or bytes.
can we add some methods to convert from one extension type to another?

(Totally fine if the answer is we'll address it later, let's focus on delivering the immediate use case ofc).

jrabary · 2023-09-14T12:08:59Z

Having the ability to decode 16 bits images would be great (for example depth map saved in a png file)

wjones127

Could we get docstrings for each of these methods? Preferably with doctest-verified examples.

python/python/lance/arrow.py

rok · 2023-09-19T17:35:23Z

@wjones127 I assume we want to stay in arrow as much as possible, but encoder/encoder and TF bridge use numpy here.
Do you have an idea how to avoid numpy at the arrow <-> numpy bridge?

wjones127 · 2023-09-19T18:26:15Z

I assume we want to stay in arrow as much as possible, but encoder/encoder and TF bridge use numpy here.
Do you have an idea how to avoid numpy at the arrow <-> numpy bridge?

I think there are two concerns:

It would be nice to not require numpy as a dependency.
We should make sure if we aren't having to make data copies unnecessarily.

If this is all internal, I think it's fine to accept this cost for now. We might just have to wait for the ecosystem to get more Arrow native before we can drop numpy as dependency for this.

rok · 2023-09-20T04:36:44Z

If this is all internal, I think it's fine to accept this cost for now. We might just have to wait for the ecosystem to get more Arrow native before we can drop numpy as dependency for this.

Indeed. Let's keep an eye out for more arrow native ways to move images to tensors while we wait.

Could we get docstrings for each of these methods? Preferably with doctest-verified examples.

Added.

rok · 2023-09-20T04:44:38Z

is it possible to have a super type interface? e.g., for higher level applications, they may only care about showing an image, or getting a numpy array, and not necessarily care whether the underlying image is represented as string uri, tensor, or bytes.

I think we should add this now. I'd only propose we do it as a separate PR to manage scope here.

can we add some methods to convert from one extension type to another?

We now have from_uris, read_uris, image_to_tensor, to_tf and to_encoded. Are we missing something we really need at this point? I suppose we are missing pytorch but I'm not sure it's being requested.

rok · 2023-09-20T04:45:18Z

@changhiskhan

is it possible to have a super type interface? e.g., for higher level applications, they may only care about showing an image, or getting a numpy array, and not necessarily care whether the underlying image is represented as string uri, tensor, or bytes.

I think we should add this now. I'd only propose we do it as a separate PR to manage scope here.

can we add some methods to convert from one extension type to another?

We now have from_uris, read_uris, image_to_tensor, to_tf and to_encoded. Are we missing something we really need at this point? I suppose we are missing pytorch but I'm not sure it's being requested.

wjones127

I think most important change is that we add a roundtrip test to Lance.

python/python/lance/arrow.py

westonpace

Very cool, only a few thoughts

python/python/lance/arrow.py

westonpace · 2023-09-20T16:00:56Z

python/python/lance/arrow.py

+
+    def image_to_tensor(self, decoder=None):
+        """
+        Decode encoded images and return a FixedShapeImageTensorArray


Hmm...if I'm a user I might wonder what the shape of the resulting array would be. It looks like, for both tensorflow and pillow, you get a 3d array [height, width, channel]. Do you think we should document and/or mandate this?

The annoying part here is that PNG comes with an extra transparency channel etc. Since we're offering a custom decoder option it'd be awkward to enforce output shapes here.
I'm not sure what a good user experience here. Maybe we could get some user feedback?

rok · 2023-09-25T16:04:56Z

FixedShapeTensorType can't be subclassed at the moment, so we need to consider other options for FixedShapeImageTensor. Here are some ideas:

Use vanilla FixedShapeTensorType and hence FixedShapeTensorArray. Downside is that we can't add methods to these, but we could instead provide a utility module for images. This makes image types (uri, encoded, tensor) less uniform, but we're probably adding a universal image class on top of them anyway.
Wait for arrow to add a way to subclass canonical extensions to add ImageTensor type.
Create our own extension class that is a copy of FixedShapeTensorType. This would work immediately and make API nicer. We would miss some methods FixedShapeTensorArray implements in C++, but we could still use those with on the fly casting.

In related news VariableShapeTensorType proposal will go to vote soon.

python/python/lance/arrow.py

Co-authored-by: Will Jones <[email protected]>

rok · 2023-10-09T21:12:13Z

@wjones127 can we merge this?

changhiskhan reviewed Sep 13, 2023

View reviewed changes

rok force-pushed the 1199 branch 5 times, most recently from 220b2eb to 353e4ef Compare September 19, 2023 13:53

rok marked this pull request as ready for review September 19, 2023 14:13

wjones127 requested changes Sep 19, 2023

View reviewed changes

wjones127 changed the title ~~feat(rust, python): design an image extension type~~ feat(python): design an image extension type Sep 19, 2023

rok force-pushed the 1199 branch from 145d7c1 to 1975601 Compare September 19, 2023 17:42

rok force-pushed the 1199 branch 4 times, most recently from 709698c to 99d73d1 Compare September 20, 2023 04:25

rok requested a review from wjones127 September 20, 2023 04:36

rok force-pushed the 1199 branch 3 times, most recently from 5ef6130 to 5cf5a23 Compare September 20, 2023 15:18

wjones127 requested changes Sep 20, 2023

View reviewed changes

westonpace reviewed Sep 20, 2023

View reviewed changes

rok force-pushed the 1199 branch from 30f2161 to 0894a77 Compare September 20, 2023 23:00

eddyxu reviewed Sep 25, 2023

View reviewed changes

python/python/lance/arrow.py Outdated Show resolved Hide resolved

rok force-pushed the 1199 branch from 778237d to faff155 Compare September 26, 2023 11:27

rok force-pushed the 1199 branch from bd9b32e to ea23aec Compare October 3, 2023 13:15

rok commented Oct 3, 2023

View reviewed changes

python/python/lance/arrow.py Outdated Show resolved Hide resolved

rok requested a review from wjones127 October 3, 2023 22:18

rok and others added 22 commits October 9, 2023 22:28

Basic classes

3ad1f81

Adding some functionality

f34c8a3

Add tensor to image encoding

d5bb326

Review feedback

e5fd5cc

Review feedback

567da04

Review feedback

28195a7

Docs

b92a8b4

Update python/python/lance/arrow.py

e5fee40

Co-authored-by: Will Jones <[email protected]>

Doc changes, lance roundtrip test

5b8ddd8

Minor changes

143f17c

Review feedback

13ff8e4

Minor changes

64d539e

Minor changes

79e0c2e

Change __repr__

24e7d30

Add ImageArray.from_array

e310fc2

Apply suggestions from code review

ad74659

Co-authored-by: Will Jones <[email protected]>

Review feedback

b274105

Add download to ImageURIArray.from_uris

21a6b35

Minor change

b8b066a

EncodedImageArray __repr__ displays image metadata

3e83e5e

Local filepaths should have file:// prefix

906ca0d

Let pyarrow figure it out

476f3c5

rok force-pushed the 1199 branch from c967e6a to 476f3c5 Compare October 9, 2023 20:28

wjones127 approved these changes Oct 10, 2023

View reviewed changes

wjones127 merged commit 8f78332 into lancedb:main Oct 10, 2023
10 checks passed

rok mentioned this pull request Oct 11, 2023

[Python][Rust] Design a Image extension type #1199

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(python): design an image extension type #1272

feat(python): design an image extension type #1272

rok commented Sep 13, 2023

changhiskhan left a comment •

edited

Loading

jrabary commented Sep 14, 2023

wjones127 left a comment

rok commented Sep 19, 2023

wjones127 commented Sep 19, 2023

rok commented Sep 20, 2023

rok commented Sep 20, 2023

rok commented Sep 20, 2023

wjones127 left a comment

westonpace left a comment

westonpace Sep 20, 2023

rok Sep 20, 2023

rok commented Sep 25, 2023

rok commented Oct 9, 2023

feat(python): design an image extension type #1272

feat(python): design an image extension type #1272

Conversation

rok commented Sep 13, 2023

changhiskhan left a comment • edited Loading

Choose a reason for hiding this comment

jrabary commented Sep 14, 2023

wjones127 left a comment

Choose a reason for hiding this comment

rok commented Sep 19, 2023

wjones127 commented Sep 19, 2023

rok commented Sep 20, 2023

rok commented Sep 20, 2023

rok commented Sep 20, 2023

wjones127 left a comment

Choose a reason for hiding this comment

westonpace left a comment

Choose a reason for hiding this comment

westonpace Sep 20, 2023

Choose a reason for hiding this comment

rok Sep 20, 2023

Choose a reason for hiding this comment

rok commented Sep 25, 2023

rok commented Oct 9, 2023

changhiskhan left a comment •

edited

Loading