Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] [FlightRPC] Index with value of 0 is out-of-bounds for array of length 0 #44160

Open
heiseish opened this issue Sep 18, 2024 · 1 comment

Comments

@heiseish
Copy link

Describe the bug, including details regarding any error messages, version, and platform.

Context

Description

  • When a table built by concat-ing dictionary arrays of mismatched "schema"/dictionary, the transmitted table appears to be malformed

Reproducible code

import pyarrow.flight as fl
import pyarrow as pa
import enum

class MyEnum(enum.Enum):
    Foo = 0
    Bar = 1
    Baz = 2

schema = pa.schema({
    'col': pa.dictionary(pa.int8(), pa.string())
})

def build_data() -> pa.Table:
    non_empty = pa.table({
        'col': pa.DictionaryArray.from_arrays(pa.array([0, 2], pa.int8()), [x.name for x in MyEnum])
    }, schema=schema)
    empty = pa.table({
        'col': pa.DictionaryArray.from_arrays(pa.array([], pa.int8()), [])
    }, schema=schema)
    # If unify_dictionaries get called here, it works
    return pa.concat_tables([empty, non_empty]) # .unify_dictionaries()

class Server(fl.FlightServerBase):
    def do_get(self, context, ticket):
        table = build_data()
        _ = table['col'].to_pylist()
        print('build table ', table)
        # This doesn't work
        return fl.RecordBatchStream(table, options=pa.ipc.IpcWriteOptions(unify_dictionaries=True))

if __name__ == '__main__':
    server = Server()
    client = fl.FlightClient(f'grpc://localhost:{server.port}')
    client.wait_for_available()
    table = client.do_get(fl.Ticket(bytes())).read_all()
    try:
        _ = table['col'].to_pylist()
        print('got table ', table)
    except Exception as e:
        print(e)
    server.shutdown()

Expectation

  • to_pylist succeeds

Actual

  • to_pylist fails with index with value of 0 is out-of-bounds for array of length 0

Observation

Table before IPC

----
col: [  -- dictionary:
[]  -- indices:
[],  -- dictionary:
["Foo","Bar","Baz"]  -- indices:
[0,2]]

Table after IPC

col: [  -- dictionary:
[]  -- indices:
[],  -- dictionary:
[]  -- indices:
[0,2]]

I'm happy to open a PR if someone can point me to the relevant code. Thanks!

Component(s)

FlightRPC, Python

@heiseish heiseish changed the title Index with value of 0 is out-of-bounds for array of length 0 [Python] [FlightRPC] Index with value of 0 is out-of-bounds for array of length 0 Sep 18, 2024
@heiseish
Copy link
Author

take

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant