[1.11.0] [Cherry-pick] [Datasets] Fix boolean tensor column representation and slicing. #22358

clarkzinzow · 2022-02-14T19:45:07Z

Reformatted cherry-pick of 4434169.

This PR fixes our {NumPy, Pandas} <--> Arrow interop for boolean tensor columns. NumPy and Pandas represent boolean arrays with a byte per boolean, while Arrow bit-packs booleans with 8 booleans per byte. Previously, when casting NumPy arrays to tensor columns, we were interpreting NumPy's boolean array buffers as being bit-packed when they were not. This PR completes support by packing and unpacking bits for boolean arrays when creating a boolean tensor column from an ndarray and when creating an ndarray from a boolean tensor column, respectively.

…-project#22323) This PR fixes our {NumPy, Pandas} <--> Arrow interop for boolean tensor columns. NumPy and Pandas represent boolean arrays with a byte per boolean, while Arrow bit-packs booleans with 8 booleans per byte. Previously, when casting NumPy arrays to tensor columns, we were interpreting NumPy's boolean array buffers as being bit-packed when they were not. This PR completes support by packing and unpacking bits for boolean arrays when creating a boolean tensor column from an ndarray and when creating an ndarray from a boolean tensor column, respectively.

clarkzinzow requested review from ericl and scv119 as code owners February 14, 2022 19:45

ericl approved these changes Feb 14, 2022

View reviewed changes

clarkzinzow assigned ericl and mwtian Feb 14, 2022

ericl merged commit d51df51 into ray-project:releases/1.11.0 Feb 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[1.11.0] [Cherry-pick] [Datasets] Fix boolean tensor column representation and slicing. #22358

[1.11.0] [Cherry-pick] [Datasets] Fix boolean tensor column representation and slicing. #22358

clarkzinzow commented Feb 14, 2022

[1.11.0] [Cherry-pick] [Datasets] Fix boolean tensor column representation and slicing. #22358

[1.11.0] [Cherry-pick] [Datasets] Fix boolean tensor column representation and slicing. #22358

Conversation

clarkzinzow commented Feb 14, 2022