-
Notifications
You must be signed in to change notification settings - Fork 789
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add into_builder
methods for Arrays
#6430
Comments
Yes, I think Maybe I can help this. Or @ShiKaiWi may also be insterested about this? |
I'm glad to work on this ticket. Thanks @Rachelint. |
take. |
@ShiKaiWi I would recommend:
Perhaps initially start with something like this let arr: BooleanArray = vec![true, true, false].into();
let mut builder = arr.into_builder()
// Append 2 new rows
builder.append_value(false);
builder.append_null();
let arr = builder.finish();
assert_eq!(....) |
When trying to start from the Here is the pub struct BooleanBufferBuilder {
buffer: MutableBuffer,
len: usize,
} And it assumes that the boolean values must start from the first bit, but a sliced @alamb @Rachelint Please let me know your opinions. |
Yes, it is actually a problem for converting from In my opinion, can we define something like |
I think we can use the builders for this rather than adding a new API. I think we can use the existing structures for low level manipulation if we made it a bit easier to convert between them. I think we should make conversion from array --> builder such that they don't copy if possible, but if necessary copy the underlying buffers. A sliced array is likely to be share so copying the bitmap to modify it will be needed anyways
The lower level BooleanBufferBuilder contains the relevant APIs, like this: Rather than changing BooleanBuilder to allow direct manipulation of the underling boolean buffer, maybe we could make it easier to further destructure it (see source) For example // convert boolean array --> BooleanBuilder
let builder = boolean_array.into_builder();
// Deconstruct the BooleanBuilder into `BooleanBufferBuilder` and `NullBufferBuilder
let (bool_builder, null_builder) = builder.into_builders();
.. modify the boolean/null buffers via low level builders
// put it back together
let builder = BooleanBuilder::new_from_builders(bool_builder, null_builder);
let array = builder.finish() new_from_builders is inspired by https://docs.rs/arrow/latest/arrow/array/builder/struct.PrimitiveBuilder.html#method.new_from_buffer |
Got it! impl<T: ArrowPrimitiveType> PrimitiveBuilder<T> {
/// Returns the current values buffer and null buffer as a slice
pub fn slices_mut(&mut self) -> (&mut [T::Native], Option<&mut [u8]>) {
(
self.values_builder.as_slice_mut(),
self.null_buffer_builder.as_slice_mut(),
)
}
} We offer the similar api like: impl BooleanBuilder {
pub fn builders_mut(&mut self) -> (&mut BooleanBufferBuilder, &mut NullBufferBuilder) {
...
}
} |
@ShiKaiWi I read codes today, and I found maybe we can use |
Thanks a lot for your responses and suggestions. @alamb @Rachelint In conclusion, the two problems and the proposed solutions to them are as follows: 1. Whether to clone the underlying buffer for the unaligned-sliced boolean array
Basically, I agree with using However, it makes me worried that in the implementation of
And if we allow the clone for Actually, I'm in favor of totally avoiding copying for a not-shared sliced boolean array for performance and api consistency, but some breaking changes in the public api of So considering the api consistency, shall we insist on allowing clone for sliced 2. Allow direct manipulation of the underling boolean buffer The proposed solutions are similar -- exposing the underlying two builders. However, the provided two builders are required to be manipulated correctly, which seems not user-friendly, if some bit needs to be set/unset. And I guess it would be better if we add a new method to the pub fn set_value(&mut self, index: usize, v: Option<bool>) {
... manipulating the `BooleanBufferBuilder` and `NullBufferBuilder`.
} So how about this proposal? |
No, I think we should follow the model of PrimitiveArray rather than cloning implicitly
I think we should still permit converting Adding some additional easier to use apis like So perhaps we can first focus on more easily converting back / forth between builders, and then we can figure out what additional APIs we could add to the builders to make it easier to use? |
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
While working with various arrays and builders in DataFusion one thing that we often want to do is convert between an array and some modifiable version of it (for example, in StringView trim it would be nice to just modify existing views rather than mutate them)
I think in general it would be good to use the various Builder APIs for this
Describe the solution you'd like
I would like all the Arrays to have equivalent methods to
PrimitiveArray::into_builder()
.Like I want to be able to do
Similarly it would be nice to do something similar with
BooleanBuiler
, possibly other arraysDescribe alternatives you've considered
Additional context
Example DataFusion PRs: apache/datafusion#12395
The text was updated successfully, but these errors were encountered: