Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add better documentation, examples and builer-style API to ByteView #6479

Merged
merged 8 commits into from
Oct 1, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 9 additions & 14 deletions arrow-array/src/array/byte_view_array.rs
Original file line number Diff line number Diff line change
Expand Up @@ -44,8 +44,11 @@ use super::ByteArrayType;
///
/// # See Also
///
/// See [`StringViewArray`] for storing utf8 encoded string data and
/// [`BinaryViewArray`] for storing bytes.
/// * [`StringViewArray`] for storing utf8 encoded string data
/// * [`BinaryViewArray`] for storing bytes
/// * [`ByteView`] to interpret `u128`s layout of the views.
///
/// [`ByteView`]: arrow_data::ByteView
///
/// # Notes
///
Expand Down Expand Up @@ -872,12 +875,9 @@ mod tests {
#[should_panic(expected = "Invalid buffer index at 0: got index 3 but only has 1 buffers")]
fn new_with_invalid_view_data() {
let v = "large payload over 12 bytes";
let view = ByteView {
length: 13,
prefix: u32::from_le_bytes(v.as_bytes()[0..4].try_into().unwrap()),
buffer_index: 3,
offset: 1,
};
let view = ByteView::new(13, &v.as_bytes()[0..4])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an example of the kind of improvement the new API allows (aka not having to know to do try_into/unwrap for byte slices

.with_buffer_index(3)
.with_offset(1);
let views = ScalarBuffer::from(vec![view.into()]);
let buffers = vec![Buffer::from_slice_ref(v)];
StringViewArray::new(views, buffers, None);
Expand All @@ -889,12 +889,7 @@ mod tests {
)]
fn new_with_invalid_utf8_data() {
let v: Vec<u8> = vec![0xf0, 0x80, 0x80, 0x80];
let view = ByteView {
length: v.len() as u32,
prefix: u32::from_le_bytes(v[0..4].try_into().unwrap()),
buffer_index: 0,
offset: 0,
};
let view = ByteView::new(v.len() as u32, &v);
let views = ScalarBuffer::from(vec![view.into()]);
let buffers = vec![Buffer::from_slice_ref(v)];
StringViewArray::new(views, buffers, None);
Expand Down
68 changes: 66 additions & 2 deletions arrow-data/src/byte_view.rs
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,40 @@ use arrow_schema::ArrowError;
/// Helper to access views of [`GenericByteViewArray`] (`StringViewArray` and
/// `BinaryViewArray`) where the length is greater than 12 bytes.
///
/// See the documentation on [`GenericByteViewArray`] for more information on
/// the layout of the views.
/// See Also:
/// * [`GenericByteViewArray`] for more information on the layout of the views.
/// * [`validate_binary_view`] and [`validate_string_view`] to validate
///
/// # Example: Create a new u128 view
///
/// ```rust
/// # use arrow_data::ByteView;;
/// // Create a view for a string of length 20
/// // first four bytes are "Rust"
/// // stored in buffer 3
/// // at offset 42
/// let prefix = "Rust";
/// let view = ByteView::new(20, prefix.as_bytes())
/// .with_buffer_index(3)
/// .with_offset(42);
///
/// // create the final u128
/// let v = view.as_u128();
/// assert_eq!(v, 0x2a000000037473755200000014);
/// ```
///
/// # Example: decode a `u128` into its constituent fields
/// ```rust
/// # use arrow_data::ByteView;
/// // Convert a u128 to a ByteView
/// // See validate_{string,binary}_view functions to validate
/// let v = ByteView::from(0x2a000000037473755200000014);
///
/// assert_eq!(v.length, 20);
/// assert_eq!(v.prefix, 0x74737552);
/// assert_eq!(v.buffer_index, 3);
/// assert_eq!(v.offset, 42);
/// ```
///
/// [`GenericByteViewArray`]: https://docs.rs/arrow/latest/arrow/array/struct.GenericByteViewArray.html
#[derive(Debug, Copy, Clone, Default)]
Expand All @@ -39,6 +71,38 @@ pub struct ByteView {
}

impl ByteView {
/// Construct a [`ByteView`] for data `length` of bytes with the specified prefix.
///
/// See example on [`ByteView`] docs
///
/// Notes:
/// * the length should always be greater than 12 (Data less than 12
/// bytes is stored as an inline view)
/// * buffer and offset are set to `0`
///
/// # Panics
/// If the prefix is not exactly 4 bytes
pub fn new(length: u32, prefix: &[u8]) -> Self {
alamb marked this conversation as resolved.
Show resolved Hide resolved
alamb marked this conversation as resolved.
Show resolved Hide resolved
Self {
length,
prefix: u32::from_le_bytes(prefix.try_into().unwrap()),
buffer_index: 0,
offset: 0,
}
}

/// Set the [`Self::buffer_index`] field
pub fn with_buffer_index(mut self, buffer_index: u32) -> Self {
alamb marked this conversation as resolved.
Show resolved Hide resolved
self.buffer_index = buffer_index;
self
}

/// Set the [`Self::offset`] field
pub fn with_offset(mut self, offset: u32) -> Self {
self.offset = offset;
self
}

#[inline(always)]
/// Convert `ByteView` to `u128` by concatenating the fields
pub fn as_u128(self) -> u128 {
Expand Down
Loading