Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert from arrow into Zero Copy but Copy On Write for Dora like memory #5

Merged
merged 7 commits into from
Aug 26, 2024

Conversation

Hennzau
Copy link
Collaborator

@Hennzau Hennzau commented Aug 15, 2024

Objective

This PR adds support for viewing an object as Rust's native types from arrow::array::ArrayData without consuming it. The initial issue arose with Dora because, in my first approach (#3), I extracted the buffer for each field in the UnionArray to take ownership inside a Vec. However, this is not always possible, particularly with Dora, because when the UnionArray passes through the shared memory server, all fields are allocated in the same buffer, making it impossible to take ownership (of sub-buffers).

Solution

The solution was to use std::borrow::Cow to represent array fields in fastformat::Image (or BBox).

Now, you can have both an Owned and a Borrowed BBox, with access to the exact same methods. However, if those methods need to mutate data, they will clone it first.

// Example for BBox structure
pub struct BBox<'a> {
    pub data: Cow<'a, [f32]>,
    pub confidence: Cow<'a, [f32]>,
    pub label: Vec<String>,
    pub encoding: Encoding,
}

User Usage

To implement this, I created a new struct: FastFormatArrowRawData, which can be built from ArrayData, and then either consumed into an Owned Object or viewed as a Borrowed Object.

let flat_image = vec![0; 27];

let bgr8_image = Image::new_bgr8(flat_image, 3, 3, None).unwrap();

let arrow_image = bgr8_image.into_arrow().unwrap();
let bgr8_image = Image::from_arrow(arrow_image).unwrap(); // Owned Image
let flat_image = vec![0; 27];
let bgr8_image = Image::new_bgr8(flat_image, 3, 3, None).unwrap();
let arrow_image = bgr8_image.into_arrow().unwrap();

let raw_data = Image::raw_data(arrow_image).unwrap();
let bgr8_image = Image::from_raw_data(raw_data).unwrap(); // Owned Image
let flat_image = vec![0; 27];
let bgr8_image = Image::new_bgr8(flat_image, 3, 3, None).unwrap();
let arrow_image = bgr8_image.into_arrow().unwrap();

let raw_data = Image::raw_data(arrow_image).unwrap();
let bgr8_image = Image::view_from_raw_data(&raw_data).unwrap(); // Borrowed Image

let rgb8_image = bgr8_image.into_rgb8().unwrap(); // This will first clone the Image

Developer Usage

Now it's super easy to create a FastFormat type and make it compatible with Arrow. You should create a few functions for your new type:

  • A raw_data function that will consume the ArrayData and return a FastFormatArrowRawData:
// Example with BBox
pub fn raw_data(array_data: arrow::array::ArrayData) -> Result<FastFormatArrowRawData> {
    use arrow::datatypes::Float32Type;

    let raw_data = FastFormatArrowRawData::new(array_data)?
        .load_primitive::<Float32Type>("data")?
        .load_primitive::<Float32Type>("confidence")?
        .load_utf("label")?
        .load_utf("encoding")?;

    Ok(raw_data)
}
  • A from_raw_data function that will consume the FastFormatArrowRawData and return your type:
pub fn from_raw_data(mut raw_data: FastFormatArrowRawData) -> Result<Self> {
    use arrow::datatypes::Float32Type;

    let data = raw_data.primitive_array::<Float32Type>("data")?;
    let confidence = raw_data.primitive_array::<Float32Type>("confidence")?;
    let label = raw_data.utf8_array("label")?;
    let encoding = Encoding::from_string(raw_data.utf8_singleton("encoding")?)?;

    Ok(Self {
        data: Cow::Owned(data),
        confidence: Cow::Owned(confidence),
        label,
        encoding,
    })
}
  • A view_from_raw_data function that will borrow the FastFormatArrowRawData to return a Borrowed object:
pub fn view_from_raw_data(raw_data: &'a FastFormatArrowRawData) -> Result<Self> {
    use arrow::datatypes::Float32Type;

    let data = raw_data.primitive_array_view::<Float32Type>("data")?; // view array instead of consuming it
    let confidence = raw_data.primitive_array_view::<Float32Type>("confidence")?; // same
    let label = raw_data.utf8_array("label")?;
    let encoding = Encoding::from_string(raw_data.utf8_singleton("encoding")?)?;

    Ok(Self {
        data: Cow::Borrowed(data),
        confidence: Cow::Borrowed(confidence),
        label,
        encoding,
    })
}

Benchmarks

I benchmarked these functions with Dora. I compared passing raw data from a Vec<u8> of different sizes to passing a fastformat::Image type, including the entire pipeline (creating the Image, converting it to Arrow, and converting it back to an Image).

(Benchmark on a laptop, 32GB of RAM and a Ryzen 7 4800H)

Raw Vec

For this benchmark, I sent 1000 raw Vec<u8> of different sizes:

Latency:
480P = 720 * 480 * 3 : 604.005µs
720P = 1280 * 720 * 3 : 638.604µs
1080P = 1920 * 1080 * 3 : 670.105µs
2160P = 3190 * 2160 * 3 : 663.552µs

Throughput:
480P = 720 * 480 * 3 : 2500 messages per second
720P = 1280 * 720 * 3 : 730 messages per second
1080P = 1920 * 1080 * 3 : 505 messages per second
2160P = 3190 * 2160 * 3 : 155 messages per second

FastFormat

For this benchmark, I sent 1000 Image objects of different sizes:

Latency (1000 samples):
480P = 720 * 480 * 3 : 871.699µs
720P = 1280 * 720 * 3 : 710.012µs
1080P = 1920 * 1080 * 3 : 725.182µs
2160P = 3190 * 2160 * 3 : 678.061µs

Throughput:
480P = 720 * 480 * 3 : 2349 messages per second
720P = 1280 * 720 * 3 : 548 messages per second
1080P = 1920 * 1080 * 3 : 508 messages per second
2160P = 3190 * 2160 * 3 : 152 messages per second

Conclusion

As you can see, there is no notable difference (which is expected, as we don’t copy any large data). See dora-benchmark.

@Hennzau Hennzau self-assigned this Aug 15, 2024
@Hennzau Hennzau added the enhancement New feature or request label Aug 15, 2024
@Hennzau Hennzau marked this pull request as ready for review August 16, 2024 19:46
@haixuanTao
Copy link
Contributor

Looks awesome!

Thanks Enzo :)

@haixuanTao haixuanTao merged commit d126f41 into other_datatype Aug 26, 2024
24 checks passed
@haixuanTao haixuanTao deleted the data_container_cow branch August 26, 2024 07:51
Hennzau added a commit that referenced this pull request Sep 9, 2024
# Objective

This PR adds a new datatype for **BBox** to ensure that everything work
well and it's easy to add a new datatype.

# Datatypes & Usage

- [x] Image

```Rust
use crate::image::Image;

let flat_image = (0..27).collect::<Vec<u8>>();
let image = Image::new_rgb8(flat_image, 3, 3, Some("camera.test")).unwrap();

let final_image = image.into_bgr8().unwrap();
let final_image_data = final_image.data.as_u8().unwrap();

let expected_image = vec![
    2, 1, 0, 5, 4, 3, 8, 7, 6, 11, 10, 9, 14, 13, 12, 17, 16, 15, 20, 19, 18, 23, 22, 21,
    26, 25, 24,
];

assert_eq!(&expected_image, final_image_data);

use crate::image::Image;

let flat_image = vec![0; 27];
let original_buffer_address = flat_image.as_ptr();

let bgr8_image = Image::new_bgr8(flat_image, 3, 3, None).unwrap();
let image_buffer_address = bgr8_image.as_ptr();

let arrow_image = bgr8_image.into_arrow().unwrap();

let new_image = Image::from_arrow(arrow_image).unwrap();
let final_image_buffer = new_image.as_ptr();

assert_eq!(original_buffer_address, image_buffer_address);
assert_eq!(image_buffer_address, final_image_buffer);
```

- [x] BBox

```Rust
use crate::bbox::BBox;

let flat_bbox = vec![1.0, 1.0, 2.0, 2.0];
let confidence = vec![0.98];
let label = vec!["cat".to_string()];

let bbox = BBox::new_xyxy(flat_bbox, confidence, label).unwrap();
let final_bbox = bbox.into_xywh().unwrap();
let final_bbox_data = final_bbox.data;

let expected_bbox = vec![1.0, 1.0, 1.0, 1.0];

assert_eq!(expected_bbox, final_bbox_data);

use crate::bbox::BBox;

let flat_bbox = vec![1.0, 1.0, 2.0, 2.0];
let original_buffer_address = flat_bbox.as_ptr();

let confidence = vec![0.98];
let label = vec!["cat".to_string()];

let xyxy_bbox = BBox::new_xyxy(flat_bbox, confidence, label).unwrap();
let bbox_buffer_address = xyxy_bbox.data.as_ptr();

let arrow_bbox = xyxy_bbox.into_arrow().unwrap();

let new_bbox = BBox::from_arrow(arrow_bbox).unwrap();
let final_bbox_buffer = new_bbox.data.as_ptr();

assert_eq!(original_buffer_address, bbox_buffer_address);
assert_eq!(bbox_buffer_address, final_bbox_buffer);
```

# Quick Fixes

- I also improved readability and consistency with Rust formatting,
following the guidelines mentioned in [this
comment](#1 (comment)).
- Fix Arrow Array extraction from Union inside #3 
- Fix Dora compatibility (by viewing objects when it's not possible to
own them) inside #5
- I improved the structure of the library, with separated packages and
with some `features` as one might want to use `fastformat` only with
ndarray/arrow. #6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants