Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add JSON writer benchmarks (#5314) #5317

Merged
merged 5 commits into from
Jan 23, 2024
Merged

Conversation

tustvold
Copy link
Contributor

Which issue does this PR close?

Part of #5314

Rationale for this change

We should get benchmarks before making changes to the serialization logic.

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the arrow Changes to the arrow crate label Jan 20, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me -- thank you @tustvold

I also applaud your data driven approach

I had some comments / suggestions but nothing I think is required

fn create_mixed(len: usize) -> RecordBatch {
let c1 = Arc::new(create_string_array::<i32>(len, 0.));
let c2 = Arc::new(create_primitive_array::<Int32Type>(len, 0.));
let c3 = Arc::new(create_primitive_array::<UInt32Type>(len, 0.));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should also include a 64 bit type (like U64IntType rather than U32IntType)?

}

fn bench_dict_array(c: &mut Criterion) {
let c1 = Arc::new(create_string_array::<i32>(NUM_ROWS, 0.));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to include a string array in a benchmark for dictionary ? Maybe this is already covered by the separate benchmark for just strings?

do_bench(c, "bench_struct", &batch)
}

fn bench_nullable_struct(c: &mut Criterion) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯 for including structs

Though again here I wonder if it is important to also have a string array or if we should just have a single struct array column)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to have more than one column at the struct level, to break branch prediction

}

fn bench_primitive(c: &mut Criterion) {
let c1 = Arc::new(create_primitive_array::<Float32Type>(NUM_ROWS, 0.));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason to lump them together or would it make more sense to have a separate one for ints and flots?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Habit, this is a good shout as float formatting is much more complex for JSON

@tustvold tustvold merged commit a0148ba into apache:master Jan 23, 2024
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants