Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve error messages while downcasting UInt32Array, UInt64Array and BooleanArray #4261

Merged
merged 1 commit into from
Nov 20, 2022

Conversation

retikulum
Copy link
Contributor

Which issue does this PR close?

Part of #3152.

Rationale for this change

This is the new PR of improving downcasting to UInt32Array, UInt64Array, BooleanArray. However, I couldn't refactor following part because error needs to be cast into ArrowError.

https://github.com/apache/arrow-datafusion/blob/822022db88d4f70c2f02ac4d1828fc9413f3e252/datafusion/core/src/physical_plan/filter.rs#L212-L227

If you have any suggestion, I will be happy to implement it.

What changes are included in this PR?

  • as_uint32_array, as_uint64_array, as_boolean_array is created in datafusion\common\src\cast.rs
  • Refactor parts where following lines are used:
as_any().downcast_ref::<UInt32Array>().unwrap()
as_any().downcast_ref::<UInt64Array>().unwrap()
as_any().downcast_ref::<BooleanArray>().unwrap()
as_boolean_array()

Are there any user-facing changes?

I am not sure about it.

@github-actions github-actions bot added core Core DataFusion crate physical-expr Physical Expressions labels Nov 17, 2022
Comment on lines +120 to +151
// Downcast ArrayRef to UInt32Array
pub fn as_uint32_array(array: &dyn Array) -> Result<&UInt32Array, DataFusionError> {
array.as_any().downcast_ref::<UInt32Array>().ok_or_else(|| {
DataFusionError::Internal(format!(
"Expected a UInt32Array, got: {}",
array.data_type()
))
})
}

// Downcast ArrayRef to UInt64Array
pub fn as_uint64_array(array: &dyn Array) -> Result<&UInt64Array, DataFusionError> {
array.as_any().downcast_ref::<UInt64Array>().ok_or_else(|| {
DataFusionError::Internal(format!(
"Expected a UInt64Array, got: {}",
array.data_type()
))
})
}

// Downcast ArrayRef to BooleanArray
pub fn as_boolean_array(array: &dyn Array) -> Result<&BooleanArray, DataFusionError> {
array
.as_any()
.downcast_ref::<BooleanArray>()
.ok_or_else(|| {
DataFusionError::Internal(format!(
"Expected a BooleanArray, got: {}",
array.data_type()
))
})
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a macro that you can use instead of writing these methods:

let array = downcast_value!(values, Int32Array)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should revert all pr from this issue #3152?
cc @alamb @andygrove

Copy link
Contributor

@alamb alamb Nov 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, that is embarrassing 🤦

Interestingly the downcast_value doesn't appear to be used in that many places (less than 10 modules at this time):
https://github.com/search?q=repo%3Aapache%2Farrow-datafusion+downcast_value&type=code

I would prefer not to roll back the PRs as they have already simplified the code non trivially.

What is important in my opinion is to use a standard pattern to do this downcasting. I don't have a huge preference between downcast_value and as_boolean_array , though the as_boolean_array might be more discoverable in an IDE that autocompletes

If we are worried about code duplication, perhaps we can use downcast_value to implement the as_boolean_array, type methods

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good improvement that we can merge and then work on unifying with the downcast macro later. If people feel strongly about the downcast macro I can do that too

@alamb
Copy link
Contributor

alamb commented Nov 18, 2022

Thank you @retikulum

@andygrove and @liukun4515 please share your thoughts

@retikulum
Copy link
Contributor Author

retikulum commented Nov 18, 2022

I think this is a good improvement that we can merge and then work on unifying with the downcast macro later. If people feel strongly about the downcast macro I can do that too

Thanks for the feedback. I just didn't know about this macro. I can refactor this PR right after other reviewers' thoughts. Moreover, I will refactor previous PRs in a time series with bigger PRs if it is okay

@alamb
Copy link
Contributor

alamb commented Nov 18, 2022

Thanks for the feedback. I just didn't know about this macro.

Me neither 😅 which is embarrassing when I look at the github blame history 🤦

@alamb
Copy link
Contributor

alamb commented Nov 19, 2022

Unless there are any other comments, I will plan to merge this PR tomorrow and we can continue the progress to clean up the code

@alamb alamb merged commit 712b9fd into apache:master Nov 20, 2022
@ursabot
Copy link

ursabot commented Nov 20, 2022

Benchmark runs are scheduled for baseline = 880e6fc and contender = 712b9fd. 712b9fd is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@liukun4515
Copy link
Contributor

I think this is a good improvement that we can merge and then work on unifying with the downcast macro later. If people feel strongly about the downcast macro I can do that too
If we are worried about code duplication, perhaps we can use downcast_value to implement the as_boolean_array, type methods
@retikulum @alamb

It's a good idea to implement as_xxx_array using the downcast_value

@retikulum
Copy link
Contributor Author

Will we keep using as_xxx_array scheme but implement these function with downcast_value? Isn't it cause code duplication? All functions will have downcast_value!(values, ArrayType) only ArrayType will change.

@retikulum
Copy link
Contributor Author

What is the last decision ? @alamb @liukun4515

@alamb
Copy link
Contributor

alamb commented Nov 21, 2022

Will we keep using as_xxx_array scheme but implement these function with downcast_value? Isn't it cause code duplication? All functions will have downcast_value!(values, ArrayType) only ArrayType will change.

I think that is fine (or maybe we can remove duplication by creating a macro that creates the entire function as_binary_array -- like make_downcast_vaue()

It would be good to eventually remove all uses of downcast_value! in favor of the as_xxx_array functions so there is single way to do this downcasting (which is the overall goal of #3152 -- to have single way in the codebase to do it)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate physical-expr Physical Expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants