-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CHORE] Update PyO3 and use their new Bound API #2793
Conversation
CodSpeed Performance ReportMerging #2793 will not alter performanceComparing Summary
|
There seems to be a strange issue where the order of dataframe columns is wrong now. Looking into that |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2793 +/- ##
==========================================
+ Coverage 63.36% 64.06% +0.70%
==========================================
Files 1016 1007 -9
Lines 114231 112934 -1297
==========================================
- Hits 72381 72350 -31
+ Misses 41850 40584 -1266
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks good! But Let's make sure to add pyo3::intern!
everywhere it was removed.
serialized | ||
.extract::<&PyBytes>(py) | ||
.map(|s| $crate::bincode::deserialize(s.as_bytes()).unwrap()) | ||
pub fn _from_serialized(serialized: &[u8]) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe this is a good contender for the bound api so we don't have to Clone / Copy
the underlying bytes before passing them into rust. We could instead of a GILBound to the bytes and create the underlying rust obj
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you mean. When a pyfunction parameter is of type &[u8]
, pyo3 just calls .as_bytes()
internally and returns a reference to the actual bytes in the python object without copying.
https://github.com/PyO3/pyo3/blob/main/src/conversions/std/slice.rs#L42
@@ -18,7 +18,7 @@ pub mod pylib { | |||
pub fn read_parquet( | |||
py: Python, | |||
uri: &str, | |||
columns: Option<Vec<&str>>, | |||
columns: Option<Vec<String>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does this have to be owned now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know why pyo3 doesn't allow Vec<&str>
anymore but I suspect it's because of some sort of lifetime issue with the new Bound stuff
convert_pyarrow_parquet_read_result_into_py(py, schema, all_arrays, num_rows, pyarrow) | ||
} | ||
#[allow(clippy::too_many_arguments)] | ||
#[pyfunction] | ||
pub fn read_parquet_bulk( | ||
py: Python, | ||
uris: Vec<&str>, | ||
columns: Option<Vec<&str>>, | ||
uris: Vec<String>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we making this owned if we just take the ref anyways?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pyo3 limitations. See comment on read_parquet
src/daft-parquet/src/read.rs
Outdated
@@ -460,9 +461,9 @@ async fn stream_parquet_single( | |||
} | |||
|
|||
#[allow(clippy::too_many_arguments)] | |||
async fn read_parquet_single_into_arrow( | |||
async fn read_parquet_single_into_arrow<T: AsRef<str>>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this instead be ToString, since we call that right after?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the generic here and just pass around String
s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
We cannot yet update to the latest minor version, v0.22, because rust-numpy only supports v0.21 right now.
There may be some small additional memory/performance optimizations we can do with this new API, but I will leave it for another PR. This one should not have any regressions at least.
Maybe next time i'll try using Cursor to do this 😄