Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Zero-Field Structs and DataFrame with Height Property #19123

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

coastalwhite
Copy link
Collaborator

@coastalwhite coastalwhite commented Oct 7, 2024

This PR refactors a large part of the code to allow for:

  • Zero-Field Structs (ZFSs)
  • Zero-Column DataFrame (with non-zero height) (ZCDFs)

This required quite a bit of changes all over the place to be able to support.

@github-actions github-actions bot added internal An internal refactor or improvement python Related to Python Polars rust Related to Rust Polars labels Oct 7, 2024
Copy link

codecov bot commented Oct 8, 2024

Codecov Report

Attention: Patch coverage is 82.30088% with 120 lines in your changes missing coverage. Please review.

Project coverage is 79.79%. Comparing base (9dada18) to head (d87fbf8).
Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
crates/polars-core/src/frame/mod.rs 81.01% 30 Missing ⚠️
crates/polars-arrow/src/array/struct_/mutable.rs 0.00% 19 Missing ⚠️
...tream/src/nodes/parquet_source/row_group_decode.rs 0.00% 17 Missing ⚠️
crates/polars-core/src/serde/series.rs 42.85% 12 Missing ⚠️
crates/polars-core/src/chunked_array/ops/bits.rs 43.75% 9 Missing ⚠️
crates/polars-arrow/src/legacy/array/mod.rs 0.00% 7 Missing ⚠️
crates/polars-arrow/src/array/struct_/mod.rs 87.50% 3 Missing ⚠️
...rates/polars-core/src/chunked_array/struct_/mod.rs 94.23% 3 Missing ⚠️
crates/polars-core/src/frame/row/av_buffer.rs 82.35% 3 Missing ⚠️
...s-pipe/src/executors/sinks/group_by/generic/mod.rs 0.00% 3 Missing ⚠️
... and 12 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #19123      +/-   ##
==========================================
+ Coverage   79.78%   79.79%   +0.01%     
==========================================
  Files        1531     1532       +1     
  Lines      208445   208649     +204     
  Branches     2913     2913              
==========================================
+ Hits       166301   166498     +197     
- Misses      41593    41601       +8     
+ Partials      551      550       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Comment on lines -110 to -111
unsafe fn _mmap_unchecked<T: AsRef<[u8]>>(
fields: &ArrowSchema,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drive-by: this function was unused

let mut out = a
.fields_as_series()
.iter()
.zip(b.fields_as_series().iter())
.map(|(l, r)| op(l, r))
.reduce(reduce)
.unwrap();
.unwrap_or_else(|| BooleanChunked::full(PlSmallStr::EMPTY, !value, a.len()));
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if both structs are ZFSs it is always eq / not ne

@@ -192,10 +192,7 @@ impl DataFrame {

match n.get(0) {
Some(n) => self.sample_n_literal(n as usize, with_replacement, shuffle, seed),
None => {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drive-by: this should be the same

@@ -237,10 +234,7 @@ impl DataFrame {
let n = (self.height() as f64 * frac) as usize;
self.sample_n_literal(n, with_replacement, shuffle, seed)
},
None => {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drive-by: this should be the same

@@ -32,6 +32,15 @@ impl DataFrame {
/// - the length of all [`Column`] is equal to the height of this [`DataFrame`]
/// - the columns names are unique
pub unsafe fn hstack_mut_unchecked(&mut self, columns: &[Column]) -> &mut Self {
// If we don't have any columns yet, copy the height from the given columns.
if let Some(fst) = columns.first() {
if self.width() == 0 {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe this should have a self.height() == 0 check as well, but I want to leave that as future correctness work

@@ -1232,6 +1292,10 @@ impl DataFrame {
if let Some(idx) = self.get_column_index(column.name().as_str()) {
self.replace_column(idx, column)?;
} else {
if self.width() == 0 {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idem

@@ -1274,7 +1338,13 @@ impl DataFrame {
debug_assert!(self.width() == 0 || self.height() == column.len());
debug_assert!(self.get_column_index(column.name().as_str()).is_none());

// SAFETY: Invariant of function guarantees for case `width` > 0. We set the height
// properly for `width` == 0.
if self.width() == 0 {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idem

@@ -1288,6 +1358,10 @@ impl DataFrame {
self.replace_column(idx, c)?;
}
} else {
if self.width() == 0 {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idem

@@ -657,13 +657,6 @@ fn any_values_to_list(
DataType::Categorical(Some(Arc::new(RevMapping::default())), *ordering)
},

// Structs don't support empty fields yet.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is very nice that we can remove these here.

.iter()
.map(|s| s.new_from_index(0, num_rows).into());
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drive-by: just convert into scalar column

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
internal An internal refactor or improvement python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant