Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Add an allowlist of DataTypes that ColumnRangeStatistics supports and validation of TableStatistics #1632

Merged
merged 13 commits into from
Nov 27, 2023

Conversation

jaychia
Copy link
Contributor

@jaychia jaychia commented Nov 17, 2023

  1. We should disallow creation of ColumnRangeStatistics from non-comparable types to avoid issues at runtime
  2. We also add validation when creating MicroPartitions:
    • The column names in a MicroPartition's schema must be found in its ScanTask's schema
    • When creating Statistics for a MicroPartition, we cast those Statistics to the MicroPartition's schema to ensure type compatibility

@github-actions github-actions bot added the bug Something isn't working label Nov 17, 2023
Copy link
Member

@samster25 samster25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! You do like your matching haha

Copy link

codecov bot commented Nov 17, 2023

Codecov Report

Merging #1632 (e0507d8) into main (ff218e7) will not change coverage.
Report is 3 commits behind head on main.
The diff coverage is n/a.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #1632   +/-   ##
=======================================
  Coverage   84.88%   84.88%           
=======================================
  Files          55       55           
  Lines        5318     5318           
=======================================
  Hits         4514     4514           
  Misses        804      804           

@jaychia jaychia force-pushed the jay/constrain-statistics-types branch from 43032c1 to 2286c38 Compare November 23, 2023 00:25
@jaychia jaychia changed the title [BUG] Add an allowlist of DataTypes that ColumnRangeStatistics supports [BUG] Add an allowlist of DataTypes that ColumnRangeStatistics supports and validation of TableStatistics Nov 23, 2023
{
panic!("MicroPartition: TableStatistics and Schema have different column names\nTableStats:\n{},\nSchema\n{}", statistics, schema);
}
assert!(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be a panic or debug assert. regular assert here will kill the program

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert! just calls panic! under the hood, is it special-cased for pyo3?

https://doc.rust-lang.org/std/macro.assert.html

Asserts that a boolean expression is true at runtime.

This will invoke the panic! macro if the provided expression cannot be evaluated to true at runtime.

src/daft-stats/src/column_stats/mod.rs Show resolved Hide resolved
src/daft-stats/src/table_stats.rs Show resolved Hide resolved
@jaychia jaychia merged commit 12bd499 into main Nov 27, 2023
39 checks passed
@jaychia jaychia deleted the jay/constrain-statistics-types branch November 27, 2023 19:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants