-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
parquet: Add finer metrics on operations covered by time_elapsed_opening
#12585
Changes from 4 commits
d1d00d5
e2ccd46
ee5faae
b9b7156
0ce20b8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -118,6 +118,7 @@ impl FileOpener for ParquetOpener { | |
Ok(Box::pin(async move { | ||
let options = ArrowReaderOptions::new().with_page_index(enable_page_index); | ||
|
||
let mut metadata_timer = file_metrics.metadata_load_time.timer(); | ||
let metadata = | ||
ArrowReaderMetadata::load_async(&mut reader, options.clone()).await?; | ||
let mut schema = metadata.schema().clone(); | ||
|
@@ -133,6 +134,8 @@ impl FileOpener for ParquetOpener { | |
let metadata = | ||
ArrowReaderMetadata::try_new(metadata.metadata().clone(), options)?; | ||
|
||
metadata_timer.stop(); | ||
|
||
let mut builder = | ||
ParquetRecordBatchStreamBuilder::new_with_metadata(reader, metadata); | ||
|
||
|
@@ -187,15 +190,18 @@ impl FileOpener for ParquetOpener { | |
} | ||
// If there is a predicate that can be evaluated against the metadata | ||
if let Some(predicate) = predicate.as_ref() { | ||
let mut timer = file_metrics.statistics_eval_time.timer(); | ||
row_groups.prune_by_statistics( | ||
&file_schema, | ||
builder.parquet_schema(), | ||
rg_metadata, | ||
predicate, | ||
&file_metrics, | ||
); | ||
timer.stop(); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think it is a problem, but technically speaking the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This one is needed, because it only goes out of scope after the Bloom Filter evaluation. The next There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Doesn't matter anymore, as I moved both calls to the callees. |
||
|
||
if enable_bloom_filter && !row_groups.is_empty() { | ||
let mut timer = file_metrics.bloom_filter_eval_time.timer(); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I note that the time calculation for access_plan = p.prune_plan_with_page_index(
access_plan,
&file_schema,
builder.parquet_schema(),
file_metadata.as_ref(),
&file_metrics,
); I wonder if we should move the timer into this function for consistency (or alternatively push the timer calculation into There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point. I did the latter, because it's probably closer to what users expect. |
||
row_groups | ||
.prune_by_bloom_filters( | ||
&file_schema, | ||
|
@@ -204,6 +210,7 @@ impl FileOpener for ParquetOpener { | |
&file_metrics, | ||
) | ||
.await; | ||
timer.stop(); | ||
} | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an aside while I am reviewing this code, the description of what the metrics mean would be more useful if we could attach them to the Metric itself somehow 🤔
Something like:
Though then we would have to figure out how to expose that information in an explain plan 🤔