Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-34338][SQL] Report metrics from Datasource v2 scan #31451
[SPARK-34338][SQL] Report metrics from Datasource v2 scan #31451
Changes from 14 commits
4d941c7
f72834c
4215ec0
d3bf283
aedb965
918c90d
e8576ec
d5d8678
cf05fb7
6dced9c
a50bf40
7d50027
b5be5ba
0f38782
50ed317
06eb9c7
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's still per-row update?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, will do "update metrics per some rows" in a followup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to check whether
customMetrics
containsmetric.name()
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can throw a user-friendly message if it doesn't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any plans to similarly support metrics in writes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at a few V2 write nodes, but seems we don't have any SQL metrics there (even number of output rows). I guess we don't provide metrics for writes generally now?
If there is interest to see metrics in writes, I think it is okay to work on it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like updating
context.taskMetrics().outputMetrics
is just in our branch. That just uses the Hadoop FS metrics collection that we use elsewhere, so it isn't metrics from the source as we want to support in this PR.I think it would be good to follow up and support metrics on the output side. It doesn't need to be done here, but metrics are really useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me. I can work on it in follow up PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: to make the code more readable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why choosing the
Long
type instead of theDouble
type here?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No special reason. We can use Double to show numbers after decimal point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @viirya, the
loadExtensions
requires that class has a no-arg constructor, or a 1-arg constructor that takes aSparkConf
, do we really need this restriction forCustomMetric
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For current usage, I don't see there are other necessary args for
CustomMetric
constructor. Besides, as an interface, we cannot define constructor forCustomMetric
. I thinkCustomMetric
should be pretty simple class and I'd like to keep it as simple as possible. If there are more usages that require additional args, we can add an initialization API used to pass some parameters. But I doubt that if it is necessary.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we should keep it as simple as possible. I'm asking because the simple case class does not work since it doesn't have a no-arg constructor, or a 1-arg constructor.
It took me a while to dig out because it just doesn't work and no WARN/ERROR logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Added a warning log at #37386
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: initialize