-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Data] Refactor OpRuntimeMetrics
to support properties
#47800
Conversation
Signed-off-by: Balaji Veeramani <[email protected]>
Signed-off-by: Balaji Veeramani <[email protected]>
Signed-off-by: Balaji Veeramani <[email protected]>
Signed-off-by: Balaji Veeramani <[email protected]>
Signed-off-by: Balaji Veeramani <[email protected]>
@classmethod | ||
def get_metric_keys(cls): | ||
"""Return a list of metric keys.""" | ||
return [f.name for f in fields(cls)] + ["cpu_usage", "gpu_usage"] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dead code.
@property | ||
def average_bytes_change_per_task(self) -> Optional[float]: | ||
"""Average size difference in bytes of input ref bundles and output ref | ||
bundles per task.""" | ||
if ( | ||
self.average_bytes_inputs_per_task is None | ||
or self.average_bytes_outputs_per_task is None | ||
): | ||
return None | ||
|
||
return self.average_bytes_outputs_per_task - self.average_bytes_inputs_per_task | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dead code.
Signed-off-by: Balaji Veeramani <[email protected]>
Signed-off-by: Balaji Veeramani <[email protected]>
@dataclass | ||
class RunningTaskInfo: | ||
inputs: RefBundle | ||
num_outputs: int | ||
bytes_outputs: int | ||
|
||
|
||
class OpRuntimesMetricsMeta(type): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to just add the metric to the list in metricfield
. So we don't need this meta class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alternatively, i think we can add that logic in OpRuntimeMetrics.__post_init__
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would we get the name of the attribute in metricfield
? Field.name
is None
in metricfield
. I'm not sure how name
exactly gets set, but I think it's some dataclass magic that occurs after the class is defined.
One alternative is to explicitly specify the name. Advantage is that we don't have to use metaclasses; disadvantage is that you need to duplicate the name:
num_inputs_received: int = metricfield(
name="num_inputs_received",
default=0,
description="Number of input blocks received by operator.",
metrics_group="inputs",
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
personally, i prefer to only specify the name only when needed. if you put the logic in __post_init__
, that should have a valid name
attribute i think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alternatively, i think we can add that logic in OpRuntimeMetrics.post_init
We could do this. We'd just need to make sure we don't add duplicate items since post_init is called once per instance and not once when the class is defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't have a strong preference for post_init
vs. metaclass. Down for either
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline, we propose putting the logic in post_init
, and making _METRICS
a set so that we don't keep duplicates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After testing it out, I think __post_init__
might not work. __post_init__
only works if you don't override __init__
, and we currently override __init__
to pass the Operator
to the OpRuntimeMetrics
instance.
python/ray/data/_internal/execution/interfaces/op_runtime_metrics.py
Outdated
Show resolved
Hide resolved
python/ray/data/_internal/execution/interfaces/op_runtime_metrics.py
Outdated
Show resolved
Hide resolved
python/ray/data/_internal/execution/interfaces/op_runtime_metrics.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Balaji Veeramani <[email protected]>
…ics.py Co-authored-by: Scott Lee <[email protected]> Signed-off-by: Balaji Veeramani <[email protected]>
Signed-off-by: Balaji Veeramani <[email protected]>
…into refactor-metrics Signed-off-by: Balaji Veeramani <[email protected]>
metrics_group=value.metadata[_METRIC_FIELD_METRICS_GROUP_KEY], | ||
map_only=value.metadata[_METRIC_FIELD_IS_MAP_ONLY_KEY], | ||
) | ||
_METRICS.append(metric) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're adding to _METRICS
twice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alexeykudinkin wdym?
class OpRuntimeMetrics: | ||
"""Runtime metrics for a PhysicalOperator. | ||
class OpRuntimeMetrics(metaclass=OpRuntimesMetricsMeta): | ||
"""Runtime metrics for a 'PhysicalOperator'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are metric definitions not metrics themselves. Let's make that clear from the description
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alexeykudinkin OpRuntimeMetrics
contains the metric definitions as well as the metric values. What's the difference between what's in OpRuntimeMetrics
and what you mean by metric?
python/ray/data/_internal/execution/interfaces/op_runtime_metrics.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Balaji Veeramani <[email protected]>
Signed-off-by: Balaji Veeramani <[email protected]>
Signed-off-by: Balaji Veeramani <[email protected]>
Signed-off-by: Balaji Veeramani <[email protected]>
…#47800) Metrics that are implemented as properties (e.g., average_bytes_inputs_per_task) aren't shown on the Ray Data dashboard. This PR refactors the implementation of OpRuntimeMetrics to fix the issue. --------- Signed-off-by: Balaji Veeramani <[email protected]> Co-authored-by: Scott Lee <[email protected]> Signed-off-by: ujjawal-khare <[email protected]>
Why are these changes needed?
Metrics that are implemented as properties (e.g.,
average_bytes_inputs_per_task
) aren't shown on the Ray Data dashboard. This PR refactors the implementation ofOpRuntimeMetrics
to fix the issue.Additional details:
_StatsActor
usesfields
(a function that returns all of the fields in a dataclass) to generate a list of metrics to show in the Dashboard.ray/python/ray/data/_internal/stats.py
Lines 266 to 280 in 1c80db5
This is an issue because
_StatsActor
is assumes that 1) all metrics are implemented as fields, and 2) all fields represent metrics. This refactor makes the interface more explicit by introducing aOpRuntimeMetrics.get_metrics
method.Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.