-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Data] Time ray data tasks (total time and udf time) #43241
Conversation
Those methods have been deprecated for 3 years and they are DeveloperAPI so it's safe to remove them. Signed-off-by: Jiajun Yao <[email protected]> Signed-off-by: Matthew Owen <[email protected]>
Signed-off-by: Matthew Owen <[email protected]>
Signed-off-by: Matthew Owen <[email protected]>
Signed-off-by: Matthew Owen <[email protected]>
Signed-off-by: Matthew Owen <[email protected]>
Signed-off-by: Matthew Owen <[email protected]>
@@ -112,6 +114,14 @@ class OpRuntimeMetrics: | |||
default=0, metadata={"map_only": True, "export_metric": True} | |||
) | |||
|
|||
# TODO(mowen): Is this actually different than the wall time already computed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you verify this TODO / is it resolved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After some offline investigation, I noticed that this 1. very closely tracked the existing walltime stats and 2. was not properly being synced to the dataset stats (i.e. the time was being aggregated appropriately in the OpRuntimeMetrics object, but the dataset stats was not seeing the correct value). As such removing this for now, might revisit post GA.
total_data_tasks_time: float = field(default=0, metadata={"export_metric": False}) | ||
|
||
# Total time spent in UDFs | ||
total_data_udfs_time: float = field(default=0, metadata={"export": False}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we change these export metric fields to True before merging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually going to remove this from OpRuntimeMetrics since it is being tracked in the BlockExecStats
and then aggregated in the DatasetStats
.
python/ray/data/_internal/execution/operators/map_transformer.py
Outdated
Show resolved
Hide resolved
python/ray/data/tests/test_stats.py
Outdated
is_map=False, | ||
extra_metrics=[ | ||
"'num_output_N': N", | ||
], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unrelated change?
python/ray/data/_internal/execution/interfaces/op_runtime_metrics.py
Outdated
Show resolved
Hide resolved
total_data_tasks_time: float = field(default=0, metadata={"export_metric": False}) | ||
|
||
# Total time spent in UDFs | ||
total_data_udfs_time: float = field(default=0, metadata={"export": False}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since all the metrics are related to ray data, we can probably omit data
from the name/output. how about total_ray_tasks_time
and total_udfs_time
?
Signed-off-by: Matthew Owen <[email protected]>
Signed-off-by: Matthew Owen <[email protected]>
Signed-off-by: Matthew Owen <[email protected]>
Signed-off-by: Matthew Owen <[email protected]>
Signed-off-by: Matthew Owen <[email protected]>
Signed-off-by: Matthew Owen <[email protected]>
Signed-off-by: Matthew Owen <[email protected]>
) -> Iterable[MapTransformFnData]: | ||
while True: | ||
try: | ||
start = time.perf_counter() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if _is_udf == False
, we can skip creating the counter object to reduce overheads.
Also, I'm wondering maybe we should do timing at the MapTransformer
level. So we don't need to wrap every subclass of MapTransformFn
.
Basically, if the fn._is_udf = True, we create wrapper around it at runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to push the timing up into the MapTransformer
level and it does seem much cleaner that way. The timing wrappers are only added if the the MapTransformerFn
has _is_udf
set to True
.
Signed-off-by: Matthew Owen <[email protected]>
Signed-off-by: Matthew Owen <[email protected]>
Why are these changes needed?
This allows us to track both the total time spent in Ray Data tasks, and the time specifically spent in UDFs (primarily through map and filter functions applied to the dataset). An example output can be found below:
Code to generate the above:
Related issue number
Closes #42801
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.