Skip to content

Commit

Permalink
[Data] Re-phrase the streaming executor current usage string (ray-pro…
Browse files Browse the repository at this point in the history
…ject#47515)

## Why are these changes needed?

The progress bar for ray data could still end up showing higher
utilization of what the cluster currently have.
ray-project#46729 was the first attempt to
fix it which addressed the issue in static clusters, but we still have
that issue for clusters that autoscales. This change simply rephrase the
string so it is less confusing.

Before
<img width="1249" alt="image"
src="https://github.com/user-attachments/assets/049ea096-a87f-4767-ba04-6d00d7c2755d">

After
<img width="1248" alt="image"
src="https://github.com/user-attachments/assets/cb74c0dc-1f33-4b22-b31c-e83df2a5d408">

This comes from the fact that operators don't track the task state (and
currently ray core does not even provide that api). Which means Ray data
operators does not know if the task is assigned to a node or not, so
once the task is submitted to ray it is marked active even if it is
pending a node assignment. The dashboard does better here since it does
have extra information from the task.

<img width="1493" alt="image"
src="https://github.com/user-attachments/assets/9315b884-3e61-4b32-8400-7f76e15b6a4b">

In the future we can visit adding the core api for remote state
reporting and allowing operators to provide more detailed state (active,
pending_scheduled, pending_node_assignment).

## Related issue number

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: Sofian Hnaide <[email protected]>
Co-authored-by: scottjlee <[email protected]>
Co-authored-by: matthewdeng <[email protected]>
Signed-off-by: ujjawal-khare <[email protected]>
  • Loading branch information
3 people authored and ujjawal-khare committed Oct 15, 2024
1 parent 1f14338 commit 9f24fd9
Showing 1 changed file with 8 additions and 3 deletions.
11 changes: 8 additions & 3 deletions python/ray/data/_internal/execution/streaming_executor.py
Original file line number Diff line number Diff line change
Expand Up @@ -363,9 +363,14 @@ def _report_current_usage(self) -> None:
pending_usage = self._resource_manager.get_global_pending_usage()
limits = self._resource_manager.get_global_limits()
resources_status = (
# TODO(scottjlee): Add dataset name/ID to progress bar output.
"Running Dataset. Active & requested resources: "
f"{running_usage.cpu:.4g}/{limits.cpu:.4g} CPU, "
"Active & requested resources: "
f"{running_usage.cpu:.4g} of {limits.cpu:.4g} available CPU, "
f"{running_usage.gpu:.4g} of {limits.gpu:.4g} available GPU, "
f"{running_usage.object_store_memory_str()} of "
f"{limits.object_store_memory_str()} available object_store_memory "
"(pending: "
f"{pending_usage.cpu:.4g} CPU, "
f"{pending_usage.gpu:.4g} GPU)"
)
if running_usage.gpu > 0:
resources_status += f"{running_usage.gpu:.4g}/{limits.gpu:.4g} GPU, "
Expand Down

0 comments on commit 9f24fd9

Please sign in to comment.