[dashboard] Add `RAY_CLUSTER_ACTIVITY_HOOK` to `/api/component_activities` #26297

nikitavemuri · 2022-07-05T11:13:01Z

Why are these changes needed?

Add external hook to /api/component_activities endpoint in dashboard snapshot router
Change is_active field of RayActivityResponse to take an enum RayActivityStatus instead of bool. This is a backward incompatible change, but should be ok because [dashboard] Add component_activities API #25996 wasn't included in any branch cuts. RayActivityResponse now supports informing when there was an error getting the activity observation and the reason.

Related issue number

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

…ponent_activities_hook

rkooo567 · 2022-07-05T18:09:12Z

python/ray/_private/ray_constants.py

+# activity component type (str) to
+# ray.dashboard.modules.snapshot.snapshot_head.RayActivityResponse.
+# Example: "your.module.ray_cluster_activity_hook".
+RAY_CLUSTER_ACTIVITY_HOOK = "RAY_CLUSTER_ACTIVITY_HOOK"


Can you move it to the dashboard constants?

rkooo567 · 2022-07-05T18:13:04Z

dashboard/modules/snapshot/snapshot_head.py

+                )
+                resp[component_type] = dataclasses.asdict(component_activity_output)
+        except Exception as e:
+            logger.info(


Consider adding this to reason?

I chatted with Matt and Sofian, and we felt it is probably better to have the response of this endpoint return if there was an error getting the activity observation of any component. In that case, the reason can include the error message. What do you think?

rkooo567 · 2022-07-05T18:17:17Z

dashboard/modules/snapshot/snapshot_head.py

        driver_activity_info = await self._get_job_activity_info(timeout=timeout)

+        external_ray_cluster_activity_output = {}
+        if ray_constants.RAY_CLUSTER_ACTIVITY_HOOK in os.environ:
+            external_ray_cluster_activity_output = _load_class(


I am confused how this would work. Isn't it supposed to "run a function" instead of just loading the output class? I believe it is not possible to dynamically set env var in the running process?

We are expecting the environment variable to be set before ray is initialized, and we not supporting the use case of the environment variable itself being updated. _load_class loads either a class or a function from an external module and the function is run on line 130 with ()

rkooo567 · 2022-07-05T18:20:10Z

dashboard/modules/snapshot/tests/test_snapshot.py

+        "ray.dashboard.optional_utils.ClassMethodRouteTable", MockClassMethodRouteTable
+    ).start()
+    mock_load_class = Mock(return_value=Mock(return_value=cluster_activity_hook_output))
+    os.environ[ray_constants.RAY_CLUSTER_ACTIVITY_HOOK] = "mock_module"


can you test with a function that dynamically changes the output instead of a static output?

yes, updated

rkooo567

LGTM. One more test request for e2e test

rkooo567 · 2022-07-06T07:36:48Z

dashboard/modules/snapshot/tests/test_snapshot.py

+            "driver": {"is_active": False, "reason": None, "timestamp": None},
+            "new_component": {"is_active": False, "reason": None, "timestamp": None},
+        }
+    os.environ.pop(RAY_CLUSTER_ACTIVITY_HOOK)


Why don't we have a real e2e test here?

a = {} def endpoint(): return {"is_active": len(a) > 0, "reason": None, "timestamp": None} RAY_CLUSTER_ACTIVITY_HOOK=tests.test_snapshot.endpoint ray start --head # Call API and make sure `is_active == False` a = {"a": 1} # call API and make sure `is_active==True`

Updated to make this an e2e test. I wasn't sure initially how to import methods from the tests folder, but this works from ray._private.test_utils

ijrsvt · 2022-07-07T15:39:30Z

dashboard/consts.py

+# Hook that is invoked on the dashboard `/api/component_activities` endpoint.
+# It does not take any arguments and should return a dictionary mapping
+# activity component type (str) to
+# ray.dashboard.modules.snapshot.snapshot_head.RayActivityResponse.
+# Example: "your.module.ray_cluster_activity_hook".
+RAY_CLUSTER_ACTIVITY_HOOK = "RAY_CLUSTER_ACTIVITY_HOOK"


Can we mention that this path should correspond to a Callable type.

ijrsvt · 2022-07-07T15:40:47Z

dashboard/modules/snapshot/snapshot_head.py

+                external_activity_output = _load_class(
+                    os.environ[RAY_CLUSTER_ACTIVITY_HOOK]
+                )()


Suggested change

external_activity_output = _load_class(

os.environ[RAY_CLUSTER_ACTIVITY_HOOK]

)()

cluster_activity_callable = _load_class(

os.environ[RAY_CLUSTER_ACTIVITY_HOOK]

)

external_activity_output = cluster_activity_callable()

This is probably not linted correctly, but maybe making it more explicit about what is going on.

sure, updated

ijrsvt

Overall LGTM.

The backwards incompatible change should be fine since this field was only introduced a week ago (and we haven't released Ray in the last week)

…vemuri/ray into component_activities_hook

* master: (42 commits) [dashboard][2/2] Add endpoints to dashboard and dashboard_agent for liveness check of raylet and gcs (ray-project#26408) [Doc] Fix docs feedback button (ray-project#26402) [core][1/2] Improve liveness check in GCS (ray-project#26405) [RLlib] Checkpoint and restore connectors. (ray-project#26253) [Workflow] Minor refactoring of workflow exceptions (ray-project#26398) [workflow] Workflow queue (ray-project#24697) [RLlib] Minor simplification of code. (ray-project#26312) [AIR] Update TensorflowPredictor to new API (ray-project#26215) [RLlib] Make Dataset reader default reader and enable CRR to use dataset (ray-project#26304) [runtime_env] [doc] Remove outdated info about "isolated" environment (ray-project#26314) [Doc] Fix rate-the-docs plugin (ray-project#26384) [Docs] [Serve] Has a consistent landing page style (ray-project#26029) [dashboard] Add `RAY_CLUSTER_ACTIVITY_HOOK` to `/api/component_activities` (ray-project#26297) [tune] Use `Checkpoint.to_bytes()` for store_to_object (ray-project#25805) [tune] Fix `SyncerCallback` having a size limit (ray-project#26371) [air] Serialize additional files in dict checkpoints turned dir checkpoints (ray-project#26351) [Docs] Add "rate the docs" plugin for feedback on docs (ray-project#26330) [Doc] Fix actor example (ray-project#26381) Set RAY_USAGE_STATS_EXTRA_TAGS for release tests (ray-project#26366) [Datasets] Update docs for drop_columns and fix typos (ray-project#26317) ...

…ties` (ray-project#26297) Add external hook to /api/component_activities endpoint in dashboard snapshot router Change is_active field of RayActivityResponse to take an enum RayActivityStatus instead of bool. This is a backward incompatible change, but should be ok because [dashboard] Add component_activities API ray-project#25996 wasn't included in any branch cuts. RayActivityResponse now supports informing when there was an error getting the activity observation and the reason. Signed-off-by: Stefan van der Kleij <[email protected]>

nikitavemuri added 4 commits July 5, 2022 01:03

add hook

7e074ef

update hook

b04208a

update test

9982fdf

linting

a6dbd92

nikitavemuri marked this pull request as ready for review July 5, 2022 11:18

nikitavemuri requested review from wuisawesome, ijrsvt, edoakes, alanwguo and architkulkarni as code owners July 5, 2022 11:18

nikitavemuri assigned rkooo567 and wuisawesome Jul 5, 2022

Merge branch 'master' of https://github.com/nikitavemuri/ray into com…

b3bc929

…ponent_activities_hook

rkooo567 reviewed Jul 5, 2022

View reviewed changes

nikitavemuri added 2 commits July 5, 2022 11:42

update test output

f69fdbd

update per comments

e8844b2

rkooo567 approved these changes Jul 6, 2022

View reviewed changes

rkooo567 added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Jul 6, 2022

nikitavemuri added 2 commits July 6, 2022 12:34

add error to endpoint response

14a5134

update to e2e test

ba3bd0e

nikitavemuri removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Jul 6, 2022

nikitavemuri requested a review from rkooo567 July 6, 2022 20:47

nikitavemuri added 2 commits July 6, 2022 14:43

update test

37796da

update test

66add06

rkooo567 approved these changes Jul 7, 2022

View reviewed changes

ijrsvt reviewed Jul 7, 2022

View reviewed changes

update

30dbe15

nikitavemuri requested a review from ijrsvt July 7, 2022 15:56

ijrsvt approved these changes Jul 7, 2022

View reviewed changes

nikitavemuri and others added 5 commits July 7, 2022 11:45

Merge branch 'ray-project:master' into component_activities_hook

af456bd

fix imports in tests

a044ac8

Merge branch 'component_activities_hook' of https://github.com/nikita…

97153bf

…vemuri/ray into component_activities_hook

linting

ec12552

linting

680b33d

nikitavemuri requested review from richardliaw and ericl as code owners July 7, 2022 21:47

nikitavemuri added 2 commits July 7, 2022 14:49

undo

be17dfc

linting

e3860f1

rkooo567 merged commit 56716a1 into ray-project:master Jul 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[dashboard] Add `RAY_CLUSTER_ACTIVITY_HOOK` to `/api/component_activities` #26297

[dashboard] Add `RAY_CLUSTER_ACTIVITY_HOOK` to `/api/component_activities` #26297

nikitavemuri commented Jul 5, 2022 •

edited

Loading

rkooo567 Jul 5, 2022

rkooo567 Jul 5, 2022

nikitavemuri Jul 6, 2022

rkooo567 Jul 5, 2022

nikitavemuri Jul 5, 2022

rkooo567 Jul 5, 2022

nikitavemuri Jul 5, 2022

rkooo567 left a comment

rkooo567 Jul 6, 2022

nikitavemuri Jul 6, 2022

ijrsvt Jul 7, 2022

ijrsvt Jul 7, 2022

nikitavemuri Jul 7, 2022

ijrsvt left a comment

[dashboard] Add RAY_CLUSTER_ACTIVITY_HOOK to /api/component_activities #26297

[dashboard] Add RAY_CLUSTER_ACTIVITY_HOOK to /api/component_activities #26297

Conversation

nikitavemuri commented Jul 5, 2022 • edited Loading

Why are these changes needed?

Related issue number

Checks

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rkooo567 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ijrsvt left a comment

Choose a reason for hiding this comment

[dashboard] Add `RAY_CLUSTER_ACTIVITY_HOOK` to `/api/component_activities` #26297

[dashboard] Add `RAY_CLUSTER_ACTIVITY_HOOK` to `/api/component_activities` #26297

nikitavemuri commented Jul 5, 2022 •

edited

Loading