From d17d62bd3967cb69d8ce7256599793c4474a15f5 Mon Sep 17 00:00:00 2001 From: Tara Charter Date: Mon, 16 Sep 2024 12:10:09 -0500 Subject: [PATCH] docs: Add cluster overview Docs is missing a cluster overview - how to manage cluster via webUI --- docs/manage/_index.rst | 6 ++ docs/manage/cluster-overview.rst | 96 ++++++++++++++++++++++++++++++++ 2 files changed, 102 insertions(+) create mode 100644 docs/manage/cluster-overview.rst diff --git a/docs/manage/_index.rst b/docs/manage/_index.rst index b1ea76dfab36..601749f784eb 100644 --- a/docs/manage/_index.rst +++ b/docs/manage/_index.rst @@ -11,6 +11,12 @@
+

Historical Cluster Usage Data

diff --git a/docs/manage/cluster-overview.rst b/docs/manage/cluster-overview.rst new file mode 100644 index 000000000000..73f37fc82685 --- /dev/null +++ b/docs/manage/cluster-overview.rst @@ -0,0 +1,96 @@ +.. _cluster-overview: + +########################## + Cluster Overview (WebUI) +########################## + +The Cluster Overview page in the WebUI provides a comprehensive view of your Determined cluster's status, resource utilization, and configuration. This page is accessible to users with appropriate permissions and offers valuable insights into cluster performance and management. + +******************** + Accessing the Page +******************** + +To access the Cluster Overview: + +1. Sign in to the WebUI. +2. From the left navigation pane, select **Cluster**. +3. The overview will be the default view under the Cluster section. + +******************** + Page Components +******************** + +The Cluster Overview page consists of several key components: + +Resource Utilization +==================== + +This section displays real-time information about the cluster's resource usage: + +- Total GPUs/CPUs available +- Currently active GPUs/CPUs +- Percentage of resource utilization + +Resource Pools +============== + +A list of configured resource pools, including: + +- Pool names +- Number of GPUs/CPUs in each pool +- Current utilization of each pool + +For more details on resource pools, visit :ref:`resource-pools`. + +Cluster Topology +================ + +A visual representation of the cluster's node and GPU distribution: + +- Each node is displayed with its unique identifier +- The number of available and in-use slots on each node +- GPU types (if applicable) + +To view detailed topology information: + +1. Navigate to Resource Pools from the Cluster section. +2. Select a specific Resource Pool. +3. Look for the **Topology** section in the resource pool details page. + +Job Queue +========= + +An overview of the current job queue, including: + +- Number of queued jobs +- Job priorities +- Estimated start times + +For more information on managing the job queue, see :ref:`job-queue`. + +Cluster Configuration +===================== + +Key configuration settings for the cluster, such as: + +- Master node information +- Scheduler type +- Version information + +***************** + Actions +***************** + +From the Cluster Overview page, administrators can perform several actions: + +- Modify resource pool settings +- Adjust job queue priorities +- Access detailed logs and metrics + +For specific instructions on these actions, refer to the respective documentation sections. + +***************** + Troubleshooting +***************** + +If you encounter issues or need more information about cluster management, visit the :ref:`troubleshooting` guide or contact your system administrator. \ No newline at end of file