Skip to content

Commit

Permalink
RDoc-2948 & RDoc-2949 Add the Cluster Health section to Cloud -> Main…
Browse files Browse the repository at this point in the history
…tenance & Troubleshooting
  • Loading branch information
PFYasu committed Sep 23, 2024
1 parent 7e0abfd commit e4ca102
Show file tree
Hide file tree
Showing 6 changed files with 76 additions and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -25,4 +25,80 @@ status.
| Low uptime - below 12 hours | Product's uptime has been below 12 hours for at least 24 hours. |
| Low uptime - below 48 hours | Product's uptime has been below 48 hours for at least 96 hours. |

{PANEL/}

{PANEL: Cluster Health}

**Cluster Health** helps keep your cluster healthy by providing suggestions and incidents history.
This dashboard has two separated sections:

- [Incidents History](cloud-maintenance-troubleshooting#incidents-history)
- [Suggestions](cloud-maintenance-troubleshooting#suggestions)

---

## Incidents History

The *RavenDB Cloud* monitoring system tracks incidents and cluster performance.
This section lets you analyse your cluster incidents for a selected *time period*, *cluster node*, *severity* and *category*.

!["Cluster Health: Incidents History section"](images\cluster-health-incidents-history.png "Cluster Health: Incidents History section")

#### Categories and their descriptions

Incidents are split into *six* categories. **Description** column contains additional information about the incidents.

| Category name | Description |
|---------------|----------------------------------------------------------------------------------------|
| Uptime | Product's uptime has been below a specified time. |
| Memory | Product is running low on available memory. |
| CPU | Product is running low on CPU credits or experiencing a high level of CPU utilization. |
| Disk | Product's available disk space is low. |
| IO | Product is experiencing high input/output operations. |
| Availability | Product is currently not responding to input or commands or has been restarted. |

---

## Suggestions

This section displays suggestions for a selected *cluster node* based on incident trends.

!["Cluster Health: Suggestions section"](images\cluster-health-suggestions.png "Cluster Health: Suggestions section")

Suggestions are generated based on a 60-day period divided in half.

**First period** (hereinafter referred to as **previous period**) lasts from `now - 60 days` to `now - 30 days`.
**Second period** (hereinafter referred to as **current period**) lasts from `now - 30 days` to `now`.

#### Suggestion types

**Cluster Health** is able to generate suggestions for *five* usage areas:

- High CPU usage
- High IO usage
- Low memory mode
- Low CPU credits
- Server restarted due to Out of Memory

#### Analyses

Suggestions are generated using three types of analyses for specific suggestion types:

| Analyse type | Applicable to | Description |
|--------------------------------------------------|-----------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------|
| Current period above previous period by percents | High CPU usage, High IO usage, Low memory mode, Low CPU credits | The duration of a specific event was longer in the current period than in the previous period. |
| Current period above previous period by numbers | Server restarted due to Out of Memory | The number of occurrences of a specific event was higher in the previous period than in the current period. |
| Current period above threshold by percents | High CPU usage, High IO usage, Low memory mode, Low CPU credits | The duration of a specific event in the current period was longer than the threshold (5%). |

Below examples of the suggestions with the analyses:

!["Cluster Health: An example of `Current period above previous period by percents` analyse"](images\cluster-health-suggestions-current-month-above-previous-month-by-percents.png "Cluster Health: An example of `Current period above previous period by percents` analyse")


!["Cluster Health: An example of `Current period above previous period by numbers` analyse"](images\cluster-health-suggestions-current-month-above-previous-month-by-numbers.png "Cluster Health: An example of `Current period above previous period by numbers` analyse")


!["Cluster Health: An example of `Current period above threshold by percents` analyse"](images\cluster-health-suggestions-current-month-above-threshold-by-percents.png "Cluster Health: An example of `Current period above threshold by percents` analyse")


{PANEL/}
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit e4ca102

Please sign in to comment.