Skip to content

Commit

Permalink
feat(system_monitor): add IP packet reassembles failed monitoring to …
Browse files Browse the repository at this point in the history
…net_monitor (#1427)

* feat(system_monitor): add IP packet reassembles failed monitoring to net_monitor

Signed-off-by: v-nakayama7440-esol <[email protected]>

* fix build errors caused by merge mistakes

Signed-off-by: v-nakayama7440-esol <[email protected]>

* fix(system_monitor): chang word Reasm and fix deep nesting

Signed-off-by: v-nakayama7440-esol <[email protected]>

* fix(system_monitor): fix deep nesting

Signed-off-by: v-nakayama7440-esol <[email protected]>

* fix(system_monitor): lightweight /proc/net/snmp reading

Signed-off-by: v-nakayama7440-esol <[email protected]>

* fix(system_monitor): fix index variable type to unsigned, add log output, and make index evaluation expression easier to understand

Signed-off-by: v-nakayama7440-esol <[email protected]>

* fix(system_monitor): remove unnecessary static_cast

Signed-off-by: v-nakayama7440-esol <[email protected]>

* fix(system_monitor): typo fix

Signed-off-by: v-nakayama7440-esol <[email protected]>

Signed-off-by: v-nakayama7440-esol <[email protected]>
Co-authored-by: ito-san <[email protected]>
  • Loading branch information
v-nakayama7440-esol and ito-san authored Sep 20, 2022
1 parent 89154c9 commit 1cc2a07
Show file tree
Hide file tree
Showing 8 changed files with 234 additions and 30 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,5 @@
monitor_program: "greengrass"
crc_error_check_duration: 1
crc_error_count_threshold: 1
reassembles_failed_check_duration: 1
reassembles_failed_check_count: 1
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,12 @@
contains: [": Network CRC Error"]
timeout: 3.0

ip_packet_reassembles_failed:
type: diagnostic_aggregator/GenericAnalyzer
path: ip_packet_reassembles_failed
contains: [": IP Packet Reassembles Failed"]
timeout: 3.0

storage:
type: diagnostic_aggregator/AnalyzerGroup
path: storage
Expand Down
47 changes: 24 additions & 23 deletions system/system_monitor/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,29 +53,30 @@ Every topic is published in 1 minute interval.

[Usage] ✓:Supported, -:Not supported

| Node | Message | Intel | arm64(tegra) | arm64(raspi) | Notes |
| --------------- | ---------------------- | :---: | :----------: | :----------: | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| CPU Monitor | CPU Temperature |||| |
| | CPU Usage |||| |
| | CPU Load Average |||| |
| | CPU Thermal Throttling || - || |
| | CPU Frequency |||| Notification of frequency only, normally error not generated. |
| HDD Monitor | HDD Temperature |||| |
| | HDD PowerOnHours |||| |
| | HDD TotalDataWritten |||| |
| | HDD Usage |||| |
| Memory Monitor | Memory Usage |||| |
| Net Monitor | Network Usage |||| |
| | Network CRC Error |||| Warning occurs when the number of CRC errors in the period reaches the threshold value. The number of CRC errors that occur is the same as the value that can be confirmed with the ip command. |
| NTP Monitor | NTP Offset |||| |
| Process Monitor | Tasks Summary |||| |
| | High-load Proc[0-9] |||| |
| | High-mem Proc[0-9] |||| |
| GPU Monitor | GPU Temperature ||| - | |
| | GPU Usage ||| - | |
| | GPU Memory Usage || - | - | |
| | GPU Thermal Throttling || - | - | |
| | GPU Frequency ||| - | For Intel platform, monitor whether current GPU clock is supported by the GPU. |
| Node | Message | Intel | arm64(tegra) | arm64(raspi) | Notes |
| --------------- | ---------------------------- | :---: | :----------: | :----------: | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| CPU Monitor | CPU Temperature |||| |
| | CPU Usage |||| |
| | CPU Load Average |||| |
| | CPU Thermal Throttling || - || |
| | CPU Frequency |||| Notification of frequency only, normally error not generated. |
| HDD Monitor | HDD Temperature |||| |
| | HDD PowerOnHours |||| |
| | HDD TotalDataWritten |||| |
| | HDD Usage |||| |
| Memory Monitor | Memory Usage |||| |
| Net Monitor | Network Usage |||| |
| | Network CRC Error |||| Warning occurs when the number of CRC errors in the period reaches the threshold value. The number of CRC errors that occur is the same as the value that can be confirmed with the ip command. |
| | IP Packet Reassembles Failed |||| |
| NTP Monitor | NTP Offset |||| |
| Process Monitor | Tasks Summary |||| |
| | High-load Proc[0-9] |||| |
| | High-mem Proc[0-9] |||| |
| GPU Monitor | GPU Temperature ||| - | |
| | GPU Usage ||| - | |
| | GPU Memory Usage || - | - | |
| | GPU Thermal Throttling || - | - | |
| | GPU Frequency ||| - | For Intel platform, monitor whether current GPU clock is supported by the GPU. |

## ROS parameters

Expand Down
2 changes: 2 additions & 0 deletions system/system_monitor/config/net_monitor.param.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,5 @@
monitor_program: "greengrass"
crc_error_check_duration: 1
crc_error_count_threshold: 1
reassembles_failed_check_duration: 1
reassembles_failed_check_count: 1
14 changes: 8 additions & 6 deletions system/system_monitor/docs/ros_parameters.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,12 +53,14 @@ mem_monitor:

net_monitor:

| Name | Type | Unit | Default | Notes |
| :------------------------ | :----------: | :-----: | :-----: | :-------------------------------------------------------------------------------------------------------------- |
| devices | list[string] | n/a | none | The name of network interface to monitor. (e.g. eth0, \* for all network interfaces) |
| usage_warn | float | %(1e-2) | 0.95 | Generates warning when network usage reaches a specified value or higher. |
| crc_error_check_duration | int | sec | 1 | CRC error check duration. |
| crc_error_count_threshold | int | n/a | 1 | Generates warning when count of CRC errors during CRC error check duration reaches a specified value or higher. |
| Name | Type | Unit | Default | Notes |
| :-------------------------------- | :----------: | :-----: | :-----: | :--------------------------------------------------------------------------------------------------------------------------------------------------- |
| devices | list[string] | n/a | none | The name of network interface to monitor. (e.g. eth0, \* for all network interfaces) |
| usage_warn | float | %(1e-2) | 0.95 | Generates warning when network usage reaches a specified value or higher. |
| crc_error_check_duration | int | sec | 1 | CRC error check duration. |
| crc_error_count_threshold | int | n/a | 1 | Generates warning when count of CRC errors during CRC error check duration reaches a specified value or higher. |
| reassembles_failed_check_duration | int | sec | 1 | IP packet reassembles failed check duration. |
| reassembles_failed_check_count | int | n/a | 1 | Generates warning when count of IP packet reassembles failed during IP packet reassembles failed check duration reaches a specified value or higher. |

## <u>NTP Monitor</u>

Expand Down
18 changes: 18 additions & 0 deletions system/system_monitor/docs/topics_net_monitor.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,3 +81,21 @@
| Network [0-9]: interface name | wlp82s0 |
| Network [0-9]: total rx_crc_errors | 0 |
| Network [0-9]: rx_crc_errors per unit time | 0 |

## <u>IP Packet Reassembles Failed</u>

/diagnostics/net_monitor: IP Packet Reassembles Failed

<b>[summary]</b>

| level | message |
| ----- | ------------------ |
| OK | OK |
| WARN | reassembles failed |

<b>[values]</b>

| key | value (example) |
| --------------------------------------- | --------------- |
| total packet reassembles failed | 0 |
| packet reassembles failed per unit time | 0 |
Loading

0 comments on commit 1cc2a07

Please sign in to comment.