From f98e86258d498e5173d51da6be10d952657362c4 Mon Sep 17 00:00:00 2001 From: Andriy Moroz Date: Fri, 3 May 2019 14:32:20 +0300 Subject: [PATCH 1/2] Add SSD Health feature design Signed-off-by: Andriy Moroz --- doc/ssdhealth_design.md | 155 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 155 insertions(+) create mode 100644 doc/ssdhealth_design.md diff --git a/doc/ssdhealth_design.md b/doc/ssdhealth_design.md new file mode 100644 index 0000000000..70475e091f --- /dev/null +++ b/doc/ssdhealth_design.md @@ -0,0 +1,155 @@ +## Motivation +Add to SONiC an ability to check storage health state. Basic functionality will be implemented as a CLI command. Optionally pmon daemon could be added for constant disk state monitoring. + +## CLI + +### Syntax + show platform ssdhealth [verbose/vendor] + +### Output example +#### Brief + admin@sonic-switch: ~$ show platform ssdhealth + Device Model : InnoDisk Corp. - mSATA 3ME + Health: 72.9% + Temperature: N/A + admin@sonic-switch: ~$ + +#### Verbose + admin@sonic-switch: ~$ show platform ssdhealth verbose + Device Model : InnoDisk Corp. - mSATA 3ME + FW Version : S140714 + Serial Number : 20160429AA1134000035 + Health : 72.9% + Capacity : 29.818199 GB + Temperature : N/A + Power On Hours : 1576 hours + Power Cycle count: 130 + Something else??? + +#### Vendor + admin@sonic-switch: ~$ show platform ssdhealth vendor + + ******************************************************************************************** + * Innodisk iSMART V3.9.41 2018/05/25 * + ******************************************************************************************** + Model Name: InnoDisk Corp. - mSATA 3ME + FW Version: S140714 + Serial Number: 20160429AA1134000035 + Health: 72.900% + Capacity: 29.818199 GB + P/E Cycle: 3000 + Lifespan : 1576 (Years : 4 Months : 3 Days : 26) + Write Protect: Disable + InnoRobust: Enable + -------------------------------------------------------------------------------------------- + ID SMART Attributes Value Raw Value + -------------------------------------------------------------------------------------------- + [09] Power On Hours [18304] [090200646480470000000000] + [0C] Power Cycle Count [ 130] [0C0200646482000000000000] + [AA] Total Bad Block Count [ 15] [AA0300646400000F00000000] + [AD] Erase Count Max. [ 883] [AD020064642D037303000000] + [AD] Erase Count Avg. [ 813] [AD020064642D037303000000] + [C2] Temperature [ 0] [000000000000000000000000] + [EB] Later Bad Block [ 0] [EB0200640000000000000000] + [EB] Read Block [ 0] [EB0200640000000000000000] + [EB] Write Block [ 0] [EB0200640000000000000000] + [EB] Erase Block [ 0] [EB0200640000000000000000] + [EC] Unstable Power Count [ 0] [EC0200646400000000000000] + admin@sonic-switch: ~$ + +## Implementation +### Generic part +#### 'show' utility update +New item under menu `platform` in `show/main.py` +It will execute "ssdhealth -d /dev/sdX" [options] + +#### ssdhealth utility +New utility in `sonic-utilities/scripts/` +It will import device plugin `ssdutil.py` and print the output returned by different API functions + +**Syntax:** + + root@mts-sonic-dut:/home/admin# ssdhealth -h + usage: ssdhealth -d DEVICE [-h] [-v] [-e] + + Show disk device health status + + optional arguments: + -h, --help show this help message and exit + -d, --device disk device to get information for + -v, --verbose show verbose output (more parameters) + -e, --vEndor show vendor specific disk information + + Examples: + ssdhealth -d /dev/sda + ssdhealth -d /dev/sda -v + ssdhealth -d /dev/sda -e + + +#### Plugins design +##### Class SsdBase +Location: `sonic-buildimage/src/sonic-platform-common/build/lib/sonic_psu/ssd_base.py` +Generic implementation of the API. Will use specific utilities for known disks or the `systemctl` utility for others. Since not all disk models are in smartctl's database, some information can be unavailable or incomplete. + + class SsdBase: + ... + +##### Class SsdUtil +Inherited from SsdBase. Can be implemented by vendors to provide detailed info about the disk installed. +Location: `sonic-buildimage/device/{{vendor}}/platform/plugins/ssdutil.py` + + class SsdUtil(SsdBase): + ... + +#### API +* **get\_disk\_health(diskdev)** + * Accepts: + * diskdev:string - disk device name (e.g. /dev/sda) + * Returns: + * res:float - Floating point in range 0-100 representing disk health in percentages. -1 if not available +* **get\_temperature(diskdev)** + * Accepts: + * diskdev:string - disk device name (e.g. /dev/sda) + * Returns: + * res:string - Integer (floating point?) disk temperature in centigrade. Zero if not available +* **get\_model(diskdev)** + * Accepts: + * diskdev:string - disk device name (e.g. /dev/sda) + * Returns: + * res:string - Human readable string holding disk model. Empty if not available +* **get\_firmware(diskdev)** + * Accepts: + * diskdev:string - disk device name (e.g. /dev/sda) + * Returns: + * res:string - Human readable string holding disk firmware version. Empty if not available +* **get\_serial(diskdev)** + * Accepts: + * diskdev:string - disk device name (e.g. /dev/sda) + * Returns: + * res:string - Human readable string holding disk serial number. Empty if not available +* **get\_vendor_output(diskdev)** + * Accepts: + * diskdev:string - disk device name (e.g. /dev/sda) + * Returns: + * res:string - Human readable string. Output of vendor application. Empty if not available + +## Utilities and packages +#### smartctl +Part of smartmontools package (1.9M) +PR: [https://github.com/Azure/sonic-buildimage/pull/2703](https://github.com/Azure/sonic-buildimage/pull/2703) + +#### iSmart +Utility for InnoDisk Corp. SSDs (<120K) +https://www.innodisk.com/en/iService/utility/iSMART +Need to be added as binary. + +#### SmartCmd +Utility for StorFly and Virtium (2.2M) + +## (Optional) Daemon for monitoring +Daemon in Pmon (ssdmond) which will periodically query disk health (get_health()) and raise alarm when value decides to some critical value. + +## Open questions +1. Daemon and monitoring? +2. SNMP needed? + From bd04b64bf17ba388e5043c66d4f924174f4c2d8d Mon Sep 17 00:00:00 2001 From: Andriy Moroz Date: Mon, 8 Jul 2019 12:51:40 +0000 Subject: [PATCH 2/2] Fix base class path Signed-off-by: Andriy Moroz --- doc/ssdhealth_design.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/ssdhealth_design.md b/doc/ssdhealth_design.md index 70475e091f..886ba826db 100644 --- a/doc/ssdhealth_design.md +++ b/doc/ssdhealth_design.md @@ -88,7 +88,7 @@ It will import device plugin `ssdutil.py` and print the output returned by diffe #### Plugins design ##### Class SsdBase -Location: `sonic-buildimage/src/sonic-platform-common/build/lib/sonic_psu/ssd_base.py` +Location: `sonic-buildimage/src/sonic-platform-common/sonic_platform_base/sonic_ssd/ssd_base.py` Generic implementation of the API. Will use specific utilities for known disks or the `systemctl` utility for others. Since not all disk models are in smartctl's database, some information can be unavailable or incomplete. class SsdBase: