Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Smart not for all devices #4720

Closed
sachaz opened this issue Sep 19, 2018 · 5 comments · Fixed by #5765
Closed

Smart not for all devices #4720

sachaz opened this issue Sep 19, 2018 · 5 comments · Fixed by #5765
Assignees
Labels
area/smart bug unexpected problem or unintended behavior
Milestone

Comments

@sachaz
Copy link

sachaz commented Sep 19, 2018

Relevant telegraf.conf:

[[outputs.prometheus_client]]
#   ## Address to listen on
listen = ":9101"

[[inputs.smart]]
devices = [ "/dev/ciss0 -a -d cciss,0", "/dev/ciss0 -a -d cciss,1", "/dev/ciss0 -a -d cciss,2", "/dev/ciss0 -a -d cciss,3", "/dev/ciss0 -a -d cciss,4", "/dev/ciss0 -a -d cciss,5", "/dev/ciss0 -a -d cciss,6", "/dev/ciss0 -a -d cciss,7" ]

System info:

FreeBSD 11.1-RELEASE-p13
telegraf-1.6.3

Steps to reproduce:

From the 8 devices in the telegraf.conf I got only 2 devices result :

smart_device_exit_status{device="ciss0",enabled="Enabled",host="poseidon.aquilenet.fr",model="",serial_no="",wwn=""} 0
smart_device_exit_status{device="ciss0",enabled="Enabled",host="poseidon.aquilenet.fr",model="Samsung SSD 850 PRO 256GB",serial_no="S251NX0H869352H",wwn="500253884027f72e"} 4
smart_device_exit_status{device="ciss0",enabled="Enabled",host="poseidon.aquilenet.fr",model="Samsung SSD 850 PRO 256GB",serial_no="S251NX0H869353L",wwn="500253884027f72f"} 4
# HELP smart_device_udma_crc_errors Telegraf collected metric
# TYPE smart_device_udma_crc_errors untyped
smart_device_udma_crc_errors{device="ciss0",enabled="Enabled",host="poseidon.aquilenet.fr",model="Samsung SSD 850 PRO 256GB",serial_no="S251NX0H869352H",wwn="500253884027f72e"} 0
smart_device_udma_crc_errors{device="ciss0",enabled="Enabled",host="poseidon.aquilenet.fr",model="Samsung SSD 850 PRO 256GB",serial_no="S251NX0H869353L",wwn="500253884027f72f"} 0

Expected behavior:

smart result on all disks

Actual behavior:

only 2 disks

Additional info:

on the system a request on other devices like following exemple works:
/usr/local/sbin/smartctl --info --health --attributes --tolerance=verypermissive -n standby --format=brief /dev/ciss0 -a -d cciss,7

@glinton glinton added the bug unexpected problem or unintended behavior label Sep 20, 2018
@danielnelson
Copy link
Contributor

@sachaz Can you add the output of your command:

/usr/local/sbin/smartctl --info --health --attributes --tolerance=verypermissive -n standby --format=brief /dev/ciss0 -a -d cciss,7

@sachaz
Copy link
Author

sachaz commented Sep 21, 2018

Sure, here it is & thanks for your answer

poseidon|17:49|:~# /usr/local/sbin/smartctl --info --health --attributes --tolerance=verypermissive -n standby --format=brief /dev/ciss0 -a -d cciss,7
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-RELEASE-p13 amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

CHECK POWER MODE: incomplete response, ATA output registers missing
CHECK POWER MODE not implemented, ignoring -n option
=== START OF INFORMATION SECTION ===
Model Family:     Samsung based SSDs
Device Model:     Samsung SSD 850 PRO 256GB
Serial Number:    S251NX0H869353L
LU WWN Device Id: 5 002538 84027f72f
Firmware Version: EXM02B6Q
User Capacity:    256 060 514 304 bytes [256 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4c
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Fri Sep 21 17:49:16 2018 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART Status not supported: Incomplete response, ATA output registers missing
SMART overall-health self-assessment test result: PASSED
Warning: This result is based on an Attribute check.

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(    0) seconds.
Offline data collection
capabilities: 			 (0x53) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					No Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 ( 136) minutes.
SCT capabilities: 	       (0x003d)	SCT Status supported.
					SCT Error Recovery Control supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  5 Reallocated_Sector_Ct   PO--CK   099   099   010    -    1
  9 Power_On_Hours          -O--CK   094   094   000    -    26732
 12 Power_Cycle_Count       -O--CK   099   099   000    -    51
177 Wear_Leveling_Count     PO--C-   001   001   000    -    7282
179 Used_Rsvd_Blk_Cnt_Tot   PO--C-   099   099   010    -    1
181 Program_Fail_Cnt_Total  -O--CK   100   100   010    -    0
182 Erase_Fail_Count_Total  -O--CK   099   099   010    -    1
183 Runtime_Bad_Block       PO--C-   099   099   010    -    1
187 Uncorrectable_Error_Cnt -O--CK   100   100   000    -    0
190 Airflow_Temperature_Cel -O--CK   081   069   000    -    19
195 ECC_Error_Rate          -O-RC-   200   200   000    -    0
199 CRC_Error_Count         -OSRCK   100   100   000    -    0
235 POR_Recovery_Count      -O--C-   099   099   000    -    50
241 Total_LBAs_Written      -O--CK   099   099   000    -    61956393677
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     26717         -
# 2  Short offline       Completed without error       00%     26693         -
# 3  Short offline       Completed without error       00%     26669         -
# 4  Short offline       Completed without error       00%     26645         -
# 5  Short offline       Completed without error       00%     26621         -
# 6  Short offline       Completed without error       00%     26596         -
# 7  Extended offline    Completed without error       00%     26574         -
# 8  Short offline       Completed without error       00%     26572         -
# 9  Short offline       Completed without error       00%     26548         -
#10  Short offline       Completed without error       00%     26524         -
#11  Short offline       Completed without error       00%     26500         -
#12  Short offline       Completed without error       00%     26476         -
#13  Short offline       Completed without error       00%     26452         -
#14  Short offline       Completed without error       00%     26428         -
#15  Extended offline    Completed without error       00%     26406         -
#16  Short offline       Completed without error       00%     26404         -
#17  Short offline       Completed without error       00%     26380         -
#18  Short offline       Completed without error       00%     26356         -
#19  Short offline       Completed without error       00%     26332         -
#20  Short offline       Completed without error       00%     26308         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

@ilkermanap
Copy link

I have telegraf-1.7.4-1.x86_64 installed on CentOS Linux release 7.5.1804 (Core).
Disks are behind the Hewlett-Packard P420i .
Disks can be reached with /dev/sg0 -d cciss,0 ... /dev/sg0 -d cciss,23
In the telegraf.conf, I put only one disk for testing " /dev/sg0 -d cciss,0" with

devices = [ "/dev/sg0 -d cciss,0"] 

And here is the test output for smart part:

# telegraf  --test| grep smart
2018/09/25 11:33:00 I! Using config file: /etc/telegraf/telegraf.conf
> smart_device,capacity=300000000000,device=sg0,enabled=Enabled,host=xxx.xxx.xxxx exit_status=0i 1537867981000000000
#

@sachaz
Copy link
Author

sachaz commented Oct 21, 2018

devices = [ "/dev/sg0 -d cciss,0"]

As I said it works for one or two disks, not for more :(

@douginoz
Copy link

douginoz commented Jan 13, 2021

I don't think the solution matches the problem.
I have the same problem, which is this:
If my telegraf.conf [input.smart] has the following, it works:
devices = [ "/dev/sg3 -d areca,12/2"]

which $telegraf --test|grep smart shows:

021-01-13T07:47:49Z I! Using config file: /etc/telegraf/telegraf.conf
> smart_attribute,capacity=10000831348736,device=sg3,enabled=Enabled,fail=-,flags=POSR--,host=sophie,id=1,model=ST10000NM0086-2AA101,name=Raw_Read_Error_Rate,serial_no=ZA29PM9W,wwn=5000c500b375b078 exit_status=32i,raw_value=134768i,threshold=44i,value=100i,worst=64i 1610524070000000000
> smart_attribute,capacity=10000831348736,device=sg3,enabled=Enabled,fail=-,flags=PO----,host=sophie,id=3,model=ST10000NM0086-2AA101,name=Spin_Up_Time,serial_no=ZA29PM9W,wwn=5000c500b375b078 exit_status=32i,raw_value=0i,threshold=0i,value=93i,worst=92i 1610524070000000000
> smart_attribute,capacity=10000831348736,device=sg3,enabled=Enabled,fail=-,flags=-O--CK,host=sophie,id=4,model=ST10000NM0086-2AA101,name=Start_Stop_Count,serial_no=ZA29PM9W,wwn=5000c500b375b078 exit_status=32i,raw_value=51i,threshold=20i,value=100i,worst=100i 1610524070000000000
> smart_attribute,capacity=10000831348736,device=sg3,enabled=Enabled,fail=-,flags=PO--CK,host=sophie,id=5,model=ST10000NM0086-2AA101,name=Reallocated_Sector_Ct,serial_no=ZA29PM9W,wwn=5000c500b375b078 exit_status=32i,raw_value=48i,threshold=10i,value=100i,worst=100i 1610524070000000000
> smart_attribute,capacity=10000831348736,device=sg3,enabled=Enabled,fail=-,flags=POSR--,host=sophie,id=7,model=ST10000NM0086-2AA101,name=Seek_Error_Rate,serial_no=ZA29PM9W,wwn=5000c500b375b078 exit_status=32i,raw_value=426905951i,threshold=45i,value=86i,worst=60i 1610524070000000000
> smart_attribute,capacity=10000831348736,device=sg3,enabled=Enabled,fail=-,flags=-O--CK,host=sophie,id=9,model=ST10000NM0086-2AA101,name=Power_On_Hours,serial_no=ZA29PM9W,wwn=5000c500b375b078 exit_status=32i,raw_value=13566i,threshold=0i,value=85i,worst=85i 1610524070000000000
> smart_attribute,capacity=10000831348736,device=sg3,enabled=Enabled,fail=-,flags=PO--C-,host=sophie,id=10,model=ST10000NM0086-2AA101,name=Spin_Retry_Count,serial_no=ZA29PM9W,wwn=5000c500b375b078 exit_status=32i,raw_value=0i,threshold=97i,value=100i,worst=100i 1610524070000000000
> smart_attribute,capacity=10000831348736,device=sg3,enabled=Enabled,fail=-,flags=-O--CK,host=sophie,id=12,model=ST10000NM0086-2AA101,name=Power_Cycle_Count,serial_no=ZA29PM9W,wwn=5000c500b375b078 exit_status=32i,raw_value=53i,threshold=20i,value=100i,worst=100i 1610524070000000000
> smart_attribute,capacity=10000831348736,device=sg3,enabled=Enabled,fail=-,flags=-O--CK,host=sophie,id=184,model=ST10000NM0086-2AA101,name=End-to-End_Error,serial_no=ZA29PM9W,wwn=5000c500b375b078 exit_status=32i,raw_value=0i,threshold=99i,value=100i,worst=100i 1610524070000000000
> smart_attribute,capacity=10000831348736,device=sg3,enabled=Enabled,fail=-,flags=-O--CK,host=sophie,id=187,model=ST10000NM0086-2AA101,name=Reported_Uncorrect,serial_no=ZA29PM9W,wwn=5000c500b375b078 exit_status=32i,raw_value=0i,threshold=0i,value=100i,worst=100i 1610524070000000000
> smart_attribute,capacity=10000831348736,device=sg3,enabled=Enabled,fail=-,flags=-O--CK,host=sophie,id=188,model=ST10000NM0086-2AA101,name=Command_Timeout,serial_no=ZA29PM9W,wwn=5000c500b375b078 exit_status=32i,raw_value=0i,threshold=0i,value=100i,worst=100i 1610524070000000000
> smart_attribute,capacity=10000831348736,device=sg3,enabled=Enabled,fail=-,flags=-O-RCK,host=sophie,id=189,model=ST10000NM0086-2AA101,name=High_Fly_Writes,serial_no=ZA29PM9W,wwn=5000c500b375b078 exit_status=32i,raw_value=68i,threshold=0i,value=32i,worst=32i 1610524070000000000
> smart_attribute,capacity=10000831348736,device=sg3,enabled=Enabled,fail=Past,flags=-O---K,host=sophie,id=190,model=ST10000NM0086-2AA101,name=Airflow_Temperature_Cel,serial_no=ZA29PM9W,wwn=5000c500b375b078 exit_status=32i,raw_value=35i,threshold=40i,value=65i,worst=39i 1610524070000000000
> smart_attribute,capacity=10000831348736,device=sg3,enabled=Enabled,fail=-,flags=-O--CK,host=sophie,id=191,model=ST10000NM0086-2AA101,name=G-Sense_Error_Rate,serial_no=ZA29PM9W,wwn=5000c500b375b078 exit_status=32i,raw_value=11417i,threshold=0i,value=95i,worst=95i 1610524070000000000
> smart_attribute,capacity=10000831348736,device=sg3,enabled=Enabled,fail=-,flags=-O--CK,host=sophie,id=192,model=ST10000NM0086-2AA101,name=Power-Off_Retract_Count,serial_no=ZA29PM9W,wwn=5000c500b375b078 exit_status=32i,raw_value=555i,threshold=0i,value=100i,worst=100i 1610524070000000000
> smart_attribute,capacity=10000831348736,device=sg3,enabled=Enabled,fail=-,flags=-O--CK,host=sophie,id=193,model=ST10000NM0086-2AA101,name=Load_Cycle_Count,serial_no=ZA29PM9W,wwn=5000c500b375b078 exit_status=32i,raw_value=605i,threshold=0i,value=100i,worst=100i 1610524070000000000
> smart_attribute,capacity=10000831348736,device=sg3,enabled=Enabled,fail=-,flags=-O---K,host=sophie,id=194,model=ST10000NM0086-2AA101,name=Temperature_Celsius,serial_no=ZA29PM9W,wwn=5000c500b375b078 exit_status=32i,raw_value=35i,threshold=0i,value=35i,worst=61i 1610524070000000000
> smart_attribute,capacity=10000831348736,device=sg3,enabled=Enabled,fail=-,flags=-O-RC-,host=sophie,id=195,model=ST10000NM0086-2AA101,name=Hardware_ECC_Recovered,serial_no=ZA29PM9W,wwn=5000c500b375b078 exit_status=32i,raw_value=134768i,threshold=0i,value=100i,worst=64i 1610524070000000000
> smart_attribute,capacity=10000831348736,device=sg3,enabled=Enabled,fail=-,flags=-O--C-,host=sophie,id=197,model=ST10000NM0086-2AA101,name=Current_Pending_Sector,serial_no=ZA29PM9W,wwn=5000c500b375b078 exit_status=32i,raw_value=0i,threshold=0i,value=100i,worst=100i 1610524070000000000
> smart_attribute,capacity=10000831348736,device=sg3,enabled=Enabled,fail=-,flags=----C-,host=sophie,id=198,model=ST10000NM0086-2AA101,name=Offline_Uncorrectable,serial_no=ZA29PM9W,wwn=5000c500b375b078 exit_status=32i,raw_value=0i,threshold=0i,value=100i,worst=100i 1610524070000000000

etc. etc. (The actual output is about 25 lines long).

But since /dev/sg3 is a hardware raid controller, there are many hard drives within it, which is why you have to specify the additional parameters (" -d areca, 12/2")

So I need to have multiple entries, one for each drive in the array:

devices = [ "/dev/sg3 -d areca,1/2", "/dev/sg3 -d areca,2/2", "/dev/sg3 -d areca,3/2", "/dev/sg3 -d areca,4/2"] etc etc

But as soon as I add the second one, it fails:

root@sophie:/etc/telegraf# telegraf  --test| grep smart
2021-01-13T07:57:13Z I! Starting Telegraf 1.17.0
2021-01-13T07:57:13Z I! Using config file: /etc/telegraf/telegraf.conf
> smart_device,device=sg3,host=sophie exit_status=2i 1610524634000000000
> smart_attribute,capacity=10000831348736,device=sg3,enabled=Enabled,fail=-,flags=PO-R--,host=sophie,id=1,model=HGST\ HDN721010ALE604,name=Raw_Read_Error_Rate,serial_no=1SJS3J5Z,wwn=5000cca26be6b0c3 exit_status=0i,raw_value=0i,threshold=16i,value=100i,worst=100i 1610524634000000000
> smart_attribute,capacity=10000831348736,device=sg3,enabled=Enabled,fail=-,flags=--S---,host=sophie,id=2,model=HGST\ HDN721010ALE604,name=Throughput_Performance,serial_no=1SJS3J5Z,wwn=5000cca26be6b0c3 exit_status=0i,raw_value=92i,threshold=54i,value=135i,worst=135i 1610524634000000000
> smart_attribute,capacity=10000831348736,device=sg3,enabled=Enabled,fail=-,flags=POS---,host=sophie,id=3,model=HGST\ HDN721010ALE604,name=Spin_Up_Time,serial_no=1SJS3J5Z,wwn=5000cca26be6b0c3 exit_status=0i,raw_value=406i,threshold=24i,value=175i,worst=175i 1610524634000000000
> smart_attribute,capacity=10000831348736,device=sg3,enabled=Enabled,fail=-,flags=-O--C-,host=sophie,id=4,model=HGST\ HDN721010ALE604,name=Start_Stop_Count,serial_no=1SJS3J5Z,wwn=5000cca26be6b0c3 exit_status=0i,raw_value=99i,threshold=0i,value=100i,worst=100i 1610524634000000000
> smart_attribute,capacity=10000831348736,device=sg3,enabled=Enabled,fail=-,flags=PO--CK,host=sophie,id=5,model=HGST\ HDN721010ALE604,name=Reallocated_Sector_Ct,serial_no=1SJS3J5Z,wwn=5000cca26be6b0c3 exit_status=0i,raw_value=0i,threshold=5i,value=100i,worst=100i 1610524634000000000
> smart_attribute,capacity=10000831348736,device=sg3,enabled=Enabled,fail=-,flags=-O-R--,host=sophie,id=7,model=HGST\ HDN721010ALE604,name=Seek_Error_Rate,serial_no=1SJS3J5Z,wwn=5000cca26be6b0c3 exit_status=0i,raw_value=0i,threshold=67i,value=100i,worst=100i 1610524634000000000

I can add the other, non-array drives all I want, for example this works:
devices = [ "/dev/sg3 -d areca,1/2", "/dev/sdb", "/dev/sdc", "/dev/sdd"]

I thought the problem was that it doesn't accept duplicate devices. For example, I would have thought this would fail:

devices = [ "/dev/sda", "/dev/sda"]

but it works. The only difference is that my configuration requires the '-d' parameter:
devices = "/dev/sg3 -d areca,1/2", "/dev/sg3 -d areca,2/2"]

which returns the 'smart_device,device=sg3,host=sophie exit_status=2i 1610524882000000000' error and only lists the first drive within the array.

For your reference, here's the output of Smartctl using the correct values

root@sophie:/etc/telegraf# smartctl --all --device=areca,1/2 /dev/sg3  
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-58-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     HGST HDN721010ALE604
Serial Number:    1SJS3J5Z
LU WWN Device Id: 5 000cca 26be6b0c3
Firmware Version: 83XN
User Capacity:    10,000,831,348,736 bytes [10.0 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Jan 13 00:05:10 2021 PST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever 
                                        been run.
Total time to complete Offline 
data collection:                (   93) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine 
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (1024) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0004   135   135   054    Old_age   Offline      -       92
  3 Spin_Up_Time            0x0007   175   175   024    Pre-fail  Always       -       406 (Average 344)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       99
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000a   100   100   067    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0004   128   128   020    Old_age   Offline      -       18
  9 Power_On_Hours          0x0012   098   098   000    Old_age   Always       -       16231
 10 Spin_Retry_Count        0x0012   100   100   060    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       99
 22 Unknown_Attribute       0x0023   100   100   025    Pre-fail  Always       -       100
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       823
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       823
194 Temperature_Celsius     0x0002   166   166   000    Old_age   Always       -       36 (Min/Max 17/59)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

And if I change the --device parameter value to the next drive, it correctly retrieves the 2nd drive's values:

root@sophie:/etc/telegraf# smartctl --all --device=areca,2/2 /dev/sg3 
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-58-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     HGST HDN721010ALE604
Serial Number:    1SJJ2SSZ
LU WWN Device Id: 5 000cca 26be37f54
Firmware Version: 83XN
User Capacity:    10,000,831,348,736 bytes [10.0 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Jan 13 00:07:06 2021 PST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever 
                                        been run.
Total time to complete Offline 
data collection:                (   93) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine 
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (1123) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0004   134   134   054    Old_age   Offline      -       96
  3 Spin_Up_Time            0x0007   174   174   024    Pre-fail  Always       -       410 (Average 348)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       99
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000a   100   100   067    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0004   128   128   020    Old_age   Offline      -       18
  9 Power_On_Hours          0x0012   098   098   000    Old_age   Always       -       16231
 10 Spin_Retry_Count        0x0012   100   100   060    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       99
 22 Unknown_Attribute       0x0023   100   100   025    Pre-fail  Always       -       100
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       822
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       822
194 Temperature_Celsius     0x0002   157   157   000    Old_age   Always       -       38 (Min/Max 17/62)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

So smartctl is definitely able to retrieve each individual drive within the array correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/smart bug unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants