df stats #2

thorhs · 2020-12-01T21:52:02Z

there is only information about volume group and not capacity filesystem :

TYPE aix_disk_free gauge

aix_disk_free{disk="hdisk1",vgname="ngamsoft",machine_serial="21475DV",lpar="GAMAY",group_id="32772"} 49664
aix_disk_free{disk="hdisk0",vgname="rootvg",machine_serial="21475DV",lpar="GAMAY",group_id="32772"} 61824

Is it possible to add "df" information or other to follow capacity of space disk ?

Regards
Frederic

Originally posted by @fredtriplefred in #1 (comment)

thorhs · 2020-12-01T21:53:37Z

@fredtriplefred I'm sure it wouldn't be too difficult. We have traditional nagios monitoring which has covered that aspect for us so it never came up. This is definetly something that should be included. I'll try to dig into that soon.

thorhs · 2020-12-02T00:19:41Z

@fredtriplefred I have released v1.8.0 with support for node_filesystem_X metrics. Unfortunately we are in a change freeze at work so I can't test it out too much, but what I have tested seems to be file. I would appreciate it if you could try to deploy this new version and see if all the numbers add up.

fredtriplefred · 2020-12-02T17:39:15Z

Thanks Thors it works perfectly 👍

I have integrated it to Grafana and alert system
Concerning others metrics (cpu/disk,memory,partition,...), have you already create specific queries to summarize activities for Prometheus or Grafana as there are a lot but i'm not enough specialist to gather most important of them.

Another request if possible.
Would you have compiled a version for AIX6 because I have this message on our scores in this version ?
exec(): 0509-036 Cannot load program ./node_exporter_aix because of the following errors:
0509-130 Symbol resolution failed for node_exporter_aix because:
0509-136 Symbol ___strcmp (number 1) is not exported from
dependent module /usr/lib/libc.a(shr.o).
0509-136 Symbol __get_lc_charmap_ptr (number 20) is not exported from
dependent module /usr/lib/libc.a(shr.o).
0509-136 Symbol cur_locale (number 113) is not exported from
dependent module /usr/lib/libc.a(shr.o).
0509-192 Examine .loader section symbols with the
'dump -Tv' command.

Thanks and regards
Frederic

thorhs · 2020-12-02T19:54:56Z

@fredtriplefred No, sorry, I don’t have access to anything below 7.1. If you install the packages in the readme you should be able to build it your self. It would actually be an interesting experiment to see if there are any issues.

I have some graphs I could share soon. Take the info with a grain of salt, I may have misinterpreted some of the metrics. The cpu stats are especially iffy. But, they do reflect the trends :)

thorhs · 2020-12-02T23:54:46Z

@fredtriplefred I uploaded the dashboard that I use most frequently. It just went through some changes so I hope all the calculations match up. Give it a whirl and create issues if you find any. This was exported from grafana v6.5.1.

fredtriplefred · 2020-12-04T10:53:29Z

Hi thors !
sorry for delayed feeback, outage on Exalogic systems ...
Dashboard seems reflect correctly reality compared another monitoring i have (lpar2rrd)
Just an issue with the context of an AIX 7.1 and the exporter aix :

http://saveprod:9100/metrics	DOWN	instance="saveprod:9100" job="node_exporter_aix"	50.206s ago	20.28ms	invalid UTF-8 label value
This is not logically related to exporter but if you have an idea ;)
Thanks for work in any case
Regards
Frederic

fredtriplefred · 2020-12-04T14:03:56Z

It seems Serial Number for this AIX bad formatted and be the source of error :

HELP node_load1 1m load average.

TYPE node_load1 gauge

node_load1{machine_serial="/?? ^B#0 ^DZp",lpar="saveprod",group_id="32773"} 2.99446

HELP node_load5 5m load average.

TYPE node_load5 gauge

node_load5{machine_serial="/?? ^B#0 ^DZp",lpar="saveprod",group_id="32773"} 4.5938

Which is the command used to extract it ?

Regards
Frederic

thorhs · 2020-12-04T15:59:27Z

This is coming from the libperfstat library. I've has issues if the system tools are not being used, for example if /opt/freeware/bin is ahead of /use/bin. See if you can run with bog-standard PATH and LIBPATH.

fredtriplefred · 2020-12-04T17:16:34Z

yes surely in relation with context environment but it works (with just an error on diskadapter) if executed manually and not as a service :
saveprod.root / => /usr/local/bin/node_exporter_aix
Node exporter for AIX version 1.8.0.0 listening on port 9100
Error calling perfstat_diskadapter: Invalid argument

so may be rather in context around the service ?
By waiting, i use nohup instead as the service.

Good week-end
Frederic

thorhs · 2020-12-06T23:49:16Z

@fredtriplefred Could you give v1.10.0 a go? I'm trying to set PATH and LIBPATH to some sane values on startup to see if that helps. It works correctly if I try to start it up using a PATH string that had issues previously, so hopefully it just works now.

fredtriplefred · 2020-12-07T07:50:12Z

Hello Thorhs
Sorry i always receive these bad characters for serial machine variable in metrics only for this lpar :
machine_serial="/ท่ �$ภ �Zp"
Regards
Frederic

thorhs · 2020-12-07T09:27:32Z

@fredtriplefred Hmmm... that is odd.

Unfortunately I don't have any control over the libperfstat, and how it finds the machine serial number. By running a trace on the process, it seems like this is command is being executed by the libperfstat library to get the machine serial number:

lscfg -vpl sysplanar0 2>/dev/null|grep -p "System VPD:" |grep "Machine/Cabinet"

I have set the path to system only directories, and emptied the LIBPATH so there should be no outside influence. What i find most peculiar is that the command works on the command line, but fails in SRC.

If you run the above command, what is the output? On my end, I get:

[REIKNISTOFA\rb747@rba-nim-dev node_exporter_aix]$ lscfg -vpl sysplanar0 2>/dev/null|/usr/bin/grep -p "System VPD:" |grep "Machine/Cabinet"
        Machine/Cabinet Serial No...XXXXXXXX

What version of AIX is this LPAR running?

I could add a flag to manually set the machine_serial, if that would be an acceptable solution, or even read it from a file in /etc/sysconfig.

fredtriplefred · 2020-12-07T14:26:22Z

yes same results as you, it works in command line :
saveprod.root / => lscfg -vpl sysplanar0 2>/dev/null|grep -p "System VPD:" |grep "Machine/Cabinet"
Machine/Cabinet Serial No...785A0A0
Not really a problem finally as it works with nohup background and i reboot partition only when there is a mandatory (update 6=>7 last time) :
saveprod.root / => uptime
03:18PM up 453 days, 3:58, 5 users, load average: 1.63, 3.39, 4.10

Which cpu_pool_id references ?
The id os the Shared Processor Pool used by the lpar ?

Regards
Frederic

thorhs · 2020-12-08T11:44:47Z

Ok. The cpupool_id is what is returned from the perfstat_partition_config, I have not fully investigated it this should be the shared processor pool the LPAR is in. If you are not using shared processor pools, this is probably 0 for all LPARs. I'm hoping I can use this to graph up the total CPU used per pool, as well as the free capacity.

thorhs self-assigned this Dec 1, 2020

thorhs mentioned this issue Dec 2, 2020

Add support for filesystems (free space, inodes, etc) #3

Merged

thorhs closed this as completed in #3 Dec 2, 2020

yashnagar mentioned this issue Jun 9, 2022

Segmentation fault while using node_exporter 1.14.3.0 or 1.12.1.0 #16

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

df stats #2

df stats #2

thorhs commented Dec 1, 2020

thorhs commented Dec 1, 2020

thorhs commented Dec 2, 2020

fredtriplefred commented Dec 2, 2020

thorhs commented Dec 2, 2020

thorhs commented Dec 2, 2020

fredtriplefred commented Dec 4, 2020

fredtriplefred commented Dec 4, 2020

thorhs commented Dec 4, 2020

fredtriplefred commented Dec 4, 2020

thorhs commented Dec 6, 2020

fredtriplefred commented Dec 7, 2020

thorhs commented Dec 7, 2020

fredtriplefred commented Dec 7, 2020

thorhs commented Dec 8, 2020

df stats #2

df stats #2

Comments

thorhs commented Dec 1, 2020

TYPE aix_disk_free gauge

thorhs commented Dec 1, 2020

thorhs commented Dec 2, 2020

fredtriplefred commented Dec 2, 2020

thorhs commented Dec 2, 2020

thorhs commented Dec 2, 2020

fredtriplefred commented Dec 4, 2020

fredtriplefred commented Dec 4, 2020

HELP node_load1 1m load average.

TYPE node_load1 gauge

HELP node_load5 5m load average.

TYPE node_load5 gauge

thorhs commented Dec 4, 2020

fredtriplefred commented Dec 4, 2020

thorhs commented Dec 6, 2020

fredtriplefred commented Dec 7, 2020

thorhs commented Dec 7, 2020

fredtriplefred commented Dec 7, 2020

thorhs commented Dec 8, 2020