Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sexigraf stopped running pull scripts #396

Open
Redicious opened this issue May 14, 2024 · 12 comments
Open

Sexigraf stopped running pull scripts #396

Redicious opened this issue May 14, 2024 · 12 comments

Comments

@Redicious
Copy link

Hey,

my Sexigraf instance stopped pulling data at 2nd May 4:00 CEST from all VBR and unmanaged ESXis (there is nothing else in the inventory). (I've been on vacation, and sexigraf is not used yet, still in trial - thats why I'm late to the party)

So grafana only shows its own metrics.

In /var/log/sexigraf/ there are no new logs for VbrPullSatistics etc.

Only logs which geht updated are

/var/log/sexigraf/carbon/carbon-cache-a/console.log 145964 5/14/2024 1:21:30 PM
/var/log/sexigraf/carbon/carbon-cache-b/console.log 145964 5/14/2024 1:21:30 PM
/var/log/sexigraf/carbon/carbon-cache-b/query.log 18180 5/14/2024 1:19:39 PM
/var/log/sexigraf/graphite/info.log 1733424 5/14/2024 1:17:07 PM

There is also no change in the patterns for when it stopped pulling data - so no error message hinting at what could go wrong.

I also checked the standard stuff... There is enough disk space, inodes left, ram is good, cpu is good, etc.

If I run ViPullStatistics.ps1 manually it also works, and I have a set of data points for the ESXi it is run for.

Since ViPullStatistics.ps1 basically works, and there are no Transcripts in /var/log/sexigraf: what does invoke it? Where should I look next?
There is nothing in crontab...

Cheers
Red

@rschitz
Copy link
Member

rschitz commented May 14, 2024

Hi and thank you for your feedback.
No VI and VBR is strange, do you still got someting in /etc/cron.d/ ?
And do you still got your entries collection activated in the credential stores?
image

@Redicious
Copy link
Author

Redicious commented May 14, 2024

Thx for your response!
Files for vi and vbr are present in /etc/cron.d.
And they are enabled in the credential story - yesterday I disabled and enabled a few. Which reflects in the creation date of the files in /etc/cron.d. Good to know how this works then.
I totally forgot about cron.d - and relied on crontab -l

However I can see the crons in syslog, like this one:

May 14 14:30:01 sexigraf CRON[219558]: (root) CMD ( /usr/bin/pwsh -NonInteractive -NoProfile -f /opt/sexigraf/ViPullStatistics.ps1 -credstore /mnt/wfs/inventory/vipscredentials.xml -server -sessionfile /tmp/vmw_.key >/dev/null 2>&1)

So I took the command and run it manually:
Starting the script like in the cron results in

Fatal error. Internal CLR error. (0x80131506)
Aborted
Even if I strip it down to the bare minimum:
root@sexigraf:/opt/sexigraf# /usr/bin/pwsh -f "/opt/sexigraf/ViPullStatistics.ps1"
Fatal error. Internal CLR error. (0x80131506)
Aborted

If I however start pwsh and then run the script within pwsh it works.

PS /opt/sexigraf> /opt/sexigraf/ViPullStatistics.ps1 -credstore /mnt/wfs/inventory/vipscredentials.xml -server host -sessionfile /tmp/vmw__host_.key
Transcript started, output file is /var/log/sexigraf/ViPullStatistics..log
2024-05-14T15:05:44.7208885+00:00 [INFO] ViPullStatistics v0.9.1037
....

Same for a simple script:

root@sexigraf:/opt/sexigraf# echo 'return "hello world!"' >> helloWorld.ps1
root@sexigraf:/opt/sexigraf# chmod 755 helloWorld.ps1
root@sexigraf:/opt/sexigraf# /usr/bin/pwsh -f "/opt/sexigraf/helloWorld.ps1"
Fatal error. Internal CLR error. (0x80131506)
Aborted
root@sexigraf:/opt/sexigraf# pwsh
PowerShell 7.2.17
Copyright (c) Microsoft Corporation.

https://aka.ms/powershell
Type 'help' to get help.

PS /opt/sexigraf> ./helloWorld.ps1
hello world!

Looks like there is something wrong with .NET and not sexigraf. I'll keep you posted....

edit: formatting

@rschitz
Copy link
Member

rschitz commented May 14, 2024

Crazy stuff!
Did you updated the appliance at some point?

@Redicious
Copy link
Author

Didn't update it before now - apt history and ssh log also shows noone touched it.

I couldn't figure out what caused the issue exactly. strace looked ok'ish - it just aborts. I spent hours googling and chatgpt'ing (is that the right word?) and grepping through logs... So I gave up on finding out what happened and just wanted it to be fixed.
I made a snapshot, reinstalled pwsh, and now it works - it is now 7.4.1, was 7.2.17 - although I doubt it is related to the update. I think it was fixed by reinstalling, since it broke without any intentional/logged changes.

apt remove powershell-lts
wget https://raw.githubusercontent.com/PowerShell/PowerShell/master/tools/install-powershell.sh
wget https://raw.githubusercontent.com/PowerShell/PowerShell/master/tools/installpsh-debian.sh
bash install-powershell.sh

I assume pwsh itself must have kicked the bucket.
Today's bofh-excuse card says: "global warming". That must be it.

Thanks for your help!

@rschitz
Copy link
Member

rschitz commented May 16, 2024

thanks a lot for your feedback, also spent some time googling (didnt thought about chatgpting it) but by the looks of it, it sounds related to pwsh indeed. FYI i always use the latest LTS version as long as everything works fine.
really really stranger issue, hope that wont affect your SexiGraf experience overall :D
cheers

@Redicious
Copy link
Author

Hi,

just wanted to let you know:
The issue came somewhat back, but with a "segmentation fault" error instead - but I think its just a different flavor due to the upgrade, since the conditions leading to it are the same.

Wich can be fixed (maybe only temporarily) with

rm ~/.cache/powershell/StartupProfileData-NonInteractive

I found this here, describing an issue where running pwsh -c or pwsh -f leads to the clr dying du to some optimization beeing stored int above file. Details can be found here.
PowerShell/PowerShell#18998

So I came up with this:

#!/bin/bash

# Define the log file path
log_file="/var/log/fixpwsh.log"

# Run the PowerShell script
result=$(/usr/bin/pwsh -f /opt/sexigraf/helloWorld.ps1)
ok_string="hello world!"

# Get the current timestamp
timestamp=$(date +"%Y-%m-%d %H:%M:%S")

# Check if the result is "Hello World!"
if [[ "$result" == "$ok_string" ]]; then
    echo "[$timestamp] Script returned '$ok_string', quitting." | tee -a "$log_file"
else
    # removing the profile data file
    rm ~/.cache/powershell/StartupProfileData-NonInteractive
    echo "[$timestamp] Script returned something other than '$ok_string', removed file: StartupProfileData-NonInteractive" | tee -a "$log_file"
fi

And now I run it as cron every hour...

@rschitz
Copy link
Member

rschitz commented May 24, 2024

what kind of CPU are you running?

@Redicious
Copy link
Author

2x Intel Xeon Silver 4210

@rschitz
Copy link
Member

rschitz commented May 24, 2024

can you try to upgrade the vHardware on the sexigraf vm just to test?

@rschitz rschitz reopened this May 24, 2024
@rschitz
Copy link
Member

rschitz commented May 24, 2024

also, did you install any security tool in the appliance?

@rschitz
Copy link
Member

rschitz commented May 24, 2024

also does it have access to internet?

@rschitz
Copy link
Member

rschitz commented Jul 1, 2024

any updates?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants