Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some interactive means of monitoring of dispatched jobs #19

Open
mikucionisaau opened this issue Sep 16, 2024 · 0 comments
Open

Some interactive means of monitoring of dispatched jobs #19

mikucionisaau opened this issue Sep 16, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@mikucionisaau
Copy link
Member

It would be nice to have some means of monitoring the resources used by the dispatched jobs.
For example, I use following script to dispatch UPPAAL-specific (single-threaded with lots of memory) jobs:

#!/usr/bin/env bash
#SBATCH --nodes 1
#SBATCH --ntasks 1
#SBATCH --sockets-per-node 1
#SBATCH --cores-per-socket 1
#SBATCH --mail-type=END
#SBATCH --mail-user=......
#SBATCH --error=slurm-%j.err
#SBATCH --output=slurm-%j.log
# # S B A T C H --time=24:00:00
# # S B A T C H --partition=rome
set -e

PERIOD=10
MEMAVAIL=100000

while getopts "hm:p:" option ; do
    case $option in
        h)
            echo "$0 script launches a command and monitors the resources."
            echo "$0 exists if process exits."
            echo "$0 kills the process if machine rans out of available memory."
            echo "Synopsis: $0 [-h] [-p N] [-m N] command with arguments"
            echo " -h     prints this help screen"
            echo " -p N   samples every N seconds"
            echo " -m N   kills if available memory gets below N kB"
            exit;;
        m)  MEMAVAIL="$OPTARG";;
        p)  PERIOD="$OPTARG";;
        ?)  echo "Invalid option $option, consult -h"
            exit 2;;
    esac
done

shift $(($OPTIND - 1))

COMMAND="$@"

if [ "$#" -lt 1 ]; then
    echo -e "Error: no arguments, expecting command and its arguments."
    echo -e "Usage:\n\t$0 your_command your_arguments"
    exit 1
fi

"$@" &
pid=$!
# echo "Process statistics is in process-$pid-stats.txt"
exec hogwatch -p$PERIOD -m$MEMAVAIL $pid

whereas the hogwatch is the script monitoring specific process and logging the resources:

#!/usr/bin/env bash
set -e

PERIOD=10
MEMAVAIL=100000

while getopts "hm:p:" option ; do
    case $option in
        h)
            echo "$0 script monitors processes by their PIDs and statistics into process-PID-stats.txt."
            echo "$0 exists if all monitored processes exit or machine rans out of available memory."
            echo "Synopsis: $0 [-h] [-p N] [-m N] PID*"
            echo " -h     prints this help screen"
            echo " -p N   samples every N seconds"
            echo " -m N   kills watched PIDs if available memory gets below N kB"
            exit;;
        m)  MEMAVAIL="$OPTARG";;
        p)  PERIOD="$OPTARG";;
        ?)  echo "Invalid option $option, consult -h"
            exit 2;;
    esac
done

shift $(($OPTIND - 1))

PIDS="$@"

function proc_status() {
    pid=$1
    f=process-$pid-stats.txt
    if [ ! -e $f ]; then
        # print the whole command line as the first line:
        ps -o pid,args -p $pid | tail -n1 > $f
        # print the table header:
        echo -ne "DATE       " >> $f
        ps -o pcpu,pmem,cputime,etime,vsize,rss -p $pid | head -n1 >> $f
    fi
    # print date-timestamp:
    echo -ne "$(date +%s) " >> $f
    # print process resources:
    ps -o pcpu,pmem,cputime,etime,vsize,rss -p $pid | tail -n1 >> $f
}

# monitor the free memory:
mem_avail=$(free | grep Mem | gawk '{ print $7 }')
while [ $mem_avail -gt $MEMAVAIL ] ; do
    list=""
    for pid in $PIDS ; do
        if [ -e "/proc/$pid" ]; then
            proc_status $pid
            list="$list $pid"
        fi
    done
    PIDS=$list
    if [ -z "$PIDS" ]; then
        exit 0
    fi
    sleep $PERIOD
    mem_free=$(free | grep Mem | gawk '{ print $4 }')
done

echo "hogwatch: machine is out of available memory, thus killing $PIDS"
kill -9 $PIDS

Then I have the following python script to show the memory and cpu consumption:

#!/usr/bin/env python
import matplotlib.pyplot as plt
import numpy as np
import csv
import sys
import time
import datetime
import os

date=[]
cpu=[]
memory=[]
cputime=[]
elapsed=[]
virtual=[]
working=[]
title="Memory"
columns="???"

for arg in sys.argv[1:]:
    date.clear()
    cpu.clear()
    memory.clear()
    cputime.clear()
    elapsed.clear()
    virtual.clear()
    working.clear()
    with open(arg, 'r') as csvfile:
        title=csvfile.readline()
        columns=csvfile.readline()
        rows = csv.reader(csvfile, delimiter=' ', skipinitialspace=True)
        for row in rows:
            if len(row) == 7:
                date.append(datetime.datetime.utcfromtimestamp(int(row[0])))
                cpu.append(float(row[1]))
                memory.append(float(row[2]))
                cputime.append(row[3])
                ela = time.strptime(row[4], "%M:%S" if row[4].count(':')==1 else "%H:%M:%S");
                delta = datetime.timedelta(hours=ela.tm_hour,minutes=ela.tm_min,seconds=ela.tm_sec).total_seconds()
                elapsed.append(delta)
                virtual.append(float(row[5])/1024/1024)
                working.append(float(row[6])/1024/1024)
    fig, ax = plt.subplots(2)
    x = elapsed
    #x = date
    ax[0].plot(x, virtual, color='b', label="virtual")
    ax[0].plot(x, working, color='r', label="working")
    ax[0].set(xlabel='time (s)', ylabel='memory (GB)', title=title)
    ax[0].grid()
    ax[0].legend()
    ax[1].plot(x, cpu, color='r', label="CPU")
    ax[1].plot(x, memory, color='b', label="memory")
    ax[1].set(xlabel='time (s)', ylabel='resources (%)')
    ax[1].grid()
    ax[1].legend()

    #plt.get_current_fig_manager().canvas.manager.set_window_title(arg)
    #plt.show()
    plt.gcf().set_size_inches(15, 15)
    out = arg + ".png"
    fig.savefig(out, bbox_inches='tight')
    print("Plot saved to " + out)
    os.system("display " + out)

Currently Python cannot open a window with interactive zoom widgets, because windowing toolkit libraries (such as tk, qt, gtk etc) are not installed (and are not available in virtual python environments), so the script dumps a png image and then launches display to show it.

Example result:
image

@mikucionisaau mikucionisaau added the enhancement New feature or request label Sep 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant