Keeping track of the tasks already completed, successfully or not, or tasks
still pending can be somewhat annoying. Resuming tasks that were not
completed, or that failed requires a level of bookkeeping you may prefer
to avoid. arange
is designed to help with both issues.
Note that for this to work, your job should do logging using
alog
.
Given either the CSV file or the task identifier range for a job, and its
log file as generated by alog
, arange
will provide statistics on the
progress of a running job, or a summary on a completed job.
If the log file's name is jobscript.slurm.log10493
, and the job was based
on an CSV data file data.csv
, a summary can be obtained by
$ arange --data data.csv --log jobscript.slurm.log10493 --summary
In case a job has been resumed, you should list all log files relevant to the job to get correct results.
Since arange
parses the data file, it also has the --sniff
option to
specify the number of bytes to use to determine the dialect of the CSV
file. For files with many columns, the number of bytes the sniffer will
use to determine the file's structure and dialect should be increased
from the default value.
For data files that have a single column only, the sniffer gets confused.
It can be switched off using the --no_sniffer
option.
Of course, arange
works independently of aenv
, so it also supports
keeping track of general job arrays using the -t
flag.
$ arange -t 1-250 --log jobscript.slurm.log10493 --summary
Sometimes it is useful to explicitly list the task identifiers of either
failed or completed jobs as task identifier ranges, this can be done by
adding the --list_failed
or --list_completed
flags respectively.
arange
primary purpose is in fact helping to determine which task
identifiers should be redone when an array job did not complete, or when
some of its tasks failed. To get an identifier range of tasks that were
not completed, use
$ arange --data data.csv --log jobscript.slurm.log10493`
or, when not using aenv
$ arange -t 1-250 --log jobscript.slurm.log10493`
If you want to include the tasks that failed, for instance when a bug that
caused this was fixed, simply add the --redo
flag when invoking arange
.
Help on the command is printed using the --help
flag.