Skip to content

jbloino/gaussian-cluster

Repository files navigation

Gaussian Cluster Tools

Introduction

This short document presents some tools available to manage the execution of GAUSSIAN job on a PBS-compatible platform (by default, Avogadro@SNS).

The following tools are available:

gxx_qsub.py

Main submission script, written in Python (version 3.5 and later recommended).

gjobadd.bash

Updates an internal data file (gjoblist.txt), adding a new job.

gjobrun.bash

Runs a job from gjoblist.txt

gjobres.bash

Resets a job in gjoblist.txt, erasing information on previous executions.

gjobchk.bash

Returns the list of jobs with a given status over a chosen period of time.

gjobupd.py

Updates the job statuses in gjoblist.txt (generally used in automated scripts).

gxxrun.bash

Script acting as a wrapper to gxx_qsub.py, normally not run directly.

The following developer-oriented scripts are available (in a separate directory): gxx_build_cluster.py:: Deploys a GAUSSIAN or working tree archive, create a microarchitecture-based tree and compile on different machines.

Job submission

Submission of a GAUSSIAN job can be done with gxx_qsub.py.

Note

gxx_qsub.py does not actually submits a GAUSSIAN job but instead builds a script to be run by the job submitted of the queue manager, generally qsub.

Quick start

In its simplest form, gxx_qsub.py only needs a valid GAUSSIAN input file.

Execution
$ gxx_qsub.py file.gjf

All optional arguments are then filled with default values.

Description of the submission program

The program executes the following operations sequentially:

  1. Definition of the execution options and check on file existence

  2. Definition of the available resources on computing nodes where GAUSSIAN is expected to run:

    • Number of processors available

    • Total memory available

  3. Parsing of GAUSSIAN's input file and, except if specified otherwise:

    • Setup of the total number of processor for GAUSSIAN to use (%NProcShared).

    • Setup of the maximum memory to be used for the job (%Mem).

    • Definition of a permanent checkpoint file (%Chk) based on the input filename. While parsing the file, the script collects all filenames, which may be relevant to the calculation requested by the user.

  4. Creation of the script to be submitted to qsub

  5. Submission of the PBS job

Description of the PBS script

The script contains 2 parts: * Definition of all relevant variables * Sequence of operations to be run by the computing node, normally: . Creation of a temporary directory on the local “scratch” space on the computing node. . Copy of the input file generated by gxx_qsub.py while parsing/completing the input file given by the user. . Copy of all other files relevant to the calculations (generally the checkpoint files) in the temporary directory. . Execution of GAUSSIAN on the computing node with the output directly written in the same directory as where the original input was (may require network file systems for this). . Copy back of all relevant files (generally the file specified with %Chk). . Remove the temporary directory

Note

The way the script is submitted, the full list of commands and other important information are printed in the .o file given in output.
Do not remove this file until you are ABSOLUTELY sure everything went fine!

Warning

Because the job generated by gxx_qsub.py is not the actual execution of GAUSSIAN, killing the PBS job will not only stop the execution of GAUSSIAN but also the cleanup operations.
Remember to clean up the temporary storage afterwards (all relevant information are in the .o file).

Keywords

Main keywords

Only the most relevant/useful keywords are listed below

-c, --chk

Name of the checkpoint file (default: generated from the input file)

-g, --gaussian

Version of GAUSSIAN to use, as gXXYYY where XX is the major revision, YYY the minor revision.
ex.: g16a03, g09e01

-j, --jobname

Name of the job (default: name of the input file, truncated to fit PBS requirements)

-k, --keep

Keep unchanged some parameters from the original input file in the generated input file. Possible values:

c, chk

checkpoint file

m, mem

memory

p, proc

number of processors

a, all

all of the above Note that GAUSSIAN may fail to run if the requested resources are not available on the node. Multiple resources can be combined by invoking the keyword multiple times (ex: -kc -km)

-m, --mail

Uses PBS capabilities to send emails on job execution, end and/or failure. The option depends strongly on the installation.
--mailto can be used to specify an email address (default: user@server)

--multi

Runs multiple GAUSSIAN jobs in serial (serial, default) or parallel (parallel).

-o, --out

Name of the log/output file (default: generated from the input file)

-p, --project

Name of the project for processor time accounting (default: none).

-q, --queue

A PBS or virtual queue on which the job should be supported.
The list of available PBS queues is generated by the module hpcnodes.py.
A virtual queue has the form: queue[:[nprocs][:node_id]], with

queue

A valid PBS queue

nprocs
H

Use half of the physical cores on 1 processor

S

Use a single physical core

>0

Use nprocs cores

<0

Use nprocs physical CPUs

0

Use all cores available on the node (same behavior if nprocs is entirely missing)

node_id

Name of a node on which the job must run.

-r, --rwf

Name of the read-write file (default: unset).
Normally only used for development purposes

-w, --wrkdir

List of working trees (as root directories of exe-dir)

Advanced keywords
--cpto

Copies a list of files from the local directory to the scratch directory (added to the list automatically generated by gxx_qsub.py).

--cpfrom

Copies a list of files from the scratch directory to the local directory (added to the list automatically generated by gxx_qsub.py).

--group

User group to run on access-restricted nodes.

Diagnosis keywords
-M, --mach

Prints a summary on the available nodes and HPC resources. Does not run any job.

--nojob

Only runs the input analysis and job preparation steps, with no submission. Generally used with -P.

-P, --print

Prints the submission script and commands.

Support for resources limitations

Since version 18.12.19 of the hpcnodes module, administrators (or users) can set their own limitations on the use of resources by each job independently of the total hardware resources available on each node. Two types of limits are available for the number of processing units and the RAM:

soft

This limit should be followed by users for standard jobs and only exceptionally exceeded.

hard

This limit should not be exceeded, whatever the type of job.

If the user does not set the quantity of RAM or the number of processors to be used in a GAUSSIAN job, gxx_qsub.py will set the resources based on the first condition verified:

  1. if a soft limit is set, the value will be used.

  2. if a hard limit is set, the value will be used.

  3. all available resources will be used (with a small reduction on the memory to prevent the risk of swapping in some extreme cases).

To override the limitation on the soft limit, the user must explicit set the resources in the GAUSSIAN input file(s), and use the command-line option --keep to prevent gxx_qsub from resetting those values. If the resources requested do not exceed the hard limit or the total available hardware resources, gxx_qsub will proceed with a simple comment on the fact that one or more soft limits have been exceeded. Otherwise, as before, it will stop, preventing the execution of the job.

Job management

gxx_qsub.py simply runs a GAUSSIAN job but does not keep track of the jobs submitted and their status. The following scripts provide a very basic structure to manage jobs, by recording them, facilitating their submission and keeping track of their status.

The list of jobs is stored by default in ${HOME}/gjoblist.txt

Add a job

The script gjobadd.bash handles the insertion of new jobs.

Minimum input
$ gjobadd.bash input.gjf

It is possible to specify the following options (positional order, the previous options must be set by the user for a given option):

  1. input file

  2. queue name (default: q02curie)

  3. jobname (default: input file name)

  4. comment (default: false)

  5. path to the input file (default: current directory)

Example
$ gjobadd.bash input.gjf q02curie test01 "This is a test"

Run a job

To run a job present in the job list, simply run gjobrun.bash with the ID of the job (leading 0s are not necessary).

Example
$ gjobrun.bash 1

The following positional arguments can be specified:

  1. job ID

  2. Gaussian version, in a format compatible with gxx_qsub.py. (default: g16b01)

  3. option to force the copy of a file before execution of the job (only relevant for very specific structures of filenames, do not use).
    Possible values: 0 (do not copy), 1 (do copy), auto (default)

Note
  • Only jobs with status WAIT can be run. Other jobs must be resetted first

  • gxxrun.bash is needed to act as an intermediate between gjobrun.bash and gxx_qsub.py

Check jobs' statuses

A list of jobs with a specific status can be obtained with the script gjobchk.bash.

Minimum input
$ gjobchk.bash

By default, all jobs with status WAIT will be listed.

It is possible to specify the following options (positional order, the previous options must be set by the user for a given option): . job status: WAIT (default), GOOD, FAIL, QSUB . starting date, when the job started running (all formats supported by date are accepted) . ending date (same formats as for starting date)

Reset a job status

When added, a job gets the status WAIT. Once submitted, it becomes QSUB and later EXEC once running on a computing node. Upon termination, it can have a status of FAIL or GOOD depending on the final result.

The status of any job can be reset to WAIT with the script gjobres.bash followed by the job ID.

Example
$ gjobres.bash 1

It is also possible to change the queue as a secondary argument,

Example
$ gjobres.bash 1 q07lee

Update the jobs statuses

The status of all jobs can be updated with the program gjobupd.py.

Normally, this program should not be run directly but instead periodically invoked by a job scheduler like cron.

To do this, open the crontab file

$ crontab -e

and insert: .crontab file

# ENVIRONMENT VARIABLES
# We need to load python 3
PATH=/cm/shared/apps/python/3.6.2/bin:/cm/shared/apps/pbspro/12.2.4.142262/bin:/bin:$PATH
LD_LIBRARY_PATH=/cm/shared/apps/python/3.6.2/lib:$LD_LIBRARY_PATH

# CRON JOBS
*/30 * * * * /home/j.bloino/bin/gjobupd.py

where the path to gjobupd.py should be updated accordingly.

Warning

If a different frequency (default: 30 minutes) is desired, gjobupd.py should be modified accordingly.

Compilation

gxx_build_cluster.py is a basic tool to facilitate the deployment and compilation of GAUSSIAN on a heterogeneous HPC infrastructure (nodes with different CPUs).

Overview

Minimum input
$ gxx_build_cluster.py what

what can be:

  • A GAUSSIAN archive file, in the form: gAA[.]BCC[C].ext

  • A working tree archive file, in the form: working_gAA[.]BCC[C]_YYYY[-]MM[-]DD.ext

  • A GAUSSIAN version, in the form: DDD[.]BCC[C]. The script looks for the actual archive in a hardcoded directory (default: /cm/shared/gaussian/repository/src).

with:

AA

GAUSSIAN major version. (ex.: 09, 16, DV)

BCC[C]

GAUSSIAN minor revision. (ex.: B01, I04+, I04p)

YYYY[-]MM[-]DD

Working tree version as (year, month, day)

ext

Extension of the archive (supported compressions: gzip, bz2, lzma)

DDD

Alias. Either gAA or any trigram representing the working.

Note
For the working, only valid for a re-compilation.

Installation process

Gaussian

Default installation path: /cm/shared/gaussian

  1. checks if a previous installation exists and removes it.

  2. creates a root directory <gAA.BCC[C]> in the default installation path.

  3. for each microarchitecture (<arch>), creates a directory <gAA.BCC[C]>/<arch> and extracts the archive in the directory.

  4. prepares a PBS submission script for each architecture and submits the compilation job.

Working tree

Default installation path: ${HOME}/gxxwork

  1. checks if a previous installation exists and asks the user how to proceed: [horizontal]

    backup

    renames the working tree, adding bak.YYYY-MM-DD, with YYYY-MM-DD today’s date

    remove

    deletes the previous installation

    keep

    keeps the installation, only recompiling the existing tree (jumps to 4).

  2. creates a src directory and extracts the archive

  3. creates microarchitecture-specific directories (<arch>) at the same level as src and links the source files from src to <arch>

  4. prepares a PBS submission script for each architecture and submits the compilation job.

Keywords

-c, --compile

Only compiles an existing installation. Archive names or aliases can be used.

-m, --mach

List of microarchitecture (supported by GAUSSIAN) on which the installation/compilation will be done

-u, --update

Updates an existing installation with the files in the archive given in input (only for the working tree).

--gpath

Root path of the Gaussian installation. The created structure will have the form:

<gpath>/
    <gAA.BCC[C]>/
        <arch>/
            <gAA>/
--wpath

Root path for the working tree. The created structure will have the form:

<wpath>/
    <gAA.BCC[C]>/
        src/
        <arch>/
            ...
            exe-dir

About

Files related to Gaussian use on HPC clusters

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published