IV - Specialized Jobs

Environment Variables

An environment variable describes something about your working environment. Some of them you can set or modify; others are set by the system. To see what is currently set in your terminal, run

$printenv

To set an environment variable yourself, use the export command.

$export VAR=value

When your job starts, SLURM will initialize several environment variables. Many of them correspond to options you have set in your SBATCH preamble. Do not attempt to assign to any variable beginning with SLURM_

Some variables that you may wish to examine or use in your scripts:

Variable	Value
SLURM_SUBMIT_DIR	Directory from which the job was submitted
SLURM_JOB_NODELIST	List of the nodes on which the job is running
SLURM_JOB_ID	The numerical ID of the job
SLURM_NUM_TASKS	The number of tasks (obtained from –ntasks)
SLURM_NTASKS_PER_NODE	The number of tasks per node
SLURM_CPUS_PER_TASK	The number of cpus (cores) assigned to each task

Interactive Jobs

Most HPC sites, including UVa’s, restrict the memory and time allowed to processes on the login nodes. Most jobs can be submitted through the batch system we have been discussing, but sometimes more interactive work is required. For example

Jobs that must be or are best run through a graphical interface,
Short development jobs,
“Computational steering” in which a program runs for an interval, then the output is examined and parameters may be adjusted.

For most of these cases, we strongly recommend the use of the Open OnDemand Interactive Applications. Jupyterlab is available to run notebooks. Rstudio and Matlab Desktop are also available to run through this interface. For more general work, including command-line options, the Desktop is usually the best option. It provides a basic terminal, but also access to other applications should they be needed.

For general-purpose interactive work with graphics, please use the Open OnDemand Desktop. The X11 service that Linux uses for graphics is very slow over a network. Even with a fast connection between two systems, the Desktop will perform better since the X11 server process and the programs that use it are running on the same computer.

If you must use a basic terminal for an interactive job, you must first use the command salloc. This is the general Slurm command to request resources. This would be followed by srun to launch the processes. However, this is complex and requires knowledge of the options, so we have provided a local “wrapper” script called ijob.

ijob takes options similar to those used with SBATCH, most of which are actually arguments to salloc.

$ijob –c 1 –A myalloc -t <time> --mem <memory in MB> -p <partition> -J <jobname>

When the job starts you will be logged in to a bash shell in a terminal on the compute node.

Never issue an sbatch command from within an interactive job (including OOD jobs). The sbatch command must be used only to submit jobs from a login node.

Multicore and Multinode Jobs

One of the advantages of using a high-performance cluster is the ability to use many cores and/or nodes at once. This is called parallelism. There are three main types of parallelism.

You should understand whether your program can make use of more than one core or node before you request multiple cores and/or nodes. Special programming is required to enable these capabilities. Asking for multiple cores or nodes that your program cannot use will result in idle cores and wasted SUs, since you are charged for each core-hour. The seff command can help with this.

High Throughput Serial Parallelism

High throughput parallelism is when many identical jobs are run at once, each on a single core. Examples can include Monte-Carlo methods, parameter searches, image processing on many related images, some areas of bioinformatics, and many others. For most cases of this type of parallelism, the best Slurm option is a job array.

When planning a high-throughput project, it is important to keep in mind that if the individual jobs are very short, less than roughly 15-30 minutes each, it is very inefficient to run each one separately, whether you do this manually or through an array. In this case you should group your jobs and run multiple instances within the same job script. Please contact us if you would like assistance setting this up.

Multicore (Threaded)

Shared-memory programs can use multiple cores but they must be physically located on the same node. The appropriate Slurm option in this case is -c (equivalent to cpus-per-task). Shared memory programs use threading of one form or another.

Example Slurm script for a threaded program:

#SBATCH -n 1
#SBATCH -c 25
#SBATCH -p standard
#SBATCH -A myalloc
#SBATCH -t 05:00:00
#SBATCH --mail-user=mst3k@virginia.edu
#SBATCH --mail-type=END

module load gcc
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

./ompheatedplate .0000001 plate

Multinode (MPI)

In this type of parallelism, each process runs independently and communicates with others through a library, the most widely-used of which is MPI. Distributed memory programs can run on single or multiple nodes and often can run on hundreds or even thousands of cores. For distributed-memory programs you can use the -N option to request a number of nodes, along with ntasks-per-node to schedule a number of processes on each of those nodes.

#!/bin/bash
#SBATCH --nodes=3
#SBATCH --ntasks-per-node=40
#SBATCH --account=myalloc
#SBATCH -p parallel
#SBATCH -t 10:00:00
#SBATCH --mail-user=mst3k@virginia.edu
#SBATCH --mail-type=END

module load gcc openmpi

srun ./mpiheatedplate .0000001 plate${SLURM_NTASKS_PER_NODE}

Hybrid MPI plus Threading

Some codes can run with distributed-memory processes, each of which can run in threaded mode. For this, request --ntasks-per-node=NT and cpus-per-task=NC, keeping in mind that the total number of cores requested on each node is then $NT \times NC$.

#SBATCH -N 3
#SBATCH --ntasks-per-node=4
#SBATCH -c 10
#SBATCH -p parallel
#SBATCH -A myalloc
#SBATCH -t 05:00:00
#SBATCH --mail-user=mst3k@virginia.edu
#SBATCH --mail-type=END

module load gcc
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

srun ./heatedplate .0000001 plate${SLURM_NTASKS_PER_NODE}

Job Arrays

Many similar jobs can be submitted simultaneously through job arrays. There are some restrictions:

It must be a batch job.
Job arrays should be explicitly named with -J
It is generally prudent to separate stdout and stderror with -o and -e

A job array is submitted with sbatch --array=<range>, where range is two digits separated by a hyphen.

$sbatch --array=0-30 myjobs.sh

An increment can be provided

$sbatch --array=1-7:2 myjobs.sh

This will number them 1, 3, 5, 7

It is also possible to provide a list

$sbatch --array=1,3,4,5,7,9 myjobs.sh

Each job will be provided an environment variable SLURM_ARRAY_JOB_ID and each task will be assigned a SLURM_ARRAY_TASK_ID. The ARRAY_JOB_ID is the overall jobid, whereas the ARRAY_TASK_ID will take on the values of the numbers in the specified range or list.

Slurm also provides two variables %A (global array ID) and %a (array task ID) which can be used in the -o and -e options. If they are not used, then the different tasks will attempt to write to the same file, which can result in garbled output or file corruption, so please use them if you wish to redirect streams with those options.

To prepare a job array, set up any input files using appropriate names that will correspond to the numbers in your range or list, e.g.

myinput.0.in
myinput.1.in
...
myinput.30.in

You would submit a job for the above files with

$sbatch --array=0-30

In your Slurm script you would use a command such as

python myscript.py myinput.${SLURM_ARRAY_TASK_ID}.in}

The script should be prepared to request resources for one instance of your program.

Complete example array job script:

#!/bin/bash
#SBATCH --ntasks=1
#SBATCH --partition=standard
#SBATCH -A myalloc
#SBATCH --time=3:00:00
#SBATCH -o out%A.%a
#SBATCH -e err%A.%a

python myscript.py myinput.${SLURM_ARRAY_TASK_ID}.in

To cancel an entire array, cancel the global ID

scancel 1283839

You can also cancel individual tasks

scancel 1283839_11

Last updated on Jan 1, 0001