IV - Specialized Jobs
Environment Variables
An environment variable describes something about your working environment. Some of them you can set or modify; others are set by the system. To see what is currently set in your terminal, run
$printenv
To set an environment variable yourself, use the export
command.
$export VAR=value
When your job starts, SLURM will initialize several environment variables. Many of them correspond to options you have set in your SBATCH
preamble. Do not attempt to assign to any variable beginning with SLURM_
Some variables that you may wish to examine or use in your scripts:
Variable | Value |
---|---|
SLURM_SUBMIT_DIR | Directory from which the job was submitted |
SLURM_JOB_NODELIST | List of the nodes on which the job is running |
SLURM_JOB_ID | The numerical ID of the job |
SLURM_NUM_TASKS | The number of tasks (obtained from –ntasks) |
SLURM_NTASKS_PER_NODE | The number of tasks per node |
SLURM_CPUS_PER_TASK | The number of cpus (cores) assigned to each task |
Interactive Jobs
Most HPC sites, including UVa’s, restrict the memory and time allowed to processes on the login nodes. Most jobs can be submitted through the batch system we have been discussing, but sometimes more interactive work is required. For example
- Jobs that must be or are best run through a graphical interface,
- Short development jobs,
- “Computational steering” in which a program runs for an interval, then the output is examined and parameters may be adjusted.
For most of these cases, we strongly recommend the use of the Open OnDemand Interactive Applications. Jupyterlab is available to run notebooks. Rstudio and Matlab Desktop are also available to run through this interface. For more general work, including command-line options, the Desktop is usually the best option. It provides a basic terminal, but also access to other applications should they be needed.
For general-purpose interactive work with graphics, please use the Open OnDemand Desktop. The X11 service that Linux uses for graphics is very slow over a network. Even with a fast connection between two systems, the Desktop will perform better since the X11 server process and the programs that use it are running on the same computer.
If you must use a basic terminal for an interactive job, you must first use the command salloc
. This is the general Slurm command to request resources. This would be followed by srun
to launch the processes. However, this is complex and requires knowledge of the options, so we have provided a local “wrapper” script called ijob
.
ijob takes options similar to those used with SBATCH
, most of which are actually arguments to salloc
.
$ijob –c 1 –A myalloc -t <time> --mem <memory in MB> -p <partition> -J <jobname>
When the job starts you will be logged in to a bash shell in a terminal on the compute node.
Multicore and Multinode Jobs
One of the advantages of using a high-performance cluster is the ability to use many cores and/or nodes at once. This is called parallelism. There are three main types of parallelism.
You should understand whether your program can make use of more than one core or node before you request multiple cores and/or nodes. Special programming is required to enable these capabilities. Asking for multiple cores or nodes that your program cannot use will result in idle cores and wasted SUs, since you are charged for each core-hour. The
seff
command can help with this.
High Throughput Serial Parallelism
High throughput parallelism is when many identical jobs are run at once, each on a single core. Examples can include Monte-Carlo methods, parameter searches, image processing on many related images, some areas of bioinformatics, and many others. For most cases of this type of parallelism, the best Slurm option is a job array.
When planning a high-throughput project, it is important to keep in mind that if the individual jobs are very short, less than roughly 15-30 minutes each, it is very inefficient to run each one separately, whether you do this manually or through an array. In this case you should group your jobs and run multiple instances within the same job script. Please contact us if you would like assistance setting this up.
Multicore (Threaded)
Shared-memory programs can use multiple cores but they must be physically located on the same node. The appropriate Slurm option in this case is -c
(equivalent to cpus-per-task
). Shared memory programs use threading of one form or another.
Example Slurm script for a threaded program:
#SBATCH -n 1
#SBATCH -c 25
#SBATCH -p interactive
#SBATCH -A hpc_training
#SBATCH -t 05:00:00
#SBATCH --mail-user=mst3k@virginia.edu
#SBATCH --mail-type=END
module load gcc
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
./ompheatedplate .0000001 plate
Multinode (MPI)
In this type of parallelism, each process runs independently and communicates with others through a library, the most widely-used of which is MPI. Distributed memory programs can run on single or multiple nodes and often can run on hundreds or even thousands of cores. For distributed-memory programs you can use the -N
option to request a number of nodes, along with ntasks-per-node
to schedule a number of processes on each of those nodes.
#!/bin/bash
#SBATCH --nodes=3
#SBATCH --ntasks-per-node=40
#SBATCH --account=hpc_training
#SBATCH -p parallel
#SBATCH -t 10:00:00
#SBATCH --mail-user=mst3k@virginia.edu
#SBATCH --mail-type=END
module load gcc openmpi
srun ./mpiheatedplate .0000001 plate${SLURM_NTASKS_PER_NODE}
Hybrid MPI plus Threading
Some codes can run with distributed-memory processes, each of which can run in threaded mode. For this, request --ntasks-per-node=NT
and cpus-per-task=NC
, keeping in mind that the total number of cores requested on each node is then $NT \times NC$.
#!/bin/bash
#SBATCH -N 3
#SBATCH --ntasks-per-node=4
#SBATCH -c 10
#SBATCH -p parallel
#SBATCH -A hpc_training
#SBATCH -t 05:00:00
#SBATCH --mail-user=mst3k@virginia.edu
#SBATCH --mail-type=END
module load gcc
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
srun ./heatedplate .0000001 plate${SLURM_NTASKS_PER_NODE}
Job Arrays
Many similar jobs can be submitted simultaneously through job arrays. There are some restrictions:
- It must be a batch job.
- Job arrays should be explicitly named with
-J
- It is generally prudent to separate stdout and stderror with
-o
and-e
A job array is submitted with sbatch --array=<range>
, where range
is two digits separated by a hyphen.
$sbatch --array=0-30 myjobs.sh
An increment can be provided
$sbatch --array=1-7:2 myjobs.sh
This will number them 1, 3, 5, 7
It is also possible to provide a list
$sbatch --array=1,3,4,5,7,9 myjobs.sh
Each job will be provided an environment variable SLURM_ARRAY_JOB_ID
and
each task will be assigned a SLURM_ARRAY_TASK_ID
. The ARRAY_JOB_ID is the overall jobid, whereas the ARRAY_TASK_ID
will take on the values of the numbers in the specified range or list.
Slurm also provides two variables %A
(global array ID) and %a
(array task ID) which can be used in the -o
and -e
options. If they are not used, then the different tasks will attempt to write to the same file, which can result in garbled output or file corruption, so please use them if you wish to redirect streams with those options.
To prepare a job array, set up any input files using appropriate names that will correspond to the numbers in your range or list, e.g.
myinput.0.in
myinput.1.in
...
myinput.30.in
You would submit a job for the above files with
$sbatch --array=0-30
In your Slurm script you would use a command such as
python myscript.py myinput.${SLURM_ARRAY_TASK_ID}.in
The script should be prepared to request resources for one instance of your program.
Complete example array job script:
#!/bin/bash
#SBATCH --ntasks=1
#SBATCH --partition=interactive
#SBATCH -A hpc_training
#SBATCH --time=3:00:00
#SBATCH -o out%A.%a
#SBATCH -e err%A.%a
python myscript.py myinput.${SLURM_ARRAY_TASK_ID}.in
To cancel an entire array, cancel the global ID
scancel 1283839
You can also cancel individual tasks
scancel 1283839_11
Useful Commands
When you submit a job and it doesn’t start or fails for an unknown reason it could be due to restraints in your account. This could include running out of storage space or SUs on your allocation. Additionally, it’s useful to see how busy the queue is. The following subsections highlight how to identify these problems.
Allocations
Sometimes it’s useful to check how many SUs are still available on your allocation. The allocations
command displays information on your allocations and how many SUs are associated with them:
$ allocations
Account Balance Reserved Available
----------------- --------- --------- ---------
hpc_training 1000000 0 999882
running allocations -a <allocation_name>
provides even more detail on when the allocation was last renewed and its members. E.g.
$ allocations -a hpc_training
Description StartTime EndTime Allocated Remaining PercentUsed Active
----------- ------------------- ---------- ----------- ---------- ----------- ------
new 2024-05-29 17:33:13 2025-05-29 1000000.000 999881.524 0.01 True
Name Active CommonName EmailAddress DefaultAccount
------ ------ ------------------------------ ------------------- ----------------------
.
.
.
Storage Quota
One way to check your storage utilization is with the hdquota
command. This command will show you how much of your home, scratch, and leased (if applicable) storage are being utilized. Below is the sample output for hdquota
:
$ hdquota
Type Location Name Size Used Avail Use%
====================================================================================================
home /home mst3k 50G 16G 35G 32%
Scratch /scratch mst3k 12T 2.0T 11T 17%
This is a useful command to check whether you’re running out of storage space or to see where files need to be cleaned up. For more detailed information on disk utilization you may also use the du
command to investigate specific directories.
Queue limits and Usage
To gain information on the different queues you can use the qlist
command. This will show the list of partitions, their usage, and the SU charge rate. You can use qlimits
for information on each queue’s limits.
The sinfo
command will provide some more detailed information on the health of each queue and the number of active nodes available. These commands can be useful in diagnosing why a job may not be running, or to better understand the queue usage for more efficient job throughput. More information on hardware specifications and queue information can be found
here on our website.
Need Help
Research Computing is ready to help you learn to use our systems efficiently. You can submit a ticket. For in-person help, please attend one of our weekly sessions of office hours.