III - Jobs on the Cluster

Running Jobs from Scratch

We recommend that you run your jobs out of your /scratch directory.

Your personal /scratch/mst3k folder has much more storage space than your home directory.
/scratch is on a Weka filesystem, a storage system designed specifically for fast access.
/scratch is connected to the compute nodes with Infiniband, a very fast network connection.

The scratch system is not permanent storage, and files older than 90 days will be marked for deleting (purging). You should keep copies of your programs and data in more permanent locations such as your home directory, leased storage such as /project or /standard, or on your lab workstation. After your jobs finish, copy the results to permanent storage.

Submitting a Job

Once we have navigated to the desired working directory in a terminal window, we use the sbatch command to submit the job. This assumes that your Slurm script is located in the current working directory.

sbatch myjob.slurm

The system returns a JOBID.

We do not make the script executable. The system handles that.

$sbatch myjob.slurm
Submitted batch job 36805

Always remember that you submit your job script and not your executable or interpreter script.

Exercise

From your working directory where hello.slurm is, submit the job.

Monitoring a Job

Once submitted, we can monitor our jobs.

Graphical Interface

The Open OnDemand Job Viewer (Jobs tab→Active Jobs) shows a Web-based view of jobs. You can switch the dropdown between “All Jobs” and “Your Jobs.” You can also use the Filter textbox to select jobs by partition or another criterion. In the Filter textbox you can enter multiple strings, which acts as “and.”

Clicking the right-pointing arrow on the left side will cause a dropdown box to appear that will show the job status (Pending, Running, Completed) along with much other useful information.

Remember that this is a Web page and you will need to reload it in order to see changes in status.

Command Line

We use the squeue command to check on jobs from the terminal.

$squeue

This shows all jobs. To narrow that down we can use the -u (user) option or the -p (partition) option.

$squeue -u mst3k
$squeue -p gpu

Job status is indicated by

PD pending
R running
CG exiting

JOBID PARTITION     NAME     USER	  ST    TIME  NODES  NODELIST(REASON)

36805  standard   myjob.sl  mst3k    R    1:45    1     udc-aw38-34-l

Jobs should rarely be observed in the CG state. If they are caught in that state they cannot be canceled by the user. Exiting jobs will not charge for the time spent in that state.

For more information on a running job, similar to what you can see from the OOD Job Viewer, use the scontrol command.

scontrol show job <jobid>

Deleting a Job

Open OnDemand

From the Job Viewer find your jobs. If the job is pending or running, a red trash-can icon will appear under the “Actions” header. Click the icon. A dialog box will appear asking you to confirm the cancellation.

Command Line

To cancel a job use the scancel with the job ID. You can use squeue -u $USER to obtain your job IDs, but you must know the JID of the specific job you wish to cancel.

$scancel 36805 #jobID

Be aware that if a job fails due to a system failure the time will not be charged, but if you cancel your job, or it fails due to inadequate resource request, your allocation will be charged for the time expended.

Exercise 4

Write a Slurm script that requests 30 minutes of time. Submit a job that will run for at least 30 minutes. It can be some software you use; if you do not have anything set up yet, write the preamble and then add the line

sleep 30m

as the command. You won’t need to request a specific amount of memory. Submit this script and monitor your job’s status in the queue with squeue or the Active Jobs tab. Once it starts, get information about your job with scontrol, let it run for a minute, then cancel it with scancel. Practice with the terminal commands or the OOD GUI. Note that you will need your job’s ID for the last two commands.

Example script

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --time=00:30:00
#SBATCH --partition=interactive
#SBATCH --account=hpc_training

module purge

sleep 30m

Examining Your Utilization

When your jobs have finished, you may wish to find out how much of the resource you utilized. sacct and seff are two commands and a software module called Jobstats can be used for this purpose.

sacct

As the name suggests, sacct will return accounting information about your job. It is built-in to Slurm and does not know about local policies such as SU charges, but it will show you information about the job. It only works for jobs that have ended.

With no options it will show output for jobs run on the current date.

JobID           JobName  Partition    Account  AllocCPUS      State ExitCode 
------------ ---------- ---------- ---------- ---------- ---------- -------- 
56220974      mpi.slurm   parallel  hpc_build         10     FAILED      9:0 
56220974.ba+      batch             hpc_build          5     FAILED      9:0 
56220974.0   mpiheated+             hpc_build         10     FAILED      1:0 
56220992      mpi.slurm   standard  hpc_build         10  COMPLETED      0:0 
56220992.ba+      batch             hpc_build         10  COMPLETED      0:0 
56220992.0   mpiheated+             hpc_build         10  COMPLETED      0:0 
56221184      mpi.slurm   standard  hpc_build         10  COMPLETED      0:0 
56221184.ba+      batch             hpc_build         10  COMPLETED      0:0 
56221184.0   mpiheated+             hpc_build         10  COMPLETED      0:0 
56221192      mpi.slurm   standard  hpc_build         10  COMPLETED      0:0 
56221192.ba+      batch             hpc_build         10  COMPLETED      0:0 
56221192.0   mpiheated+             hpc_build         10  COMPLETED      0:0

For a particular job, use the -j option.

$sacct -j 56221192
JobID           JobName  Partition    Account  AllocCPUS      State ExitCode 
------------ ---------- ---------- ---------- ---------- ---------- -------- 
56221192      mpi.slurm   standard  hpc_build         10  COMPLETED      0:0 
56221192.ba+      batch             hpc_build         10  COMPLETED      0:0 
56221192.0   mpiheated+             hpc_build         10  COMPLETED      0:0

For more detail, specify the -o option and a list of fields. The list of available fields is returned by sacct -e and is lengthy. For example, if I use only one allocation I may not be interested in that field.

$sacct -o jobname,jobid,ncpus,nnodes,maxrss,state,elapsed -j 56221192
   JobName JobID             NCPUS   NNodes     MaxRSS      State    Elapsed 
---------- ------------ ---------- -------- ---------- ---------- ---------- 
 mpi.slurm 56221192             10        1             COMPLETED   00:00:34 
     batch 56221192.ba+         10        1      4824K  COMPLETED   00:00:34 
mpiheated+ 56221192.0           10        1    108800K  COMPLETED   00:00:33

The output from sacct can be heavily customized. For more information see the documentation.

Running sacct puts a load on the system and can be very slow, so please use it judiciously.

seff

The seff command returns information about the utilization (called the “efficiency”) of core and memory. The output of seff will be returned in an email if you use END in Slurm’s emailing feature.

$seff 56221192
Job ID: 56221192
Cluster: shen
User/Group: mst3k/users
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 10
CPU Utilized: 00:05:17
CPU Efficiency: 93.24% of 00:05:40 core-walltime
Job Wall-clock time: 00:00:34
Memory Utilized: 1.04 GB (estimated maximum)
Memory Efficiency: 1.18% of 87.89 GB (8.79 GB/core)

Under most circumstances, for a cpu-only job the “CPU” (core) efficiency should be around 90% or better. Please contact us if it is significantly lower than that. Note that seff may be relatively inaccurate for very short jobs.

Core efficiency is more problematic for GPU jobs, since the key to efficient GPU utilization is maximizing the GPU computations and minimizing CPU work. Seff does not provide a GPU utilization metric at this time, but we may be able to help you if you are concerned about GPU utilization.

If your memory utilization is low and you have requested a specified amount, use sacct -o with at least the MaxRSS field to double-check. If you do not need as much memory as you thought, you may be able to save SUs and have a shorter queue wait time if you decrease it.

jobstats

Jobstats is an open-source job monitoring software that can be used to gain more utilization statistics about completed jobs. It is designed for CPU and GPU clusters that use the Slurm workload manager. It is not built in to Slurm like seff or sacct, so it needs to be loaded as a module first:

module load jobstats

Once a job has finished, you can run the command jobstats <JobID> to receive a utilization report. Here is an example of what a report looks like:

Example Report

================================================================================
                              Slurm Job Statistics
================================================================================
         Job ID: REDACTED
  NetID/Account: REDACTED
       Job Name: REDACTED
          State: COMPLETED
          Nodes: 1
      CPU Cores: 1
     CPU Memory: 50GB
           GPUs: 1
  QOS/Partition: normal/bii-gpu
        Cluster: shen
     Start Time: Wed May 14, 2025 at 9:39 AM
       Run Time: 00:32:33
     Time Limit: 05:00:00

                              Overall Utilization
================================================================================
  CPU utilization  [|||||||||||||||||||||||||||||||||||||||||||||||97%]
  CPU memory usage [||                                              4%]
  GPU utilization  [|||||||||||||||||||||||||||||||||||||||||||||||97%]
  GPU memory usage [||                                              5%]

                              Detailed Utilization
================================================================================
  CPU utilization per node (CPU time used/run time)
      10.153.10.145: 00:31:32/00:32:33 (efficiency=96.9%)

  CPU memory usage per node - used/allocated
      10.153.10.145: 2.0GB/48.8GB (2.0GB/48.8GB per core of 1)

  GPU utilization per node
      10.153.10.145 (GPU 0): 96.7%

  GPU memory usage per node - maximum used/total
      10.153.10.145 (GPU 0): 1.6GB/32.0GB (5.1%)

                                     Notes
================================================================================
  * This job only needed 11% of the requested time which was 05:00:00. For
    future jobs, please request less time by modifying the --time Slurm
    directive. This will lower your queue times and allow the Slurm job
    scheduler to work more effectively for all users. For more info:
      https://www.rc.virginia.edu/userinfo/hpc/slurm/

  * This job only used 4% of the 50GB of total allocated CPU memory. For
    future jobs, please allocate less memory by using a Slurm directive such
    as --mem-per-cpu=3G or --mem=3G. This will reduce your queue times and
    make the resources available to other users. For more info:
      https://www.rc.virginia.edu/userinfo/hpc/slurm/#options

To highlight some of the information jobstats reports on:

Both CPU and GPU utilization efficiency
CPU utilization across multiple nodes
Utilization across multiple GPUs
Notes on areas of job improvement

Note that the suggestions provided at the end of the report may or may not be applicable to your jobs. The provided example was a job run on a GPU, where CPU memory is utilized less. A low utilization report on CPU memory usage would not be of concern versus a report on low GPU memory utilization. Please contact us if you have any questions or concerns about your utilization report.

Jobstats reports are only kept in the database for so long, so older jobs will give an error. Jobstats will also not be able to collect useful information for short jobs and will provide the same error. The output will recommend instead to use seff for utilization details. Because of the soft time limit, it may be useful to immediately run a report on a completed job and save the report in a file for later use.

Stream Output in Slurm

When running a program interactively, any output to the Unix standard streams will be printed directly to the user’s console window. However, programs running under the control of Slurm will not have a console attached.

By default, SLURM redirects both standard output and standard error to a file called slurm-<jobid>.out.

You can change the name of this file with the -o or --output option in your script.

#SBATCH --output=<filename>

#SBATCH -o <filename>

You can also separate standard-error output. Even if your program does not use standard error (not many do), Slurm uses it, so you may wish to keep that output distinct.

#SBATCH --error=<filename>

#SBATCH -e <filename>

Text from standard input must be redirected from a file in your command line in the script.

./myexec < myinput.txt

As an alternative to the Slurm options, you can also redirect standard output in the usual Unix manner if you prefer.

./myexec < myinput.txt > myoutput.dat

Last updated on Jan 1, 0001