III - Jobs on the Cluster
Running Jobs from Scratch
We recommend that you run your jobs out of your /scratch directory.
- Your personal /scratch/mst3k folder has much more storage space than your home directory.
- /scratch is on a Weka filesystem, a storage system designed specifically for fast access.
- /scratch is connected to the compute nodes with Infiniband, a very fast network connection.
Submitting a Job
Once we have navigated to the desired working directory in a terminal window, we use the sbatch
command to submit the job. This assumes that your Slurm script is located in the current working directory.
sbatch myjob.slurm
The system returns a JOBID.
We do not make the script executable. The system handles that.
$sbatch myjob.slurm
Submitted batch job 36805
Always remember that you submit your job script and not your executable or interpreter script.
Exercise
From your working directory where hello.slurm is, submit the job.
Monitoring a Job
Once submitted, we can monitor our jobs.
Graphical Interface
The Open OnDemand Job Viewer (Jobs tab→Active Jobs) shows a Web-based view of jobs. You can switch the dropdown between “All Jobs” and “Your Jobs.” You can also use the Filter textbox to select jobs by partition or another criterion. In the Filter textbox you can enter multiple strings, which acts as “and.”
Clicking the right-pointing arrow on the left side will cause a dropdown box to appear that will show the job status (Pending, Running, Completed) along with much other useful information.
Remember that this is a Web page and you will need to reload it in order to see changes in status.
Command Line
We use the squeue
command to check on jobs from the terminal.
$squeue
This shows all jobs. To narrow that down we can use the -u
(user) option or the -p
(partition) option.
$squeue -u mst3k
$squeue -p gpu
Job status is indicated by
- PD pending
- R running
- CG exiting
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
36805 standard myjob.sl mst3k R 1:45 1 udc-aw38-34-l
Jobs should rarely be observed in the CG
state. If they are caught in that state they cannot be canceled by the user. Exiting jobs will not charge for the time spent in that state.
For more information on a running job, similar to what you can see from the OOD Job Viewer, use the scontrol
command.
scontrol show job <jobid>
Deleting a Job
Open OnDemand
From the Job Viewer find your jobs. If the job is pending or running, a red trash-can icon will appear under the “Actions” header. Click the icon. A dialog box will appear asking you to confirm the cancellation.
Command Line
To cancel a job use the scancel
with the job ID. You can use squeue -u $USER
to obtain your job IDs, but you must know the JID of the specific job you wish to cancel.
$scancel 36805 #jobID
Be aware that if a job fails due to a system failure the time will not be charged, but if you cancel your job, or it fails due to inadequate resource request, your allocation will be charged for the time expended.
Exercise 4
Write a Slurm script that requests 30 minutes of time. Submit a job that will run for at least 30 minutes. It can be some software you use; if you do not have anything set up yet, write the preamble and then add the line
sleep 30m
as the command. You won’t need to request a specific amount of memory. Submit this script and monitor your job’s status in the queue with squeue
or the Active Jobs tab. Once it starts, get information about your job with scontrol
, let it run for a minute, then cancel it with scancel
. Practice with the terminal commands or the OOD GUI. Note that you will need your job’s ID for the last two commands.
Example script
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --time=00:30:00
#SBATCH --partition=interactive
#SBATCH --account=hpc_training
module purge
sleep 30m
Examining Your Utilization
When your jobs have finished, you may wish to find out how much of the resource you utilized. Two commands can be used for this purpose, sacct
and seff
.
sacct
As the name suggests, sacct
will return accounting information about your job. It is built-in to Slurm and does not know about local policies such as SU charges, but it will show you information about the job. It only works for jobs that have ended.
With no options it will show output for jobs run on the current date.
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
56220974 mpi.slurm parallel hpc_build 10 FAILED 9:0
56220974.ba+ batch hpc_build 5 FAILED 9:0
56220974.0 mpiheated+ hpc_build 10 FAILED 1:0
56220992 mpi.slurm standard hpc_build 10 COMPLETED 0:0
56220992.ba+ batch hpc_build 10 COMPLETED 0:0
56220992.0 mpiheated+ hpc_build 10 COMPLETED 0:0
56221184 mpi.slurm standard hpc_build 10 COMPLETED 0:0
56221184.ba+ batch hpc_build 10 COMPLETED 0:0
56221184.0 mpiheated+ hpc_build 10 COMPLETED 0:0
56221192 mpi.slurm standard hpc_build 10 COMPLETED 0:0
56221192.ba+ batch hpc_build 10 COMPLETED 0:0
56221192.0 mpiheated+ hpc_build 10 COMPLETED 0:0
For a particular job, use the -j
option.
$sacct -j 56221192
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
56221192 mpi.slurm standard hpc_build 10 COMPLETED 0:0
56221192.ba+ batch hpc_build 10 COMPLETED 0:0
56221192.0 mpiheated+ hpc_build 10 COMPLETED 0:0
For more detail, specify the -o
option and a list of fields. The list of available fields is returned by sacct -e
and is lengthy. For example, if I use only one allocation I may not be interested in that field.
$sacct -o jobname,jobid,ncpus,nnodes,maxrss,state,elapsed -j 56221192
JobName JobID NCPUS NNodes MaxRSS State Elapsed
---------- ------------ ---------- -------- ---------- ---------- ----------
mpi.slurm 56221192 10 1 COMPLETED 00:00:34
batch 56221192.ba+ 10 1 4824K COMPLETED 00:00:34
mpiheated+ 56221192.0 10 1 108800K COMPLETED 00:00:33
The output from sacct
can be heavily customized. For more information see the
documentation.
Running sacct
puts a load on the system and can be very slow, so please use it judiciously.
seff
The seff
command returns information about the utilization (called the “efficiency”) of core and memory.
$seff 56221192
Job ID: 56221192
Cluster: shen
User/Group: mst3k/users
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 10
CPU Utilized: 00:05:17
CPU Efficiency: 93.24% of 00:05:40 core-walltime
Job Wall-clock time: 00:00:34
Memory Utilized: 1.04 GB (estimated maximum)
Memory Efficiency: 1.18% of 87.89 GB (8.79 GB/core)
Under most circumstances, for a cpu-only job the “CPU” (core) efficiency should be around 90% or better. Please contact us if it is significantly lower than that. Note that seff may be relatively inaccurate for very short jobs.
Core efficiency is more problematic for GPU jobs, since the key to efficient GPU utilization is maximizing the GPU computations and minimizing CPU work. Seff does not provide a GPU utilization metric at this time, but we may be able to help you if you are concerned about GPU utilization.
If your memory utilization is low and you have requested a specified amount, use sacct -o
with at least the MaxRSS field to double-check. If you do not need as much memory as you thought, you may be able to save SUs and have a shorter queue wait time if you decrease it.
Stream Output in Slurm
When running a program interactively, any output to the Unix standard streams will be printed directly to the user’s console window. However, programs running under the control of Slurm will not have a console attached.
By default, SLURM redirects both standard output and standard error to a file called slurm-<jobid>.out
.
You can change the name of this file with the -o
or --output
option in your script.
#SBATCH --output=<filename>
or
#SBATCH -o <filename>
You can also separate standard-error output. Even if your program does not use standard error (not many do), Slurm uses it, so you may wish to keep that output distinct.
#SBATCH --error=<filename>
or
#SBATCH -e <filename>
Text from standard input must be redirected from a file in your command line in the script.
./myexec < myinput.txt
As an alternative to the Slurm options, you can also redirect standard output in the usual Unix manner if you prefer.
./myexec < myinput.txt > myoutput.dat