Using Containers on HPC [Apptainer]
Log on to our HPC cluster
- SSH client or FastX Web
- Run
hdquota
- Make sure you have a few GBs of free space
- Run
allocations
- Check if you have
hpc_training
- Check if you have
Basic Apptainer commands
Pull
To download a container hosted on a registry, use the pull
command. Docker images are automatically converted into Apptainer format.
apptainer pull [<SIF>] <URI>
<URI>
(Unified resource identifiers)[library|docker|shub]://[<user>/]<repo>[:<tag>]
- Default prefix:
library
( Singularity Library) user
: optional; may be empty (e.g.apptainer pull ubuntu
)tag
: optional; default:latest
<SIF>
(Singularity image format)- Optional
- Rename image; default:
<repo>_<tag>.sif
Pull lolcow from Docker Hub
apptainer pull docker://rsdmse/lolcow
Inspect
Inspect an image before running it via inspect
.
apptainer inspect <SIF>
$ apptainer inspect lolcow_latest.sif
org.label-schema.build-arch: amd64
org.label-schema.build-date: Monday_8_January_2024_10:21:0_EST
org.label-schema.schema-version: 1.0
org.label-schema.usage.apptainer.version: 1.2.2
org.label-schema.usage.singularity.deffile.bootstrap: docker
org.label-schema.usage.singularity.deffile.from: rsdmse/lolcow
Inspect runscript
This is the default command of the container. (Docker ENTRYPOINT
is preserved.)
apptainer inspect --runscript <SIF>
$ apptainer inspect --runscript lolcow_latest.sif
#!/bin/sh
OCI_ENTRYPOINT='"/bin/sh" "-c" "fortune | cowsay | lolcat"'
...
Run
There are three ways to run a container: run
, shell
, exec
.
run
Execute the default command in inspect --runscript
.
CPU: apptainer run <SIF>
= ./<SIF>
GPU: apptainer run --nv <SIF>
(later)
./lolcow_latest.sif
shell
Start an Apptainer container interactively in its shell.
apptainer shell <SIF>
$ apptainer shell lolcow_latest.sif
Apptainer>
The change in prompt indicates you are now inside the container.
To exit the container shell, type exit
.
exec
Execute custom commands without shelling into the container.
apptainer exec <SIF> <command>
$ apptainer exec lolcow_latest.sif which fortune
/usr/bin/fortune
Bind mount
- Apptainer bind mounts these host directories at runtime:
- Personal directories:
/home
,/scratch
- Leased storage shared by your research group:
/project
,/standard
- Your current working directory
- Personal directories:
- To bind mount additional host directories/files, use
--bind
/-B
:
apptainer run|shell|exec -B <host_path>[:<container_path>] <SIF>
Exercises
- For each of the three executables
fortune
,cowsay
,lolcat
, runwhich
both inside and outside thelolcow
container. Which one exists on both the host and the container? - a) Run
ls -l
for your home directory both inside and outside the container. Verify that you get the same result. b) To disable all bind mounting, userun|shell|exec -c
. Verify that$HOME
is now empty. - View the content of
/etc/os-release
both inside and outside the container. Are they the same or different? Why? - (Advanced) Let’s see if we can run the host
gcc
inside the lolcow container. First load the module:module load gcc
- Verify that the path to
gcc
(hint:which
) is equal to$EBROOTGCC/bin
. - Verify that
$EBROOTGCC/bin
is in yourPATH
. - Now shell into the container (hint:
-B /apps
) and examine the environment variables$EBROOTGCC
and$PATH
. Are they the same as those on the host? Why (not)? - In the container, add
$EBROOTGCC/bin
toPATH
(hint:export
). Is it detectable bywhich
? Can you launchgcc
? Why (not)?
- Verify that the path to
Container Modules
Apptainer module
The apptainer
module serves as a “toolchain” that will activate container modules. You must load apptainer
before loading container modules.
See what modules are available by default:
module purge
module avail
Check the module version of Apptainer:
module spider apptainer
Load the Apptainer module and check what modules are available:
module load apptainer
module avail
You can now load container modules.
Container modules under apptainer toolchain
The corresponding run
command is displayed upon loading a module.
$ module load tensorflow
To execute the default application inside the container, run:
apptainer run --nv $CONTAINERDIR/tensorflow-2.13.0.sif
$ module list
Currently Loaded Modules:
1) apptainer/1.2.2 2) tensorflow/2.13.0
$CONTAINERDIR
is an environment variable. It is the directory where containers are stored.- After old container module versions are deprecated, the corresponding containers are placed in
$CONTAINERDIR/archive
. These are inaccessible through the module system, but you are welcome to use them if necessary.
Exercise
- What happens if you load a container module without loading Apptainer first?
module purge module list module load tensorflow
- Check the versions of tensorflow via
module spider tensorflow
. How would you load a non-default version? - What is the default command of the tensorflow container? Where was it pulled from?
Container Slurm job (TensorFlow on GPU)
- Computationally intensive tasks must be performed on compute nodes.
- Slurm is a resource manager.
- Prepare a Slurm script to submit a job.
Copy these files:
cp /share/resources/tutorials/apptainer_ws/tensorflow-2.13.0.slurm .
cp /share/resources/tutorials/apptainer_ws/mnist_example.{ipynb,py} .
Examine Slurm script:
#!/bin/bash
#SBATCH -A hpc_training # account name
#SBATCH -p gpu # partition/queue
#SBATCH --gres=gpu:1 # request 1 gpu
#SBATCH -c 1 # request 1 cpu core
#SBATCH -t 00:05:00 # time limit: 5 min
#SBATCH -J tftest # job name
#SBATCH -o tftest-%A.out # output file
#SBATCH -e tftest-%A.err # error file
VERSION=2.13.0
# start with clean environment
module purge
module load apptainer tensorflow/$VERSION
apptainer run --nv $CONTAINERDIR/tensorflow-$VERSION.sif mnist_example.py
Submit job:
sbatch tensorflow-2.13.0.slurm
What does --nv
do?
$ apptainer shell $CONTAINERDIR/tensorflow-2.13.0.sif
Apptainer> ls /.singularity.d/libs
$ apptainer shell --nv $CONTAINERDIR/tensorflow-2.13.0.sif
Apptainer> ls /.singularity.d/libs
libEGL.so libGLX.so.0 libnvidia-cfg.so libnvidia-ifr.so
libEGL.so.1 libGLX_nvidia.so.0 libnvidia-cfg.so.1 libnvidia-ifr.so.1
...
Custom Jupyter Kernel
“Can I use my own container on JupyterLab?”
Suppose you need to use TensorFlow 2.17.0 on JupyterLab. First, note we do not have tensorflow/2.17.0
as a module:
module spider tensorflow
Go to
TensorFlow’s Docker Hub page and search for the tag (i.e. version). You’ll want to use one that has the -gpu-jupyter
suffix. Pull the container in your account.
Installation
Manual
- Create kernel directory
DIR=~/.local/share/jupyter/kernels/tensorflow-2.17.0
mkdir -p $DIR
cd $DIR
- Write
kernel.json
{
"argv": [
"/home/<user>/.local/share/jupyter/kernels/tensorflow-2.17.0/init.sh",
"-f",
"{connection_file}"
],
"display_name": "Tensorflow 2.17",
"language": "python"
}
- Write
init.sh
#!/bin/bash
module load apptainer
apptainer exec --nv /path/to/sif python -m ipykernel $@
- Change
init.sh
into an executable
chmod +x init.sh
Easy to automate!
JKRollout
This tool is currently limited to Python. The container must have the ipykernel
Python package.
Usage: jkrollout sif display_name [gpu]
sif = file name of *.sif
display_name = name of Jupyter kernel
gpu = enable gpu (default: false)
jkrollout /path/to/sif "Tensorflow 2.17" gpu
Test your new kernel
- Go to https://ood.hpc.virginia.edu
- Select JupyterLab
- Partition: GPU
- Work Directory: (location of your
mnist_example.ipynb
) - Allocation:
hpc_training
- Select the new “TensorFlow 2.17” kernel
- Run
mnist_example.ipynb
Remove a custom kernel
rm -rf ~/.local/share/jupyter/kernels/tensorflow-2.17.0