Using Containers on HPC [Apptainer]

Log on to our HPC cluster

  • SSH client or FastX Web
  • Run hdquota
    • Make sure you have a few GBs of free space
  • Run allocations
    • Check if you have hpc_training

Basic Apptainer commands

Pull

To download a container hosted on a registry, use the pull command. Docker images are automatically converted into Apptainer format.

apptainer pull [<SIF>] <URI>

  • <URI> (Unified resource identifiers)
    • [library|docker|shub]://[<user>/]<repo>[:<tag>]
    • Default prefix: library ( Singularity Library)
    • user: optional; may be empty (e.g. apptainer pull ubuntu)
    • tag: optional; default: latest
  • <SIF> (Singularity image format)
    • Optional
    • Rename image; default: <repo>_<tag>.sif

Pull lolcow from Docker Hub

apptainer pull docker://rsdmse/lolcow

Inspect

Inspect an image before running it via inspect.

apptainer inspect <SIF>

$ apptainer inspect lolcow_latest.sif 
org.label-schema.build-arch: amd64
org.label-schema.build-date: Monday_8_January_2024_10:21:0_EST
org.label-schema.schema-version: 1.0
org.label-schema.usage.apptainer.version: 1.2.2
org.label-schema.usage.singularity.deffile.bootstrap: docker
org.label-schema.usage.singularity.deffile.from: rsdmse/lolcow

Inspect runscript

This is the default command of the container. (Docker ENTRYPOINT is preserved.)

apptainer inspect --runscript <SIF>

$ apptainer inspect --runscript lolcow_latest.sif 
#!/bin/sh
OCI_ENTRYPOINT='"/bin/sh" "-c" "fortune | cowsay | lolcat"'
...

Run

There are three ways to run a container: run, shell, exec.

run

Execute the default command in inspect --runscript.

CPU: apptainer run <SIF> = ./<SIF>

GPU: apptainer run --nv <SIF> (later)

./lolcow_latest.sif

shell

Start an Apptainer container interactively in its shell.

apptainer shell <SIF>

$ apptainer shell lolcow_latest.sif
Apptainer>

The change in prompt indicates you are now inside the container.

To exit the container shell, type exit.

exec

Execute custom commands without shelling into the container.

apptainer exec <SIF> <command>

$ apptainer exec lolcow_latest.sif which fortune
/usr/bin/fortune

Bind mount

  • Apptainer bind mounts these host directories at runtime:
    • Personal directories: /home, /scratch
    • Leased storage shared by your research group: /project, /standard
    • Your current working directory
  • To bind mount additional host directories/files, use --bind/-B:
apptainer run|shell|exec -B <host_path>[:<container_path>] <SIF>

Exercises

  1. For each of the three executables fortune, cowsay, lolcat, run which both inside and outside the lolcow container. Which one exists on both the host and the container?
  2. a) Run ls -l for your home directory both inside and outside the container. Verify that you get the same result. b) To disable all bind mounting, use run|shell|exec -c. Verify that $HOME is now empty.
  3. View the content of /etc/os-release both inside and outside the container. Are they the same or different? Why?
  4. (Advanced) Let’s see if we can run the host gcc inside the lolcow container. First load the module: module load gcc
    • Verify that the path to gcc (hint: which) is equal to $EBROOTGCC/bin.
    • Verify that $EBROOTGCC/bin is in your PATH.
    • Now shell into the container (hint: -B /apps) and examine the environment variables $EBROOTGCC and $PATH. Are they the same as those on the host? Why (not)?
    • In the container, add $EBROOTGCC/bin to PATH (hint: export). Is it detectable by which? Can you launch gcc? Why (not)?

Container Modules

Apptainer module

The apptainer module serves as a “toolchain” that will activate container modules. You must load apptainer before loading container modules.

See what modules are available by default:

module purge
module avail

Check the module version of Apptainer:

module spider apptainer

Load the Apptainer module and check what modules are available:

module load apptainer
module avail

You can now load container modules.

Container modules under apptainer toolchain

The corresponding run command is displayed upon loading a module.

$ module load tensorflow
To execute the default application inside the container, run:
apptainer run --nv $CONTAINERDIR/tensorflow-2.13.0.sif

$ module list
Currently Loaded Modules:
  1) apptainer/1.2.2   2) tensorflow/2.13.0
  • $CONTAINERDIR is an environment variable. It is the directory where containers are stored.
  • After old container module versions are deprecated, the corresponding containers are placed in $CONTAINERDIR/archive. These are inaccessible through the module system, but you are welcome to use them if necessary.

Exercise

  1. What happens if you load a container module without loading Apptainer first?
    module purge
    module list
    module load tensorflow
    
  2. Check the versions of tensorflow via module spider tensorflow. How would you load a non-default version?
  3. What is the default command of the tensorflow container? Where was it pulled from?

Container Slurm job (TensorFlow on GPU)

  • Computationally intensive tasks must be performed on compute nodes.
  • Slurm is a resource manager.
  • Prepare a Slurm script to submit a job.

Copy these files:

cp /share/resources/tutorials/apptainer_ws/tensorflow-2.13.0.slurm .
cp /share/resources/tutorials/apptainer_ws/mnist_example.{ipynb,py} .

Examine Slurm script:

#!/bin/bash
#SBATCH -A hpc_training      # account name
#SBATCH -p gpu               # partition/queue
#SBATCH --gres=gpu:1         # request 1 gpu
#SBATCH -c 1                 # request 1 cpu core
#SBATCH -t 00:05:00          # time limit: 5 min
#SBATCH -J tftest            # job name
#SBATCH -o tftest-%A.out     # output file
#SBATCH -e tftest-%A.err     # error file

VERSION=2.13.0
# start with clean environment
module purge
module load apptainer tensorflow/$VERSION

apptainer run --nv $CONTAINERDIR/tensorflow-$VERSION.sif mnist_example.py

Submit job:

sbatch tensorflow-2.13.0.slurm

What does --nv do?

See Apptainer GPU user guide

$ apptainer shell $CONTAINERDIR/tensorflow-2.13.0.sif
Apptainer> ls /.singularity.d/libs

$ apptainer shell --nv $CONTAINERDIR/tensorflow-2.13.0.sif
Apptainer> ls /.singularity.d/libs
libEGL.so		  libGLX.so.0		       libnvidia-cfg.so			  libnvidia-ifr.so
libEGL.so.1		  libGLX_nvidia.so.0	       libnvidia-cfg.so.1		  libnvidia-ifr.so.1
...

Custom Jupyter Kernel

“Can I use my own container on JupyterLab?”

Suppose you need to use TensorFlow 2.17.0 on JupyterLab. First, note we do not have tensorflow/2.17.0 as a module:

module spider tensorflow

Go to TensorFlow’s Docker Hub page and search for the tag (i.e. version). You’ll want to use one that has the -gpu-jupyter suffix. Pull the container in your account.

Installation

Manual

  1. Create kernel directory
DIR=~/.local/share/jupyter/kernels/tensorflow-2.17.0
mkdir -p $DIR
cd $DIR
  1. Write kernel.json
{
 "argv": [
  "/home/<user>/.local/share/jupyter/kernels/tensorflow-2.17.0/init.sh",
  "-f",
  "{connection_file}"
 ],
 "display_name": "Tensorflow 2.17",
 "language": "python"
}
  1. Write init.sh
#!/bin/bash
module load apptainer
apptainer exec --nv /path/to/sif python -m ipykernel $@
  1. Change init.sh into an executable
chmod +x init.sh

Easy to automate!

JKRollout

This tool is currently limited to Python. The container must have the ipykernel Python package.

Usage: jkrollout sif display_name [gpu]
    sif          = file name of *.sif
    display_name = name of Jupyter kernel
    gpu          = enable gpu (default: false)
jkrollout /path/to/sif "Tensorflow 2.17" gpu

Test your new kernel

  • Go to https://ood.hpc.virginia.edu
  • Select JupyterLab
    • Partition: GPU
    • Work Directory: (location of your mnist_example.ipynb)
    • Allocation: hpc_training
  • Select the new “TensorFlow 2.17” kernel
  • Run mnist_example.ipynb

Remove a custom kernel

rm -rf ~/.local/share/jupyter/kernels/tensorflow-2.17.0

References

Previous
Next