GPU-Enabled Applications on Rivanna

Oct 19, 2022

In this workshop participants are introduced to the gpu computing resources on Rivanna.

Introduction to GPU

The graphics processing unit was invented specifically for graphics rendering. Nowadays they are also used as accelerators for parallel computing; you may also hear the term “general-purpose GPU” (GPGPU).

Property	CPU	GPU
Number of cores	$10^{0-1}$	$10^{3-4}$
Throughput	Low	High
Per-core performance	High	Low
Workload type	Generic	Specific (e.g. rendering, deep learning)
Memory on Rivanna	up to 1.5 TB per node	up to 80 GB per device

Integrated vs discrete GPU

Integrated GPUs are mostly for graphics rendering and light gaming. They are integrated on the CPU motherboard to achieve more compact systems.

Discrete (or dedicated) GPUs are designed for resource-intensive computations.

GPU vendors and types

NVIDIA, AMD, Intel

Datacenter: H100, A100, V100, P100, K80
Workstation: A6000, Quadro
Gaming: GeForce RTX 40xx, 30xx, 20xx

(bold means available on Rivanna)

Myths

GPUs are better than CPUs and will eventually replace them.
CPU and GPU complement each other. GPU will not replace CPU.
If I run my CPU code on a GPU, it’ll be way faster.
This depends on whether your code can run on a GPU at all. Even so, if the computation is not resource-intensive enough, there will be no acceleration. In fact, your code may even be slower on a GPU.
Running a GPU program on two GPU devices will be twice as fast as running it on one.
Again, this depends on whether your program can run on multiple GPU devices and the computation intensity.
GPU acceleration only applies to data science and machine/deep learning.
Many scientific codes are making use of GPU acceleration: VASP, QuantumEspresso, GROMACS, … See here for a list compiled in 2018.

GPUs on Rivanna

Go to this page. GPUs are indicated by “GPU” under the specialty hardware column.

Command to check the current status of GPU nodes:

$ qlist -p gpu

STATE    NODE           CPUS(A/I/O/T) TOTALMEM(MB)  ALLOCMEM(MB)  AVAILMEM(MB)  GRES(M:T:A)               JOBS
==============================================================================================================
mix      udc-an28-1     8/120/0/128   1000000       40960         959040        gpu:a100:8(S:0-7):1         1
mix      udc-an28-7     28/100/0/128  1000000       680960        319040        gpu:a100:8(S:0-7):6         6
mix      udc-an33-37    12/24/0/36    384000        384000        0             gpu:v100:4(S:0-1):3         3
...

Important things to note:

CPU memory is not GPU memory
Each GPU node contains multiple GPU devices
Different GPU types have different specs (GPU memory, CPU cores, etc.)
In descending order of performance: A100, V100, P100, K80

GPU-Enabled Applications on Rivanna

Popular GPU applications on Rivanna at a glance

`nvhpc`	`gcc`/`goolf`	`nvompic`	`singularity`	Jupyter kernels
(User code)	`gromacs`	`quantumespresso`	`pytorch`	PyTorch
	`gpunufft`	`berkeleygw`	`tensorflow`	TensorFlow
	`mumax3`	`yambo`	`rapidsai`	RAPIDS
			`amptorch`	AMPTorch
			`alphafold`
			`deeplabcut`
			`isaacgym`

Modules

The nvhpc module (NVIDIA HPC SDK) provides these libraries and tools:

Compilers (nvc, nvc++, nvfortran)
CUDA
Mathematical libraries: cuBLAS, cuRAND, cuFFT, cuSPARSE, cuTENSOR, cuSOLVER
Communication libraries: NVSHMEM, NCCL
Tools: CUDA-GDB, Nsight System

In addition, applications are installed under three toolchains goolfc, nvompic (compiled languages), and singularity (container).

`goolfc`

Stands for:

GCC compilers (g)
OpenMPI (o)
OpenBLAS (o)
ScaLAPACK (l)
FFTW (f)
CUDA (c)

--------------------------------------------------------------------------------
  goolfc: goolfc/9.2.0_3.1.6_11.0.228
--------------------------------------------------------------------------------
    Description:
      GNU Compiler Collection (GCC) based compiler toolchain along with CUDA
      toolkit, including OpenMPI for MPI support with CUDA features enabled,
      OpenBLAS (BLAS and LAPACK support), FFTW and ScaLAPACK with CUDA features
      enabled.


    This module can be loaded directly: module load goolfc/9.2.0_3.1.6_11.0.228

The toolchain version consists of three subversions joined by _, corresponding to the version of gcc, openmpi, and cuda, respectively.

$ module load goolfc
$ module avail

------- /apps/modulefiles/standard/mpi/gcc-cuda/9.2.0-11.0.228/openmpi/3.1.6 -------
   fftw/3.3.8     (L,D)    hoomd/2.9.6     python/3.8.8    (D)
   gromacs/2021.2          python/3.7.7    scalapack/2.1.0 (L)

----------- /apps/modulefiles/standard/compiler/gcc-cuda/9.2.0-11.0.228 ------------
   gpunufft/2.1.0    mumax3/3.10    nccl/2.7.8    openmpi/3.1.6 (L,D)

Usage instructions

GROMACS

`nvompic`

Stands for:

NVIDIA compilers (nv)
OpenMPI (ompi)
CUDA (c)

$ module spider nvompic

-----------------------------------------------------------------------------------
  nvompic: nvompic/21.9_3.1.6_11.4.2
-----------------------------------------------------------------------------------
    Description:
      NVHPC Compiler including OpenMPI for MPI support.


    This module can be loaded directly: module load nvompic/21.9_3.1.6_11.4.2

The toolchain version consists of three subversions joined by _, corresponding to the version of nvhpc, openmpi, and cuda, respectively.

$ module load nvompic
$ module avail
------------- /apps/modulefiles/standard/mpi/nvhpc/21.9/openmpi/3.1.6 -------------
   berkeleygw/3.0.1    fftw/3.3.10 (D)    quantumespresso/7.0    yambo/5.0.4
   elpa/2021.05.001    hdf5/1.12.1 (D)    scalapack/2.1.0

----------------- /apps/modulefiles/standard/compiler/nvhpc/21.9 ------------------
   hdf5/1.12.1    openblas/0.3.17 (D)    openmpi/3.1.6 (L,D)

Usage Instructions

`singularity`

The popular deep learning frameworks, TensorFlow and PyTorch, are backed by containers. (To learn more about containers, see Using Containers on Rivanna.)

module load singularity tensorflow

On JupyterLab, you may conveniently select the kernel of the desired framework and version.

Usage instructions

Jupyter kernels

TensorFlow
PyTorch
RAPIDS

Requesting a GPU

Open OnDemand

Select the gpu partition. If you need a specific GPU type, select from the dropdown menu. Default will assign you the first available GPU.

Slurm script

Your Slurm script must contain these lines:

#SBATCH -p gpu
#SBATCH --gres=gpu

See here for further information.

Demo (Python & Matlab)

Congratulations - you have completed this tutorial!

GPU-Enabled Applications on Rivanna

Introduction to GPU

Integrated vs discrete GPU

GPU vendors and types

Myths

GPUs on Rivanna

GPU-Enabled Applications on Rivanna

Modules

goolfc

Usage instructions

nvompic

Usage Instructions

singularity

Usage instructions

Jupyter kernels

Requesting a GPU

Open OnDemand

Slurm script

Demo (Python & Matlab)

References

`goolfc`

`nvompic`

`singularity`