GPUs on HPC

GPU Full Name Year Launched Memory # of Tensor Cores
A100 NVIDIA A100 2020 40GB or 80GB 432 (3rd gen)
A6000 NVIDIA RTX A6000 2020 48GB 336 (3rd gen)
A40 NVIDIA A40 2020 48GB 336 (3rd gen)
RTX3090 NVIDIA GeForce RTX 3090 2020 24GB 328 (3rd gen)
RTX2080Ti NVIDIA GeForce RTX 2080 Ti 2018 11GB 544 (2nd gen)
V100 NVIDIA V100 2018 32GB 640 (1st gen)

UVA-NVIDIA DGX BasePOD

  • 10 DGX A100 nodes
    • 8 NVIDIA A100 GPUs.
    • 80 GB GPU memory options.
    • Dual AMD EPYC™️; nodes: Series 7742 CPUs, 128 total cores, 2.25 GHz (base), 3.4 GHz (max boost).
    • 2 TB of system memory.
    • Two 1.92 TB M.2 NVMe drives for DGX OS, eight 3.84 TB U.2 NVMe drives forstorage/cache.
  • Advanced Features:
    • NVLink for fast multi-GPU communication
    • GPUDirect RDMA Peer Memory for fast multi-node multi-GPU communication
    • GPUDirect Storage with 200 TB IBM ESS3200 (NVMe) SpectrumScale storage array
  • Ideal Scenarios:
    • Job needs multiple GPUs on a single node or multi node
    • Job (single or multi-GPU) is I/O intensive
    • Job (single or multi-GPU) requires more than 40GB of GPU memory

Note: The POD is good if you need multiple GPUs and very fast computation.

GPU access on UVA HPC

When you request memory for UVA HPC, that is CPU memory.

If you request a GPU, you will receive all of the GPU memory.

  • Choose “GPU” or “Interactive” as the HPC Partition in OOD
  • Optionally, choose GPU type and number of GPUs
  • POD nodes are contained in the gpu partition with a specific Slurm constraint.
  • Slurm script:
#SBATCH -p gpu
#SBATCH --gres=gpu:a100:X	# X number of GPUs
#SBATCH -C gpupod
  • Open OnDemand
--constraint=gpupod

Note: Only one person can be using a GPU at a time.

Previous
Next