GPU Resources on BlueBEAR¶
Some applications benefit from GPU computing. These applications will usually use the CUDA libraries to take advantage of GPU computing.
Access to the BlueBEAR GPU Service¶
To request access to the GPU nodes please contact us through the IT Service Portal. We may ask for more information about the types of applications you are looking to use on these nodes prior to granting access to this service.
Using the BlueBEAR GPU Service¶
To use one of the GPU nodes, add
#SBATCH --qos=bbgpu
#SBATCH --account=_projectname_
#SBATCH --gres=_gpu_
to your job submission script,
where _projectname_
is a project that has access to the GPU service;
and _gpu_
is the type of GPU you wish to use in your job - specified
in the form gpu:a100:1
to request one NVidia A100 per node in your job
(see the list of available GPU nodes below). If, when you submit a GPU
job, you receive an error message of
sbatch: error: Batch job submission failed: Invalid qos specification
then either you are not a member of a project that has access to the GPU service or the project you specified does not have access to the GPU service.
GPU Binding¶
It should be noted that each GPU is bound to a particular CPU. On our
most common Ice Lake A100 nodes, there are 72 cores and 4 GPUs: 2 GPUs
are bound to the first 36 cores and 2 are bound to the second 36 cores.
This means, for example, that you can run 2x 36 CPU core, 2 GPU
jobs,
or 4x 18 CPU core, 1 GPU
jobs on each node, but you cannot run 3x 24
CPU core, 1 GPU
jobs.
Note
In general, you should request at most (total_number_of_CPU_cores / total_number_of_GPUs
) CPU cores per GPU requested. This ensures efficient use of resources and doesn’t leave GPUs which cannot be scheduled due to no available CPU cores. Recommended cores-per-GPU values are listed below.
This behaviour can be disabled on a per-job basis but this may affect performance. Please see Slurm documentation for more information.
Available GPU nodes¶
In the bbgpu
QOS we have:
-
Ten A100 nodes with:
- 2 x 36 core Ice Lake (x86_64) CPUs
- 4 x NVIDIA Ampere A100-40, 40GB GPUs
- Max 18 CPU cores per GPU recommended
- 512GB system memory
-
Request 1x A100 using:
#SBATCH --gres=gpu:a100:1
-
Two A100 nodes each with:
- 2 x 36 core Ice Lake (x86_64) CPUs
- 4 x NVIDIA Ampere A100-80, 80GB GPUs
- 512GB memory
- Max 18 CPU cores per GPU recommended
-
Request 1x A100-80 using:
#SBATCH --gres=gpu:a100_80:1
-
Two A100 nodes each with:
- 2 x 28 core Ice Lake (x86_64) CPUs
- 2 x NVIDIA Ampere A100-40, 40GB GPUs
- 512GB memory
- Max 14 CPU cores per GPU recommended (both GPUs are currently bound to 1 CPU)
-
Request 1x A100 using:
#SBATCH --gres=gpu:a100:1
-
Two A30 nodes each with:
- 2 x 28 core Ice Lake (x86_64) CPUs
- 2 x NVIDIA Ampere A30, 25GB GPUs
- 512GB memory
- Max 14 CPU cores per GPU recommended (both GPUs are currently bound to 1 CPU)
-
Request 1x A30 using:
#SBATCH --gres=gpu:a30:1
Software¶
We have installed a range of applications on the BEAR GPU nodes, but the software available on the different GPU types does vary and you may have to direct your job to the correct GPU type for the version of an application you wish to use. On the BEAR Applications website the available architectures for a specific version of an application will include details on which GPUs the software has been installed for.
If the software you are looking to use is not available, or not available on the nodes you wish to use it on, then please open a Request New BEAR Software to discuss this with us. However, we usually will only make available the newer versions of an application on the GPUs.
Baskerville and Sulis¶
Where the GPU resources available on BlueBEAR are insufficient you may be eligible for access to Baskerville or Sulis. Please see the Tier2 HPC information for more details on applying for access to Baskerville or Sulis.