Jobs on BlueBEAR¶

Process summary¶

A simplified summary of the job submission process on BlueBEAR is as follows:

Compose a job script, which includes:
- the resources that the job requires
- the software/application modules to load
- the commands to run
Submit the job script to the cluster.
- The job will be queued by the scheduler until there are sufficient resources available for it to run. Queue times for jobs vary depending on how busy BlueBEAR is and the amount of resource that the job has requested.
View the job's output, either in realtime or once the job completes.

Further information on the mechanics of job submission can be found in the sections below.

Job scheduling with Slurm

Jobs on the cluster are controlled by the Slurm HPC scheduling system. The scheduler is configured to ensure an equitable distribution of resources over time to all users. The key means by which this is achieved are:

Jobs are scheduled according to the QOS (Quality of Service) and the resources that are requested. Information on how to request resources for your job is detailed below.
Jobs are not necessarily run in the order in which they are submitted.
Jobs requiring a large number of cores and/or long walltime will have to queue until the requested resources become available. The system will run smaller jobs, that can fit in available gaps, until all of the resources that have been requested for the larger job become available - this is known as backfill. Hence it is beneficial to specify a realistic walltime for a job so it can be fitted in the gaps.

Job Scripts¶

Example job script¶

Note

The following code is a simple example of a batch job script; the options and commands can be as complex as necessary. All of the options that can be set can be viewed on Slurm's documentation for sbatch. You can add any of these options as command line arguments, but it is recommended that you add all of your options into your job script for ease of reproducibility.

job.sh

#!/bin/bash
#SBATCH --ntasks=1
#SBATCH --time=5:0
#SBATCH --qos=bbshort

set -e

module purge; module load bluebear

echo "Starting a text-processing HPC batch job on $(hostname)"

# Create a small text file with sample lines
cat << 'EOF' > sample_text.txt
banana
apple
cherry
apple
eggplant
date
cherry
EOF

echo "Original file:"
cat sample_text.txt

# Sort the file and remove duplicate lines
sort sample_text.txt | uniq > processed_text.txt

echo "Processed (sorted, duplicates removed):"
cat processed_text.txt

Script Options Explained

#!/bin/bash - run the job using GNU Bourne Again Shell (the same shell as the login nodes).

#SBATCH --ntasks=1 - run the job with one core.
By default, the amount of memory (RAM) that a job is allocated is (the --ntasks value) x 4096MB.
#SBATCH --time=5:0 - run the job with a walltime of 5 minutes.
#SBATCH --qos=bbshort - run the job in the bbshort QOS / queue.
See the Job QOS section for more details on the list of available QOSes and their function in BlueBEAR jobs.

set -e - makes your script fail on first error. This is recommended as early errors can easily be missed.
module purge; module load bluebear - resets the environment to ensure that the script hasn't inherited anything from where it was submitted. This line is required and Slurm will reject the script if it isn't present – it must be included before any other module load statements.

See the Job Options and Resources section below for further details.

Overview of Job Operations¶

Submitting a job¶

Note

Slurm is aware of your current working directory when submitting the job so there is no need to manually specify it in the script.

The command to submit a job is sbatch. This reads its input from a job script file. The job is submitted to the scheduling system, using the requested resources, and will run on the first available node(s) that are able to provide the resources requested. For example, to submit the set of commands contained in the above example file job.sh, use the command:

sbatch job.sh

The system will return a job number, for example:

$ sbatch job.sh
Submitted batch job 55260

When will my job start?

Wait times vary on BlueBEAR and are dependent on a variety of factors, including:

How much general resource the job has requested: see the "Job scheduling with Slurm" info box at the top of the page for further details
If the job requires a GPU: there are limited GPU resources available so wait times for these nodes are longer
Cluster utilisation levels: other users' jobs (both number and scale) are a significant factor

Note

When the cluster is very busy it may take some time before the "start time" becomes available. Also, please be aware that the value can change in both directions and should only be treated as indicative.

The Slurm scheduler will consider the above factors when allocating your job and once it's determined a slot it will provide a "start time". This will be a timestamp that represents the latest possible time your job will begin, based on current state of the cluster – the job may start earlier if other jobs queuing or running complete before their requested walltime.
The following command will show the "start time" value for your pending jobs:

squeue --states=pending --format="%12i %8t %24S"

Monitoring a job¶

There are a number of ways to monitor the current status and output of a job:

squeue
```
squeue -j 55260
```
squeue is Slurm's command for viewing the status of your jobs. This shows information such as the job's ID and name, the QOS used (the partition, which will tell you the node type), the user that submitted the job, time elapsed and the number of nodes being used.
scontrol
```
scontrol show job 55260
```
scontrol is a powerful interface that provides an advanced amount of detail regarding the status of your job. The show command within scontrol can be used to view details regarding a specific job.
slurm.out and slurm.stats files

When your job is submitted a slurm.stats file is created and named to include the job id, e.g. slurm-55260.stats.
Once your job begins to run a slurm.out file is also created (e.g. slurm-55260.out), which contains the standard out (stdout) and standard error (stderr) outputs that would have been shown had you run the command(s) directly in a terminal shell.
Info
- These two output files are created in the directory from which you submitted the job.
- slurm-55260.out is a plain text file (i.e. not an executable). To view its contents you can, for example, run: cat slurm-55260.out
  To view this output in realtime (i.e. when your job is running), execute: tail -f slurm-55260.out
- If your job completes successfully, you will not get a notification in the terminal, but you can choose to receive email notifications by configuring the header at the start of your job script.

Job Efficiency¶

The seff command takes a job id as an argument, for example,

$ seff 24556911

Job ID: 24556911
Cluster: paddington
User/Group: wakelinj/users
State: TIMEOUT (exit code 0)
Cores: 1
CPU Utilized: 00:02:09
CPU Efficiency: 95.56% of 00:02:15 core-walltime
Job Wall-clock time: 00:02:15
Memory Utilized: 3.99 GB
Memory Efficiency: 99.77% of 4.00 GB

The seff command tells you about two aspects of job efficiency:

CPU Efficiency¶

How much CPU time was used as a percentage of the total runtime and is calculated using (CPU Time / Runtime) x 100. This tells you how efficient your software is.

Memory Efficiency¶

How much of the requested memory was actually used by the job. This tells you how accurate you were at requesting memory.

Note

Software efficiency is actually more complicated than the CPU Efficiency metric shown above. The simple calculation above does not take into account more complex features of code efficiency, such as vectorization or offloading of work to accelerators such as GPUs
A low CPU Efficiency is often due to slow or excessive IO (i.e. reading and writing of files).

Cancelling a job¶

To cancel a queued or running job use the scancel command and supply it with the job ID that is to be cancelled. For example, to cancel the previous job:

scancel 55260

Job Options and Resources¶

Resource Limits¶

The maximum duration (walltime) on BlueBEAR is 10 days per job, except...
- ... for bbshort where the maximum walltime is 10 minutes per job
Each user is limited to:
- 1344 cores and 6TB of memory (RAM) per shared CPU QOS
- 4 GPUs when using the bbgpu QOS
- These limits are summed across all running jobs in the QOS
- There are no additional limits on the sizes of individual jobs
- If any limit is exceeded, any future jobs (for that QOS) will remain queued until the usage falls below these limits again

Info

The maximum of 1344 CPU cores per job and 6 Terabytes of RAM is across all of the jobs for one person. E.g. this could be made up of:

A single job requesting 1344 cores across 12 Sapphire Rapids nodes
A single job requesting 14 cores and 6 Terabytes of RAM
12 jobs requesting 112 cores across 12 Sapphire Rapids nodes

Note

Different limits may be set on QOSes relating to user-owned resources.

Resource Utilisation¶

Some software uses multiple CPU cores on one node and some can support multiple cores on multiple nodes. Programming for multiple nodes generally requires a different approach and so this is less common. Unfortunately, increasing the number of cores will not always make your job run faster.

Job QOS¶

BlueBEAR uses the QOS (--qos or -q) option of a job to direct it to a particular set of resources. By default, there are two QOS to which you can submit jobs. These are: bbdefault and bbshort. You may also have access to bblargemem or bbgpu.
All shared QOS have a maximum job length (walltime) of 10 days, with the exception of bbshort where it is 10 minutes.
You can specify the QOS to use by adding the following line to your job script:

#SBATCH --qos=bbshort

QOS Details¶

Please select a QOS to view its details.

bbdefaultbbshortbbgpubblargemem (deprecated)

This is the default QOS and will be used if no --qos is specified in the job script.
It comprises different types of node, as described on the Standard Resources page.

This QOS contains all nodes in the cluster and is the fastest way to get your job to run. The maximum walltime is 10 minutes.

This QOS contains a mixture of GPU nodes which are available if your job requires a GPU. Please see the GPU Service page for more details on these nodes.

This QOS, pre-2022, contained a mixture of large memory nodes that were available if your job required a larger amount of memory on one node. When the Intel Ice Lake nodes were added the bblargemem QOS was retired. Please see the Large Memory Service page for more details on requesting more memory for a job.

Note

Some of the memory a node has is for running system processes and will be unavailable to jobs.

Associating Jobs with Projects¶

Every job has to be associated with a project to ensure the equitable distribution of resources. Project owners and members will have been issued a project code for each registered project, and only usernames authorised by the project owner will be able to run jobs using that project code. You can see what projects you are a member of by running the command:

my_bluebear

If a user is registered on more than one project then it should be specified using the --account option followed by the project code. For example, if your project is _project_name_ then add the following line to your job script:

#SBATCH --account=_project_name_

If a job is submitted using an invalid project, either because the project does not exist, the username is not authorised to use that project, or the project does not have access to the requested QOS, then the job will be rejected with the following error:

sbatch: error: Batch job submission failed: Invalid account or account/partition combination specified

Memory (RAM) Requests¶

By default, for each core requested (e.g. using the --ntasks option) the job will be allocated 4096MB RAM. Wherever possible we prefer this method as it ensures efficient distribution of jobs across the BlueBEAR cluster.
However, this default can be overridden by specifying one of the following options to request the amount of memory required:

Memory Units

You can specify values in megabytes (M), gigabytes (G) or terabytes (T) with the default unit being M if none is given.

`#SBATCH --mem`¶

The memory value specified against --mem will be allocated to each node on which a job is running, regardless of cores. This makes it a less suitable option for distributed jobs and it's therefore commonly combined with the --nodes=1 option.
See, for example, how it is used to run a large memory job.

`#SBATCH --mem-per-cpu`¶

Default value = 4096M.
Safer for distributed jobs as memory allocation scales with the core count.

Dedicated Resources¶

Some research groups have dedicated resources in BlueBEAR. Those users who can submit jobs to a dedicated QOS can see what jobs are running in the dedicated QOS by using the following command with _name_ replaced with the name of your QOS:

view_qos _name_

Modules (software applications)¶

Software on BlueBEAR is typically provided through our BEAR Applications modules. Therefore, to make an application or command available for use within a job script, it is necessary to load any required modules. For example, if you would like to use Python, you could add module load bear-apps/2023a; module load Python/3.11.3-GCCcore-12.3.0 after module purge; module load bluebear. This would load the specified BEAR Application version and Python version, making the python command available.

Warning

Note that if a job script loads multiple modules, these must all be from the same BEAR Application Version.

For a list of available software modules, see the BEAR Applications website.

For further information on the use of modules (and other options for accessing software), please refer to our Software pages.

Jobs on BlueBEAR¶

Process summary¶

Job Scripts¶

Example job script¶

Overview of Job Operations¶

Submitting a job¶

Monitoring a job¶

Job Efficiency¶

CPU Efficiency¶

Memory Efficiency¶

Cancelling a job¶

Job Options and Resources¶

Resource Limits¶

Resource Utilisation¶

Job QOS¶

QOS Details¶

Associating Jobs with Projects¶

Memory (RAM) Requests¶

#SBATCH --mem¶

#SBATCH --mem-per-cpu¶

Dedicated Resources¶

Modules (software applications)¶

`#SBATCH --mem`¶

`#SBATCH --mem-per-cpu`¶