Application Guide: Nextflow¶
Nextflow is a tool for creating and running reproducible scientific workflows using software containers. See the Nextflow page on the BEAR Apps website for a list of available versions.
Tip
Nextflow provides comprehensive training material. We recommend you read this before writing your own pipelines.
Using nf-core¶
The nf-core community collects a curated set of pipelines and configurations built with Nextflow.
You can run nf-core pipelines on BlueBEAR using our custom configuration profile – please refer to the BlueBEAR
HPC Configuration page on the nf-core website for further guidance.
If you are not using nf-core, continue to the “Example pipeline” section for a guide to
running a Nextflow pipeline example.
Example pipeline¶
Here, we show how to run Your First Script from
the Nextflow training material on BlueBEAR. Create two files, a job submission script hello.sh
and a Nextflow
workflow file hello.nf
. Change _project_
to match your BEAR project code.
#!/bin/bash
#SBATCH --account _project_
#SBATCH --qos bbshort
#SBATCH --time 10
module purge; module load bluebear
module load bear-apps/2022b
module load Nextflow/24.04.2
nextflow run hello.nf
#!/usr/bin/env nextflow
params.greeting = 'Hello world!'
greeting_ch = Channel.of(params.greeting)
process SPLITLETTERS {
input:
val x
output:
path 'chunk_*'
script:
"""
printf '$x' | split -b 6 - chunk_
"""
}
process CONVERTTOUPPER {
input:
path y
output:
stdout
script:
"""
cat $y | tr '[a-z]' '[A-Z]'
"""
}
workflow {
letters_ch = SPLITLETTERS(greeting_ch)
results_ch = CONVERTTOUPPER(letters_ch.flatten())
results_ch.view { it }
}
Submit the workflow by running,
sbatch hello.sh
Tip
The job script should be submitted with a sufficiently large time allocation to allow all processes to complete in
time. If the job terminates before the workflow is complete, you may rerun the workflow with the -resume
option.
For example, updating hello.sh
with the following line and resubmitting will continue the workflow from the last
checkpoint. (See Job Checkpointing for general information.)
nextflow run hello.nf -resume
Expand the box below to see example output from the Slurm job submission.
Example output
Java/11.0.18
Nextflow/24.04.2
Nextflow 24.04.4 is available - Please consider updating your version to it
N E X T F L O W ~ version 24.04.2
Launching `hello.nf` [kickass_meitner] DSL2 - revision: 6ac561bd2f
[- ] SPLITLETTERS -
[- ] CONVERTTOUPPER -
[- ] SPLITLETTERS | 0 of 1
[- ] CONVERTTOUPPER -
executor > local (1)
[9d/a34293] SPLITLETTERS (1) | 0 of 1
[- ] CONVERTTOUPPER -
executor > local (2)
[9d/a34293] SPLITLETTERS (1) | 1 of 1 ✔
[45/6b74b2] CONVERTTOUPPER (2) | 0 of 2
executor > local (3)
[9d/a34293] SPLITLETTERS (1) | 1 of 1 ✔
[fe/caebc0] CONVERTTOUPPER (1) | 2 of 2 ✔
WORLD!
HELLO
executor > local (3)
[9d/a34293] SPLITLETTERS (1) | 1 of 1 ✔
[fe/caebc0] CONVERTTOUPPER (1) | 2 of 2 ✔
WORLD!
HELLO
Slurm executor¶
By default, Nextflow will run the workflow local to the process which launched it. To effectively make use of the cluster, you should configure Nextflow to use the Slurm executor.
Note
Nextflow looks in multiple places for configuration files (see the Nextflow documentation for more information).
Create a file called nextflow.config
in the same directory as hello.nf
and add the following configuration:
process {
executor = 'slurm'
clusterOptions = {
"--account ${System.getenv("SLURM_JOB_ACCOUNT")} " + // (1)!
"--qos ${task.time <= 10.m ? 'bbshort' : 'bbdefault'}" // (2)!
}
}
executor {
queueSize = 60 // (3)!
submitRateLimit = '1sec' // (4)!
}
-
Inherit job account from the
SLURM_JOB_ACCOUNT
environment variable which is automatically set when submitting the base job. -
Automatically set the QOS to
bbshort
if the task is expected to take 10 minutes or less. -
The maximum number of jobs allowed in the queue at any one time. Higher values may overload the scheduler.
-
Submit jobs at a rate of 1 per second. Higher rates may overload the scheduler.
You may now resubmit the base hello.sh
job script. Each process will be submitted to the Slurm queue and Nextflow will
wait until all jobs succeed before completing. Periodically use the squeue
command to see these jobs get added to the
queue.
Expand the box below to see example Slurm job output. Notice that the line executor > local
from before has changed
to executor > slurm
.
Example output
Java/11.0.18
Nextflow/24.04.2
Nextflow 24.04.4 is available - Please consider updating your version to it
N E X T F L O W ~ version 24.04.2
Launching `hello.nf` [shrivelled_bassi] DSL2 - revision: ee0a752bfc
[- ] SPLITLETTERS -
[- ] CONVERTTOUPPER -
[- ] SPLITLETTERS | 0 of 1
[- ] CONVERTTOUPPER -
executor > slurm (1)
[d0/2e84b3] SPLITLETTERS (1) | 0 of 1
[- ] CONVERTTOUPPER -
executor > slurm (1)
[d0/2e84b3] SPLITLETTERS (1) | 0 of 1
[- ] CONVERTTOUPPER -
executor > slurm (1)
[d0/2e84b3] SPLITLETTERS (1) | 1 of 1
[- ] CONVERTTOUPPER -
executor > slurm (3)
[d0/2e84b3] SPLITLETTERS (1) | 1 of 1 ✔
[f1/359647] CONVERTTOUPPER (1) | 0 of 2
executor > slurm (3)
[d0/2e84b3] SPLITLETTERS (1) | 1 of 1 ✔
[f1/359647] CONVERTTOUPPER (1) | 2 of 2 ✔
WORLD!
HELLO
executor > slurm (3)
[d0/2e84b3] SPLITLETTERS (1) | 1 of 1 ✔
[f1/359647] CONVERTTOUPPER (1) | 2 of 2 ✔
WORLD!
HELLO
It is good practice to allocate resources to your processes. Nextflow’s Slurm executor supports several process
directives to do this. For example, let’s give the
SPLITLETTERS
process a maximum time of 5 minutes and 2 CPUs:
process SPLITLETTERS {
time '5min'
cpus 2
input:
val x
output:
path 'chunk_*'
script:
"""
printf '$x' | split -b 6 - chunk_
"""
}
Software dependencies¶
So far, hello.sh
has used the printf
, cat
and tr
commands which are all available without loading additional
application modules. If a process
requires a particular software, you need to make sure that software is available. You may do this by including
module load
commands in the process, or by running the process in a container. The latter
method is recommended to ensure reproducibility and portability of your workflow. We will go through both methods in
this section.
Create a Python script which converts the contents of a file to uppercase and prints the result. We will replace
the existing CONVERTTOUPPER
process with this script later. Copy the code below into a file named convert_upper.py
.
#!/usr/bin/env python
import sys
# Read file from first argument
with open(sys.argv[1], "r") as file:
word = file.read()
# Convert to uppercase and print without new line
print(word.upper(), end="")
Loading modules¶
While Nextflow provides a module
directive to simplify the module loading process, this is not currently compatible
with the Slurm configuration on BlueBEAR. Instead, you can manually load modules using the beforeScript
directive.
Let’s create a Python script which converts the first argument to uppercase and prints the result. Edit hello.nf
and
replace the CONVERTTOUPPER
process with the code below.
process CONVERTTOUPPER {
beforeScript '''\
module purge; module load bluebear
module load bear-apps/2021b
module load Python/3.9.6-GCCcore-11.2.0
'''.stripIndent() // (1)!
input:
path y
output:
stdout
script:
"""
python $projectDir/convert_upper.py $y
"""
}
- This removes the indentation of the multiline string which was done for improved readability.
You may now resubmit hello.sh
and verify the output is the same as before.
In some cases you may want to run the workflow on a different machine or cluster. In this case, consider using containers.
Using containers¶
Nextflow processes can be configured to run in software containers. These are environments which have all the software dependencies installed and ready to run. BlueBEAR supports containerisation using Apptainer. This section walks through building a container and running it with Nextflow.
Add the following to your nextflow.config
file to enable Apptainer for the workflow:
apptainer {
enabled = true
autoMounts = true // (1)!
}
- This option is required so that the working directory for a process is automatically bind mounted in the container.
You can specify a container for each process by adding the container
directive. This can be a path to a local
container image or a URI to an image stored in a container registry (see the
Nextflow documentation for more information).
For this example, we will build a container for the CONVERTTOUPPER
process. Create a file called convert_upper.def
,
this will be our Apptainer definition file. Copy the
code below into this file. This defines a container based on Python 3.11 and copies the file
convert_upper.py
into /opt
for use in the process.
Bootstrap: docker
From: python:3.11
%files
convert_upper.py /opt
Build the container with the following command,
apptainer build convert_upper.sif convert_upper.def
Then, replace the CONVERTTOUPPER
process in hello.nf
with the following code. This tells the process to run our
script within the container image we just built. Notice that the path to the Python script has changed to
/opt/convert_upper.py
. This is where we instructed the file to go in the convert_upper.def
definition file.
process CONVERTTOUPPER {
container "$projectDir/convert_upper.sif" // (1)!
input:
path y
output:
stdout
script:
"""
python /opt/convert_upper.py $y
"""
}
- The path to the container image in which to run the script for this process. This can also be a path to a remote OCI repository (e.g. on DockerHub or Quay.io).
You may now resubmit hello.sh
and verify the output is the same as before.