Application Guide: Nextflow¶

Nextflow is a tool for creating and running reproducible scientific workflows using software containers. See the Nextflow page on the BEAR Apps website for a list of available versions.

Tip

Nextflow provides comprehensive training material. We recommend you read this before writing your own pipelines.

Using nf-core¶

The nf-core community collects a curated set of pipelines and configurations built with Nextflow.
You can run nf-core pipelines on BlueBEAR using our custom configuration profile – please refer to the BlueBEAR HPC Configuration page on the nf-core website for further guidance.
If you are not using nf-core, continue to the "Example pipeline" section for a guide to running a Nextflow pipeline example.

Example pipeline¶

Here, we show how to run Your First Script from the Nextflow training material on BlueBEAR. Create two files, a job submission script hello.sh and a Nextflow workflow file hello.nf. Change _project_ to match your BEAR project code.

Job submission scriptNextflow workflow

hello.sh

#!/bin/bash
#SBATCH --account _project_
#SBATCH --qos bbshort
#SBATCH --time 10

module purge; module load bluebear
module load bear-apps/2022b
module load Nextflow/24.04.2

nextflow run hello.nf

hello.nf

#!/usr/bin/env nextflow

params.greeting = 'Hello world!' 
greeting_ch = Channel.of(params.greeting) 

process SPLITLETTERS { 
    input: 
    val x 

    output: 
    path 'chunk_*' 

    script: 
    """
    printf '$x' | split -b 6 - chunk_
    """
} 

process CONVERTTOUPPER { 
    input: 
    path y 

    output: 
    stdout 

    script: 
    """
    cat $y | tr '[a-z]' '[A-Z]'
    """
} 

workflow { 
    letters_ch = SPLITLETTERS(greeting_ch) 
    results_ch = CONVERTTOUPPER(letters_ch.flatten()) 
    results_ch.view { it } 
}

Submit the workflow by running,

sbatch hello.sh

Tip

The job script should be submitted with a sufficiently large time allocation to allow all processes to complete in time. If the job terminates before the workflow is complete, you may rerun the workflow with the -resume option. For example, updating hello.sh with the following line and resubmitting will continue the workflow from the last checkpoint. (See Job Checkpointing for general information.)

nextflow run hello.nf -resume

Expand the box below to see example output from the Slurm job submission.

Example output

Java/11.0.18
Nextflow/24.04.2
Nextflow 24.04.4 is available - Please consider updating your version to it

N E X T F L O W   ~  version 24.04.2

Launching `hello.nf` [kickass_meitner] DSL2 - revision: 6ac561bd2f

[-        ] SPLITLETTERS   -
[-        ] CONVERTTOUPPER -

[-        ] SPLITLETTERS   | 0 of 1
[-        ] CONVERTTOUPPER -

executor >  local (1)
[9d/a34293] SPLITLETTERS (1) | 0 of 1
[-        ] CONVERTTOUPPER   -

executor >  local (2)
[9d/a34293] SPLITLETTERS (1)   | 1 of 1 ✔
[45/6b74b2] CONVERTTOUPPER (2) | 0 of 2

executor >  local (3)
[9d/a34293] SPLITLETTERS (1)   | 1 of 1 ✔
[fe/caebc0] CONVERTTOUPPER (1) | 2 of 2 ✔
WORLD!
HELLO

executor >  local (3)
[9d/a34293] SPLITLETTERS (1)   | 1 of 1 ✔
[fe/caebc0] CONVERTTOUPPER (1) | 2 of 2 ✔
WORLD!
HELLO

Slurm executor¶

By default, Nextflow will run the workflow local to the process which launched it. To effectively make use of the cluster, you should configure Nextflow to use the Slurm executor.

Note

Nextflow looks in multiple places for configuration files (see the Nextflow documentation for more information).

Create a file called nextflow.config in the same directory as hello.nf and add the following configuration:

nextflow.config

process {
    executor = 'slurm'
    clusterOptions = {
            "--account ${System.getenv("SLURM_JOB_ACCOUNT")} " +    // (1)!
            "--qos ${task.time <= 10.m ? 'bbshort' : 'bbdefault'}"  // (2)!
    }
}

executor {
    queueSize = 60            // (3)!
    submitRateLimit = '1sec'  // (4)!
}

Inherit job account from the SLURM_JOB_ACCOUNT environment variable which is automatically set when submitting the base job.
Automatically set the QOS to bbshort if the task is expected to take 10 minutes or less.
The maximum number of jobs allowed in the queue at any one time. Higher values may overload the scheduler.
Submit jobs at a rate of 1 per second. Higher rates may overload the scheduler.

You may now resubmit the base hello.sh job script. Each process will be submitted to the Slurm queue and Nextflow will wait until all jobs succeed before completing. Periodically use the squeue command to see these jobs get added to the queue.

Expand the box below to see example Slurm job output. Notice that the line executor > local from before has changed to executor > slurm.

Example output

Java/11.0.18
Nextflow/24.04.2
Nextflow 24.04.4 is available - Please consider updating your version to it

N E X T F L O W   ~  version 24.04.2

Launching `hello.nf` [shrivelled_bassi] DSL2 - revision: ee0a752bfc

[-        ] SPLITLETTERS   -
[-        ] CONVERTTOUPPER -

[-        ] SPLITLETTERS   | 0 of 1
[-        ] CONVERTTOUPPER -

executor >  slurm (1)
[d0/2e84b3] SPLITLETTERS (1) | 0 of 1
[-        ] CONVERTTOUPPER   -

executor >  slurm (1)
[d0/2e84b3] SPLITLETTERS (1) | 0 of 1
[-        ] CONVERTTOUPPER   -

executor >  slurm (1)
[d0/2e84b3] SPLITLETTERS (1) | 1 of 1
[-        ] CONVERTTOUPPER   -

executor >  slurm (3)
[d0/2e84b3] SPLITLETTERS (1)   | 1 of 1 ✔
[f1/359647] CONVERTTOUPPER (1) | 0 of 2

executor >  slurm (3)
[d0/2e84b3] SPLITLETTERS (1)   | 1 of 1 ✔
[f1/359647] CONVERTTOUPPER (1) | 2 of 2 ✔
WORLD!
HELLO

executor >  slurm (3)
[d0/2e84b3] SPLITLETTERS (1)   | 1 of 1 ✔
[f1/359647] CONVERTTOUPPER (1) | 2 of 2 ✔
WORLD!
HELLO

It is good practice to allocate resources to your processes. Nextflow's Slurm executor supports several process directives to do this. For example, let's give the SPLITLETTERS process a maximum time of 5 minutes and 2 CPUs:

hello.nf

process SPLITLETTERS { 
    time '5min'
    cpus 2

    input: 
    val x 

    output: 
    path 'chunk_*' 

    script: 
    """
    printf '$x' | split -b 6 - chunk_
    """
}

Software dependencies¶

So far, hello.sh has used the printf, cat and tr commands which are all available without loading additional application modules. If a process requires a particular software, you need to make sure that software is available. You may do this by including module load commands in the process, or by running the process in a container. The latter method is recommended to ensure reproducibility and portability of your workflow. We will go through both methods in this section.

Create a Python script which converts the contents of a file to uppercase and prints the result. We will replace the existing CONVERTTOUPPER process with this script later. Copy the code below into a file named convert_upper.py.

convert_upper.py

#!/usr/bin/env python
import sys

# Read file from first argument
with open(sys.argv[1], "r") as file:
    word = file.read()

# Convert to uppercase and print without new line
print(word.upper(), end="")

Loading modules¶

While Nextflow provides a module directive to simplify the module loading process, this is not currently compatible with the Slurm configuration on BlueBEAR. Instead, you can manually load modules using the beforeScript directive.

Let's create a Python script which converts the first argument to uppercase and prints the result. Edit hello.nf and replace the CONVERTTOUPPER process with the code below.

hello.nf

process CONVERTTOUPPER { 
    beforeScript '''\
        module purge; module load bluebear
        module load bear-apps/2021b
        module load Python/3.9.6-GCCcore-11.2.0
    '''.stripIndent()  // (1)!

    input: 
    path y 

    output: 
    stdout 

    script: 
    """
    python $projectDir/convert_upper.py $y
    """
}

This removes the indentation of the multiline string which was done for improved readability.

You may now resubmit hello.sh and verify the output is the same as before.

In some cases you may want to run the workflow on a different machine or cluster. In this case, consider using containers.

Using containers¶

Nextflow processes can be configured to run in software containers. These are environments which have all the software dependencies installed and ready to run. BlueBEAR supports containerisation using Apptainer. This section walks through building a container and running it with Nextflow.

Add the following to your nextflow.config file to enable Apptainer for the workflow:

nextflow.config

apptainer {
    enabled = true
    autoMounts = true // (1)!
}

This option is required so that the working directory for a process is automatically bind mounted in the container.

You can specify a container for each process by adding the container directive. This can be a path to a local container image or a URI to an image stored in a container registry (see the Nextflow documentation for more information).

For this example, we will build a container for the CONVERTTOUPPER process. Create a file called convert_upper.def, this will be our Apptainer definition file. Copy the code below into this file. This defines a container based on Python 3.11 and copies the file convert_upper.py into /opt for use in the process.

convert_upper.def

Bootstrap: docker
From: python:3.11

%files
    convert_upper.py /opt

Build the container with the following command,

apptainer build convert_upper.sif convert_upper.def

Then, replace the CONVERTTOUPPER process in hello.nf with the following code. This tells the process to run our script within the container image we just built. Notice that the path to the Python script has changed to /opt/convert_upper.py. This is where we instructed the file to go in the convert_upper.def definition file.

hello.nf

process CONVERTTOUPPER { 
    container "$projectDir/convert_upper.sif"  // (1)!

    input: 
    path y 

    output: 
    stdout 

    script: 
    """
    python /opt/convert_upper.py $y
    """
}

The path to the container image in which to run the script for this process. This can also be a path to a remote OCI repository (e.g. on DockerHub or Quay.io).

You may now resubmit hello.sh and verify the output is the same as before.