Skip to content

Using MPI in Containers

The information below covers the process of how to run MPI jobs using containers on BlueBEAR.

Introduction

To run MPI jobs inside containers on BlueBEAR you need to use the same version of OpenMPI inside and outside of the container.

Building your own containers

We recommend using Rocky Linux as the base image for building your containers. It is a freely available distribution that closely tracks the RedHat Enterprise Linux distribution that runs on BlueBEAR.

Note

We will only provide support for container images using a distribution based on RHEL 8 (e.g. Rocky or AlmaLinux) and using an OpenMPI version matching that of the OpenMPI-Container module used.

For further general information on building Apptainer images please see here. Please also take note of the information on Building Software and Node Types, which also applies to container images.

Example Code

The following tabs provide example scripts of an Apptainer definition file and complementary Slurm batch script. You should adapt these scripts as required in order to run your MPI jobs.

The following Apptainer definition file compiles OpenMPI inside a container image and then uses the appropriate MPI compiler (e.g. mpicc) to build OSU Micro Benchmarks.

To build the example below, execute:

unset APPTAINER_BIND
apptainer build openmpi_example_${BB_CPU}.sif openmpi_example.def

openmpi_example.def

Bootstrap: docker
From: rockylinux:8.6

%help
    Container running BlueBEAR compatible OpenMPI

%environment
    export OMPI_DIR=/opt/ompi
    export PATH=${PATH}:${OMPI_DIR}/bin
    export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${OMPI_DIR}/lib

%post
    echo "Installing required packages..."
    dnf -y update && dnf --enablerepo=powertools -y install \
        wget \
        git \
        tar \
        bzip2 \
        perl \
        gcc \
        gcc-c++ \
        gcc-gfortran \
        make \
        file \
        rdma-core-devel \
        libfabric-devel \
        hwloc-libs \
        iproute \
        net-tools

    mkdir -p /opt

    ### DOWNLOAD AND COMPILE OPENMPI
    echo "Installing Open MPI"
    OMPI_DIR=/opt/ompi
    OMPI_VERSION=4.1.4  # this needs to match the version of the OpenMPI-container module you are using
    OMPI_URL="https://download.open-mpi.org/release/open-mpi/v4.1/openmpi-$OMPI_VERSION.tar.bz2"

    mkdir -p /tmp/ompi
    # Download OpenMPI
    cd /tmp/ompi \
    && wget -O openmpi-$OMPI_VERSION.tar.bz2 $OMPI_URL \
    && tar -xjf openmpi-$OMPI_VERSION.tar.bz2

    # Compile and install OpenMPI
    cd /tmp/ompi/openmpi-$OMPI_VERSION \
    && ./configure --prefix=$OMPI_DIR \
    && make -j8 install

    rm -rf /tmp/ompi

    ### DOWNLOAD AND COMPILE OSU MICRO BENCHMARKS
    OSU_DIR=/opt/osu
    OSU_VERSION=6.2
    OSU_URL="https://mvapich.cse.ohio-state.edu/download/mvapich/osu-micro-benchmarks-$OSU_VERSION.tar.gz"

    mkdir -p /tmp/osu
    # Download OSU
    cd /tmp/osu \
    && wget -O osu-micro-benchmarks-$OSU_VERSION.tar.gz $OSU_URL \
    && tar -xzf osu-micro-benchmarks-$OSU_VERSION.tar.gz

    # Compile and install OSU
    cd /tmp/osu/osu-micro-benchmarks-$OSU_VERSION \
    && ./configure --prefix=$OSU_DIR CC=/opt/ompi/bin/mpicc CXX=/opt/ompi/bin/mpicxx \
    && make -j8 && make install

    rm -rf /tmp/osu

This batch submission file loads an OpenMPI module built specifically for use with containers. It then executes mpirun on an apptainer exec command, in this case running the osu_alltoall that was built in the Apptainer definition file. Note the various options that are passed to mpirun.

Submit the job using the following command:

sbatch job.sh

job.sh

#!/bin/bash

#SBATCH --ntasks=20
#SBATCH --nodes=2
#SBATCH --time=10

set -e

export APPTAINER_IMAGE="_PATH_TO_IMAGE_"

module purge; module load bluebear
# Load an OpenMPI-container module
# N.B. The version must match the version you are using in the container image
module load bear-apps/2022a 
module load OpenMPI-container/4.1.4

export APPTAINER_BIND=$APPTAINER_BIND,$BB_EL8_CONTAINER_SYS_BINDS

unset OMPI_MCA_btl
unset OMPI_MCA_mtl

mpirun \
    --mca btl_openib_allow_ib 1 \
    --mca orte_base_help_aggregate 0 \
    --mca btl_vader_single_copy_mechanism none \
    -np $SLURM_NTASKS \
    apptainer exec ${APPTAINER_IMAGE} /opt/osu/libexec/osu-micro-benchmarks/mpi/collective/osu_alltoall