Using MPI in Containers¶
The information below covers the process of how to run MPI jobs using containers on BlueBEAR.
Introduction¶
To run MPI jobs inside containers on BlueBEAR you need to use the same version of OpenMPI inside and outside of the container.
Building your own containers¶
We recommend using Rocky Linux as the base image for building your containers. It is a freely available distribution that closely tracks the RedHat Enterprise Linux distribution that runs on BlueBEAR.
Note
We will only provide support for container images using a distribution based on RHEL 8 (e.g. Rocky or AlmaLinux) and using an OpenMPI version matching that of the OpenMPI-Container module used.
For further general information on building Apptainer images please see here. Please also take note of the information on Building Software and Node Types, which also applies to container images.
Example Code¶
The following tabs provide example scripts of an Apptainer definition file and complementary Slurm batch script. You should adapt these scripts as required in order to run your MPI jobs.
The following Apptainer definition file compiles OpenMPI inside a container image and then uses
the appropriate MPI compiler (e.g. mpicc
) to build OSU Micro Benchmarks.
To build the example below, execute:
unset APPTAINER_BIND
apptainer build openmpi_example_${BB_CPU}.sif openmpi_example.def
openmpi_example.def
Bootstrap: docker
From: rockylinux:8.6
%help
Container running BlueBEAR compatible OpenMPI
%environment
export OMPI_DIR=/opt/ompi
export PATH=${PATH}:${OMPI_DIR}/bin
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${OMPI_DIR}/lib
%post
echo "Installing required packages..."
dnf -y update && dnf --enablerepo=powertools -y install \
wget \
git \
tar \
bzip2 \
perl \
gcc \
gcc-c++ \
gcc-gfortran \
make \
file \
rdma-core-devel \
libfabric-devel \
hwloc-libs \
iproute \
net-tools
mkdir -p /opt
### DOWNLOAD AND COMPILE OPENMPI
echo "Installing Open MPI"
OMPI_DIR=/opt/ompi
OMPI_VERSION=4.1.4 # this needs to match the version of the OpenMPI-container module you are using
OMPI_URL="https://download.open-mpi.org/release/open-mpi/v4.1/openmpi-$OMPI_VERSION.tar.bz2"
mkdir -p /tmp/ompi
# Download OpenMPI
cd /tmp/ompi \
&& wget -O openmpi-$OMPI_VERSION.tar.bz2 $OMPI_URL \
&& tar -xjf openmpi-$OMPI_VERSION.tar.bz2
# Compile and install OpenMPI
cd /tmp/ompi/openmpi-$OMPI_VERSION \
&& ./configure --prefix=$OMPI_DIR \
&& make -j8 install
rm -rf /tmp/ompi
### DOWNLOAD AND COMPILE OSU MICRO BENCHMARKS
OSU_DIR=/opt/osu
OSU_VERSION=6.2
OSU_URL="https://mvapich.cse.ohio-state.edu/download/mvapich/osu-micro-benchmarks-$OSU_VERSION.tar.gz"
mkdir -p /tmp/osu
# Download OSU
cd /tmp/osu \
&& wget -O osu-micro-benchmarks-$OSU_VERSION.tar.gz $OSU_URL \
&& tar -xzf osu-micro-benchmarks-$OSU_VERSION.tar.gz
# Compile and install OSU
cd /tmp/osu/osu-micro-benchmarks-$OSU_VERSION \
&& ./configure --prefix=$OSU_DIR CC=/opt/ompi/bin/mpicc CXX=/opt/ompi/bin/mpicxx \
&& make -j8 && make install
rm -rf /tmp/osu
This batch submission file loads an OpenMPI module built specifically for use with containers. It then executes
mpirun
on an apptainer exec
command, in this case running the osu_alltoall
that was built in the Apptainer definition file. Note the various options that are passed to mpirun
.
Submit the job using the following command:
sbatch job.sh
job.sh
#!/bin/bash
#SBATCH --ntasks=20
#SBATCH --nodes=2
#SBATCH --time=10
set -e
export APPTAINER_IMAGE="_PATH_TO_IMAGE_"
module purge; module load bluebear
# Load an OpenMPI-container module
# N.B. The version must match the version you are using in the container image
module load bear-apps/2022a
module load OpenMPI-container/4.1.4
export APPTAINER_BIND=$APPTAINER_BIND,$BB_EL8_CONTAINER_SYS_BINDS
unset OMPI_MCA_btl
unset OMPI_MCA_mtl
mpirun \
--mca btl_openib_allow_ib 1 \
--mca orte_base_help_aggregate 0 \
--mca btl_vader_single_copy_mechanism none \
-np $SLURM_NTASKS \
apptainer exec ${APPTAINER_IMAGE} /opt/osu/libexec/osu-micro-benchmarks/mpi/collective/osu_alltoall