Skip to content

Self-installing Python Modules for BlueBEAR

Note

The term “module” in this context refers to the name of the extensions to Python’s functionality that can be used by including e.g. import flake8 in your Python code.

These are the most commonly used methods for installing Python modules:

  • pip install flake8
  • python setup.py install

Python module installation process

Where a Python module is available at the Python Package Index (PyPI) it can be installed by using pip, the Python installer command. Executing the default pip install command will not work on BlueBEAR as users don’t have the file permissions to write into the directory where this process normally places the Python modules. It is possible to pass the --user option to the command so that it installs into your home directory but this is problematic for the reasons described above, i.e. it won’t distinguish between node types and your jobs may subsequently fail.

We therefore recommend that you use a node-specific Python virtual environment. This solution applies to both the pip installation method and also the python setup.py install method.

The process for creating and using a node-specific virtual environment is as follows:

Creating a virtual environment and installing a Python module

  1. Load the BEAR Python module on which you want to base your virtual environment.

    • Optional: load any additionally required modules, e.g. Matplotlib, SciPy-bundle etc. (See the tips section for further details.)
  2. Change to the directory in which you want to create the virtual environment. (Alternatively you can specify the full path in the following step.)

  3. Create a virtual environment, including the environment variable ${BB_CPU} in its name to identify the node-type:

    python3 -m venv --system-site-packages my-virtual-env-${BB_CPU}
    
  4. Activate the virtual environment:

    source my-virtual-env-${BB_CPU}/bin/activate
    
  5. Run your Python module installations as normal (N.B. don’t include --user):

    pip install flake8
    

Using your node-specific virtual environment

  1. First load the same BEAR Python module as you used to create the virtual environment in the previous step. This is important, else your Python commands will likely fail.
  2. Activate the virtual environment:

    source my-virtual-env-${BB_CPU}/bin/activate
    
  3. Execute your Python code.

Example script

All of the above steps can be encapsulated in a script, which can be included as part of the batch script that you submit to BlueBEAR:

#!/bin/bash
set -e

module purge; module load bluebear
module load bear-apps/2021b
module load Python/3.9.6-GCCcore-11.2.0

export VENV_DIR="${HOME}/virtual-environments"
export VENV_PATH="${VENV_DIR}/my-virtual-env-${BB_CPU}"

# Create a master venv directory if necessary
mkdir -p ${VENV_DIR}

# Check if virtual environment exists and create it if not
if [[ ! -d ${VENV_PATH} ]]; then
    python3 -m venv --system-site-packages ${VENV_PATH}
fi

# Activate the virtual environment
source ${VENV_PATH}/bin/activate

# Perform any required pip installations. For reasons of consistency we would recommend
# that you define the version of the Python module – this will also ensure that if the
# module is already installed in the virtual environment it won't be modified.
pip install flake8==6.0.0

# Execute your Python scripts
python my-script.py

Removing user-wide Python modules

If you have installed Python modules using pip install but without using a virtual environment (as detailed above) then you may experience a variety of issues.
For example, if you performed a pip install against our Python/3.9.6 module then this will have installed content into the following directory: ${HOME}/.local/lib/python3.9/site-packages

Our recommendation is to remove all Python directories located in ~/.local/lib by executing the following command:

rm -r "${HOME}/.local/lib/python"*

User virtual envs and BEAR Portal’s JupyterLab app

The process for using Python extensions installed in a virtual environment within a Python kernel running on the BEAR Portal JupyterLab app is summarised below.

Warning

BEAR Portal Interactive Apps cannot be constrained to a specific node-type so you will need to create multiple virtual environments (one for each node-type) by passing constraints in your sbatch script. See here for more information.
You will then need to pass the ${BB_CPU} environment variable in the following process, where required.

Process

  1. Start a JupyterLab Interactive App session on BEAR Portal, being sure to match the kernel to the Python version against which you created the virtual environment.

    Note

    A mismatch between the virtual environment’s Python version and the running kernel’s Python version will likely result in errors.

  2. Once connected to the JupyterLab server, load any additional modules that were also present when you created the venv.

  3. Launch a notebook (or shutdown & restart the kernel for an already-running notebook).
  4. Within your running notebook, copy the following code (modifying paths where necessary) into a cell and execute it to insert your virtual environment’s site-packages path to the running system path.

    import os
    from pathlib import Path
    import sys
    node_type = os.getenv('BB_CPU')
    venv_dir = f'/path/to/venv-{node_type}'  # edit this line to match the venv directory format
    venv_site_pkgs = Path(venv_dir) / 'lib' / f'python{sys.version_info.major}.{sys.version_info.minor}' / 'site-packages'
    if venv_site_pkgs.exists():
        sys.path.insert(0, str(venv_site_pkgs))
    else:
        print(f"Path '{venv_site_pkgs}' not found. Check that it exists and/or that it exists for node-type '{node_type}'.")
    

Subsequent Python import statements will now search in your virtual environment’s path before any others.

Tips

  • Install the minimum of what is required. For example, if the Python module that you’re installing has a dependency on Matplotlib, load the relevant BEAR Matplotlib module first and then perform your virtual environment installations.

    Note

    Your virtual environment should add to, and not replace, the Python libraries available via the module loaded.

  • If you switch the Python modules being loaded then you must create a new virtual environment based on these new Python modules.

  • Further to the above tip, you may need to be aware of dependencies’ version constraints. For example, if a Python module needs a newer version of Matplotlib than the one we provide, first check if BEAR Applications has the later version. If not, see whether you can install an earlier version of the module you require that will work with the BEAR Applications version of Matplotlib – this would be our recommendation as some Python modules are complex to install. Finally, you can instead use the BEAR Python module instead of the BEAR Matplotlib module and then install everything yourself although, as mentioned, this may be difficult depending on the complexity of the modules’ installation processes.
  • We strongly recommend using a module instead of the system Python version. Also, note that we do not recommend the use of Python 2 as it’s no longer supported by the Python developers.
  • Python libraries on PyPI can either be binary packages known as ‘wheels’ which are self contained with compiled code, or source packages which rely on external dependencies such as compiled C/C++/Fortran libraries and which compile at installation. For the latter, you may find that installing through pip will fail and that you need to load additional modules from BEAR Applications before retrying the installation, or that you will need to compile the dependencies yourself.
  • Some package authors recommend Python package installation via Anaconda or Miniconda. We do not recommend the use of Python packages via this method on the BlueBEAR cluster and would encourage you to contact us if the package you want to make use of suggests this method of installation.