Nathan Wood


Setting Up the Matlab Python Engine on a HPC Cluster

1. Introduction

Matlab is a proprietary yet powerful numerical analysis software suite that is used across a variety of fields, from acoustics to zymography. Matlab is particularly famous for its ease of use and its strengths in signal processing and matrix/vector operations.

Python is a free and open source general purpose programming language with a permissive license. Python often is employed in intense data wrangling, machine learning, and automation tasks. Because of its interprator, it often works across operating systems as long as said interpreter is compiled against the correct hardware. Much like Matlab it has numerical analysis and machine learning packages.

Some projects take advantage of both tools by using the Python MATLAB engine, which is in python is just called matlabengine or similar. matlabengine is an applications programming interface (API), that allows for Matlab functions to be executed from a Python environment with data transfer. In order to use matlabengine, you will need a licensed copy of Matlab.

On a typical graphical workstation environment, setting up Matlab, Python, and matlabengine are rather straightforward. However, some projects are resource intensive and must be conducted remotely using a high performance computing (HPC) environment such as a cluster. As a user of a HPC cluster, it is rather unlikely that you have superuser privileges, and must work within your privileges as a user to setup the appropriate python environment and connect matlabengine to the institution’s Matlab license(s).

This post is a brief tutorial describing how I was able to get matlabengine to communicate with Matlab 2024a on the Grace cluster at the Texas A&M High Performance Research Computing center. This post assumes you are accessing the cluster through SSH.

2. Creating the Conda Environment for Your Scripts

You should first create the necessary conda environment for your scripts/programs. On a workstation its rather straightforward. However, on an HPC cluster, you may have a limited /home/user directory and a much larger /scratch/user directory, as is the case for the TAMU HPRC. To create a conda environment in a specific directory, the following can be performed

mkdir /scratch/user/$USER/.conda/$directoryName
mamba create --prefix /scratch/user/$USER/.conda/$directoryName -n env

Please note that typically the /scratch/user directories are not long term storage, so please either backup your environment or export the appropriate *.yaml configuration file

Make sure you also install everything needed as well. You will also need to point to the conda environment when making a batch submission script.

3. Setting Up Matlab and Matlab Engine

3.1 Figure Out Which Matlab Version You Will Use

HPC clusters typically do not make every software they have immediately available. This is especially important as HPC clusters have to maintain multiple versions of the same software for all users, which will most likely clash with the dependencies of other software at some point. To manage this, most HPC clusters have a module control system such as envmod or lmod. The TAMU HPRC system uses lmod.

To find which Matlab versions are available on the HPC cluster use the lmod command module spider Matlab. Please note that this command is case sensitive. The spider command will enumerate all of the software that matches the regular expression provided. If you wish to learn more about an enumerated package, you can use module show Matlab.

Once you are sure of a specific package, you can load it using module load Matlab/Ryyyyx. Make sure that this command is included in any batch submission scripts that are used.

If you want to see that Matlab is properly loaded in an SSH session without X11 forwarding you can use the command matlab -nodisplay -nosplash and it should load the Matlab interpreter environment. You can also check the license number using Matlab command license and the status of said license using license('inuse').

3.2 Figure Out Which Matlab Engine Version You Will Use

A table outlining which version of Matlab Engine corresponds with Matlab can be found here. You will then want to add the desired version into your python environment. This is accomplished using the command pip install matlabengine==$version.

3.3 Making Sure Everything Works

Once everything is installed/configured open an interactive python3 command line environment and execute the following:

import matlab.engine
matlab.engine.start_matlab()

If the matlab engine properly installed and is communicating with matlab properly, there should be no issue. Chances are on an HPC cluster (at least at Texas A&M) that the execution will immediately terminate with an error mentioning that the Matlab license cannot be accessed. The rest of this guide is dedicated to figuring out how to properly communicate with Matlab on the cluster.

4. Diagnosing the License Shutdown Error

Python and Matlab are not communicating properly due to a missing Matlab software license. You will need to figure out where/how Matlab is executed. This is accomplished using which matlab, which should return startup script in an isolated directory. At Texas A&M, this directory is /sw/hprc/sw/Matlab/. The particular startup script is /sw/hprc/sw/Matlab/bin/matlab, and can be viewed using less. When examining this script there are conditional statements to load the proper Matlab version, as well as comments relating to changes systems administrators have made. Towards the bottom there will be mention of the following:

echo "...Starting Matlab from host: $host"
${MATLABROOT}/bin/matlab -c 27000@modulus.tamu.edu
echo "Done"

These lines specify that Texas A&M HPRC uses a license server, known as modulus, to distribute Matlab licenses on an as-needed basis. The server modulus.tamu.edu is accessed via port 27000 as declared by the statement 27000@modulus.tamu.edu.

Texas A&M likely uses a license management service such FlexLM to distribute the Matlab licenses on HPC instances. The name Modulus is most likely just a reference to the arithmetic operation that is often used in sanity checking. It is likely, based on documentation (1, 2), that Matlab is typically executed entirely from the SSH client, a forwarded graphical environment via VNC, or through a specialized submission script. Because of this, it is rather cumbersome to activate Matlab throught matlabengine.

To activate matlabengine through a traditional submission script that will execute a Python script, include the following statements in the submission script:

export LD_LIBRARY_PATH="/sw/hprc/sw/Matlab/R2024a/sys/os/glnxa64:$LD_LIBRARY_PATH"
export MLM_LICENSE_FILE="27000@modulus.tamu.edu"

These will point to the correct Linux X86-64 Matlab libraries, as well as point to the Modulus license server through port 27000.

5. Summary

Hopefully this brief guide clarified how to do the following on an HPC cluster:

  1. Establish a anaconda environment with custom directory
  2. Find and load the appropriate Matlab suite
  3. Execute the Python Matlab Engine by pointing to where the license server is located.

6. Example SLURM Script

#!/bin/bash
  
#SBATCH --job-name=ilee-woodn-dummy-matlab
#SBATCH --time=08:00:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=16
#SBATCH --mem=64G
#SBATCH --gres=gpu:rtx:1
#SBATCH --output=stdout.%j
#SBATCH --error=stderr.%j
#SBATCH --mail-type=ALL
#SBATCH --mail-user=email@address


# load requisites
module purge
module load CUDA/12.2.0
module load Mamba/23.11.0-0
module load Matlab/R2024a

# bash profile stuff
source /home/woodn/.bashrc
export LD_LIBRARY_PATH="/sw/hprc/sw/Matlab/R2024a/sys/os/glnxa64:$LD_LIBRARY_PATH"
export MLM_LICENSE_FILE="27000@modulus.tamu.edu"

# activate conda environment
mamba activate /scratch/user/woodn/.conda/ilee

# execute script
python3 /scratch/user/woodn/ilee-3d/ilee-3d-mat.py

# end by unloading requisites
module purge