Parallel / MPI Job
Running a simple MPI test job
This job is very simple. It just prints a hello world message from all the cores it runs on.
There are sample job submission scripts available to you in the following path /opt/examples/slurm.
We will copy one of these to our home directory, and submit it in this example.
CODE
$ cp /opt/examples/slurm/mpi-job.sh ~/mpi-job.sh
New sbatch options for parallel jobs
Argument | Default | Description |
---|---|---|
--ntasks=X | 1 | The number of cores to run the job on. This will pick X cores from a set of machines, it will not guarantee placement of those tasks on specific nodes. |
--ntasks-per-node=X | none | The number of tasks per node (use in combination with --nodes). |
--nodes=X | none | The number of nodes, for example --nodes=2 --ntasks-per-node=10 would run on 20 cores total, 10 on each node. |
Job Script
CODE
#!/bin/bash
# Which partition/queue does this job need to run in. Default is 'hsw-fdr'
#SBATCH --partition=hsw-fdr
# How long does my job have to run (HH:MM:SS), # without this option limit is
# 5min
#SBATCH --time=01:00:00
# How many cores should I run my job on; for mpijobs jobs this should be the
# number of you'd pass to mpirun (i.e. mpirun -np X). If not specified the
# default is 1
#SBATCH --ntasks=60
# This is memory need per task (see above). If not specified you will get
# 3GB of RAM per cpu.
#SBATCH --mem-per-cpu=1G
# The descriptive name for your job. This potentially will be visible to other
# users on ACTnowHPC
#SBATCH --job-name=mpi_test
# The name of the file to write stdout/stderr to. Use %j as a place holder
# for the current job number
#SBATCH --output=mpi_test-%j.out
# load the mpi version you code was compiled with
module load mvapich2-2.2a/gcc
# issue the mpirun command with my binary, no need for -np options as scheduler
# takes care of that for you
mpirun /opt/examples/mpihello/mpihello-mvapich2-2.2a
Options used in this job submission script
#SBATCH --partition=hsw-fdr | Run in the hsw-fdr partition |
#SBATCH --time=01:00:00 | Run for 1 hour |
#SBATCH --ntasks=60 | Run the job on 60 cores |
#SBATCH --mem-per-cpu=1G | I will need 1GB of RAM per CPU/CORE for my job (in this example 60GB total) |
#SBATCH --job-name=mpi_test | I'm naming my job "serial_test" |
#SBATCH --output=mpi_test-%j.out | Write all the output to a file called mpi_test-JOBID.out |
To run the job issue the following command
CODE
$ sbatch mpi-job.sh
Submitted batch job 1521
Check the status of the job
Check the status with squeue (more info Basic SLURM commands).
CODE
$ squeue --job 1521