Slurm Documentation

Running jobs on Olimp cluster:

On olimp we are introducing the concept of partitions. The partitions are similar to queues in Torque.

We offer three different partitions:

-suspend Partition which is meant for jobs which eat up lots of processing power and next to none of the memory. Job will SIGSTOP if a job is started in any other partition and SIGCONT as soon as it is done

-rude Name of this partition says it all. Jobs in this partition use their elbows to get to the top of the cumulative queue. Suspended and cancelled jobs from beforementioned partitions lay in the wake of their success. Use only if you need to. Due to abuse in the past only 288 threads are available. Summary of SLURM commands:

Examples:

Run a script interactively with srun > srun --pty -p rude -t 10 --mem 1000 --qos=rude /bin/hostname Kill a job with scancel: >scancel 999999 View status of queues: squeue >squeue -u blaz Check current job by id: >sacct -j 999999 Submit batch script: >sbatch -p suspend --qos=suspend random.sh

A Couple of examples of job submission:

Requesting single node, partition suspend (lowest priority, qos is same as partition), a day and 2hrs of computing time, 50000MB of memory, a specific node,24 threads, exclusive access to node (other jobs will not be able to share it). Output and error log file names are specified, upon multiple submission of this script the output will be appended to log files.

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --partition=suspend
#SBATCH --qos=suspend
#SBATCH --time=1-2:00:00
#SBATCH --mem=50000
#SBATCH --nodelist=node01
#SBATCH --cpus-per-task 24
#SBATCH --error=/home/blaz/job.err 
#SBATCH --output=/home/blaz/job.out
#SBATCH --open-mode=append


stress-ng --cpu 24  --timeout 1h --metrics-brief

Asking for 4 tasks distributed across 2 nodes, running for no longer than 30 minutes. Running the (non-MPI) program "my_program". I would like exclusive access to the node i.e. I do not wish to share it during use.

#!/bin/bash

#SBATCH --ntasks 4
#SBATCH --ntasks-per-node=2
#SBATCH --time=00:30:00
#SBATCH --partition=suspend
#SBATCH --qos=suspend

srun ./myprogram

Submit the program from the command-line with:

sbatch -p  <partition_name >  <my_jobscript>

Running two executables per node (two serial jobs).

The scripts job1.batch and job2.batch could be very simple scripts, only containing a line with "./my_program". If you wish to rename the output and error files, and get them in separate files, you can do it with --error and --output directive:

#!/bin/bash
#SBATCH --nodes 1
#SBATCH --ntasks 2
#SBATCH --time=00:30:00 
#SBATCH --error=job.%J.err 
#SBATCH --output=job.%J.out
#SBATCH --partition=suspend
#SBATCH --qos=suspend


# Use '&' to move the first job to the background
srun --ntasks 1 ./job1.batch &
srun --ntasks 1 ./job2.batch

# Use 'wait' as a barrier to collect both executables when they are done.
wait

Running a MPI job

We can use slurm output environment variables to write out information about run in a log file. Slurm 2 nodes, 48 cores, 1 hour, all available memory. Exclusive.

#!/bin/bash
#SBATCH --nodes=2
#SBATCH --time=01:00:00
#SBATCH --mem=127000
#SBATCH --cpus-per-task 2
#SBATCH --output=random_job.%J.out
#SBATCH --partition=suspend
#SBATCH --qos=suspend

echo "Starting at `date`"
echo "Running on hosts: $SLURM_NODELIST"
echo "Running on $SLURM_NNODES nodes."
echo "Running on $SLURM_NPROCS processors."
echo "Current working directory is `pwd`"

srun --ntasks-per-node=24 ./mpi_largemem

More slurm output environmental variables can be found HERE.

More about sbatch options HERE.

You can always run one of the following comands on olimp if you wish to learn more:

man sbatch
man srun
man squeue
man sinfo
man scancel
man sacct