I'm struggling to understand how to run multiple processes in the same node using SLURM.
Suppose I want to run a program with 100 different input arguments. This is what I would do on my laptop for example:
for i in `seq 100`; do
./program ${i}
done
Now I have access to a cluster with 24-core nodes. So, I want to run 24 instances of the program on 5 nodes (24 on 4 nodes + 4 on a 5th node) at the same time.
I thought the submit script should look like this:
#!/bin/bash
#SBATCH -N 5
#SBATCH -n 100
#SBATCH --ntasks-per-node=24
for i in `seq 100`; do
srun ./program ${i} &
done
wait
It turns out that, with this submit script, ./program is run multiple times for every i value, even though srun is called only once for each loop.
What is going on? What is the right way to do this?