9

I'm struggling to understand how to run multiple processes in the same node using SLURM.

Suppose I want to run a program with 100 different input arguments. This is what I would do on my laptop for example:

for i in `seq 100`; do
  ./program ${i}
done

Now I have access to a cluster with 24-core nodes. So, I want to run 24 instances of the program on 5 nodes (24 on 4 nodes + 4 on a 5th node) at the same time.

I thought the submit script should look like this:

#!/bin/bash
#SBATCH -N 5
#SBATCH -n 100
#SBATCH --ntasks-per-node=24
for i in `seq 100`; do
  srun ./program ${i} &
done
wait

It turns out that, with this submit script, ./program is run multiple times for every i value, even though srun is called only once for each loop.

What is going on? What is the right way to do this?

2 Answers 2

5

By default, srun will use the full allocation in runs in, so here, the full 100 tasks. To tell is only to use a single core, you need to run

srun --exclusive --ntasks 1 ...

From the srun manpage:

This option can also be used when initiating more than one job step within an existing resource allocation, where you want separate processors to be dedicated to each job step. If sufficient processors are not available to initiate the job step, it will be deferred. This can be thought of as providing a mechanism for resource management to the job within it's allocation.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks Damien! I guess my misunderstanding comes from the confusion among job vs. job step vs. task and node vs. processor vs. CPU. Does the --exclusive option makes a node or a processor or a CPU exclusive to a task or a job or a job step?
No, in this context, it has no relation with that meaning.
Hmm.. Let me paraphrase. What if I don't add --exclusive to the command? Like this: srun --ntasks 1 ./program $i?
0

Add --nodes 1 will get rid of the warnings.

#!/bin/bash
#SBATCH -N 5
#SBATCH -n 100
#SBATCH --ntasks-per-node=24
for i in `seq 100`; do
  srun --exclusive --nodes 1 --ntasks 1 ./program ${i} &
done
wait

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.