Single GPU Pytorch training with SLURM - how to set "ntasks-per-node"?

Question

I would like to do some simple fine-tuning on a transformers model using a single GPU on a server via SLURM. I haven't used SLURM before and I am not a computer scientist so my understanding of the field is a bit limited. I have done some research and created the script below.

Could you please confirm if it is fit for purpose?

As far as I have understood, a node corresponds to a single computer and "--gres=gpu:1" will use a single gpu. The only thing I haven't understood clearly is "ntasks-per-node". The way I have understood it, because I will run a single python script, this can be equal to 1. Is that correct?

#! /bin/bash

#SBATCH --job-name 'SQuAD'
#SBATCH --output squad_job%J.out
#SBATCH --error squad_error%J.err
#SBATCH --nodes=1
#SBATCH --gres=gpu:1
#SBATCH --ntasks-per-node=1
#SBATCH --partition=normal
#SBATCH --time=72:00:00

python3 fine_tune_squad.py

Berriel · Accepted Answer · 2021-08-15 14:52:20Z

1

Yes, it will request 1 GPU for running the task. As described in the documentation:

The default is one task per node [...]

Therefore, the default value for --ntasks-per-node is already 1, which means you don't even need to define it. In fact, even --nodes has a default value of 1. Nonetheless, some consider a good practice to explicitly define them to avoid problems, so I'd leave them as you did.

edited Aug 15, 2021 at 14:52

answered Aug 15, 2021 at 14:47

Berriel

13.8k4 gold badges51 silver badges73 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Single GPU Pytorch training with SLURM - how to set "ntasks-per-node"?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related