1

I would like to do some simple fine-tuning on a transformers model using a single GPU on a server via SLURM. I haven't used SLURM before and I am not a computer scientist so my understanding of the field is a bit limited. I have done some research and created the script below.

Could you please confirm if it is fit for purpose?

As far as I have understood, a node corresponds to a single computer and "--gres=gpu:1" will use a single gpu. The only thing I haven't understood clearly is "ntasks-per-node". The way I have understood it, because I will run a single python script, this can be equal to 1. Is that correct?

#! /bin/bash

#SBATCH --job-name 'SQuAD'
#SBATCH --output squad_job%J.out
#SBATCH --error squad_error%J.err
#SBATCH --nodes=1
#SBATCH --gres=gpu:1
#SBATCH --ntasks-per-node=1
#SBATCH --partition=normal
#SBATCH --time=72:00:00

python3 fine_tune_squad.py

1 Answer 1

1

Yes, it will request 1 GPU for running the task. As described in the documentation:

The default is one task per node [...]

Therefore, the default value for --ntasks-per-node is already 1, which means you don't even need to define it. In fact, even --nodes has a default value of 1. Nonetheless, some consider a good practice to explicitly define them to avoid problems, so I'd leave them as you did.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.