I would like to do some simple fine-tuning on a transformers model using a single GPU on a server via SLURM. I haven't used SLURM before and I am not a computer scientist so my understanding of the field is a bit limited. I have done some research and created the script below.
Could you please confirm if it is fit for purpose?
As far as I have understood, a node corresponds to a single computer and "--gres=gpu:1" will use a single gpu. The only thing I haven't understood clearly is "ntasks-per-node". The way I have understood it, because I will run a single python script, this can be equal to 1. Is that correct?
#! /bin/bash
#SBATCH --job-name 'SQuAD'
#SBATCH --output squad_job%J.out
#SBATCH --error squad_error%J.err
#SBATCH --nodes=1
#SBATCH --gres=gpu:1
#SBATCH --ntasks-per-node=1
#SBATCH --partition=normal
#SBATCH --time=72:00:00
python3 fine_tune_squad.py