Utilisation¶
- Contenu
- Utilisation
Le calculateur est composé de processeurs disposant de 48 coeurs chacun et 380Go de mémoire, pour un total de 768 coeurs et 6To de mémoire . Tous les calculs et exécution de code sont à réaliser à travers le gestionnaire de ressources slurm :
- soit en définissant un script permettant de réserver les ressources, et d'exécuter les codes : Configuration du gestionnaire de job
- ou d'utiliser les outils de réservation d'une session interactive : Connexion
Réservation de ressources interactives¶
Il est possible de réserver les ressources via slurm de manière interactive avec la commande srun
. Ces réservations peuvent être utilisées pour tout type de calcul, utilisation de codes, librairies et autre outils, via l'usage de la partition interactive
; exemple :
$ srun -p interactive -n 24 --cpus-per-task=1 --pty bash -i
...
- il est impératif d'utiliser la partition
interactive
(-p interactive
) - les autres options et la syntaxe associée sont similaires à l'usage classique de la commande
sbatch
- le job interactif se place dans le répertoire de travail où la commande
srun
est exécutée, vous pouvez vous replacer dans n'importe quel répertoire contenu sur votre compte ensuite.
Slurm¶
Introduction¶
- On peut trouver une introduction au gestionnaire de travaux Slurm https://slurm.schedmd.com/quickstart.html.
- Cheatsheet : https://slurm.schedmd.com/pdfs/summary.pdf
quelques commandes de gestion des jobs¶
Lancer un job décrit dans le script
<jobscript>
:$ sbatch <jobscript>
consulter la liste des jobs soumis:
$ squeue #ou en ajoutant un formatage différent $ squeue -o '%A %8u %12j %3C %8N %2t %12l %12M %23e %10m %5Q'
supprimer un job (référencé par un
jobid
)$ scancel <jobid>
Obtenir des informations sur un job soumis:
$ scontrol show job <jobid> # ou $ sacct -j <jobid> --format=User%20,JobID%20,Jobname%60,partition,state%20,time,start,end,elapsed,MaxRss,MaxVMSize,ReqMem,nnodes,ncpus,nodelist%60,submit%40
Surcharger des options en ligne de commande
$ sbatch -n 96 -t 24:00:00 --qos batch_long ./script_with_small_walltime.sh
Ajout des mails
$ sbatch --mail-user=homer.simpson@springfield.fr --mail-type=BEGIN, END,FAIL submit.sh
Execution directe via
srun
d'un code nomméex17
$ srun -p interactive -n 12 --mem=48G -- mpirun -n 12 ./ex17 -dm_refine 9
calcul "MPI"¶
Plusieurs implémentations de mpi
sont disponibles, directement géré par les modules : openmpi, intelmpi
(ou le module intel compilers/intel/all_tools_2025.0.0
chargeant tous les outils intel);
Exemple de script mpi¶
- je demande 48 processus pendant 10 min
#!/bin/bash
#SBATCH --job-name=LULESH-MPI
#SBATCH --output=LULESH-MPI.log
#SBATCH -n 48
#SBATCH -p calcul
#SBATCH --time=10:00
module add compilers/gcc/ mpi/openmpi/5.0.5
srun ./omp_4.0/lulesh2.0 -i 20 -p
calcul "OPENMP"¶
Exemple de script openmp¶
- je demande 48 processus pendant 10 min
#!/bin/bash
#SBATCH --job-name=LULESH-OMP4
#SBATCH --output=LULESH-OMP4.log
#SBATCH -n 1
#SBATCH --cpus-per-task=48
#SBATCH -p calcul
#SBATCH --time=10:00
module add compilers/gcc/
./omp_4.0/lulesh2.0 -i 20 -p
suivi des jobs¶
En plus des commandes squeue, sinfo, sacct
classiques, d'autres commandes sont disponibles (fournies par le projet https://github.com/OleHolmNielsen/Slurm_tools.git)
pestat
: Print Slurm nodes status with 1 line per node including job info.$ pestat -h Usage: pestat [-p partition(s)] [-P] [-u username] [-g groupname] [-A accountname] [-a] [-q qoslist] [-s/-t statelist] [-n/-w hostlist] [-j joblist] [-G] [-N] [-f | -F | -m free_mem | -M free_mem ] [-1|-2] [-d] [-S] [-E] [-T] [-C|-c] [-V] [-h] where: -p partition: Select only partion <partition> -P: Include all partitions, including hidden and unavailable ones -u username: Print only jobs of a single user <username> -g groupname: Print only users in UNIX group <groupname> -A accountname: Print only jobs in Slurm account <accountname> -a: Print User(Account) information after each JobID -q qoslist: Print only QOS in the qoslist <qoslist> -R reservationlist: Print only node reservations <reservationlist> -s|-t statelist: Print only nodes with state in <statelist> -n|-w hostlist: Print only nodes in hostlist -j joblist: Print only nodes in job <joblist> -G: Print GRES (Generic Resources) in addition to JobID -N: Print JobName in addition to JobID -f: Print only nodes that are flagged by * (unexpected load etc.) -F: Like -f, but only nodes flagged in RED are printed. -m free_mem: Print only nodes with free memory LESS than free_mem MB -M free_mem: Print only nodes with free memory GREATER than free_mem MB (under-utilized) -d: Omit nodes with states: down down~ idle~ alloc# drain drng resv maint boot -1: Default: Only 1 line per node (unique nodes in multiple partitions are printed once only) -2: 2..N lines per node which participates in multiple partitions -S: Job StartTime is printed after each JobID/user -E: Job EndTime is printed after each JobID/user -T: Job TimeUsed is printed after each JobID/user -C: Color output is forced ON -c: Color output is forced OFF -h: Print this help information -V: Version information
showuserjobs
: Print the current node status and batch jobs status broken down into userids.$ showuserjobs -h Usage: /usr/local/bin/showuserjobs [-u username] [-a account] [-p partition] [-P] [-q QOS] [-r] [-A] [-C] [-h] where: -u username: Print only user <username> -a account: Print only jobs in Slurm account <account> -A: Print only ACCT_TOTAL lines -C: Print comma separated lines for Excel -p partition: Print only jobs in partition <partition-list> -P: Include all partitions, including hidden and unavailable ones -q qos-list: Print only jobs in QOS <qos-list> -r: Print additional job Reason columns -h: Print this help information
showpartitions
: Print a Slurm cluster partition status overview with 1 line per partition, and other tools.$ showpartitions -h Usage: /usr/local/bin/showpartitions [-p partition-list] [-g] [-m] [-a|-P] [-f] [-h] where: -p partition-list: Print only jobs in partition(s) <partition-list> -g: Print also GRES information. -m: Print minimum and maximum values for memory and cores/node. -a|-P: Display information about all partitions including hidden ones. -f: Show all partitions from the federation if a member of one. Only Slurm 18.08 and newer. -n: no headers or colors will be printed (for parsing). -h: Print this help information.
$ showpartitions Partition statistics for cluster jarvis at Fri Feb 14 02:50:19 PM CET 2025 Partition #Nodes #CPU_cores Cores_pending Job_Nodes MaxJobTime Cores Mem/Node Name State Total Idle Total Idle Resorc Other Min Max Day-hr:mn /node (GB) calcul:* up 1 0 768 576 0 106 1 infin 5-00:00 768 6095 visu up 1 0 768 576 0 0 1 infin 12:00 768 6095 interactive up 1 0 768 576 0 0 1 infin 12:00 768 6095 Note: The cluster default partition name is indicated by :*
showuserlimits
: Print Slurm resource user limits and usage.$ showuserlimits -h Usage: /usr/local/bin/showuserlimits [-u username [-A account] [-p partition] [-M cluster] [-q qos] [-l limit] [-s sublimit1[,sublimit2,...]] [-n] | -h ] where: -u username: Print user <username> (Default is current user) -A accountname: Print only account <accountname> -p partition: Print only Slurm partition <partition> -M cluster: Print only cluster <cluster> -q Print only QOS=<qos> -l Print selected limits only -s Print selected sublimit1[,sublimit2,...] only -n Print also limits with value None -h Print help information