Utilisation¶
- Contenu
- Utilisation
Le calculateur est composé de processeurs disposant de 48 coeurs chacun et 380Go de mémoire, pour un total de 768 coeurs et 6To de mémoire . Tous les calculs et exécution de code sont à réaliser à travers le gestionnaire de ressources slurm :
- soit en définissant un script permettant de réserver les ressources, et d'exécuter les codes : Configuration du gestionnaire de job
- ou d'utiliser les outils de réservation d'une session interactive : Connexion
Réservation de ressources interactives¶
Il est possible de réserver les ressources via slurm de manière interactive avec la commande srun. Ces réservations peuvent être utilisées pour tout type de calcul, utilisation de codes, librairies et autre outils, via l'usage de la partition interactive ; exemple :
$ srun -p interactive -n 24 --cpus-per-task=1 --pty bash -i
...
- il est impératif d'utiliser la partition
interactive(-p interactive) - les autres options et la syntaxe associée sont similaires à l'usage classique de la commande
sbatch - le job interactif se place dans le répertoire de travail où la commande
srunest exécutée, vous pouvez vous replacer dans n'importe quel répertoire contenu sur votre compte ensuite.
Slurm¶
Introduction¶
- On peut trouver une introduction au gestionnaire de travaux Slurm https://slurm.schedmd.com/quickstart.html.
- Cheatsheet : https://slurm.schedmd.com/pdfs/summary.pdf
quelques commandes de gestion des jobs¶
Lancer un job décrit dans le script
<jobscript>:$ sbatch <jobscript>
consulter la liste des jobs soumis:
$ squeue #ou en ajoutant un formatage différent $ squeue -o '%A %8u %12j %3C %8N %2t %12l %12M %23e %10m %5Q'supprimer un job (référencé par un
jobid)$ scancel <jobid>
Obtenir des informations sur un job soumis:
$ scontrol show job <jobid> # ou $ sacct -j <jobid> --format=User%20,JobID%20,Jobname%60,partition,state%20,time,start,end,elapsed,MaxRss,MaxVMSize,ReqMem,nnodes,ncpus,nodelist%60,submit%40Surcharger des options en ligne de commande
$ sbatch -n 96 -t 24:00:00 --qos batch_long ./script_with_small_walltime.shAjout des mails
$ sbatch --mail-user=homer.simpson@springfield.fr --mail-type=BEGIN, END,FAIL submit.shExecution directe via
srund'un code nomméex17$ srun -p interactive -n 12 --mem=48G -- mpirun -n 12 ./ex17 -dm_refine 9
calcul "MPI"¶
Plusieurs implémentations de mpi sont disponibles, directement géré par les modules : openmpi, intelmpi (ou le module intel compilers/intel/all_tools_2025.0.0 chargeant tous les outils intel);
Exemple de script mpi¶
- je demande 48 processus pendant 10 min
#!/bin/bash
#SBATCH --job-name=LULESH-MPI
#SBATCH --output=LULESH-MPI.log
#SBATCH -n 48
#SBATCH -p calcul
#SBATCH --time=10:00
module add compilers/gcc/ mpi/openmpi/5.0.5
srun ./omp_4.0/lulesh2.0 -i 20 -p
calcul "OPENMP"¶
Exemple de script openmp¶
- je demande 48 processus pendant 10 min
#!/bin/bash
#SBATCH --job-name=LULESH-OMP4
#SBATCH --output=LULESH-OMP4.log
#SBATCH -n 1
#SBATCH --cpus-per-task=48
#SBATCH -p calcul
#SBATCH --time=10:00
module add compilers/gcc/
./omp_4.0/lulesh2.0 -i 20 -p
Profiling des codes¶
La suite likwid est disponible ; voir Tools_(profiler)
suivi des jobs¶
En plus des commandes squeue, sinfo, sacct classiques, d'autres commandes sont disponibles (fournies par le projet https://github.com/OleHolmNielsen/Slurm_tools.git)
pestat: Print Slurm nodes status with 1 line per node including job info.$ pestat -h Usage: pestat [-p partition(s)] [-P] [-u username] [-g groupname] [-A accountname] [-a] [-q qoslist] [-s/-t statelist] [-n/-w hostlist] [-j joblist] [-G] [-N] [-f | -F | -m free_mem | -M free_mem ] [-1|-2] [-d] [-S] [-E] [-T] [-C|-c] [-V] [-h] where: -p partition: Select only partion <partition> -P: Include all partitions, including hidden and unavailable ones -u username: Print only jobs of a single user <username> -g groupname: Print only users in UNIX group <groupname> -A accountname: Print only jobs in Slurm account <accountname> -a: Print User(Account) information after each JobID -q qoslist: Print only QOS in the qoslist <qoslist> -R reservationlist: Print only node reservations <reservationlist> -s|-t statelist: Print only nodes with state in <statelist> -n|-w hostlist: Print only nodes in hostlist -j joblist: Print only nodes in job <joblist> -G: Print GRES (Generic Resources) in addition to JobID -N: Print JobName in addition to JobID -f: Print only nodes that are flagged by * (unexpected load etc.) -F: Like -f, but only nodes flagged in RED are printed. -m free_mem: Print only nodes with free memory LESS than free_mem MB -M free_mem: Print only nodes with free memory GREATER than free_mem MB (under-utilized) -d: Omit nodes with states: down down~ idle~ alloc# drain drng resv maint boot -1: Default: Only 1 line per node (unique nodes in multiple partitions are printed once only) -2: 2..N lines per node which participates in multiple partitions -S: Job StartTime is printed after each JobID/user -E: Job EndTime is printed after each JobID/user -T: Job TimeUsed is printed after each JobID/user -C: Color output is forced ON -c: Color output is forced OFF -h: Print this help information -V: Version informationshowuserjobs: Print the current node status and batch jobs status broken down into userids.$ showuserjobs -h Usage: /usr/local/bin/showuserjobs [-u username] [-a account] [-p partition] [-P] [-q QOS] [-r] [-A] [-C] [-h] where: -u username: Print only user <username> -a account: Print only jobs in Slurm account <account> -A: Print only ACCT_TOTAL lines -C: Print comma separated lines for Excel -p partition: Print only jobs in partition <partition-list> -P: Include all partitions, including hidden and unavailable ones -q qos-list: Print only jobs in QOS <qos-list> -r: Print additional job Reason columns -h: Print this help informationshowpartitions: Print a Slurm cluster partition status overview with 1 line per partition, and other tools.$ showpartitions -h Usage: /usr/local/bin/showpartitions [-p partition-list] [-g] [-m] [-a|-P] [-f] [-h] where: -p partition-list: Print only jobs in partition(s) <partition-list> -g: Print also GRES information. -m: Print minimum and maximum values for memory and cores/node. -a|-P: Display information about all partitions including hidden ones. -f: Show all partitions from the federation if a member of one. Only Slurm 18.08 and newer. -n: no headers or colors will be printed (for parsing). -h: Print this help information.$ showpartitions Partition statistics for cluster jarvis at Fri Feb 14 02:50:19 PM CET 2025 Partition #Nodes #CPU_cores Cores_pending Job_Nodes MaxJobTime Cores Mem/Node Name State Total Idle Total Idle Resorc Other Min Max Day-hr:mn /node (GB) calcul:* up 1 0 768 576 0 106 1 infin 5-00:00 768 6095 visu up 1 0 768 576 0 0 1 infin 12:00 768 6095 interactive up 1 0 768 576 0 0 1 infin 12:00 768 6095 Note: The cluster default partition name is indicated by :*showuserlimits: Print Slurm resource user limits and usage.$ showuserlimits -h Usage: /usr/local/bin/showuserlimits [-u username [-A account] [-p partition] [-M cluster] [-q qos] [-l limit] [-s sublimit1[,sublimit2,...]] [-n] | -h ] where: -u username: Print user <username> (Default is current user) -A accountname: Print only account <accountname> -p partition: Print only Slurm partition <partition> -M cluster: Print only cluster <cluster> -q Print only QOS=<qos> -l Print selected limits only -s Print selected sublimit1[,sublimit2,...] only -n Print also limits with value None -h Print help information