Working on Astro

From CsWiki
Jump to: navigation, search

General info

This page contains general instructions for working with SLURM.

Each job submission must declare how much RAM it will take. Jobs requiring more memory than initially requested can result in the job being killed.

Each job must declare how many CPU's it will require. Due to hyper-threading (on nodes with hyper-threading enabled), this number needs to be even (it will be rounded up if odd). The number of cpus is forced, if the number of processes/threads exceeds the number of allocated cpus they will share the allocated cpus (even if other cpus are available).

The default number of cpus and memory allocation is cluster depended.


Quick start

First login to the cluster:

 ssh MY_USER_NAME%astro-gw@gw.cs.huji.ac.il

Submit a script (myscript) that requires 4 cpus, 400M RAM and will run at most for 2 hours:

 sbatch --mem=400m -c4 --time=2:0:0 "myscript" 

Submit a binary executable (myexecutable), for maximum 3 days:

 sbatch --mem=400m -c4 --time=3-0 --wrap="myexecutable"

Submit a script that requires 2 gpus:

 sbatch --mem=500m -c2 --gres=gpu:2 "myscript" 

Run a shell for interactive work:

 srun --mem=400m -c2 --time=1-12 --pty $SHELL

To run graphical programs you need to connect to the cluster with X11 forwarding enabled:

 srun --mem=400m -c2 --time=2:0:0 xterm

More explicit details later on.


Commands

Schedule a script

Used to schedule a script to run as soon as resources are available.

usage:

sbatch [options] <script>

options:

-c n Allocate n CPU's (per task).
-t t Total run time limit (e.g. "2:0:0" for 2 hours, or "2-0" for 2 days and 0 hours).
--mem-per-cpu m Allocate m MB per CPU.
--mem m Allocate m MB per node (--mem and --mem-per-cpu are mutually exclusive)
--array=1-k Run the script k times (from 1 to k). The array index of the current run is in the SLURM_ARRAY_TASK_ID environment variable accessible from within the script.
--wrap cmd Instead of giving a script to sbatch, run the command cmd.
-n n Allocate resources for n tasks. Default is 1. Only relevant for parallel jobs, e.g. with mpi.
--gres resource specify general resource to use. Currently only gpu is supported. e.g. gpu:2 for two gpus.

More info in "man sbatch"

Submitted jobs status

Usage:

 squeue

More info in "man squeue"

Cancel a job

Usage:

 scancel <job id>

More info in "man scancel"

Hold and release jobs

To hold a job from executing (e.g. to give another job a chance to run), run:

 scontrol hold <job id>

To release it:

 scontrol release <job id>

Work interactively

To run commands interactively, use the srun command. This will block until there are resources available, and will redirect the input/output of the program to the executing shell. srun has the same parameters as sbatch.

If the input/output isn't working currectly (e.g. with shell jobs), usually adding the --pty flag solves the issue.

On some of the clusters interactive jobs have some limitation compared to batch jobs.

Jobs statistics

Used to view statistics about previous jobs.

e.g.

sacct

Long format:

sacct -l

All users:

sacct -a

Since 1/1/2017

sacct -S 2017-01-01

Or any combination of the options.

Nodes status

Shows data about the cluster and the nodes:

sinfo

Running jobs status

Show data about running jobs (e.g. memory, time, etc.)

sstat

Batch Scripts

Using the sbatch command, a script is executed once the resources are available.

All parameters to sbatch can be incorporated into the script itself, simplifying the batch submission command. The paremeters inside the script files are passed by lines begining with '#SBATCH'. These lines must be after the first line (e.g. the #!/bin/bash line) but before any real command.

This way, instead of:

sbatch --mem=400m -c4 --time=2:0:0 --gres=gpu:3 script.sh

One can use the script:

#!/bin/bash
#SBATCH --mem=400m
#SBATCH -c4
#SBATCH --time=2:0:0
#SBATCH --gres=gpu:3

some script lines
...

and submit using just:

sbatch script.sh

Inside the batch script, the 'srun' command is used to launch specific tasks that require part of the allocated space. i.e. several srun can be run in the background within the script (maybe with different options). This can form a "private queue" for the allocated resources. e.g. if the script was run using:

sbatch -n1 -c8 <script>

and the script is:

#/bin/bash
for s in 1 2 3 4 5 6 7 8; do
    srun -n1 -c3 <command> &
done
wait

then each command will be given 3 cpus, and the srun will wait for the resources to be available before executing the command. The batch script itself is assumed not to take CPU or memory.

All programs will be terminated once the batch script is terminated. So if executing srun in the background, it's usually helpful to finish the batch script with the 'wait' command (assuming bash).

Graphical Commands

For simple interactive session, srun --pty should suffice. For graphical programs, the DISPLAY should be set appropriatly. The simplest method is by ssh'ing to the astro-gw machine. This should set up everything.

Another method, is setting it manually:

  1. On the machine where the X server is running (where the window will be opened), before connecting to the gw, run:
    xauth list $HOST:0
    this will return a line similar to:
    ant-87.cs.huji.ac.il:0  MIT-MAGIC-COOKIE-1  fe8332fcbfd2de8fb37d4acdf64767be
  2. login to the gw machine
  3. run:
    xauth add <line returned from step 1>
  4. Set the DISPLAY according to <host>:0. If e.g. I'm working on ant-87:
    setenv DISPLAY ant-87:0
  5. Verify that it works by running e.g. xeyes
  6. Run the command. e.g.
    srun -n1 -c4 xterm

This will open an xterm with the specified resources, but it will open only when the resources are allocated.

Priority/Scheduling

Each job is given priority according to several weighted factors:

  1. QOS - Requested quality of service
  2. Fairshare - The past resource consumption of the user/account
  3. Job age - How long the job is waiting in the queue

QOS

The are four QOS: high, normal, low and requeue. The default is normal. To use a different QOS, use the --qos flag of sbatch.

high

Jobs with the high QOS will be allocated before the other QOS. Don't abuse this QOS, otherwise everyone will use it and it will lose its purpose.

normal

The default QOS.

low

The low QOS is used to submit jobs that will run only if there is no other jobs to run. Currently no jobs are killed, so if a low priority job will run for 30 days, it can still cause normal and high priority jobs to wait.

requeue

This QOS has the same priority as the low QOS, but jobs on this QOS will be killed and requeued if it will allow jobs from the normal or high QOS to be dispatched sooner.


Fairshare

This factor takes into account past resource use by the user/account, with some decay factor. If user1 used the cluster intensively in the past week, user2 will get higher priority. But if user1 used the cluster 2 years ago, it probably won't effect the current priority.

Job age

The longer the job is in the queue, the higher priority it will gain over other younger jobs.

More information

Man pages: sbatch, srun, sacct, squeue, scancel, sinfo, sstat, sprio web pages:

    general: http://slurm.schedmd.com/documentation.html
 user guide: http://slurm.schedmd.com/quickstart.html