The phoenix cluster policy works mainly by pool of resources limits. Each contribution is counted according to the CPUs, Memory, and GPUs added to the cluster. The different types of CPUs or GPUs aren't taken into account.
Each CS lab (PI) has a slurm account on the phoenix cluster. Each such account is limited by the lab's contribution to the cluster.
For example, if a lab contributed 2 nodes with 8 CPUs, 500G RAM, and 2 GPUs each. Than that lab won't be able to use more than 16 CPUs, 1T RAM and 4 GPU at any given time.
Due to scheduling constrains, especially when the cluster is loaded, jobs might still get delayed before the lab limit has been reached.
To check the account usage and limits, one can use the slimits utility:
slimits
All accounts have access to the killable account. This account doesn't have limits, but jobs there can be killed by "normal" jobs.
Courses, projects (e.g. engineering), and labs which didn't contribute to the cluster can only run on the killable account.
Jobs in the killable account have the lowest priority.
The priority is a number assigned to each job which determine the order in which the scheduler will try to schedule the jobs.
The priorities are not related to the limits. High priority jobs might not run because the lab has reached its resources limits. In which case lower priority jobs will start before the higher priority ones.
Currently there are no priorities differences between accounts. All accounts are equal priority-wise. A lab which bought more resources - will have higher limits, not higher priority.
The main priority factor is the fair-share. The more a user (or a lab) has used in the past, the lower the priority he'll have. Currently only CPU time is taken into account for usage history, but this might change in the future and different weights can be given to different resources (CPU, memory, GPU). The historic usage for the fair-share calculation has a half life decay of 7 days.
The cinder cluster comprises from old phoenix cluster nodes. On September of each year, nodes that are older than 7 years will be moved to the cinder cluster.
The cinder cluster's policy is the same as the phoenix.
Set the GPU limit on a per GPU group type, instead of generally on GPUs.
Another layer of killable jobs. Labs could pay for <resource>-time (cpu-time, gpu-time, etc.). These jobs will have priority between "killable" jobs and "normal" jobs. They could kill killable jobs and could be killed by "normal" jobs.
Slurm supports cluster federation, in which jobs are sent to all clusters which the user/account can run on. This could help users who have access to other clusters such as hm or blaise. Or generally increase utilization if the killable jobs could also migrate.
Nodes | # | CPU (sockets:cores:threads) | RAM | GPU |
---|---|---|---|---|
ampere-01 | 1 | 32 (8:4:1) | 377G | 8 a10 (22G) |
arion-[01-02] | 2 | 128 (2:64:1) | 503G | 8 a10 (22G) |
binky-[01-05] | 5 | 48 (8:6:1) | 503G | 8 a5000 (24G) |
creek-[01-04] | 4 | 72 (2:18:2) | 251G | 8 rtx2080 (10G) |
cyril-01 | 1 | 128 (2:64:1) | 503G | 8 a6000 (48G) |
drape-[01-03] | 3 | 128 (2:64:1) | 503G | 8 a5000 (24G) |
dumfries-003 | 1 | 32 (2:8:2) | 125G | 2 rtx2080 (10G) |
dumfries-006 | 1 | 32 (2:8:2) | 125G | 3 rtx2080 (10G) |
dumfries-007 | 1 | 32 (2:8:2) | 125G | 2 rtx2080 (10G) |
dumfries-[001-002] | 2 | 32 (2:8:2) | 125G | 4 rtx2080 (10G) |
dumfries-[004-005] | 2 | 32 (2:8:2) | 125G | 4 rtx2080 (10G) |
dumfries-[008-010] | 3 | 32 (2:8:2) | 125G | 4 rtx2080 (10G) |
epona-[01-02] | 2 | 128 (2:64:1) | 503G | 8 a40 (45G) |
firefoot-[01-03] | 3 | 64 (2:32:1) | 1T | 8 l40s (45G) |
firth-[01-02] | 2 | 40 (2:10:2) | 376G | 8 rtx6000 (24G) |
gringolet-[01-06] | 6 | 128 (2:64:1) | 1T | |
hasufel-01 | 1 | 64 (2:32:1) | 1T | 8 l4 (23G) |
wadi-[01-05] | 5 | 128 (2:32:2) | 503G |
Nodes | # | CPU (sockets:cores:threads) | RAM | GPU |
---|---|---|---|---|
cb-[05-06,08,10,14,17-19] | 8 | 16 (2:4:2) | 62G | |
cortex-01 | 1 | 16 (2:8:1) | 251G | 8 m60 (8G) |
cortex-02 | 1 | 16 (2:8:1) | 251G | 6 m60 (8G) |
cortex-[03-05] | 3 | 16 (2:8:1) | 251G | 8 m60 (8G) |
cortex-[06-08] | 3 | 24 (2:12:1) | 251G | 8 m60 (8G) |
gsm-01 | 1 | 32 (2:8:2) | 251G | |
gsm-[03-04] | 2 | 32 (2:8:2) | 251G | 3 black (6G) |
lucy-[01-03] | 3 | 48 (2:12:2) | 377G | 2 gtx980 (4G) |
ohm-[54-64] | 11 | 48 (2:12:2) | 251G | |
oxygen-[01-08] | 8 | 48 (2:12:2) | 251G | |
sm-15 | 1 | 16 (2:4:2) | 23G | |
sm-16 | 1 | 16 (2:4:2) | 46G | |
sm-[01-04,08] | 5 | 16 (2:4:2) | 46G | |
sm-[17-20] | 4 | 24 (2:6:2) | 62G |
Resource | Phoenix | Cinder | |
---|---|---|---|
Nodes | 45 | 52 | |
CPUs | 3648 | 1640 | |
Memory | 22.15T | 9.48T | |
GPUs | 227 | 74 | |
GPU: gtx980 | 4G | 0 | 6 |
GPU: black | 6G | 0 | 6 |
GPU: m60 | 8G | 0 | 62 |
GPU: rtx2080 | 10G | 67 | 0 |
GPU: a10 | 22G | 24 | 0 |
GPU: l4 | 23G | 8 | 0 |
GPU: a5000 | 24G | 64 | 0 |
GPU: rtx6000 | 24G | 16 | 0 |
GPU: a40 | 45G | 16 | 0 |
GPU: l40s | 45G | 24 | 0 |
GPU: a6000 | 48G | 8 | 0 |