New subject: [Aspuru-Guzik group list] Fwd: Re: Queue in Odyssey

18 Jul 2014

Okay I know this thread is longer than most people's interest(and I 
promise this is my last email), but Salvatore has done some excellent 
sleuthing and determined the exact formula(including our particular 
parameters) by which our jobs are given priority.  Anyone who runs on 
the cluster should find this extremely relevant, and it can form the 
basis for in-person discussions (as suggested by Tim) on cluster 
refinements for general satisfaction:

-------- Original Message --------
Subject: 	Re: [Aspuru-Guzik group list] Queue in Odyssey
Date: 	Fri, 18 Jul 2014 12:30:18 -0400
From: 	Salvatore Mandrà &lt;salvatore.mandra(a)gmail.com&gt;
To: 	Jarrod &lt;jarrod.mcc(a)gmail.com&gt;

    Backfill functionality is a separate issue from the primary
    scheduler being FIFO(basic) vs fairshare(multifactor), are you able
    to check that as well?

Sure!

You were right, multifactor option is activated:

$  cat /etc/slurm/slurm.conf | grep PriorityType

PriorityType=priority/multifactor

Looking at the documentation, the ranking of a job is defined as:

Job_priority =
(PriorityWeightAge) * (age_factor) +
(PriorityWeightFairshare) * (fair-share_factor) +
(PriorityWeightJobSize) * (job_size_factor) +
(PriorityWeightPartition) * (partition_factor) +
(PriorityWeightQOS) * (QOS_factor)

All of the factors in this formula are floating point numbers that range 
from 0.0 to 1.0.

In our case:

PriorityWeightAge=1000
PriorityWeightFairshare=20000000
PriorityWeightJobSize=0
PriorityWeightPartition=100000000
PriorityWeightQOS=1000000000

where 
(https://computing.llnl.gov/linux/slurm/priority_multifactor.html#mfjppintro)

*Age:* the length of time a job has been waiting in the queue, eligible 
to be scheduled
*Fair-share:* the difference between the portion of the computing 
resource that has been promised and the amount of resources that has 
been consumed
*Job size:* the number of nodes a job is allocated
*Partition:* a factor associated with each node partition
*QOS:* a factor associated with each Quality Of Service

I guess that the job-dependent factors are: age, fair-share and job size 
(while partition and qos factors are jobs independent). As you can see, 
age seems to be not so important and it's dominated by the fair-share 
factor.

---------------------------------------------------------

_Some analysis:_

*Age Factor*

The age factor represents the length of time a job has been sitting in 
the queue and eligible to run. In general, the longer a job waits in the 
queue, the larger its age factor grows. However, the age factor for a 
dependent job will not change while it waits for the job it depends on 
to complete. Also, the age factor will not change when scheduling is 
withheld for a job whose node or time limits exceed the cluster's 
current limits.

At some configurable length of time (PriorityMaxAge), the age factor 
will max out to 1.0.

In our case, *PriorityMaxAge = 7-0*. This means that after 7 days (Am I 
right?), a job get a factor 1.0 in AgeFactor.

*Fair-share Factor*

The fair-share component to a job's priority influences the order in 
which a user's queued jobs are scheduled to run based on the portion of 
the computing resources they have been allocated and the resources their 
jobs have already consumed. The fair-share factor does not involve a 
fixed allotment, whereby a user's access to a machine is cut off once 
that allotment is reached. Instead, *the fair-share factor serves to 
prioritize queued jobs such that those jobs charging accounts that are 
under-serviced are scheduled first, while jobs charging accounts that 
are over-serviced are scheduled when the machine would otherwise go idle*.

SLURM's fair-share factor is a floating point number between 0.0 and 1.0 
that reflects the shares of a computing resource that a user has been 
allocated and the amount of computing resources the user's jobs have 
consumed. The higher the value, the higher is the placement in the queue 
of jobs waiting to be scheduled.

The *computing resource* is currently defined to be computing cycles 
delivered by a machine in the units of *processor*seconds*. Future 
versions of the fair-share factor may additionally include a memory 
integral component.

---------------------------------------------------------

Since the age_factor is really small compared to the fair-share factor, 
it is possible that jobs with a large fair-share factor could be served 
before than older jobs.

Cheers!

S