CSAR Job Scheduling Policies
Introduction
There are a number of factors influencing the order
in which jobs are scheduled to be run. The basic principles in scheduling
are common to all CSAR machines, although there are some specific differences
(noted in the linked web pages). This priority depends mostly on the
share of resources allocated to the consortium or project, and this
is managed with a locally written fair-share scheduler. Thus if project
A has double the resource of project B, their priorities should remain
the same if project A uses double the resource of project B, and if
both projects have the same share, the one which has used fewer resources
will have a higher priority. This is usually the explanation when a
recently submitted job starts running before jobs that have been queuing
for some time.
Factors
In practice resources are never used uniformly throughout
the life of the project, so other factors are taken into account:
- Preference is given to 'large' jobs at specific times (usually overnight
and at weekends) so they do not have to wait until all smaller jobs
have finished. Note that Capability Incentives
are given to encourage the use of large jobs. More information is
given below.
Projects which have run a large amount of work 'recently' will have
their priority lowered to allow jobs from other projects to be run.
- Jobs from a given project may increase or decrease their priority
in relation to other jobs under the same project by including
the
-q high or -q low option when the job
is submitted. (Note you cannot influence the priority of jobs in your
project with respect to jobs from other projects).
- It will occasionally be necessary to rundown the queuing system,
either to empty the machine of all jobs, so that for example scheduled
maintenance can be performed, or partially to allow large jobs to
be run.
- In such cases, any short jobs that can be run will be, while
long jobs must wait until after the rundown has completed. This
point emphasises the importance of specifying resource requests
accurately - if you know your job requires only 3 hours to run,
it may well be scheduled earlier if you request only 3 hours.
- When jobs are held in this way they will be identified in
qs
as being in a state of Big Job Rundown.
- The current policy is to schedule rundowns for larger jobs on
Tuesday and Thursday nights and over the weekend from Friday evening.
On Newton very large jobs are run at the weekends. There may however
be other occasions when a few jobs are temporarily held to allow
a large job to be started depending on the actual job mixes and
sizes at the time.
- Capacity planning - all project PIs are encouraged to keep their
capacity plans up to date. As these plans contain information related
to resource usage, they may influence the order in which jobs are
scheduled.
- Advance reservations - it is anticipated that there will be an
increased demand for specific resources at specific times, particularly
as work is coupled across different systems. If such reservations
have been made, the queues will be rundown as required as mentioned
above, and jobs scheduled to fit in with this. At present such reservations
are carried out manually on request, but automated mechanisms are
being developed. Please contact CSAR if you need this facility.
Additional system specific information about queue
limits and related matters is provided for the SGI
Altix (Newton) and for the SGI
Origin systems (Wren, Fermat and Green).
Overuse of resources
Users who exceed their resource allocations, or whose
projects have run out of resource, will be unable to run batch work.
They will be informed on the command line that this is the case.
|