UK National HPC Service

Computer Services for Academic Research Logo
Home | Helpdesk | Machine Status | Search | Apply
 

CSAR Job Scheduling Policies

Introduction

There are a number of factors influencing the order in which jobs are scheduled to be run. The basic principles in scheduling are common to all CSAR machines, although there are some specific differences (noted in the linked web pages). This priority depends mostly on the share of resources allocated to the consortium or project, and this is managed with a locally written fair-share scheduler. Thus if project A has double the resource of project B, their priorities should remain the same if project A uses double the resource of project B, and if both projects have the same share, the one which has used fewer resources will have a higher priority. This is usually the explanation when a recently submitted job starts running before jobs that have been queuing for some time.

Factors

In practice resources are never used uniformly throughout the life of the project, so other factors are taken into account:

  • Preference is given to 'large' jobs at specific times (usually overnight and at weekends) so they do not have to wait until all smaller jobs have finished. Note that Capability Incentives are given to encourage the use of large jobs. More information is given below.
  • Projects which have run a large amount of work 'recently' will have their priority lowered to allow jobs from other projects to be run.
  • Jobs from a given project may increase or decrease their priority in relation to other jobs under the same project by including the -q high or -q low option when the job is submitted. (Note you cannot influence the priority of jobs in your project with respect to jobs from other projects).
  • It will occasionally be necessary to rundown the queuing system, either to empty the machine of all jobs, so that for example scheduled maintenance can be performed, or partially to allow large jobs to be run.
    • In such cases, any short jobs that can be run will be, while long jobs must wait until after the rundown has completed. This point emphasises the importance of specifying resource requests accurately - if you know your job requires only 3 hours to run, it may well be scheduled earlier if you request only 3 hours.
    • When jobs are held in this way they will be identified in qs as being in a state of Big Job Rundown.
    • The current policy is to schedule rundowns for larger jobs on Tuesday and Thursday nights and over the weekend from Friday evening. On Newton very large jobs are run at the weekends. There may however be other occasions when a few jobs are temporarily held to allow a large job to be started depending on the actual job mixes and sizes at the time.
  • Capacity planning - all project PIs are encouraged to keep their capacity plans up to date. As these plans contain information related to resource usage, they may influence the order in which jobs are scheduled.
  • Advance reservations - it is anticipated that there will be an increased demand for specific resources at specific times, particularly as work is coupled across different systems. If such reservations have been made, the queues will be rundown as required as mentioned above, and jobs scheduled to fit in with this. At present such reservations are carried out manually on request, but automated mechanisms are being developed. Please contact CSAR if you need this facility.

Additional system specific information about queue limits and related matters is provided for the SGI Altix (Newton) and for the SGI Origin systems (Wren, Fermat and Green).

Overuse of resources

Users who exceed their resource allocations, or whose projects have run out of resource, will be unable to run batch work. They will be informed on the command line that this is the case.

Page maintained by This page last updated: Tuesday, 06-Sep-2005 11:33:40 BST