Submitting Jobs with LSF
bsub
The batch scripts are given to LSF with the command
bsub. This is achieved with "bsub < scriptname" . Here we
describe the various useful options to bsub. These options can be added
to the command line or placed at the top of the script (the option needs
to be prefixed with "#BSUB "). Options placed on the command
line can be used to override those specified in the file.
The most useful two options are -W and
-n . The -n options allows you request a certain
number of processors. The -W option requests a certain
amount of wall clock time, this means that your job will automatically
finish after that amount of time is used up if it has not already finished.
The -c option is similar to -W in that it
is a way of restricting the amount of time your job runs for, however
-c is the total amount of CPU time used. Both -W
and -c are measured in minutes.
You will also need to specify where your job runs
and this can be achieved with the -q and -m
options. The -m option specifies which machine to run on
such as green. The -q option tells it which queue to run
on for that particular machine.
The output from the batch job (the stderr and stdout)
can be controlled with the -e and -o options.
The -e options specifies where the stderr should be put
and the -o option specifies where the stdout should be
put. If neither option is specified the output should be emailed to
you, if only the -o option is specified the stdout and
stderr are merged into the specified file. It is also useful to put
"%J" at the end of the filename as this will create a unique file
for each job output.
The -J option is used to give your job
a name which is useful to identify which of your jobs are running when
using some of the LSF monitoring commands.
There a few other useful options such us -w
which creates a dependency expression for your batch jobs. Often you
may wish to chain jobs so that the jobs run in turn. The -w
option takes a jobid and possibly a condition such as specifying that
the job has exited or done. See the examples further on for instances
of this in operation.
There are a number of options to allow the job to
email you when it reaches various stages.
You will also find more flags and more information
on the flags described here in the bsub manpage.
Summary of BSUB Commands
BSUB Command |
Meaning |
Notes |
-B |
Sends an email when the job is dispatched and begins execution.
|
NB Do not specify an email address after this option as it will
not work. The email address is taken from the .forward file. |
-b begin_time |
Specifies the earliest date or time that the job can be dispatched. |
- |
-c cpu_time |
Limits the total CPU time the job can use (hour:minute) |
This time is minutes by default. |
-e file_name |
Writes the stderr to "file_name" |
In LSF this cannot be combined with the -o . See -o
for more details. |
-J job_name |
Assigns the specified name to the job |
This can also be used to set up job arrays. See later notes for
details. |
-m host_name |
Specifies the host to run the job on |
- |
-n min_proc[,max_proc] |
Specifies the number of processors to run the job on. |
In LSF, if one number is specified the job will run on that number
of processors. If two are specified it will run on any number of
processors between the two numbers. |
-o file_name |
Writes the stdout to file_name. |
In LSF, if -o is specified and -e isn't,
then the stdout and stderr are automatically written to the same
file. |
-q queue_name |
Submits the job to the specified queues. |
gnormal is no longer required for submitting jobs to Green. Instead
use the -m green option together with -q normal . |
-W runtime |
Sets the run limit of the job. |
This is not a direct equivalence as the time is not cpu time.
However, it is a per process run time that ignores time spent whilst
the job is suspended. |
-w expresion |
Sets a dependency expression. |
This is not a direct equivalence as the time is not cpu time.
However, it is a per process run time that ignores time spent whilst
the job is suspended. |
-R string |
Sets a resource request. |
Sometimes you may wish to make a more specific request to LSF
such as ensuring all your processors appear on one host (useful
on Newton). |
Example 1
The following script can be submitted which will run a.out on 8
processors asking for one hour of green and writing the output to a
file called output. It should be submitted using bsub < scriptname
#BSUB -n 8
#BSUB -W 1:00
#BSUB -m green
#BSUB -q normal
#BSUB -o output
mpirun -np 8 ./a.out
Example 2
The following is an equivalent script that can be submitted to newton
passing the options as command line arguments -
bsub -n 8 -W 1:00 -m newton -q normal -o output scriptname
Only the mpirun line then needs to appear in the file.
Example 3
The following batch script sets a name up for the job (with -J ) and
we can use the -w option to submit the file with a dependency expression
that the previous job completed normally.
#BSUB -n 32
#BSUB -W 20
#BSUB -J myjob
#BSUB -m green
#BSUB -q normal
#BSUB -o output.%J
mpirun -np 8 ./a.out
We can then run bsub < scriptname which will submit the
job and perhaps return a numeric job id, lets say 1234. We can use this
to ensure a second submission does not start until job number 1234 has
completed by using bsub -w 1234 < scriptname .
If you want your next job to start even if the previous one failed
then you can use bsub -w 'ended(1234)' < scriptname .
|