PBS Summary
Batch jobs are managed by Torque, a free version of PBS and successor of OpenPBS, in connection with the Maui scheduler.
A PBS server treats all associated machines as nodes of one big cluster and uses queues to allocate their resources.
In the simplest case, there is just one default queue ("defq") defined.
Use the pbsnodes and qstat commands below to show the configuration in detail. Submit jobs with qsub.
Node info
pbsnodes -a # show status of all nodes
pbsnodes -a <node> # show status of specified node
pbsnodes -l # list inactive nodes
pbsnodelist # list status of all nodes (one per line)
pbsnodelist -h # help
pbssummary # cluster usage (table)
pbssummary -h # help
Queue info
qstat -Q # show all queues
qstat -Q <queue> # show status of specified queue
qstat -f -Q <queue> # show full info for specified queue
qstat -q # show all queues (alternative format)
qstat -q <queue> # show status of specified queue (alt.)
pbsuserlist # list node and core usage per user
pbsuserlist -h # help
Job submission and monitoring
qsub <jobscript> # submit to default queue
qsub -q <queue> <jobscript> # submit to specified queue
other useful qsub options:
-N <jobname> # set job name (default: name of jobscript)
-r n # don't rerun the job if system fails
# (default: yes)
-k oe # keep the job's current output and error files
# in the user's HOME directory
# (default: deliver them in the end)
-M <email> # full email address for notification
# (firewall: must be inside of HU)
-m bea # send email at beginning, end, abort of job
qdel <job_no> # delete job (with <job_no> from qstat)
qstat -a # show all jobs
qstat -a <queue> # show all jobs in specified queue
qstat -f <job_no> # show full info for specified job
qstat -n # show all jobs and the nodes they occupy
pbsjoblist # list jobs with resource usage
pbsjoblist -h # help
Jobscript
The jobscript is a shell script with the commands to run the program, copy files etc.
Above the shell commands, it may contain qsub options in lines starting with #PBS .
Sample script for a serial job
For parallel jobs, the queuing system allocates the processors and passes the information to the mpirun command (OpenMPI). An explicit hostfile is available for illustration.
Sample script for a parallel job
(Under MPICH-1, the machine file and the node count have to be passed to mpirun explicitly).
Note that the batch system does not perform an interactive login, but only starts the job script with a remote shell command. This means that the job does not "see" your full interactive environment. You may have to extend the PATH explicitly or specify commands with full pathname. As an illustration, run a job with the command "set" only - it writes the environment (as seen by the job script) to the output file.
Node properties
Possible values are defined in the PBS server's node file (/var/spool/pbs/server_priv/nodes)
and can be listed with the command pbsnodes. A more readable summary of the nodes and their properties is obtained with pbsnodelist.
A dual processor (or dual core) node is marked np = 2 .
If the batch systems extends over systems with different architectures, you may select the type as
i386 32bit PC architecture, also called "ia32"(Intel), "x86"(AMD)
amd64 64bit extension, introduced by AMD, also called "x86-64"
The following property names specify the type of CPU:
P450 Pentium III / 450MHz
P1000 Pentium III / 1000MHz
P1700 Pentium 4 / 1700MHz
P2600 Pentium 4 / 2600-2800MHz
P3200 Pentium 4 / 3200MHz
C1860 Core2 / 1860MHz
C2200 Core2 / 2200MHz
C2660 Core2 / 2660MHz
X2400 Xeon / 2.4GHz
X3200 Xeon / 3.2GHz
A1800 Athlon 1800+ / 1300MHz
A2000 Athlon XP 2000+ / 1670MHz
A3100 Athlon 3100+ / 2166MHz
A3200 Athlon 64 3200+ / 2000MHz
O2400 Opteron 250 / 2400MHz
dual dual core
quad quad core
In the absense of any node specification, the batch system assumes that one CPU is required and tries to find a free node to start the job.
If you want the job to run on a particular node, it may be addressed by name:
qsub -l nodes=pool15 ... # run on node "pool15"
For more flexibility, the node(s) can be specified by (a list of) properties:
qsub -l nodes=P3200 ... # run on any node with P4/3200MHz
qsub -l nodes=4:amd64 ... # run on four 64bit nodes
qsub -l nodes=2:ppn=2:X3200 ... # run on two dual Xeon nodes
# (a total of 4 processors)
qsub -l nodes=2:A3200+4:ppn=2:X2400 ...
# run on 2 Athlon and 4 dual Xeon nodes
# (a total of 10 processors)
qsub -l nodes=1:ppn=2:C1860 # run on both cores of a Core2 processor
You may set a time limit (within the queue limit, of course) as follows:
qsub -l cput=HH:MM:SS <jobscript> # limit on CPU time (serial job)
qsub -l walltime=HH:MM:SS <jobscript> # limit on wallclock time (parallel job)
letzte Änderung: B Bunk, 31.10.2013