PBS Summary
Batch jobs are managed by Torque, a free version of PBS and successor of OpenPBS, in connection with the Maui scheduler.
A PBS server treats all associated machines as nodes of one big cluster and uses queues to allocate their resources.
In the simplest case, there is just one default queue ("defq") defined.
Use the pbsnodes and qstat commands below to show the configuration in detail. Submit jobs with qsub.
Node info
pbsnodes -a # show status of all nodes pbsnodes -a <node> # show status of specified node pbsnodes -l # list inactive nodes pbsnodelist # list status of all nodes (one per line) pbsnodelist -h # help pbssummary # cluster usage (table) pbssummary -h # help
Queue info
qstat -Q # show all queues qstat -Q <queue> # show status of specified queue qstat -f -Q <queue> # show full info for specified queue qstat -q # show all queues (alternative format) qstat -q <queue> # show status of specified queue (alt.) pbsuserlist # list node and core usage per user pbsuserlist -h # help
Job submission and monitoring
qsub <jobscript> # submit to default queue qsub -q <queue> <jobscript> # submit to specified queue other useful qsub options: -N <jobname> # set job name (default: name of jobscript) -r n # don't rerun the job if system fails # (default: yes) -k oe # keep the job's current output and error files # in the user's HOME directory # (default: deliver them in the end) -M <email> # full email address for notification # (firewall: must be inside of HU) -m bea # send email at beginning, end, abort of job qdel <job_no> # delete job (with <job_no> from qstat) qstat -a # show all jobs qstat -a <queue> # show all jobs in specified queue qstat -f <job_no> # show full info for specified job qstat -n # show all jobs and the nodes they occupy pbsjoblist # list jobs with resource usage pbsjoblist -h # help
Jobscript
The jobscript is a shell script with the commands to run the program, copy files etc.
Above the shell commands, it may contain qsub options in lines starting with #PBS .
Sample script for a serial job
For parallel jobs, the queuing system allocates the processors and passes the information to the mpirun command (OpenMPI). An explicit hostfile is available for illustration.
Sample script for a parallel job
(Under MPICH-1, the machine file and the node count have to be passed to mpirun explicitly).
Note that the batch system does not perform an interactive login, but only starts the job script with a remote shell command. This means that the job does not "see" your full interactive environment. You may have to extend the PATH explicitly or specify commands with full pathname. As an illustration, run a job with the command "set" only - it writes the environment (as seen by the job script) to the output file.
Node properties
Possible values are defined in the PBS server's node file (/var/spool/pbs/server_priv/nodes)
and can be listed with the command pbsnodes. A more readable summary of the nodes and their properties is obtained with pbsnodelist.
A dual processor (or dual core) node is marked np = 2 .
If the batch systems extends over systems with different architectures, you may select the type as
i386 32bit PC architecture, also called "ia32"(Intel), "x86"(AMD) amd64 64bit extension, introduced by AMD, also called "x86-64"
The following property names specify the type of CPU:
P450 Pentium III / 450MHz P1000 Pentium III / 1000MHz P1700 Pentium 4 / 1700MHz P2600 Pentium 4 / 2600-2800MHz P3200 Pentium 4 / 3200MHz C1860 Core2 / 1860MHz C2200 Core2 / 2200MHz C2660 Core2 / 2660MHz X2400 Xeon / 2.4GHz X3200 Xeon / 3.2GHz A1800 Athlon 1800+ / 1300MHz A2000 Athlon XP 2000+ / 1670MHz A3100 Athlon 3100+ / 2166MHz A3200 Athlon 64 3200+ / 2000MHz O2400 Opteron 250 / 2400MHz dual dual core quad quad core
In the absense of any node specification, the batch system assumes that one CPU is required and tries to find a free node to start the job.
If you want the job to run on a particular node, it may be addressed by name:
qsub -l nodes=pool15 ... # run on node "pool15"
For more flexibility, the node(s) can be specified by (a list of) properties:
qsub -l nodes=P3200 ... # run on any node with P4/3200MHz qsub -l nodes=4:amd64 ... # run on four 64bit nodes qsub -l nodes=2:ppn=2:X3200 ... # run on two dual Xeon nodes # (a total of 4 processors) qsub -l nodes=2:A3200+4:ppn=2:X2400 ... # run on 2 Athlon and 4 dual Xeon nodes # (a total of 10 processors) qsub -l nodes=1:ppn=2:C1860 # run on both cores of a Core2 processor
You may set a time limit (within the queue limit, of course) as follows:
qsub -l cput=HH:MM:SS <jobscript> # limit on CPU time (serial job) qsub -l walltime=HH:MM:SS <jobscript> # limit on wallclock time (parallel job)
letzte Änderung: B Bunk, 31.10.2013