Humboldt-Universität zu Berlin - Mathematisch-Naturwissen­schaft­liche Fakultät - Institut für Physik

PBS Summary

Batch jobs are managed by Torque, a free version of PBS and successor of OpenPBS, in connection with the Maui scheduler.

A PBS server treats all associated machines as nodes of one big cluster and uses queues to allocate their resources.
In the simplest case, there is just one default queue ("defq") defined.
Use the pbsnodes and qstat commands below to show the configuration in detail. Submit jobs with qsub.

Node info

      pbsnodes -a             # show status of all nodes
      pbsnodes -a <node>      # show status of specified node
      pbsnodes -l             # list inactive nodes

      pbsnodelist             # list status of all nodes (one per line)
      pbsnodelist -h          # help
      pbssummary              # cluster usage (table)
      pbssummary -h           # help

Queue info

      qstat -Q                      # show all queues
      qstat -Q <queue>              # show status of specified queue
      qstat -f -Q <queue>           # show full info for specified queue
      qstat -q                      # show all queues (alternative format)
      qstat -q <queue>              # show status of specified queue (alt.)

      pbsuserlist                   # list node and core usage per user
      pbsuserlist -h                # help      

Job submission and monitoring

      qsub <jobscript>              # submit to default queue
      qsub -q <queue> <jobscript>   # submit to specified queue

      other useful qsub options:
           -N <jobname>             # set job name (default: name of jobscript)

           -r n                     # don't rerun the job if system fails
                                    #     (default: yes)

           -k oe                    # keep the job's current output and error files
                                    #     in the user's HOME directory
                                    #     (default: deliver them in the end)

           -M <email>               # full email address for notification
                                    #     (firewall: must be inside of HU)
           -m bea                   # send email at beginning, end, abort of job

      qdel <job_no>                 # delete job (with <job_no> from qstat)

      qstat -a                      # show all jobs
      qstat -a <queue>              # show all jobs in specified queue        
      qstat -f <job_no>             # show full info for specified job
      qstat -n                      # show all jobs and the nodes they occupy

      pbsjoblist                    # list jobs with resource usage
      pbsjoblist -h                 # help

Jobscript

The jobscript is a shell script with the commands to run the program, copy files etc.
Above the shell commands, it may contain qsub options in lines starting with #PBS .
Sample script for a serial job

For parallel jobs, the queuing system allocates the processors and passes the information to the mpirun command (OpenMPI). An explicit hostfile is available for illustration.
Sample script for a parallel job

(Under MPICH-1, the machine file and the node count have to be passed to mpirun explicitly).

Note that the batch system does not perform an interactive login, but only starts the job script with a remote shell command. This means that the job does not "see" your full interactive environment. You may have to extend the PATH explicitly or specify commands with full pathname. As an illustration, run a job with the command "set" only - it writes the environment (as seen by the job script) to the output file.

Node properties

Possible values are defined in the PBS server's node file (/var/spool/pbs/server_priv/nodes)
and can be listed with the command pbsnodes. A more readable summary of the nodes and their properties is obtained with pbsnodelist.

A dual processor (or dual core) node is marked  np = 2 .

If the batch systems extends over systems with different architectures, you may select the type as

      i386        32bit PC architecture, also called "ia32"(Intel), "x86"(AMD)
      amd64       64bit extension, introduced by AMD, also called "x86-64"

The following property names specify the type of CPU:

      P450        Pentium III / 450MHz
      P1000       Pentium III / 1000MHz
      P1700       Pentium 4 / 1700MHz
      P2600       Pentium 4 / 2600-2800MHz
      P3200       Pentium 4 / 3200MHz

      C1860       Core2 / 1860MHz 
      C2200       Core2 / 2200MHz
      C2660       Core2 / 2660MHz
      X2400       Xeon / 2.4GHz
      X3200       Xeon / 3.2GHz

      A1800       Athlon 1800+    / 1300MHz
      A2000       Athlon XP 2000+ / 1670MHz
      A3100       Athlon 3100+    / 2166MHz
      A3200       Athlon 64 3200+ / 2000MHz

      O2400       Opteron 250 / 2400MHz

      dual        dual core
      quad        quad core

In the absense of any node specification, the batch system assumes that one CPU is required and tries to find a free node to start the job.

If you want the job to run on a particular node, it may be addressed by name:

      qsub -l nodes=pool15 ...         # run on node "pool15"

For more flexibility, the node(s) can be specified by (a list of) properties:

      qsub -l nodes=P3200 ...          # run on any node with P4/3200MHz
      qsub -l nodes=4:amd64 ...        # run on four 64bit nodes
      qsub -l nodes=2:ppn=2:X3200 ...  # run on two dual Xeon nodes
                                       #     (a total of 4 processors)
      qsub -l nodes=2:A3200+4:ppn=2:X2400 ...
                                       # run on 2 Athlon and 4 dual Xeon nodes
                                       #     (a total of 10 processors)
      qsub -l nodes=1:ppn=2:C1860      # run on both cores of a Core2 processor

You may set a time limit (within the queue limit, of course) as follows:

      qsub -l cput=HH:MM:SS <jobscript>       # limit on CPU time (serial job)
      qsub -l walltime=HH:MM:SS <jobscript>   # limit on wallclock time (parallel job)

 


letzte Änderung: B Bunk, 31.10.2013