Skip to Main Content

The University of Tennessee

Unix Systems Group - HPC

Frequently Used Tools:



High Performance Computing > Main > GettingStarted > BatchQueue


The LSF Batch Queue

The cluster uses the Platform LSF (Load Sharing Facility) batch queue system to distribute computing jobs among the worker nodes of the cluster and to make resource contention decisions. Any computing task which takes over a few minutes to run should be submitted to the batch queue. Basic use of the queue system is detailed in the LSF user's manual, and advanced parallel job use is available in the LSF HPC user's manual.

The cluster has three primary queues and one special use queue:

Primary Queues: short, medium, and long

Nearly all compute nodes belong to these queues. These queues enforce job run times of 2 hours, 24 hours, and unlimited respectively. The short queue has no job slot limits, any user can run as many simultaneous jobs as there are available job slots. The medium queue enforces a maximum job slot limit of 32 for all non-priority users. The long queue allows only as many job slots as the user has priority usage allocation. Only priority cluster users have access to the long queue. These three queues are designed to encourage users to submit to the shortest running queue that they can. This improves response time of the queue system and ensures that each user gets a fair share of the available resources.

Special Queue: compile

The "compile" queue is assigned a dedicated cluster machine which is licensed to run the Intel compiler suite. It contains 8 dedicated slots which are shared with the "testing" queue. This queue should be used for all code compilation that takes longer than a minute or so.

Special Queue: testing

The "testing" queue is designed to enable quick testing of parallel jobs. It has two dedicated machines with a total processor count of 16.

Fairshare

All queues are configured for "Fairshare" scheduling. This scheme takes into account a user's past resource usage and priority share when determining job execution order from the queues. Fairshare is currently configured at the group level (all members of the group have equal priority). We can also set up intra-group fairshare priorities on request.

Special Job Needs

If you require specific worker node properties for your job, you can specify these requirements by using the resource requirement option to 'bsub' (ie. the -R flag). For example, to use all four processor cores on a quad processor node you would use the following command line.

bsub -n 4 -R 'span[ptile=4]' ./job.sh

We have also grouped the compute nodes according to hardware type. Currently there are three hardware configurations: g1850, g1950, and g6850. These names are derived from the Dell server model numbers. To request usage of a specific machine type just use the '-m' bsub option:

bsub -m g1850 ./job.sh

Attachment sort Action Size Date Who Comment
pdf hpc6.2_using.pdf manage 1860.0 K 26 Jul 2007 - 17:03 GeriRagghianti  
pdf lsf6.2_using.pdf manage 504.8 K 09 Jul 2007 - 17:41 GeriRagghianti LSF User Documentation


Topic BatchQueue . { Edit | Attach | Backlinks: Web All webs | Printable | History: r14 < r13 < r12 < r11 < r10 More }

Parents: GettingStarted

This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding USG HPC? Send feedback