The LSF Batch Queue
The cluster uses the Platform LSF (Load Sharing Facility) batch queue system to distribute computing jobs among the worker nodes of the cluster and to make resource contention decisions. Any computing task which takes over a few minutes to run should be submitted to the batch queue. Basic use of the queue system is detailed in the
LSF user's manual, and advanced parallel job use is available in the
LSF HPC user's manual.
The cluster has three primary queues and one special use queue:
Primary Queues: short, medium, and long
Nearly all compute nodes belong to these queues. These queues enforce job run times of 2 hours, 24 hours, and unlimited respectively. The short queue has no job slot limits, any user can run as many simultaneous jobs as there are available job slots. The medium queue enforces a maximum job slot limit of 32 for all non-priority users. The long queue allows only as many job slots as the user has priority usage allocation. Only priority cluster users have access to the long queue. These three queues are designed to encourage users to submit to the shortest running queue that they can. This improves response time of the queue system and ensures that each user gets a fair share of the available resources.
Special Queue: compile
The "compile" queue is assigned a dedicated cluster machine which is licensed to run the Intel compiler suite. It contains 8 dedicated slots which are shared with the "testing" queue. This queue should be used for all code compilation that takes longer than a minute or so.
Special Queue: testing
The "testing" queue is designed to enable quick testing of parallel jobs. It has two dedicated machines with a total processor count of 16.
Fairshare
All queues are configured for "Fairshare" scheduling. This scheme takes into account a user's past resource usage and priority share when determining job execution order from the queues. Fairshare is currently configured at the group level (all members of the group have equal priority). We can also set up intra-group fairshare priorities on request.
Special Job Needs
If you require specific worker node properties for your job, you can specify these requirements by using the resource requirement option to 'bsub' (ie. the -R flag). For example, to use all four processor cores on a quad processor node you would use the following command line.
bsub -n 4 -R 'span[ptile=4]' ./job.sh
We have also grouped the compute nodes according to hardware type. Currently there are three hardware configurations: g1850, g1950, and g6850.
These names are derived from the Dell server model numbers. To request usage of a specific machine type just use the '-m' bsub option:
bsub -m g1850 ./job.sh
|
|
Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors. Ideas, requests, problems regarding USG HPC? Send feedback
|