Use TORQUE Resource Manager on Fedora 12

TORQUE is an open source resource manager providing control over batch jobs and distributed compute nodes. It is a community effort based on the original *PBS project and, with more than 1,200 patches, has incorporated significant advances in the areas of scalability, fault tolerance, and feature extensions contributed by NCSA, OSC, USC , the U.S. Dept of Energy, Sandia, PNNL, U of Buffalo, TeraGrid, and many other leading edge HPC organizations.[1]

1. Install TORQUE packages

$sudo yum install torque* libtorque

Configure TORQUE on the server

$cd /usr/share/doc/torque-2.1.10/
$sudo vi torque.setup

change the lines

qmgr -c 'create queue batch'
qmgr -c 'set queue batch queue_type = execution'
qmgr -c 'set queue batch started = true'
qmgr -c 'set queue batch enabled = true'
qmgr -c 'set queue batch resources_default.walltime = 1:00:00'
qmgr -c 'set queue batch resources_default.nodes = 1'

qmgr -c 'set server default_queue = batch'

as

qmgr -c 'create queue batch'
qmgr -c 'set queue batch queue_type = execution'
qmgr -c 'set queue batch started = true'
qmgr -c 'set queue batch enabled = true'

qmgr -c 'set queue batch resources_default.walltime = 72:00:00'
# walltime = 72:00:00 means that every job has 72 hours to execute as default

qmgr -c 'set queue batch resources_default.nodes = 1'

qmgr -c 'set queue batch max_running = 2'
# max_running = 2 means there are two jobs running at any time

qmgr -c 'set queue batch max_user_run = 5'
# max_user_run meas there are five jobs in the queue

qmgr -c 'set server default_queue = batch'

then execute it as

$sudo ./torque.setup root

for root as the administrator.

3. setting the server nodes
the default TORQUE configuration folder on Fedora 12 is /var/torque
make a file server_priv/nodes like this

node01 np=2

node01 is your hostname, np=2 means 2 processors on the node

4. Initialize/Configure TORQUE on Each Compute Node
make a file mom_priv/torque.cfg like this

$pbsserver localhost # note: hostname running pbs_server
$logevent 255 # bitmap of which events to log

5. Start the daemon service

$sudo chkconfig pbs_mom on
$sudo chkconfig pbs_sched on
$sudo chkconfig pbs_server on

6. Test service configuration
verify all nodes are correctly reporting

$pbsnodes -a

view additional service configuration

$qmgr -c 'p s'

Finally, you finish the settings so that you want to work on it. Submitting a job in the queue is to use command qsub

$qsub batchjob

the batchjob is a file containing some settings and command lines.

However, this is a simple configuration to use TORQUE on Fedora 12. A detailed configuration is on the site clusterresources.com

References
[1] http://www.clusterresources.com/products/torque-resource-manager.php
[2] ClusterResources. TORQUE Administrator's Guide. v2.3

   Send article as PDF   

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.