LTM stands for LACE Task Manager. LTM scheduals tasks across multiple nodes of a computing cluster. A node may consist of one or more processors. LTM allows one to request an integral number of nodes for an integral number of minutes. Multiple copies of LTM may be started, one for each queue. LTM queues manage mutually exclusive groups of nodes. A default queue may be defined by the environment variable LTMQUEUE.
A user must start a supervisor before they can use ltm. A supervisor may be started with the command ltmsuper. the command:
ltmsuperwill start a supervisor for the default queue. A supervisor for a non-default queue may be started with the command:
ltmsuper -q ts01The above command starts a supervisor for the queue ts01.
A supervisor is persistent. That is, once it is started it will run until it is explicitely terminated. Logging out will not kill it. If one attempts to start it while it is running, the attempt will be ignored. Therefore it is convenient to start it by putting the ltmsuper command(s) in '.bash_profile'.
An LTM sesion consists of requesting nodes, running commands, and returning nodes. To request nodes, one uses the command ltmbegin. The command:
ltmbegin -n8will allocate to the user 8 nodes from the default queue for 10 minutes. The command:
ltmbegin -n8 -m60 -q js21will allocate to the user 8 nodes from queue js21 for 60 minutes. To return nodes when one is finished, the command ltmend is used. If the nodes were requested from the default queue, the command:
ltmendis sufficent. If, however, the nodes were requested from a non-default queue, such as js21, the command:
ltmend -q js21must be used. The following is a complete LTM session.
ltmbegin -n8 -m60 -q js21 ltmend -q js21But it does not do anything. A simple command to do something on all nodes might be:
ltmsh -q js21 uptimeThis command will run the 'uptime' command on each node allocated from the queue, js21. Therefore, a session that will do something might be:
ltmbegin -n8 -m60 -q js21 ltmsh -q js21 uptime ltmend -q js21Note that option '-q js21' must be used on all LTM commands if js21 was not the default queue. If it was the default queue, the session could be:
ltmbegin -n8 -m60 ltmsh uptime ltmendThe 'ltmsh' command may be used to start user developed programs as well.
As a reminder, the user may reset the default queue with the command:
export LTMQUEUE=js21if the new default will be the queue js21. This command may be placed in '.bash_profile' or '.bashrc'.
By default LTM is set up to run one session per shell task. If one wanted to allocate 2 groups of nodes to run a different set of commands at the same time. The following script:
ltmbegin -n8 -m60 ltmbegin -n8 -m60 ltmsh usercommand2 ltmsh usercommand1 ltmend ltmendwill not work. A remedy for this is to give each session a user defined name. In the following script:
ltmbegin -S task1 -n8 -m60 ltmbegin -S task2 -n8 -m60 ltmsh -S task2 usercommand2 ltmsh -S task1 usercommand1 ltmend -S task1 ltmend -S task2two groups of 8 nodes are allocated. One session, task1, will runs 8 instances of 'usercommand1' and another session, task2, will runs 8 instances of 'usercommand2'. The commands 'usercommand1' and 'usercommand2' will run durring the same period of time.
The command 'ltmsh' will run one instance of a job on each node specified. For example, the command:
ltmsh uptimewill run one instance of the comand 'uptime' on each node allocated for the session. While the command:
ltmsh -n4 uptimewill run one instance of the comand 'uptime' on the first 4 nodes of the session. And the command:
ltmsh -s2n5 uptimewill run one instance of the comand 'uptime' on 5 nodes of the session, starting at the third node. Note the first node of a session is refered to by s0. so the following two commands are equivalent.
ltmsh -s0n3 uptime ltmsh -n3 uptimeNote again that 'ltmsh' only starts one instance of a command on each node.
To start more than one instance of a command on each node, one can use the command 'ltmjobs'. One can specify how many instance of a command to start. The command:
ltmjobs -t20 usercommandwill start 20 instances of 'usercommand' across the number of requested nodes as evenly as possible. While the command:
ltmjobs -p2 usercommandwill start 2 instances of 'usercommand' on each of the requested nodes. Each instance of 'usercommand' will run in its own environment. Each instance will contain a variable 'ltmtasks' which will contain a value equal to the number of instances that the 'ltmjobs' command started. Each environment will also contain a uniquely valued variable ltmtask. Its value will contain a value between 0 and ltmtasks-1. In this way each instance of 'usercommand' can know how many instance of itself were started when it was started and which of those instances it is.
For example, the session:
ltmbegin -n5 -m60 ltmjobs -t20 usercommand ltmendwill start 20 instances of 'usercommand', 4 per node. While the session:
ltmbegin -n6 -m60 ltmjobs -p3 usercommand ltmendwill start 18 instances of 'usercommand', 3 per node.
'ltmjob' is used in conjunction with 'ltmwait'. 'ltmjob' allows one to start a command on a specific node. The command:
ltmjob -s7 usercommand1starts the command 'usercommand1' on the 8th requested node (note: nodes are numbered starting with 0). The command:
ltmjob usercommand2will start 'usercommand2' on the next node after the last node a command was started on. This allows a series of different commands to be started on different nodes. 'ltmwait' is used to wait for commands started by 'ltmjob'. 'ltmwait' can wait for the nth task, all tasks or any task.
To wait for any task, use:
ltmwaitto wait for all tasks, use:
ltmwait -aor to wait for the 6th task, use:
ltmwait -t5
These two LTM commands can be used to set up a list of tasks, for which only a fixed number of tasks are running at the same time. This facilitates load balancing when each task takes different amount of time.
The following script submits a steady stream of tasks to a fixed number of nodes until all the tasks are complete.
ltmbegin -n4 i=0 while [ $i -lt 8 ] ; do ltmjob usercommand $i i=`expr $i + 1` done while [ $i -lt 50 ] ; do ltmwait ltmjob usercommand $i i=`expr $i + 1` done ltmwait -a ltmendThe first 'while' loop starts 8 tasks on 4 nodes. Each usercommand is given one argument, a task counter $i. The second loop waits for a task to finish and then starts a new task on the node from which the previous task finished. Once the loop finishes the submittion of the rest of the 50 tasks the script waits for all the tasks to finish.
'ltmpi' is like 'ltmjobs' except the program it is executing must be a MPI program. The script:
ltmbegin -n10 ltmpi -p1 mpi_program ltmendwill run 10 mpi jobs, 'mpi_program', on 10 nodes. The default number of jobs per node is 2. So the script:
ltmbegin -n10 ltmpi mpi_program ltmendwill run 20 mpi tasks on 10 nodes.
'ltmcp' allows one to copy a file to all requested nodes. The command:
ltmcp myfile .will copy the file 'myfile' to the users home directory on every node.
'ltmbatch' allows one to start a script that contains LTM session commands so that the user can logout and the script will continue to run. The command:
ltmbatch 'canned_scriptwill run a script 'canned_script' even if the user logs out.
'ltmnodes' list the nodes of an LTM session The command:
ltmnodes -S myseswill list the nodes of LTM session, myses.
'ltmload' prints the processing load of an LTM session The command:
ltmload -S myseswill print processing load of LTM session, myses.
'ltmreport' prints the queue usage. The command:
ltmreportwill prints the queue usage of the user for the month.
'ltmkill' allows one to kill an LTM session by task ID. The command:
ltmkill 345will kill the LTM session with the task ID (tid) 345. The task ID of a session can be found out by using the command 'ltmstate'.
'ltmstate' prints the state of an LTM queue. The state of a queue consists primarily of the priorities of the queue, health of the queue, and the state of the sessions waiting in the queue. The command:
ltmstate -q js20will print the state of the LTM queue, js20.
See the following man pages for more information.
ltmbegin, ltmend, ltmsh, ltmcp, ltmjobs, ltmjob, ltmwait, ltmpi
Other relevant commands are:
ltmstate, ltmkill, ltmreport, ltmnodes, ltmload, ltmbatch