Glen Beane
2016-12-02 21:11:28 UTC
Fellow Torque users,
I'm currently experimenting with routing queues on one of my test
clusters. The motivation behind this is I have developed a pipeline tool
that is built on top of Torque, and in some cases we want to run thousands
of samples through a pipeline. This could result in many thousands of jobs
that are submitted to the queue, which would cause me to exceed the
'max_user_queuable' limit that has been imposed on our production
clusters. I want to submit these to a routing queue and have some fraction
of those jobs sent to an execution queue.
My dev cluster is running Torque 6.0.1 and Moab 9.0.1
I've been testing out some routing queues with the pipeline software, and
am seeing what I feel is strange behavior. First, it seems my jobs are
routed from the routing queue into the execution queue LIFO. I expected it
to be FIFO. Second, it does not seem to take job dependencies into
account -- I had a bunch of jobs that depended on a previously submitted
job, but since they were getting routed last in first out, those jobs
made it to the execution queue only to be held waiting on a previously
submitted job that was still in the routing queue. At one point i filled
up my "max_user_queuable" limit with jobs that all had a dependency still
in the routing queue -- then none of my jobs could run.
I would expect a routing queue to route FIFO, and it would be better if
Torque could somehow take dependencies into consideration -- don't route
jobs still waiting on dependencies when there are other eligible to run
jobs that could be moved into an execution queue instead.
Any suggestions?
I'm currently experimenting with routing queues on one of my test
clusters. The motivation behind this is I have developed a pipeline tool
that is built on top of Torque, and in some cases we want to run thousands
of samples through a pipeline. This could result in many thousands of jobs
that are submitted to the queue, which would cause me to exceed the
'max_user_queuable' limit that has been imposed on our production
clusters. I want to submit these to a routing queue and have some fraction
of those jobs sent to an execution queue.
My dev cluster is running Torque 6.0.1 and Moab 9.0.1
I've been testing out some routing queues with the pipeline software, and
am seeing what I feel is strange behavior. First, it seems my jobs are
routed from the routing queue into the execution queue LIFO. I expected it
to be FIFO. Second, it does not seem to take job dependencies into
account -- I had a bunch of jobs that depended on a previously submitted
job, but since they were getting routed last in first out, those jobs
made it to the execution queue only to be held waiting on a previously
submitted job that was still in the routing queue. At one point i filled
up my "max_user_queuable" limit with jobs that all had a dependency still
in the routing queue -- then none of my jobs could run.
I would expect a routing queue to route FIFO, and it would be better if
Torque could somehow take dependencies into consideration -- don't route
jobs still waiting on dependencies when there are other eligible to run
jobs that could be moved into an execution queue instead.
Any suggestions?