Christopher Pierce
2016-10-09 18:18:18 UTC
Hello All,
I am the maintainer of a small cluster (20 nodes) used by my computation
physics group. I use torque as our job scheduler and have run into an
interesting problem. We typically have either long term jobs that use a
single node or short term jobs that are embarrassingly parallel and
should use up as many nodes as possible. When a user requests resources
for a short term job they will check the number of available nodes and
adjust their requested node count to match that number. This becomes a
problem when long term jobs are pushed to the cluster and the short term
jobs sit in the queue because one of the nodes is now unavailable. I
was wondering if there was any mechanism for a user to submit a job that
requests a range of values for resources. IE a job that will run on as
many nodes as are available constrained to a certain range.
Thank You,
Chris Pierce
Center for Computation Nano-Science, WPI
I am the maintainer of a small cluster (20 nodes) used by my computation
physics group. I use torque as our job scheduler and have run into an
interesting problem. We typically have either long term jobs that use a
single node or short term jobs that are embarrassingly parallel and
should use up as many nodes as possible. When a user requests resources
for a short term job they will check the number of available nodes and
adjust their requested node count to match that number. This becomes a
problem when long term jobs are pushed to the cluster and the short term
jobs sit in the queue because one of the nodes is now unavailable. I
was wondering if there was any mechanism for a user to submit a job that
requests a range of values for resources. IE a job that will run on as
many nodes as are available constrained to a certain range.
Thank You,
Chris Pierce
Center for Computation Nano-Science, WPI