Josep Guerrero
2011-04-22 07:04:14 UTC
Hello,
I'm using torque in a small 8 node cluster. Up to now I was using version
2.4.3. Some users need to run their jobs in specific nodes, so they used to
send their jobs like this:
qsub -q l0 -l nodes=hidra2 tmp/prova.sh
(where l0 is the name of the queue, and nodes are named hidra0,
hidra1,..,hidra7) and it worked. Recently I upgraded to torque 2.5.5 and now
this doesn't work anymore. If I try to run in a specific node, I always get the
same error:
qsub: Job exceeds queue resource limits MSG=cannot satisfy queue max nodes
requirement
but I can run jobs normally if I don't specify the node, even if the
destination node ends up being the same.
I discovered I get the same error if I write some random gibberish after
"nodes="
hidra0:~> qsub -q l0 -l nodes=adfafda tmp/prova.sh
qsub: Job exceeds queue resource limits MSG=cannot satisfy queue max nodes
requirement
and that if going back to 2.4.3, the "nodes=" option worked again. I "straced"
qsub, but I wasn't able to obtain any help (the only strace output that may be
related is that qsub opened at several points the /etc/hosts file. Nodes are
listed there, too).
I've searched torque manual and the list, but I've found no reference to this
problem. I've found this warning, but I'm not sure if it may have any
relationship to the error:
========
Versions of TORQUE earlier than 2.4.5 attempted to apply queue and server
defaults to a job that didn't have defaults specified. If a setting still did
not have a value after that, TORQUE applied the queue and server maximum
values to a job (meaning, the maximum values for an applicable setting were
applied to jobs that had no specified or default value).
In TORQUE 2.4.5 and later, the queue and server maximum values are no longer
used as a value for missing settings.
========
Does someone know what may be the problem, and if there is any way to solve
it?
Thanks!
Josep Guerrero
I'm using torque in a small 8 node cluster. Up to now I was using version
2.4.3. Some users need to run their jobs in specific nodes, so they used to
send their jobs like this:
qsub -q l0 -l nodes=hidra2 tmp/prova.sh
(where l0 is the name of the queue, and nodes are named hidra0,
hidra1,..,hidra7) and it worked. Recently I upgraded to torque 2.5.5 and now
this doesn't work anymore. If I try to run in a specific node, I always get the
same error:
qsub: Job exceeds queue resource limits MSG=cannot satisfy queue max nodes
requirement
but I can run jobs normally if I don't specify the node, even if the
destination node ends up being the same.
I discovered I get the same error if I write some random gibberish after
"nodes="
hidra0:~> qsub -q l0 -l nodes=adfafda tmp/prova.sh
qsub: Job exceeds queue resource limits MSG=cannot satisfy queue max nodes
requirement
and that if going back to 2.4.3, the "nodes=" option worked again. I "straced"
qsub, but I wasn't able to obtain any help (the only strace output that may be
related is that qsub opened at several points the /etc/hosts file. Nodes are
listed there, too).
I've searched torque manual and the list, but I've found no reference to this
problem. I've found this warning, but I'm not sure if it may have any
relationship to the error:
========
Versions of TORQUE earlier than 2.4.5 attempted to apply queue and server
defaults to a job that didn't have defaults specified. If a setting still did
not have a value after that, TORQUE applied the queue and server maximum
values to a job (meaning, the maximum values for an applicable setting were
applied to jobs that had no specified or default value).
In TORQUE 2.4.5 and later, the queue and server maximum values are no longer
used as a value for missing settings.
========
Does someone know what may be the problem, and if there is any way to solve
it?
Thanks!
Josep Guerrero