Fernando Silva
2017-01-12 14:52:51 UTC
Hi Everyone, Torque/Moab user here.
First off, we are noticing that every once in awhile OOM kicks in and
will kill a job or process on our compute nodes, how is this possible
or preventable, torque should not allow to over commit mem/job to a
node.
On another note, ideally we would like the compute nodes to never use
swap, unfortunately it seems you have to submit vmem with your
submissions. It's my understanding that vmem is actually referencing
virtual address spacing which will include the swap, correct?. Tips on
preventing jobs touching swap? That's probably a step in the right
direction stopping OOM killer.
Thanks.
________________________________
This e-mail may contain confidential, personal and/or health information(information which may be subject to legal restrictions on use, retention and/or disclosure) for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this e-mail in error, please contact the sender and delete all copies.
First off, we are noticing that every once in awhile OOM kicks in and
will kill a job or process on our compute nodes, how is this possible
or preventable, torque should not allow to over commit mem/job to a
node.
On another note, ideally we would like the compute nodes to never use
swap, unfortunately it seems you have to submit vmem with your
submissions. It's my understanding that vmem is actually referencing
virtual address spacing which will include the swap, correct?. Tips on
preventing jobs touching swap? That's probably a step in the right
direction stopping OOM killer.
Thanks.
________________________________
This e-mail may contain confidential, personal and/or health information(information which may be subject to legal restrictions on use, retention and/or disclosure) for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this e-mail in error, please contact the sender and delete all copies.