John Griffin-Wiesner
2017-02-15 22:19:56 UTC
Since updating to torque 6.1 we've had the moms on our compute
nodes frequently getting into a bad state. The nodes will show
up as down in torque. There will be a load of 1 even though
nothing is going on, and there will be some job files from
finished jobs left in /var/spool/torque/aux and
/var/spool/torque/mom_priv/jobs. The mom daemon will have to be
killed with -9.
A regular 'service pbs_mom stop' sometimes gives:
kERROR: cannot shutdown mom daemon on localhost (errno=97-Address family not supported by protocol)
Anyone else seen this?
nodes frequently getting into a bad state. The nodes will show
up as down in torque. There will be a load of 1 even though
nothing is going on, and there will be some job files from
finished jobs left in /var/spool/torque/aux and
/var/spool/torque/mom_priv/jobs. The mom daemon will have to be
killed with -9.
A regular 'service pbs_mom stop' sometimes gives:
kERROR: cannot shutdown mom daemon on localhost (errno=97-Address family not supported by protocol)
Anyone else seen this?
--
John Griffin-Wiesner
HPC Systems Administrator
Minnesota Supercomputing Institute
http://www.msi.umn.edu
***@msi.umn.edu
John Griffin-Wiesner
HPC Systems Administrator
Minnesota Supercomputing Institute
http://www.msi.umn.edu
***@msi.umn.edu