Discussion:
[torqueusers] pbs_mom + NFS problems starting
scoggins
2006-11-15 22:24:27 UTC
Permalink
I am trying to start pbs_mom with the SPOOLDIR NFS mounted. I have
include a -d option poiniting to a different location for each node
and I am constantly getting this error:

pbs_mom -d /var/spool/torque/node0000

returns the error:

-bash-3.00# pwd
/var/spool/torque/node0000/mom_priv
-bash-3.00# -bash-3.00# ls -l
total 12
-rw-r--r-- 1 root root 139 Jul 15 00:39 config
-rw-r--r-- 1 root root 139 Jul 15 00:28 config.save
drwxr-x--x 2 root root 4096 May 5 2006 jobs
-bash-3.00# pbs_mom -d /var/spool/torque/node0000
pbs_mom: No locks available (37) in pbs_mom, cannot lock '/var/spool/
torque/node0000/mom_priv/mom.lock' - another mom running
cannot lock '/var/spool/torque/node0000/mom_priv/mom.lock' - another
mom running

-bash-3.00# df -k | grep torqu
10.0.0.1:/var/spool/torque
9851328 796832 8554080 9% /var/spool/
torque


Node and master are running torque-2.0.0p4 which I know I need to
update. But I could never get this one to work on this cluster
only. All other clusters worked fine.


Does anyone know what could be the problem?

Thanks

Jackie
Garrick Staples
2006-11-16 02:32:55 UTC
Permalink
Post by scoggins
-bash-3.00# pbs_mom -d /var/spool/torque/node0000
pbs_mom: No locks available (37) in pbs_mom, cannot lock '/var/spool/
Looks like locking isn't working over NFS. Is lockd running on the
server and clients?
Jackie Scoggins
2006-11-16 03:35:15 UTC
Permalink
Yes. lockd is running on the nodes. Once I ran a debug on this and I remember it had something to do with flock. I am not familiar with the details of how this all work but it appears to create the file before pbs actually starts. Then pbs_mom tries to start and complains that the file already exist. I just installed the 2.1.6 version of torque and I am still having this problem. Is there something I can run to debug this better? What is interesting is I see nfslock in /var/lock/subsys on the nodes but not on the master. What could that be from?


Thanks

Jackie

----- Original Message -----
From: Garrick Staples <***@clusterresources.com>
Date: Wednesday, November 15, 2006 7:33 pm
Subject: Re: [torqueusers] pbs_mom + NFS problems starting
Post by scoggins
Post by scoggins
-bash-3.00# pbs_mom -d /var/spool/torque/node0000
pbs_mom: No locks available (37) in pbs_mom, cannot lock
'/var/spool/
Looks like locking isn't working over NFS. Is lockd running on the
server and clients?
_______________________________________________
torqueusers mailing list
http://www.supercluster.org/mailman/listinfo/torqueusers
Jackie Scoggins
2006-11-16 03:56:35 UTC
Permalink
Thanks. I found it. nfslock was turned off on the master node. All the nodes were fine. Someone chkconfig'd nfslock off on the master.

Again, thanks

Jackie


----- Original Message -----
From: Garrick Staples <***@clusterresources.com>
Date: Wednesday, November 15, 2006 7:33 pm
Subject: Re: [torqueusers] pbs_mom + NFS problems starting
Post by scoggins
Post by scoggins
-bash-3.00# pbs_mom -d /var/spool/torque/node0000
pbs_mom: No locks available (37) in pbs_mom, cannot lock
'/var/spool/
Looks like locking isn't working over NFS. Is lockd running on the
server and clients?
_______________________________________________
torqueusers mailing list
http://www.supercluster.org/mailman/listinfo/torqueusers
Loading...