Discussion:
[torqueusers] Cannot connect to specified server host
Yousuke Itoh
2017-03-24 05:13:06 UTC
Permalink
Dear all,


I am using a small cluster where CentOS 6.8 is running on the

job manager and CentOS 6.6 is running on each computing node.

All the nodes (the job manager and the computing nodes) run

torque version 4.2.10.


While everything worked fine last week,

I have got a following connection problem since yesterday


(Below Hongo-Login-v3 is the name of the job manager.)


***@Hongo-Login-v3 ~]$ qstat

Unable to communicate with Hongo-Login-v3(192.168.1.1)

Cannot connect to specified server host 'Hongo-Login-v3'.

Unable to communicate with Hongo-Login-v3(192.168.1.1)

Cannot connect to specified server host 'Hongo-Login-v3'.

Unable to communicate with Hongo-Login-v3(192.168.1.1)

Cannot connect to specified server host 'Hongo-Login-v3'.

qstat: cannot connect to server Hongo-Login-v3 (errno=111) Connection
refused


[***@Hongo-Login-v3 yousuke]# ps aux | grep pbs

root 6673 1.9 7.8 2705276 2575376 ? Dl 13:22 0:48
/usr/local/sbin/pbs_server -d /var/spool/torque

root 6704 0.0 0.0 62276 7704 ? Ss 13:22 0:00
/usr/local/sbin/pbs_sched -d /var/spool/torque

root 11632 0.0 0.0 103332 896 pts/0 S+ 14:04 0:00 grep pbs


[***@Hongo-Login-v3 yousuke]# cat /var/spool/torque/server_name

Hongo-Login-v3


[***@Hongo-Login-v3 yousuke]# pbs-config --version

4.2.10


[***@Yushima-v3-01 ~]$ pbs-config --version

4.2.10


(Yushima-v3-01 is the name of the computing node 1)


This happens even after rebooting the whole cluster.


I really appreciate your help or any suggestions.


Best regards,
Yousuke
Gustavo Correa
2018-02-27 01:33:29 UTC
Permalink
Check if the pbs_mom service daemon is running on the compute node (192.168.1.1),
or just restart it (service pbs_mom restart).
Post by Yousuke Itoh
Dear all,
I am using a small cluster where CentOS 6.8 is running on the
job manager and CentOS 6.6 is running on each computing node.
All the nodes (the job manager and the computing nodes) run
torque version 4.2.10.
While everything worked fine last week,
I have got a following connection problem since yesterday
(Below Hongo-Login-v3 is the name of the job manager.)
Unable to communicate with Hongo-Login-v3(192.168.1.1)
Cannot connect to specified server host 'Hongo-Login-v3'.
Unable to communicate with Hongo-Login-v3(192.168.1.1)
Cannot connect to specified server host 'Hongo-Login-v3'.
Unable to communicate with Hongo-Login-v3(192.168.1.1)
Cannot connect to specified server host 'Hongo-Login-v3'.
qstat: cannot connect to server Hongo-Login-v3 (errno=111) Connection refused
root 6673 1.9 7.8 2705276 2575376 ? Dl 13:22 0:48 /usr/local/sbin/pbs_server -d /var/spool/torque
root 6704 0.0 0.0 62276 7704 ? Ss 13:22 0:00 /usr/local/sbin/pbs_sched -d /var/spool/torque
root 11632 0.0 0.0 103332 896 pts/0 S+ 14:04 0:00 grep pbs
Hongo-Login-v3
4.2.10
4.2.10
(Yushima-v3-01 is the name of the computing node 1)
This happens even after rebooting the whole cluster.
I really appreciate your help or any suggestions.
Best regards,
Yousuke
_______________________________________________
torqueusers mailing list
http://www.supercluster.org/mailman/listinfo/torqueusers
Loading...