Yousuke Itoh
2017-03-24 05:13:06 UTC
Dear all,
I am using a small cluster where CentOS 6.8 is running on the
job manager and CentOS 6.6 is running on each computing node.
All the nodes (the job manager and the computing nodes) run
torque version 4.2.10.
While everything worked fine last week,
I have got a following connection problem since yesterday
(Below Hongo-Login-v3 is the name of the job manager.)
***@Hongo-Login-v3 ~]$ qstat
Unable to communicate with Hongo-Login-v3(192.168.1.1)
Cannot connect to specified server host 'Hongo-Login-v3'.
Unable to communicate with Hongo-Login-v3(192.168.1.1)
Cannot connect to specified server host 'Hongo-Login-v3'.
Unable to communicate with Hongo-Login-v3(192.168.1.1)
Cannot connect to specified server host 'Hongo-Login-v3'.
qstat: cannot connect to server Hongo-Login-v3 (errno=111) Connection
refused
[***@Hongo-Login-v3 yousuke]# ps aux | grep pbs
root 6673 1.9 7.8 2705276 2575376 ? Dl 13:22 0:48
/usr/local/sbin/pbs_server -d /var/spool/torque
root 6704 0.0 0.0 62276 7704 ? Ss 13:22 0:00
/usr/local/sbin/pbs_sched -d /var/spool/torque
root 11632 0.0 0.0 103332 896 pts/0 S+ 14:04 0:00 grep pbs
[***@Hongo-Login-v3 yousuke]# cat /var/spool/torque/server_name
Hongo-Login-v3
[***@Hongo-Login-v3 yousuke]# pbs-config --version
4.2.10
[***@Yushima-v3-01 ~]$ pbs-config --version
4.2.10
(Yushima-v3-01 is the name of the computing node 1)
This happens even after rebooting the whole cluster.
I really appreciate your help or any suggestions.
Best regards,
Yousuke
I am using a small cluster where CentOS 6.8 is running on the
job manager and CentOS 6.6 is running on each computing node.
All the nodes (the job manager and the computing nodes) run
torque version 4.2.10.
While everything worked fine last week,
I have got a following connection problem since yesterday
(Below Hongo-Login-v3 is the name of the job manager.)
***@Hongo-Login-v3 ~]$ qstat
Unable to communicate with Hongo-Login-v3(192.168.1.1)
Cannot connect to specified server host 'Hongo-Login-v3'.
Unable to communicate with Hongo-Login-v3(192.168.1.1)
Cannot connect to specified server host 'Hongo-Login-v3'.
Unable to communicate with Hongo-Login-v3(192.168.1.1)
Cannot connect to specified server host 'Hongo-Login-v3'.
qstat: cannot connect to server Hongo-Login-v3 (errno=111) Connection
refused
[***@Hongo-Login-v3 yousuke]# ps aux | grep pbs
root 6673 1.9 7.8 2705276 2575376 ? Dl 13:22 0:48
/usr/local/sbin/pbs_server -d /var/spool/torque
root 6704 0.0 0.0 62276 7704 ? Ss 13:22 0:00
/usr/local/sbin/pbs_sched -d /var/spool/torque
root 11632 0.0 0.0 103332 896 pts/0 S+ 14:04 0:00 grep pbs
[***@Hongo-Login-v3 yousuke]# cat /var/spool/torque/server_name
Hongo-Login-v3
[***@Hongo-Login-v3 yousuke]# pbs-config --version
4.2.10
[***@Yushima-v3-01 ~]$ pbs-config --version
4.2.10
(Yushima-v3-01 is the name of the computing node 1)
This happens even after rebooting the whole cluster.
I really appreciate your help or any suggestions.
Best regards,
Yousuke