sudo
2017-05-22 06:08:34 UTC
Hello,
I was able to build/configure v6.1.1.1 on my RHEL7.2 cluster.
My example batch jobs runs just fine.
But, I noticed some errors in /var/log/messages from MOM and SERVER
below.
/var/log/messages
May 19 18:00:49 orion001 pbs_mom: I/O error : Permission denied
May 19 18:00:49 orion001 pbs_mom: I/O error : Permission denied
May 19 18:00:49 orion001 pbs_server: Assertion failed, bad pointer in
link: file "req_select.c", line 401
Are these known issues of v6.1.1.1?
Are there any ways to avoid these errors?
For example, I know the facts that ;
1) the message "Assertion failed" is printed out at the timing of
user's 'qsub' command.
(It seems periodically, too)
2) "I/O error : Permission denied" arises when user's job has finished.
User's output and .e.o files are returned to job's PBS_WORKDIR normally.
---
I built it with HWLOC-1.9 and enabled cgroup.
./configure --enable-cgroups
On this server, torque-6.0.2-1469811694_d9a3483 was running fine, in
the past (without these errors)
I installed v6.1.1.1 as fresh install (removed old /var/spool/torque).
Best Regards,
--Sudo
-------
Ryuichi Sudo (***@sstc.co.jp)
-------
#PS momctl tell that pbs_mom is v6.1.1.1
[***@orion001 ~]# momctl -d 6
Host: orion001/orion001 Version: 6.1.1.1 PID: 1801
Server[0]: orion001 (192.168.21.101:15001)
Last Msg From Server: 127 seconds (CLUSTER_ADDRS)
Last Msg To Server: 25 seconds
HomeDirectory: /var/spool/torque/mom_priv
stdout/stderr spool directory: '/var/spool/torque/spool/' (477820591
blocks available)
NOTE: syslog enabled
MOM active: 131 seconds
Check Poll Time: 45 seconds
Server Update Interval: 45 seconds
LogLevel: 6 (use SIGUSR1/SIGUSR2 to adjust)
Communication Model: TCP
MemLocked: TRUE (mlock)
TCP Timeout: 300 seconds
Prolog: /var/spool/torque/mom_priv/prologue (enabled)
Epilog: /var/spool/torque/mom_priv/epilogue (enabled)
Prolog/Epilog Alarm Time: 300 seconds
Alarm Time: 0 of 10 seconds
Trusted Client List:
127.0.0.1:0,192.168.21.101:0,192.168.21.101:15003,192.168.21.102:15003
Copy Command: /usr/bin/scp -rpB
NOTE: no local jobs detected
diagnostics complete
I was able to build/configure v6.1.1.1 on my RHEL7.2 cluster.
My example batch jobs runs just fine.
But, I noticed some errors in /var/log/messages from MOM and SERVER
below.
/var/log/messages
May 19 18:00:49 orion001 pbs_mom: I/O error : Permission denied
May 19 18:00:49 orion001 pbs_mom: I/O error : Permission denied
May 19 18:00:49 orion001 pbs_server: Assertion failed, bad pointer in
link: file "req_select.c", line 401
Are these known issues of v6.1.1.1?
Are there any ways to avoid these errors?
For example, I know the facts that ;
1) the message "Assertion failed" is printed out at the timing of
user's 'qsub' command.
(It seems periodically, too)
2) "I/O error : Permission denied" arises when user's job has finished.
User's output and .e.o files are returned to job's PBS_WORKDIR normally.
---
I built it with HWLOC-1.9 and enabled cgroup.
./configure --enable-cgroups
On this server, torque-6.0.2-1469811694_d9a3483 was running fine, in
the past (without these errors)
I installed v6.1.1.1 as fresh install (removed old /var/spool/torque).
Best Regards,
--Sudo
-------
Ryuichi Sudo (***@sstc.co.jp)
-------
#PS momctl tell that pbs_mom is v6.1.1.1
[***@orion001 ~]# momctl -d 6
Host: orion001/orion001 Version: 6.1.1.1 PID: 1801
Server[0]: orion001 (192.168.21.101:15001)
Last Msg From Server: 127 seconds (CLUSTER_ADDRS)
Last Msg To Server: 25 seconds
HomeDirectory: /var/spool/torque/mom_priv
stdout/stderr spool directory: '/var/spool/torque/spool/' (477820591
blocks available)
NOTE: syslog enabled
MOM active: 131 seconds
Check Poll Time: 45 seconds
Server Update Interval: 45 seconds
LogLevel: 6 (use SIGUSR1/SIGUSR2 to adjust)
Communication Model: TCP
MemLocked: TRUE (mlock)
TCP Timeout: 300 seconds
Prolog: /var/spool/torque/mom_priv/prologue (enabled)
Epilog: /var/spool/torque/mom_priv/epilogue (enabled)
Prolog/Epilog Alarm Time: 300 seconds
Alarm Time: 0 of 10 seconds
Trusted Client List:
127.0.0.1:0,192.168.21.101:0,192.168.21.101:15003,192.168.21.102:15003
Copy Command: /usr/bin/scp -rpB
NOTE: no local jobs detected
diagnostics complete