Discussion:
[torqueusers] RM failure, rc: 15085, msg: 'End of File'
Clotho Tsang
2013-09-18 06:54:20 UTC
Permalink
Jobs turn to "R" status, and then changed to "Q" again.
"checkjob" commands shows:

job is deferred. Reason: RMFailure (cannot start job - RM failure, rc:
15085, msg: 'End of File')
Holds: Defer (hold reason: RMFailure)


Later we find that it is because one of nodes has turned on firewall.
--
Clotho Tsang
Senior Software Engineer
Cluster Technology Limited
Email: clotho at clustertech.com
Tel: (852) 2655-6129
Fax: (852) 2994-2101
Website: www.clustertech.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20130918/d5f672a7/attachment.html
Ken Nielson
2013-09-18 17:11:41 UTC
Permalink
What version of TORQUE and what scheduler are you using.
Post by Clotho Tsang
Jobs turn to "R" status, and then changed to "Q" again.
15085, msg: 'End of File')
Holds: Defer (hold reason: RMFailure)
Later we find that it is because one of nodes has turned on firewall.
--
Clotho Tsang
Senior Software Engineer
Cluster Technology Limited
Email: clotho at clustertech.com
Tel: (852) 2655-6129
Fax: (852) 2994-2101
Website: www.clustertech.com
_______________________________________________
torqueusers mailing list
torqueusers at supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers
--
Ken Nielson
+1 801.717.3700 office +1 801.717.3738 fax
1712 S. East Bay Blvd, Suite 300 Provo, UT 84606
www.adaptivecomputing.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20130918/fd2d4465/attachment.html
Clotho Tsang
2013-09-19 09:58:29 UTC
Permalink
Torque 4.2.2
Maui 3.3.1
Post by Ken Nielson
What version of TORQUE and what scheduler are you using.
Post by Clotho Tsang
Jobs turn to "R" status, and then changed to "Q" again.
15085, msg: 'End of File')
Holds: Defer (hold reason: RMFailure)
Later we find that it is because one of nodes has turned on firewall.
--
Clotho Tsang
Senior Software Engineer
Cluster Technology Limited
Email: clotho at clustertech.com
Tel: (852) 2655-6129
Fax: (852) 2994-2101
Website: www.clustertech.com
_______________________________________________
torqueusers mailing list
torqueusers at supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers
--
Ken Nielson
+1 801.717.3700 office +1 801.717.3738 fax
1712 S. East Bay Blvd, Suite 300 Provo, UT 84606
www.adaptivecomputing.com
_______________________________________________
torqueusers mailing list
torqueusers at supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers
--
Clotho Tsang
Senior Software Engineer
Cluster Technology Limited
Email: <http://www.supercluster.org/mailman/listinfo/torqueusers>
clotho at clustertech.com
Tel: (852) 2655-6129
Fax: (852) 2994-2101
Website: www.clustertech.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20130919/8e07af0f/attachment.html
Ken Nielson
2013-09-19 14:59:21 UTC
Permalink
Clotho,

TORQUE will put a job in a running state even though it is possible that
the job may fail in its link up with other MOMs. This will cause the job to
be re-queued. With the firewall turned on on one of the MOMs this is
probably what happened.
Post by Clotho Tsang
Torque 4.2.2
Maui 3.3.1
Post by Ken Nielson
What version of TORQUE and what scheduler are you using.
Post by Clotho Tsang
Jobs turn to "R" status, and then changed to "Q" again.
job is deferred. Reason: RMFailure (cannot start job - RM failure,
rc: 15085, msg: 'End of File')
Holds: Defer (hold reason: RMFailure)
Later we find that it is because one of nodes has turned on firewall.
--
Clotho Tsang
Senior Software Engineer
Cluster Technology Limited
Email: clotho at clustertech.com
Tel: (852) 2655-6129
Fax: (852) 2994-2101
Website: www.clustertech.com
_______________________________________________
torqueusers mailing list
torqueusers at supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers
--
Ken Nielson
+1 801.717.3700 office +1 801.717.3738 fax
1712 S. East Bay Blvd, Suite 300 Provo, UT 84606
www.adaptivecomputing.com
_______________________________________________
torqueusers mailing list
torqueusers at supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers
--
Clotho Tsang
Senior Software Engineer
Cluster Technology Limited
Email: <http://www.supercluster.org/mailman/listinfo/torqueusers>
clotho at clustertech.com
Tel: (852) 2655-6129
Fax: (852) 2994-2101
Website: www.clustertech.com
_______________________________________________
torqueusers mailing list
torqueusers at supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers
--
Ken Nielson
+1 801.717.3700 office +1 801.717.3738 fax
1712 S. East Bay Blvd, Suite 300 Provo, UT 84606
www.adaptivecomputing.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20130919/ea8b43c0/attachment.html
Loading...