Discussion:
[torqueusers] Torque from EPEL
J Martin Rushton
2016-11-03 17:51:28 UTC
Permalink
I've been trying to install Torque from the EPEL repository (used by
RHEL and derivatives). Installing MOM on a compute node worked, but I'm
having problems getting the server to start, even following the supplied
README.

The EPEL version of Torque has been build with PBS_HOME set to
/var/lib/torque rather than /var/spool/torque. In addition some of the
configuration files are links into /etc/torque. During the installation
of the MOM I got an error message concerning
/var/spool/torque/mom_priv/mom.lock, but simply touching
/var/lib/torque/mom_priv/mom.lock sorted it out.

The server has not been so easy though. Creating the server.lock in the
/var/lib tree worked, but now I'm hitting a problem trying to create the
initial database. I've tried exporting PBS_HOME and PBS_SERVER_HOME and
also explicitly specifying the path on the command, but to no avail:

# pbs_server -d /var/lib/torque -t create
PBS_Server: LOG_ERROR::No such file or directory (2) in chk_file_sec,
Security violation with "/var/spool/torque" - /var/spool/torque cannot
be lstat'd - errno=2, No such file or directory
PBS_Server: LOG_ERROR::PBS_Server, pbsd_init failed

Has anyone else tried to use the EPEL repository approach and if so did
you succeed? If you did succeed, please pass on the missing information.

Thanks all,
Martin
stdweird
2016-11-03 18:06:25 UTC
Permalink
hi martin,
Post by J Martin Rushton
I've been trying to install Torque from the EPEL repository (used by
RHEL and derivatives).
what version is in EPEL, and who is maintaining it. not to hijack your
original issue, but it would be really nice if someone from adaptive
would be the maintainer of the torque packages in EPEL and/or fedora.

Installing MOM on a compute node worked, but I'm
Post by J Martin Rushton
having problems getting the server to start, even following the supplied
README.
The EPEL version of Torque has been build with PBS_HOME set to
/var/lib/torque rather than /var/spool/torque.
why is this an issue? i remember long long ago issues with using a
PBS_HOME that wasn't the one specified during compilation, and also gave
up and recompiled with what i wanted (or was used to, we still use
/var/spool/pbs). however, i'm guilty of not opening an issue for that at
that time.

In addition some of the
Post by J Martin Rushton
configuration files are links into /etc/torque.
that actually makes more sense than placing config data in /var/spool or
/var/lib. maybe this is a similar configure option during compilation?

During the installation
Post by J Martin Rushton
of the MOM I got an error message concerning
/var/spool/torque/mom_priv/mom.lock, but simply touching
/var/lib/torque/mom_priv/mom.lock sorted it out.
that should be addressed by a postinstall script; but might be related
to not using the compiled PBS_HOME
Post by J Martin Rushton
The server has not been so easy though. Creating the server.lock in the
/var/lib tree worked, but now I'm hitting a problem trying to create the
initial database. I've tried exporting PBS_HOME and PBS_SERVER_HOME and
# pbs_server -d /var/lib/torque -t create
PBS_Server: LOG_ERROR::No such file or directory (2) in chk_file_sec,
Security violation with "/var/spool/torque" - /var/spool/torque cannot
be lstat'd - errno=2, No such file or directory
PBS_Server: LOG_ERROR::PBS_Server, pbsd_init failed
check the selinux bits on the /var/lib/torque with ls -lZ and make sure
they are also set on /var/spool/torque (or just use /var/lib/torque).

a bit more hacky might to symlink /var/lib/torque to /var/spool/torque
Post by J Martin Rushton
Has anyone else tried to use the EPEL repository approach and if so did
you succeed? If you did succeed, please pass on the missing information.
i suggest to open issues in github for each problem (and if you have
moab support, also report them to adaptive that way).

good luck!

stijn
Post by J Martin Rushton
Thanks all,
Martin
_______________________________________________
torqueusers mailing list
http://www.supercluster.org/mailman/listinfo/torqueusers
J Martin Rushton
2016-11-03 18:19:36 UTC
Permalink
Inline
Post by stdweird
hi martin,
Post by J Martin Rushton
I've been trying to install Torque from the EPEL repository (used by
RHEL and derivatives).
what version is in EPEL, and who is maintaining it. not to hijack your
original issue, but it would be really nice if someone from adaptive
would be the maintainer of the torque packages in EPEL and/or fedora.
I'm working on 4.2.10-9 under CentOS 6.8 and therefore using EPEL-6.
For information the version in EPEL-5 is the same and that in EPEL-7 is
4.2.9-10
Post by stdweird
Post by J Martin Rushton
Installing MOM on a compute node worked, but I'm
having problems getting the server to start, even following the supplied
README.
The EPEL version of Torque has been build with PBS_HOME set to
/var/lib/torque rather than /var/spool/torque.
why is this an issue? i remember long long ago issues with using a
PBS_HOME that wasn't the one specified during compilation, and also gave
up and recompiled with what i wanted (or was used to, we still use
/var/spool/pbs). however, i'm guilty of not opening an issue for that at
that time.
Only an issue when it doesn't work!
Post by stdweird
Post by J Martin Rushton
In addition some of the
configuration files are links into /etc/torque.
that actually makes more sense than placing config data in /var/spool or
/var/lib. maybe this is a similar configure option during compilation?
Agreed it's nicer. I suspect that it's not implemented during
compilation but only during the build of the RPM. Hence the soft links.
Post by stdweird
Post by J Martin Rushton
During the installation
of the MOM I got an error message concerning
/var/spool/torque/mom_priv/mom.lock, but simply touching
/var/lib/torque/mom_priv/mom.lock sorted it out.
that should be addressed by a postinstall script; but might be related
to not using the compiled PBS_HOME
Also agreed, but it would be nice the the error message pointed to the
correct place, even if defined as $PBS_HOME/mom_priv
Post by stdweird
Post by J Martin Rushton
The server has not been so easy though. Creating the server.lock in the
/var/lib tree worked, but now I'm hitting a problem trying to create the
initial database. I've tried exporting PBS_HOME and PBS_SERVER_HOME and
# pbs_server -d /var/lib/torque -t create
PBS_Server: LOG_ERROR::No such file or directory (2) in chk_file_sec,
Security violation with "/var/spool/torque" - /var/spool/torque cannot
be lstat'd - errno=2, No such file or directory
PBS_Server: LOG_ERROR::PBS_Server, pbsd_init failed
check the selinux bits on the /var/lib/torque with ls -lZ and make sure
they are also set on /var/spool/torque (or just use /var/lib/torque).
a bit more hacky might to symlink /var/lib/torque to /var/spool/torque
Selinux is disabled - this is an isolated machine.
Post by stdweird
Post by J Martin Rushton
Has anyone else tried to use the EPEL repository approach and if so did
you succeed? If you did succeed, please pass on the missing information.
i suggest to open issues in github for each problem (and if you have
moab support, also report them to adaptive that way).
good luck!
stijn
Post by J Martin Rushton
Thanks all,
Martin
_______________________________________________
torqueusers mailing list
http://www.supercluster.org/mailman/listinfo/torqueusers
_______________________________________________
torqueusers mailing list
http://www.supercluster.org/mailman/listinfo/torqueusers
Tony Schreiner
2016-11-03 19:47:37 UTC
Permalink
I use the EPEL version. It should not be referring to /var/spool/torque at
all. Have you modified any files to refer to such?

On Thu, Nov 3, 2016 at 2:19 PM, J Martin Rushton <
Post by J Martin Rushton
Inline
Post by stdweird
hi martin,
Post by J Martin Rushton
I've been trying to install Torque from the EPEL repository (used by
RHEL and derivatives).
what version is in EPEL, and who is maintaining it. not to hijack your
original issue, but it would be really nice if someone from adaptive
would be the maintainer of the torque packages in EPEL and/or fedora.
I'm working on 4.2.10-9 under CentOS 6.8 and therefore using EPEL-6.
For information the version in EPEL-5 is the same and that in EPEL-7 is
4.2.9-10
Post by stdweird
Post by J Martin Rushton
Installing MOM on a compute node worked, but I'm
having problems getting the server to start, even following the supplied
README.
The EPEL version of Torque has been build with PBS_HOME set to
/var/lib/torque rather than /var/spool/torque.
why is this an issue? i remember long long ago issues with using a
PBS_HOME that wasn't the one specified during compilation, and also gave
up and recompiled with what i wanted (or was used to, we still use
/var/spool/pbs). however, i'm guilty of not opening an issue for that at
that time.
Only an issue when it doesn't work!
Post by stdweird
Post by J Martin Rushton
In addition some of the
configuration files are links into /etc/torque.
that actually makes more sense than placing config data in /var/spool or
/var/lib. maybe this is a similar configure option during compilation?
Agreed it's nicer. I suspect that it's not implemented during
compilation but only during the build of the RPM. Hence the soft links.
Post by stdweird
Post by J Martin Rushton
During the installation
of the MOM I got an error message concerning
/var/spool/torque/mom_priv/mom.lock, but simply touching
/var/lib/torque/mom_priv/mom.lock sorted it out.
that should be addressed by a postinstall script; but might be related
to not using the compiled PBS_HOME
Also agreed, but it would be nice the the error message pointed to the
correct place, even if defined as $PBS_HOME/mom_priv
Post by stdweird
Post by J Martin Rushton
The server has not been so easy though. Creating the server.lock in the
/var/lib tree worked, but now I'm hitting a problem trying to create the
initial database. I've tried exporting PBS_HOME and PBS_SERVER_HOME and
# pbs_server -d /var/lib/torque -t create
PBS_Server: LOG_ERROR::No such file or directory (2) in chk_file_sec,
Security violation with "/var/spool/torque" - /var/spool/torque cannot
be lstat'd - errno=2, No such file or directory
PBS_Server: LOG_ERROR::PBS_Server, pbsd_init failed
check the selinux bits on the /var/lib/torque with ls -lZ and make sure
they are also set on /var/spool/torque (or just use /var/lib/torque).
a bit more hacky might to symlink /var/lib/torque to /var/spool/torque
Selinux is disabled - this is an isolated machine.
Post by stdweird
Post by J Martin Rushton
Has anyone else tried to use the EPEL repository approach and if so did
you succeed? If you did succeed, please pass on the missing
information.
Post by stdweird
i suggest to open issues in github for each problem (and if you have
moab support, also report them to adaptive that way).
good luck!
stijn
Post by J Martin Rushton
Thanks all,
Martin
_______________________________________________
torqueusers mailing list
http://www.supercluster.org/mailman/listinfo/torqueusers
_______________________________________________
torqueusers mailing list
http://www.supercluster.org/mailman/listinfo/torqueusers
_______________________________________________
torqueusers mailing list
http://www.supercluster.org/mailman/listinfo/torqueusers
J Martin Rushton
2016-11-04 09:12:13 UTC
Permalink
No, I removed a test version of it and deleted all the old (/var/spool,
V2) files. I then installed it, configured it and tried to start it.

Have you seen Jim Prewett's post?

Regards,
Martin
Post by Tony Schreiner
I use the EPEL version. It should not be referring to /var/spool/torque
at all. Have you modified any files to refer to such?
On Thu, Nov 3, 2016 at 2:19 PM, J Martin Rushton
Inline
Post by stdweird
hi martin,
Post by J Martin Rushton
I've been trying to install Torque from the EPEL repository (used by
RHEL and derivatives).
what version is in EPEL, and who is maintaining it. not to hijack your
original issue, but it would be really nice if someone from adaptive
would be the maintainer of the torque packages in EPEL and/or fedora.
I'm working on 4.2.10-9 under CentOS 6.8 and therefore using EPEL-6.
For information the version in EPEL-5 is the same and that in EPEL-7 is
4.2.9-10
Post by stdweird
Post by J Martin Rushton
Installing MOM on a compute node worked, but I'm
having problems getting the server to start, even following the supplied
README.
The EPEL version of Torque has been build with PBS_HOME set to
/var/lib/torque rather than /var/spool/torque.
why is this an issue? i remember long long ago issues with using a
PBS_HOME that wasn't the one specified during compilation, and also gave
up and recompiled with what i wanted (or was used to, we still use
/var/spool/pbs). however, i'm guilty of not opening an issue for that at
that time.
Only an issue when it doesn't work!
Post by stdweird
Post by J Martin Rushton
In addition some of the
configuration files are links into /etc/torque.
that actually makes more sense than placing config data in /var/spool or
/var/lib. maybe this is a similar configure option during compilation?
Agreed it's nicer. I suspect that it's not implemented during
compilation but only during the build of the RPM. Hence the soft links.
Post by stdweird
Post by J Martin Rushton
During the installation
of the MOM I got an error message concerning
/var/spool/torque/mom_priv/mom.lock, but simply touching
/var/lib/torque/mom_priv/mom.lock sorted it out.
that should be addressed by a postinstall script; but might be related
to not using the compiled PBS_HOME
Also agreed, but it would be nice the the error message pointed to the
correct place, even if defined as $PBS_HOME/mom_priv
Post by stdweird
Post by J Martin Rushton
The server has not been so easy though. Creating the server.lock in the
/var/lib tree worked, but now I'm hitting a problem trying to create the
initial database. I've tried exporting PBS_HOME and PBS_SERVER_HOME and
# pbs_server -d /var/lib/torque -t create
PBS_Server: LOG_ERROR::No such file or directory (2) in chk_file_sec,
Security violation with "/var/spool/torque" - /var/spool/torque cannot
be lstat'd - errno=2, No such file or directory
PBS_Server: LOG_ERROR::PBS_Server, pbsd_init failed
check the selinux bits on the /var/lib/torque with ls -lZ and make sure
they are also set on /var/spool/torque (or just use /var/lib/torque).
a bit more hacky might to symlink /var/lib/torque to /var/spool/torque
Selinux is disabled - this is an isolated machine.
Post by stdweird
Post by J Martin Rushton
Has anyone else tried to use the EPEL repository approach and if
so did
Post by stdweird
Post by J Martin Rushton
you succeed? If you did succeed, please pass on the missing
information.
Post by stdweird
i suggest to open issues in github for each problem (and if you have
moab support, also report them to adaptive that way).
good luck!
stijn
Post by J Martin Rushton
Thanks all,
Martin
_______________________________________________
torqueusers mailing list
http://www.supercluster.org/mailman/listinfo/torqueusers
<http://www.supercluster.org/mailman/listinfo/torqueusers>
Post by stdweird
_______________________________________________
torqueusers mailing list
http://www.supercluster.org/mailman/listinfo/torqueusers
<http://www.supercluster.org/mailman/listinfo/torqueusers>
_______________________________________________
torqueusers mailing list
http://www.supercluster.org/mailman/listinfo/torqueusers
<http://www.supercluster.org/mailman/listinfo/torqueusers>
_______________________________________________
torqueusers mailing list
http://www.supercluster.org/mailman/listinfo/torqueusers
J Martin Rushton
2016-11-04 09:22:46 UTC
Permalink
Thanks, Jim.

Fortunately this was a new installation, change of server node and
upgrade from Torque 2.something. I understand what you mean about EPEL,
I notice that Dokuwiki has been dropped from EPEL-7 which is of concern
to me. I prefer though to stay with the repositories if possible since
they seem to have the knack of avoiding problems when the Kernel
changes. It's a bit of a pain when that doesn't happen. We run IBM's
GPFS and it needs a rebuild after every kernel update, however minor. I
suppose that is to be expected with filesystems, but I don't want it for
other packages, hence my predilection for EPEL.

Regards,
Martin
Hi Martin,
I stopped using the EPEL version of Torque quite a while ago because an
upgrade totally burned me (by switching the PBS_HOME to /var/lib/torque
and creating a new, empty configuration for Torque! :P ) . Personally,
I prefer building the RPMs from the sources and installing those. For
my shop, I've built a RPM repository that has only a few things like
Torque in it to simplify installations (and, with the added benefit that
*I* control when massive changes like PBS_HOME happen! :) .
I feel somewhat betrayed by EPEL. Initially, I was excited to see all
of the additional packages, like Torque, that it provides.
Unfortunately, it feels to me like it is managed in a sort of wild west
kind of fashion (but, that's just my personal opinion). I also think
the Torque upgrade that burned me was done in a really "lazy" kind of
way: The RPM package manager provides all sorts of "good stuff" for
upgrading packages and I feel that upgrade could have been made
completely seemless (at worst, I would have been unhappy to find that
changes to configurations in /var/spool/torque didn't do anything and I
needed to instead modify configs in /var/lib/torque... ).
HTH,
Jim
Post by J Martin Rushton
I've been trying to install Torque from the EPEL repository (used by
RHEL and derivatives). Installing MOM on a compute node worked, but I'm
having problems getting the server to start, even following the supplied
README.
The EPEL version of Torque has been build with PBS_HOME set to
/var/lib/torque rather than /var/spool/torque. In addition some of the
configuration files are links into /etc/torque. During the installation
of the MOM I got an error message concerning
/var/spool/torque/mom_priv/mom.lock, but simply touching
/var/lib/torque/mom_priv/mom.lock sorted it out.
The server has not been so easy though. Creating the server.lock in the
/var/lib tree worked, but now I'm hitting a problem trying to create the
initial database. I've tried exporting PBS_HOME and PBS_SERVER_HOME and
# pbs_server -d /var/lib/torque -t create
PBS_Server: LOG_ERROR::No such file or directory (2) in chk_file_sec,
Security violation with "/var/spool/torque" - /var/spool/torque cannot
be lstat'd - errno=2, No such file or directory
PBS_Server: LOG_ERROR::PBS_Server, pbsd_init failed
Has anyone else tried to use the EPEL repository approach and if so did
you succeed? If you did succeed, please pass on the missing information.
Thanks all,
Martin
_______________________________________________
torqueusers mailing list
http://www.supercluster.org/mailman/listinfo/torqueusers
Systems Team Leader LoGS: http://www.hpc.unm.edu/~download/LoGS/
Designated Security Officer OpenPGP key: pub 1024D/31816D93
HPC Systems Engineer III UNM HPC 505.277.8210
Michael Jennings
2016-11-04 19:49:04 UTC
Permalink
On Fri, Nov 4, 2016 at 2:22 AM, J Martin Rushton
Post by J Martin Rushton
Fortunately this was a new installation, change of server node and
upgrade from Torque 2.something. I understand what you mean about EPEL,
I notice that Dokuwiki has been dropped from EPEL-7 which is of concern
to me. I prefer though to stay with the repositories if possible since
they seem to have the knack of avoiding problems when the Kernel
changes. It's a bit of a pain when that doesn't happen. We run IBM's
GPFS and it needs a rebuild after every kernel update, however minor. I
suppose that is to be expected with filesystems, but I don't want it for
other packages, hence my predilection for EPEL.
It is possible, with care, to make kernel modules that are tied to the
kernel ABI which the driver actually requires rather than the exact
version string of the kernel build. ELrepo does a great job with
this, and the end result is actually much more manageable even than
using DKMS, at least in my experience. (See
https://elrepo.org/tiki/FAQ)

But if IBM's GPFS driver isn't capable of using the kABI
compatibility, DKMS might be an alternative way to at least ensure
that GPFS gets rebuilt for you automatically.
Post by J Martin Rushton
I stopped using the EPEL version of Torque quite a while ago because an
upgrade totally burned me (by switching the PBS_HOME to /var/lib/torque
and creating a new, empty configuration for Torque! :P ) . Personally,
I prefer building the RPMs from the sources and installing those. For
my shop, I've built a RPM repository that has only a few things like
Torque in it to simplify installations (and, with the added benefit that
*I* control when massive changes like PBS_HOME happen! :) .
The packagers who work with EPEL are required to strictly adhere to
certain (in some cases) draconian policies regarding FHS compliance,
even for packages whose conventions and standards predate even the FHS
itself.

The community spec file in Git, on the other hand, was explicitly
engineered to provide the proper balance of Adaptive supportability
and compliance with the long-established TORQUE standards with
"proper" RPM/specfile behavior.

And it will never, ever make huge sweeping architecture changes like
moving PBS_HOME without warning or without community discussion.
That's just flat-out wrong.
Post by J Martin Rushton
I feel somewhat betrayed by EPEL. Initially, I was excited to see all
of the additional packages, like Torque, that it provides.
Unfortunately, it feels to me like it is managed in a sort of wild west
kind of fashion (but, that's just my personal opinion). I also think
the Torque upgrade that burned me was done in a really "lazy" kind of
way: The RPM package manager provides all sorts of "good stuff" for
upgrading packages and I feel that upgrade could have been made
completely seemless (at worst, I would have been unhappy to find that
changes to configurations in /var/spool/torque didn't do anything and I
needed to instead modify configs in /var/lib/torque... ).
It is entirely possible to use the functionality in RPM to handle just
about any packaging transition as seamless as possible to users. In
fact, many of RPM's more esoteric/obscure features had to be added to
handle specific challenges presented by strange or unexpected
transitions in upstream software.

For example, I recently encountered the need to rename not only my
package, which is easy, but many of the configuration files that come
with it, some of which might have been modified by the user. And I
needed to make the transition as seamless as possible for the users
without overwriting or losing any of the users' changes. So I did.
And it works.

Similarly, it should've been possible to automatically migrate the
PBS_HOME contents to the new location as part of the packaging
directives had the maintainer wished to do so and put in the time and
effort to create and test the needed code. It's incredibly
unfortunate that wasn't done, but it certainly isn't for lack of
functionality in RPM.

Michael
--
Michael Jennings (KainX) https://medium.com/@mej0/ <***@eterm.org>
Linux/HPC Systems Engineer Author, Eterm (www.eterm.org)
-----------------------------------------------------------------------
"The trouble with doing something right the first time is that nobody
appreciates how difficult it was." -- Walt West
J Martin Rushton
2017-03-24 10:15:33 UTC
Permalink
(apologies if this is repeated, the original post seems to have been
lost when I shut down last night)

A few months ago I was hitting problems with Torque appearing to look
for hard coded links. I've identified what was causing that - an
incorrect ordering in root's PATH meant it was reading a mixture of 2.x
and 4.x binaries. :-o

I'm now hitting a solid wall with 4.2.10 both from EPEL (yum install
pbs_server), and when built directly (download, configure, make , make
install). In either case pbs_server is reporting an undefined symbol:
job_log_mutex. The message appears both with "service pbs_server start"
and running the daemon directly.

"ldd -u pbs_server" actually reports 6 missing symbols: job_log_mutex,
msg_momjoboverwalltimelimit, log_mutex, pbs_server_name,
msg_momjobovercputlimit and msg_momjobovermemlimit.

"ldd -d pbs_server" reports two further errors in addition to the
missing symbols: symbol 'svr_conn' has different size in shared object,
consider re-linking" and the same message for 'dis_emsg'.

I appreciate that 4.2 is getting quite old, but it is the one supplied
with EPEL-6 and that makes building images for diskless compute nodes
easier. Is this a known problem, and more importantly, is there a fix?

Regards,
Martin

Loading...