I am using mpich2 1.5 with hydra. Now that I think of it the behaviour is as I expect it: the machinefile binds ranks with nodes. I have compiled it also with Openmpi 1.6.5 but I don't remember the behaviour and our openmpi is not integrated with torque.
Thanks for the suggestion, I will try to use PBS_NODEFILE to generate a machinefile on the fly.
Tiago
[hyde at deepgreen PP]$ which mpiexec
/apps/mpich2/1.5/ifort/bin/mpiexec
[hyde at deepgreen PP]$ mpirun -info
HYDRA build details:
Version: 1.5
Release Date: Mon Oct 8 14:00:48 CDT 2012
(...)
-----Original Message-----
From: torqueusers-bounces at supercluster.org [mailto:torqueusers-bounces at supercluster.org] On Behalf Of Gus Correa
Sent: 20 February 2014 19:12
To: Torque Users Mailing List
Subject: Re: [torqueusers] qsub and mpiexec -f machinefile
Hi Tiago
Which MPI and which mpiexec are you using?
I am not familiar to all of them, but the behavior depends primarily on which you are using.
Most likely, by default you will get the sequential rank-to-node mapping that you mentioned.
Have you tried it?
What result did you get?
You can insert the MPI function MPI_Get_processor_name, early on your code, say, right after MPI_Init, MPI_Comm_size, and MPI_Comm_rank, and then printout the pairs rank and processor name (which will probably be your nodes' names).
https://www.open-mpi.org/doc/v1.4/man3/MPI_Get_processor_name.3.php
http://www.mcs.anl.gov/research/projects/mpi/www/www3/MPI_Get_processor_name.html
With OpenMPI there are easier ways (through mpiexec) to report this information.
However, there are ways to change the sequential rank-to-node mapping, if this is your goal, again, depending on which mpiexec you are using.
Anyway, this is more of an MPI then of a Torque question.
I hope this helps,
Gus Correa
Post by Tiago Silva (Cefas)Thanks, this seems promising. Before I try building with openmpi, if I
parse PBS_NODEFILE to produce my own machinefile for mpiexec, for
n100
n100
n101
n101
n101
n101
won't mpiexec start mpi processes with ranks 0-1 onto n100 and with rank
2-5 on n101? That what I think it does when I don't use qsub.
Tiago
-----Original Message-----
From: torqueusers-bounces at supercluster.org
<mailto:torqueusers-bounces at supercluster.org> [mailto:torqueusers- >
bounces at supercluster.org] On Behalf Of Gus Correa > Sent: 19 February
[torqueusers] qsub and mpiexec -f machinefile > > Hi Tiago > > The
Torque/PBS node file is available to your job script through the >
environmnent variable $PBS_NODEFILE.
This file has one line listing the node name for each
processor/core > that you requested.
Just do a "cat $PBS_NODEFILE" inside your job script to see how it looks.
Inside your job script, and before the mpiexec command, you can run
a > brief auxiliary script to create the machinefile you need from
the the > $PBS_NODEFILE.
You will need to create this auxiliary script, tailored to your >
application.
Still, this method won't bind the MPI processes to the appropriate
hardware components (cores, sockets, etc), (in case this is also
part > of your goal).
Having said that, if you are using OpenMPI, it can be built with
Torque > support (with the --with-tm=/torque/location configuration option).
This would give you a range of options on how to assign different
cores, sockets, etc, to different MPI ranks/processes, directly in
the > mpiexec command, or in the OpenMPI runtime configuration files.
This method would't require creating the machinefile from the >
PBS_NODEFILE.
This second approach has the advantage of allowing you to bind the
processes to cores, sockets, etc.
I hope this helps,
Gus Correa
Post by Tiago Silva (Cefas)Hi,
My MPI code is normally executed across a set of nodes with
mpiexec -f machinefile -np 6 ./bin > > > > where the
n01
n01
n02
n02
n02
n02
Now the issue here is that this list has been optimised to
balance > the > > load between nodes and to reduce internode
communication. So for > > instance model domain tiles 0 and 1 will
run on n01 while tiles 2 to > 5 > > will run on n02.
Post by Tiago Silva (Cefas)Is there a way to integrate this into qsub since I don't know
which > > nodes will be assigned before submission? Or in other words
can I > > control grouping processes in one node?
Post by Tiago Silva (Cefas)In my example I used 6 processes for simplicity but normally I >
parallelise across 4-16 nodes and >100 processes.
Post by Tiago Silva (Cefas)Thanks,
tiago
This email and any attachments are intended for the named
recipient > > only. Its unauthorised use, distribution, disclosure,
storage or > > copying is not permitted. If you have received it in
error, please > > destroy all copies and notify the sender. In
messages of a > > non-business nature, the views and opinions
expressed are the > author's > > own and do not necessarily reflect
those of Cefas. Communications on > > Cefas' computer systems may be
monitored and/or recorded to secure > the > > effective operation of
the system and for other lawful purposes.
Post by Tiago Silva (Cefas)_______________________________________________
torqueusers mailing list
torqueusers at supercluster.org
<mailto:torqueusers at supercluster.org>
Post by Tiago Silva (Cefas)http://www.supercluster.org/mailman/listinfo/torqueusers
_______________________________________________
torqueusers mailing list
torqueusers at supercluster.org <mailto:torqueusers at supercluster.org>
http://www.supercluster.org/mailman/listinfo/torqueusers
This email and any attachments are intended for the named recipient
only. Its unauthorised use, distribution, disclosure, storage or
copying is not permitted. If you have received it in error, please
destroy all copies and notify the sender. In messages of a
non-business nature, the views and opinions expressed are the author's
own and do not necessarily reflect those of Cefas. Communications on
Cefas' computer systems may be monitored and/or recorded to secure the
effective operation of the system and for other lawful purposes.
_______________________________________________
torqueusers mailing list
torqueusers at supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers
_______________________________________________
torqueusers mailing list
torqueusers at supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers
This email and any attachments are intended for the named recipient only. Its unauthorised use, distribution, disclosure, storage or copying is not permitted.
If you have received it in error, please destroy all copies and notify the sender. In messages of a non-business nature, the views and opinions expressed are the author's own
and do not necessarily reflect those of Cefas.
Communications on Cefas? computer systems may be monitored and/or recorded to secure the effective operation of the system and for other lawful purposes.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20140221/1759fed3/attachment.html