Discussion:
[torqueusers] torque 6.02 and maui 3.3.1 problem
Fedele Stabile
2016-10-22 07:51:59 UTC
Permalink
If I compile torque with numa support enabled I go in troubles when submit
an mpi job on multiple nodes:

it runs only on one node creating all processes on the first node!

If I recompile torque without numa support mpi jobs are executed correctly.

Can anyone help me?
David Beer
2016-10-22 18:24:34 UTC
Permalink
Fedele,

This is part of our product that is a little bit confusing - our original
NUMA support was made for large-scale NUMA macines, such as an Altix UV. I
believe you will be best served by using --enable-cgroups. The difference
is documented, but it can be a bit confusing.

David

On Sat, Oct 22, 2016 at 1:51 AM, Fedele Stabile <
Post by Fedele Stabile
If I compile torque with numa support enabled I go in troubles when submit
it runs only on one node creating all processes on the first node!
If I recompile torque without numa support mpi jobs are executed correctly.
Can anyone help me?
_______________________________________________
torqueusers mailing list
http://www.supercluster.org/mailman/listinfo/torqueusers
--
David Beer | Torque Architect
Adaptive Computing
Fedele Stabile
2016-10-23 08:01:24 UTC
Permalink
Actually I configured torque with enable cpuset, enable numa and enable
cgroups

with problems on mpi jobs.

Can you explain what is the difference between enable_numa and
enable_cgroups

Also I notice that there are problems with maui 3.3.1 is it supported?



Fedele
David Beer
2016-10-24 15:46:38 UTC
Permalink
enable-numa does not allow jobs to span hosts, and it only pins jobs to
memory by writing the memory nodes to the cpuset. This is only intended for
large-scale machines where you don't want to span hosts, i.e. Altix
machines.

enable-cgroups uses cgroups to pin memory, so it will write the memory node
as well as setting memory.max_usage_in_bytes (it can also set swap). It
doesn't require extra configuration. It should be used in all cases except
perhaps if you are using an Altix UV.

On Sun, Oct 23, 2016 at 2:01 AM, Fedele Stabile <
Post by Fedele Stabile
Actually I configured torque with enable cpuset, enable numa and enable
cgroups
with problems on mpi jobs.
Can you explain what is the difference between enable_numa and
enable_cgroups
Also I notice that there are problems with maui 3.3.1 is it supported?
Fedele
--
David Beer | Torque Architect
Adaptive Computing
Fedele Stabile
2016-10-24 16:17:04 UTC
Permalink
Thank you for the answer,
but I want to know if there are incompatibilities between toruqe 6.x
and maui 3.3.1.
Can anyone help me?
Fedele
Post by David Beer
enable-numa does not allow jobs to span hosts, and it only pins jobs
to memory by writing the memory nodes to the cpuset. This is only
intended for large-scale machines where you don't want to span hosts,
i.e. Altix machines.
enable-cgroups uses cgroups to pin memory, so it will write the
memory node as well as setting memory.max_usage_in_bytes (it can also
set swap). It doesn't require extra configuration. It should be used
in all cases except perhaps if you are using an Altix UV.
Post by Fedele Stabile
Actually I configured torque with enable cpuset, enable numa and
enable cgroups
with problems on mpi jobs.
Can you explain what is the difference between enable_numa and
enable_cgroups
Also I notice that there are problems with maui 3.3.1 is it
supported?
 
Fedele
 
 
-- 
David Beer | Torque Architect
Adaptive Computing
David Beer
2016-10-24 20:04:45 UTC
Permalink
I'm not really a Maui expert, but I think that if you are using the cgroup
features Maui will not interpret Torque's report of the sockets, numa
nodes, cores, and threads correctly. I don't know the extent to which this
affects Maui, or if there's a workaround. Perhaps someone with more
knowledge of Maui can chime in.

On Mon, Oct 24, 2016 at 10:17 AM, Fedele Stabile <
Post by Fedele Stabile
Thank you for the answer,
but I want to know if there are incompatibilities between toruqe 6.x
and maui 3.3.1.
Can anyone help me?
Fedele
Post by David Beer
enable-numa does not allow jobs to span hosts, and it only pins jobs
to memory by writing the memory nodes to the cpuset. This is only
intended for large-scale machines where you don't want to span hosts,
i.e. Altix machines.
enable-cgroups uses cgroups to pin memory, so it will write the
memory node as well as setting memory.max_usage_in_bytes (it can also
set swap). It doesn't require extra configuration. It should be used
in all cases except perhaps if you are using an Altix UV.
Post by Fedele Stabile
Actually I configured torque with enable cpuset, enable numa and
enable cgroups
with problems on mpi jobs.
Can you explain what is the difference between enable_numa and
enable_cgroups
Also I notice that there are problems with maui 3.3.1 is it supported?
Fedele
--
David Beer | Torque Architect
Adaptive Computing
--
David Beer | Torque Architect
Adaptive Computing
Loading...