Discussion:
[torqueusers] how to set mom.layout and server_prive/nodes to use mcdram of KNL
노승우
2017-03-09 02:41:04 UTC
Permalink
Dear Torque users.





I'm having trouble using mcdram memory.





my server_prive/nodes is


knl02.kisti.re.kr np=272 num_node_boards=1





and mom.layout is


nodes=0,1 cpus=0-271 mems=0,1








and pbs_mom log is





03/09/2017 09:41:12.782;02; pbs_mom.272986;Svr;pbs_mom;Torque Mom Version = 6.1.0, loglevel = 0


03/09/2017 09:41:12.813;128; pbs_mom.272986;n/a;add_static;config[1] add name knl02 value .kisti.re.kr headnode

03/09/2017 09:41:12.814;02; pbs_mom.272986;Svr;setpbsserver;knl02.kisti.re.kr

03/09/2017 09:41:12.814;02; pbs_mom.272986;Svr;mom_server_add;server knl02.kisti.re.kr added

03/09/2017 09:41:13.506;02; pbs_mom.272986;Svr;initialize_hwloc_topology;machine topology contains 1 sockets 2 memory nodes, 68 cores 272 cpus

03/09/2017 09:41:13.528;02; pbs_mom.272986;node;read_layout_file;nodeboard 0: 0 NUMA nodes: 0-1

03/09/2017 09:41:13.528;02; pbs_mom.272986;node;read_layout_file;Setting up this mom to function as 1 numa nodes

03/09/2017 09:41:13.528;02; pbs_mom.272986;node;setup_nodeboards;nodeboard 0: 272 cpus (0-271), 1 mems (0)

03/09/2017 09:41:13.529;02; pbs_mom.272986;Svr;init_torque_cpuset;Init cpuset /dev/cpuset/torque

03/09/2017 09:41:13.529;02; pbs_mom.272986;Svr;init_torque_cpuset;setting cpus = 0-271

03/09/2017 09:41:13.529;02; pbs_mom.272986;Svr;init_torque_cpuset;setting mems = 0

03/09/2017 09:41:13.549;02; pbs_mom.272986;n/a;initialize;independent

03/09/2017 09:41:13.549;02; pbs_mom.272986;Svr;dep_initialize;mom is now oom-killer safe

03/09/2017 09:41:13.553;02; pbs_mom.272986;Svr;read_mom_hierarchy;No local mom hierarchy file found, will request from server.

03/09/2017 09:41:13.553;128; pbs_mom.272986;Svr;pbs_mom;before init_abort_jobs

03/09/2017 09:41:13.561;02; pbs_mom.272986;Svr;pbs_mom;Is up








and pbs_nodes is





[***@knl02 torque]$ pbsnodes


knl02.kisti.re.kr-0

state = free

power_state = Running

np = 272


ntype = cluster

status = opsys=linux,uname=Linux knl02.kisti.re.kr 3.10.0-514.2.2.el7.x86_64 #1 SMP Tue Dec 6 23:06:41 UTC 2016 x86_64,nsessions=0,nusers=0,idletime=68878,totmem=100564804kb,availmem=94743368kb,physmem=100564804kb,ncpus=272,loadave=0.00,gres=knl02:.kisti.re.kr headnode,netload=,state=free,varattr= ,cpuclock=Fixed,macaddr=00:1e:67:f9:ae:c3,version=6.1.0,rectime=1489020718,jobs=

mom_service_port = 15002

mom_manager_port = 15003

total_sockets = 1

total_numa_nodes = 1

total_cores = 68

total_threads = 272

dedicated_sockets = 0

dedicated_numa_nodes = 0

dedicated_cores = 0

dedicated_threads = 0





and my test pbs script is





#!/bin/sh

#

#This is an example script example.sh

#

#These commands set up the Grid Environment for your job:

#PBS -N ExampleJob

#PBS -l nodes=1,walltime=00:02:00

#PBS -q np_workq

#PBS -M ***@umich.edu

#PBS -m abe

#print the time and date

date

#wait 5 seconds


/usr/bin/numactl -p 1 sleep 5


#print the time and date again


date





when I submit my script, following error occurs.


libnuma: Warning: node argument 1 is out of range









Any help or idea is more than welcomed not sure what is wrong.





Thank you,


seungwoo
=====================================
Seungwoo Rho
National Institute of Supercomputing and Networking,
KISTI,
52-11, Eoeundong, Yuseonggu,
Daejeon, 305-806, Republic of Korea
e-mail : ***@kisti.re.kr
Phone : +82-42-869-1643

Mobie : +82-10-8849-4001
=====================================
노승우
2017-03-10 09:04:46 UTC
Permalink
Dear Torque users.




I'm having trouble using mcdram memory.




my server_prive/nodes is

knl02.kisti.re.kr np=272 num_node_boards=1




and mom.layout is

nodes=0,1 cpus=0-271 mems=0,1







and pbs_mom log is




03/09/2017 09:41:12.782;02; pbs_mom.272986;Svr;pbs_mom;Torque Mom Version = 6.1.0, loglevel = 0

03/09/2017 09:41:12.813;128; pbs_mom.272986;n/a;add_static;config[1] add name knl02 value .kisti.re.kr headnode

03/09/2017 09:41:12.814;02; pbs_mom.272986;Svr;setpbsserver;knl02.kisti.re.kr

03/09/2017 09:41:12.814;02; pbs_mom.272986;Svr;mom_server_add;server knl02.kisti.re.kr added

03/09/2017 09:41:13.506;02; pbs_mom.272986;Svr;initialize_hwloc_topology;machine topology contains 1 sockets 2 memory nodes, 68 cores 272 cpus

03/09/2017 09:41:13.528;02; pbs_mom.272986;node;read_layout_file;nodeboard 0: 0 NUMA nodes: 0-1

03/09/2017 09:41:13.528;02; pbs_mom.272986;node;read_layout_file;Setting up this mom to function as 1 numa nodes

03/09/2017 09:41:13.528;02; pbs_mom.272986;node;setup_nodeboards;nodeboard 0: 272 cpus (0-271), 1 mems (0)

03/09/2017 09:41:13.529;02; pbs_mom.272986;Svr;init_torque_cpuset;Init cpuset /dev/cpuset/torque

03/09/2017 09:41:13.529;02; pbs_mom.272986;Svr;init_torque_cpuset;setting cpus = 0-271

03/09/2017 09:41:13.529;02; pbs_mom.272986;Svr;init_torque_cpuset;setting mems = 0

03/09/2017 09:41:13.549;02; pbs_mom.272986;n/a;initialize;independent

03/09/2017 09:41:13.549;02; pbs_mom.272986;Svr;dep_initialize;mom is now oom-killer safe

03/09/2017 09:41:13.553;02; pbs_mom.272986;Svr;read_mom_hierarchy;No local mom hierarchy file found, will request from server.

03/09/2017 09:41:13.553;128; pbs_mom.272986;Svr;pbs_mom;before init_abort_jobs

03/09/2017 09:41:13.561;02; pbs_mom.272986;Svr;pbs_mom;Is up







and pbs_nodes is




[***@knl02 torque]$ pbsnodes

knl02.kisti.re.kr-0

state = free

power_state = Running

np = 272

ntype = cluster

status = opsys=linux,uname=Linux knl02.kisti.re.kr 3.10.0-514.2.2.el7.x86_64 #1 SMP Tue Dec 6 23:06:41 UTC 2016 x86_64,nsessions=0,nusers=0,idletime=68878,totmem=100564804kb,availmem=94743368kb,physmem=100564804kb,ncpus=272,loadave=0.00,gres=knl02:.kisti.re.kr headnode,netload=,state=free,varattr= ,cpuclock=Fixed,macaddr=00:1e:67:f9:ae:c3,version=6.1.0,rectime=1489020718,jobs=

mom_service_port = 15002

mom_manager_port = 15003

total_sockets = 1

total_numa_nodes = 1

total_cores = 68

total_threads = 272

dedicated_sockets = 0

dedicated_numa_nodes = 0

dedicated_cores = 0

dedicated_threads = 0




and my test pbs script is




#!/bin/sh

#

#This is an example script example.sh

#

#These commands set up the Grid Environment for your job:

#PBS -N ExampleJob

#PBS -l nodes=1,walltime=00:02:00

#PBS -q np_workq

#PBS -M ***@umich.edu

#PBS -m abe

#print the time and date

date

#wait 5 seconds

/usr/bin/numactl -p 1 sleep 5

#print the time and date again

date




when I submit my script, following error occurs.

libnuma: Warning: node argument 1 is out of range








Any help or idea is more than welcomed not sure what is wrong.





Thank you,

seungwoo




=====================================
Seungwoo Rho
National Institute of Supercomputing and Networking,
KISTI,
52-11, Eoeundong, Yuseonggu,
Daejeon, 305-806, Republic of Korea
e-mail : ***@kisti.re.kr
Phone : +82-42-869-1643


Mobie : +82-10-8849-4001
=====================================

Loading...