노승우
2017-03-09 02:41:04 UTC
Dear Torque users.
I'm having trouble using mcdram memory.
my server_prive/nodes is
knl02.kisti.re.kr np=272 num_node_boards=1
and mom.layout is
nodes=0,1 cpus=0-271 mems=0,1
and pbs_mom log is
03/09/2017 09:41:12.782;02; pbs_mom.272986;Svr;pbs_mom;Torque Mom Version = 6.1.0, loglevel = 0
03/09/2017 09:41:12.813;128; pbs_mom.272986;n/a;add_static;config[1] add name knl02 value .kisti.re.kr headnode
03/09/2017 09:41:12.814;02; pbs_mom.272986;Svr;setpbsserver;knl02.kisti.re.kr
03/09/2017 09:41:12.814;02; pbs_mom.272986;Svr;mom_server_add;server knl02.kisti.re.kr added
03/09/2017 09:41:13.506;02; pbs_mom.272986;Svr;initialize_hwloc_topology;machine topology contains 1 sockets 2 memory nodes, 68 cores 272 cpus
03/09/2017 09:41:13.528;02; pbs_mom.272986;node;read_layout_file;nodeboard 0: 0 NUMA nodes: 0-1
03/09/2017 09:41:13.528;02; pbs_mom.272986;node;read_layout_file;Setting up this mom to function as 1 numa nodes
03/09/2017 09:41:13.528;02; pbs_mom.272986;node;setup_nodeboards;nodeboard 0: 272 cpus (0-271), 1 mems (0)
03/09/2017 09:41:13.529;02; pbs_mom.272986;Svr;init_torque_cpuset;Init cpuset /dev/cpuset/torque
03/09/2017 09:41:13.529;02; pbs_mom.272986;Svr;init_torque_cpuset;setting cpus = 0-271
03/09/2017 09:41:13.529;02; pbs_mom.272986;Svr;init_torque_cpuset;setting mems = 0
03/09/2017 09:41:13.549;02; pbs_mom.272986;n/a;initialize;independent
03/09/2017 09:41:13.549;02; pbs_mom.272986;Svr;dep_initialize;mom is now oom-killer safe
03/09/2017 09:41:13.553;02; pbs_mom.272986;Svr;read_mom_hierarchy;No local mom hierarchy file found, will request from server.
03/09/2017 09:41:13.553;128; pbs_mom.272986;Svr;pbs_mom;before init_abort_jobs
03/09/2017 09:41:13.561;02; pbs_mom.272986;Svr;pbs_mom;Is up
and pbs_nodes is
[***@knl02 torque]$ pbsnodes
knl02.kisti.re.kr-0
state = free
power_state = Running
np = 272
ntype = cluster
status = opsys=linux,uname=Linux knl02.kisti.re.kr 3.10.0-514.2.2.el7.x86_64 #1 SMP Tue Dec 6 23:06:41 UTC 2016 x86_64,nsessions=0,nusers=0,idletime=68878,totmem=100564804kb,availmem=94743368kb,physmem=100564804kb,ncpus=272,loadave=0.00,gres=knl02:.kisti.re.kr headnode,netload=,state=free,varattr= ,cpuclock=Fixed,macaddr=00:1e:67:f9:ae:c3,version=6.1.0,rectime=1489020718,jobs=
mom_service_port = 15002
mom_manager_port = 15003
total_sockets = 1
total_numa_nodes = 1
total_cores = 68
total_threads = 272
dedicated_sockets = 0
dedicated_numa_nodes = 0
dedicated_cores = 0
dedicated_threads = 0
and my test pbs script is
#!/bin/sh
#
#This is an example script example.sh
#
#These commands set up the Grid Environment for your job:
#PBS -N ExampleJob
#PBS -l nodes=1,walltime=00:02:00
#PBS -q np_workq
#PBS -M ***@umich.edu
#PBS -m abe
#print the time and date
date
#wait 5 seconds
/usr/bin/numactl -p 1 sleep 5
#print the time and date again
date
when I submit my script, following error occurs.
libnuma: Warning: node argument 1 is out of range
Any help or idea is more than welcomed not sure what is wrong.
Thank you,
seungwoo
=====================================
Seungwoo Rho
National Institute of Supercomputing and Networking,
KISTI,
52-11, Eoeundong, Yuseonggu,
Daejeon, 305-806, Republic of Korea
e-mail : ***@kisti.re.kr
Phone : +82-42-869-1643
Mobie : +82-10-8849-4001
=====================================
I'm having trouble using mcdram memory.
my server_prive/nodes is
knl02.kisti.re.kr np=272 num_node_boards=1
and mom.layout is
nodes=0,1 cpus=0-271 mems=0,1
and pbs_mom log is
03/09/2017 09:41:12.782;02; pbs_mom.272986;Svr;pbs_mom;Torque Mom Version = 6.1.0, loglevel = 0
03/09/2017 09:41:12.813;128; pbs_mom.272986;n/a;add_static;config[1] add name knl02 value .kisti.re.kr headnode
03/09/2017 09:41:12.814;02; pbs_mom.272986;Svr;setpbsserver;knl02.kisti.re.kr
03/09/2017 09:41:12.814;02; pbs_mom.272986;Svr;mom_server_add;server knl02.kisti.re.kr added
03/09/2017 09:41:13.506;02; pbs_mom.272986;Svr;initialize_hwloc_topology;machine topology contains 1 sockets 2 memory nodes, 68 cores 272 cpus
03/09/2017 09:41:13.528;02; pbs_mom.272986;node;read_layout_file;nodeboard 0: 0 NUMA nodes: 0-1
03/09/2017 09:41:13.528;02; pbs_mom.272986;node;read_layout_file;Setting up this mom to function as 1 numa nodes
03/09/2017 09:41:13.528;02; pbs_mom.272986;node;setup_nodeboards;nodeboard 0: 272 cpus (0-271), 1 mems (0)
03/09/2017 09:41:13.529;02; pbs_mom.272986;Svr;init_torque_cpuset;Init cpuset /dev/cpuset/torque
03/09/2017 09:41:13.529;02; pbs_mom.272986;Svr;init_torque_cpuset;setting cpus = 0-271
03/09/2017 09:41:13.529;02; pbs_mom.272986;Svr;init_torque_cpuset;setting mems = 0
03/09/2017 09:41:13.549;02; pbs_mom.272986;n/a;initialize;independent
03/09/2017 09:41:13.549;02; pbs_mom.272986;Svr;dep_initialize;mom is now oom-killer safe
03/09/2017 09:41:13.553;02; pbs_mom.272986;Svr;read_mom_hierarchy;No local mom hierarchy file found, will request from server.
03/09/2017 09:41:13.553;128; pbs_mom.272986;Svr;pbs_mom;before init_abort_jobs
03/09/2017 09:41:13.561;02; pbs_mom.272986;Svr;pbs_mom;Is up
and pbs_nodes is
[***@knl02 torque]$ pbsnodes
knl02.kisti.re.kr-0
state = free
power_state = Running
np = 272
ntype = cluster
status = opsys=linux,uname=Linux knl02.kisti.re.kr 3.10.0-514.2.2.el7.x86_64 #1 SMP Tue Dec 6 23:06:41 UTC 2016 x86_64,nsessions=0,nusers=0,idletime=68878,totmem=100564804kb,availmem=94743368kb,physmem=100564804kb,ncpus=272,loadave=0.00,gres=knl02:.kisti.re.kr headnode,netload=,state=free,varattr= ,cpuclock=Fixed,macaddr=00:1e:67:f9:ae:c3,version=6.1.0,rectime=1489020718,jobs=
mom_service_port = 15002
mom_manager_port = 15003
total_sockets = 1
total_numa_nodes = 1
total_cores = 68
total_threads = 272
dedicated_sockets = 0
dedicated_numa_nodes = 0
dedicated_cores = 0
dedicated_threads = 0
and my test pbs script is
#!/bin/sh
#
#This is an example script example.sh
#
#These commands set up the Grid Environment for your job:
#PBS -N ExampleJob
#PBS -l nodes=1,walltime=00:02:00
#PBS -q np_workq
#PBS -M ***@umich.edu
#PBS -m abe
#print the time and date
date
#wait 5 seconds
/usr/bin/numactl -p 1 sleep 5
#print the time and date again
date
when I submit my script, following error occurs.
libnuma: Warning: node argument 1 is out of range
Any help or idea is more than welcomed not sure what is wrong.
Thank you,
seungwoo
=====================================
Seungwoo Rho
National Institute of Supercomputing and Networking,
KISTI,
52-11, Eoeundong, Yuseonggu,
Daejeon, 305-806, Republic of Korea
e-mail : ***@kisti.re.kr
Phone : +82-42-869-1643
Mobie : +82-10-8849-4001
=====================================