Xiaodong Qi
2016-12-27 08:02:39 UTC
Dear Torque users and developers,
I was installing Torque 6.1.0 on a Ubuntu 16.04 Workstation, but the
installation doesn't seem to recognize how much cores and threads the
machine has. The only node I set up showed a status of "state=down" and
any job would trigger an error saying "not enough of the right type
of nodes". In fact, the workstation has 56 threads or 28 physical cores
on 2 processors, and I only want to use 54 threads or 27 physical cores
for the shared computing jobs. I realized that this might be related to the configuration of NUMA starting from Torque V6.0 which I am not if I was doing the right thing. Below are some outputs of current configs. What should I do? Thanks.
$ pbsnodes
node1
state = down
power_state = Running
np = 54
ntype = cluster
mom_service_port = 15002
mom_manager_port = 15003
total_sockets = 0
total_numa_nodes = 0
total_cores = 0
total_threads = 0
dedicated_sockets = 0
dedicated_numa_nodes = 0
dedicated_cores = 0
dedicated_threads = 0
$ lssubsys -am
cpuset /sys/fs/cgroup/cpuset
cpu,cpuacct /sys/fs/cgroup/cpu,cpuacct
blkio /sys/fs/cgroup/blkio
memory /sys/fs/cgroup/memory
devices /sys/fs/cgroup/devices
freezer /sys/fs/cgroup/freezer
net_cls,net_prio /sys/fs/cgroup/net_cls,net_prio
perf_event /sys/fs/cgroup/perf_event
hugetlb /sys/fs/cgroup/hugetlb
pids /sys/fs/cgroup/pids
PS: I have tried to mount cpu and other modules to /var/spool/torque/cgroup directories, but "lssubsys -am" still showed the same information as above. I assume they should have been mounted?
I was installing Torque 6.1.0 on a Ubuntu 16.04 Workstation, but the
installation doesn't seem to recognize how much cores and threads the
machine has. The only node I set up showed a status of "state=down" and
any job would trigger an error saying "not enough of the right type
of nodes". In fact, the workstation has 56 threads or 28 physical cores
on 2 processors, and I only want to use 54 threads or 27 physical cores
for the shared computing jobs. I realized that this might be related to the configuration of NUMA starting from Torque V6.0 which I am not if I was doing the right thing. Below are some outputs of current configs. What should I do? Thanks.
$ pbsnodes
node1
state = down
power_state = Running
np = 54
ntype = cluster
mom_service_port = 15002
mom_manager_port = 15003
total_sockets = 0
total_numa_nodes = 0
total_cores = 0
total_threads = 0
dedicated_sockets = 0
dedicated_numa_nodes = 0
dedicated_cores = 0
dedicated_threads = 0
$ lssubsys -am
cpuset /sys/fs/cgroup/cpuset
cpu,cpuacct /sys/fs/cgroup/cpu,cpuacct
blkio /sys/fs/cgroup/blkio
memory /sys/fs/cgroup/memory
devices /sys/fs/cgroup/devices
freezer /sys/fs/cgroup/freezer
net_cls,net_prio /sys/fs/cgroup/net_cls,net_prio
perf_event /sys/fs/cgroup/perf_event
hugetlb /sys/fs/cgroup/hugetlb
pids /sys/fs/cgroup/pids
PS: I have tried to mount cpu and other modules to /var/spool/torque/cgroup directories, but "lssubsys -am" still showed the same information as above. I assume they should have been mounted?