Discussion:
[torqueusers] Jobs asking for half the cores in a node won't start.
Mike Diehn
2016-10-13 14:39:42 UTC
Permalink
We've discovered an odd problem here.

We have a cluster of 32 M620 nodes, 16 cores per node. It's managed by
Bright 6, Torque 6.0.0 with Maui 3.3.1, and the distro is CentoOS 6.4.

Normally, our users submit jobs that use all the cores on a node. But
occasionally, one will want to run a job that uses only half the cores on a
set of nodes. Or other suchness.

The normal full-node jobs, with nodes=32:ppn=16, run as expected and so
does nodes=12:ppn=8. When enough nodes are available, the job starts, give
or take scheduling.

When we submit a job with nodes=32:ppn=8, nodes=24:ppn=6, nodes=24,ppn=7
for example, that job will sit in queue forever.

The checkjob always output says

Holds: Batch (hold reason: NoResources)


But at the time of submission, there was nothing else in the queue and no
reservations had been scheduled that I'd expect to block it.

I imagine the problem really is that I've configured the cluster wrong, but
I can't see how.

If anyone can help, I'll be grateful.

Here's output from schedctl -l and below that is from qmgr -c 'print server'

# Maui version 3.3.1 (PID: 19340)
# global policies

REJECTNEGPRIOJOBS[0] FALSE
ENABLENEGJOBPRIORITY[0] FALSE
ENABLEMULTINODEJOBS[0] TRUE
ENABLEMULTIREQJOBS[0] FALSE
BFPRIORITYPOLICY[0] [NONE]
JOBPRIOACCRUALPOLICY QUEUEPOLICY
NODELOADPOLICY ADJUSTSTATE
USEMACHINESPEEDFORFS FALSE
USEMACHINESPEED FALSE
USESYSTEMQUEUETIME TRUE
USELOCALMACHINEPRIORITY FALSE
NODEUNTRACKEDLOADFACTOR 1.2
JOBNODEMATCHPOLICY[0]

JOBMAXSTARTTIME[0] INFINITY

METAMAXTASKS[0] 0
NODESETPOLICY[0] ONEOF
NODESETATTRIBUTE[0] FEATURE
NODESETLIST[0] chassis01 chassis02
NODESETDELAY[0] 00:00:00
NODESETPRIORITYTYPE[0] MINLOSS
NODESETTOLERANCE[0] 0.00

BACKFILLPOLICY[0] FIRSTFIT
BACKFILLDEPTH[0] 10
BACKFILLPROCFACTOR[0] 0
BACKFILLMAXSCHEDULES[0] 10000
BACKFILLMETRIC[0] PROCS

BFCHUNKDURATION[0] 00:00:00
BFCHUNKSIZE[0] 0
PREEMPTPOLICY[0] REQUEUE
MINADMINSTIME[0] 00:00:00
RESOURCELIMITPOLICY[0]
NODEAVAILABILITYPOLICY[0] COMBINED:[DEFAULT]
NODEALLOCATIONPOLICY[0] MINRESOURCE
TASKDISTRIBUTIONPOLICY[0] DEFAULT
RESERVATIONPOLICY[0] CURRENTHIGHEST
RESERVATIONRETRYTIME[0] 00:00:00
RESERVATIONTHRESHOLDTYPE[0] NONE
RESERVATIONTHRESHOLDVALUE[0] 0

FSPOLICY [NONE]
FSPOLICY [NONE]
FSINTERVAL 12:00:00
FSDEPTH 8
FSDECAY 1.00



# Priority Weights

SERVICEWEIGHT[0] 1
TARGETWEIGHT[0] 1
CREDWEIGHT[0] 1
ATTRWEIGHT[0] 1
FSWEIGHT[0] 1
RESWEIGHT[0] 1
USAGEWEIGHT[0] 1
QUEUETIMEWEIGHT[0] 1
XFACTORWEIGHT[0] 0
SPVIOLATIONWEIGHT[0] 0
BYPASSWEIGHT[0] 0
TARGETQUEUETIMEWEIGHT[0] 0
TARGETXFACTORWEIGHT[0] 0
USERWEIGHT[0] 0
GROUPWEIGHT[0] 0
ACCOUNTWEIGHT[0] 0
QOSWEIGHT[0] 0
CLASSWEIGHT[0] 0
FSUSERWEIGHT[0] 0
FSGROUPWEIGHT[0] 0
FSACCOUNTWEIGHT[0] 0
FSQOSWEIGHT[0] 0
FSCLASSWEIGHT[0] 0
ATTRATTRWEIGHT[0] 0
ATTRSTATEWEIGHT[0] 0
NODEWEIGHT[0] 0
PROCWEIGHT[0] 0
MEMWEIGHT[0] 0
SWAPWEIGHT[0] 0
DISKWEIGHT[0] 0
PSWEIGHT[0] 0
PEWEIGHT[0] 0
WALLTIMEWEIGHT[0] 0
UPROCWEIGHT[0] 0
UJOBWEIGHT[0] 0
CONSUMEDWEIGHT[0] 0
USAGEEXECUTIONTIMEWEIGHT[0] 0
REMAININGWEIGHT[0] 0
PERCENTWEIGHT[0] 0
XFMINWCLIMIT[0] 00:02:00


# partition DEFAULT policies

REJECTNEGPRIOJOBS[1] FALSE
ENABLENEGJOBPRIORITY[1] FALSE
ENABLEMULTINODEJOBS[1] TRUE
ENABLEMULTIREQJOBS[1] FALSE
BFPRIORITYPOLICY[1] [NONE]
JOBPRIOACCRUALPOLICY QUEUEPOLICY
NODELOADPOLICY ADJUSTSTATE
JOBNODEMATCHPOLICY[1]

JOBMAXSTARTTIME[1] INFINITY

METAMAXTASKS[1] 0
NODESETPOLICY[1] [NONE]
NODESETATTRIBUTE[1] [NONE]
NODESETLIST[1]
NODESETDELAY[1] 00:00:00
NODESETPRIORITYTYPE[1] MINLOSS
NODESETTOLERANCE[1] 0.00

# Priority Weights

XFMINWCLIMIT[1] 00:00:00

SRTASKCOUNT[0] 32
SRTPN[0] 0
SRRESOURCES[0] PROCS=-1;MEM=0;DISK=0;SWAP=0
SRDEPTH[0] 7
SRSTARTTIME[0] 00:00:00
SRENDTIME[0] 8:00:00
SRWSTARTTIME[0] 00:00:00
SRWENDTIME[0] 00:00:00
SRDAYS[0] Mon|Tue|Wed|Thu|Fri
SRHOSTLIST[0]
SRCHARGEACCOUNT[0]
SRCFG[perftest] CLASSLIST=perftest TASKCOUNT=32

RMAUTHTYPE[0] CHECKSUM

CLASSCFG[[NONE]] DEFAULT.FEATURES=[NONE]
CLASSCFG[[ALL]] DEFAULT.FEATURES=[NONE]
CLASSCFG[batch] DEFAULT.FEATURES=[NONE]
CLASSCFG[perftest] DEFAULT.FEATURES=[NONE]
QOSPRIORITY[0] 0
QOSQTWEIGHT[0] 0
QOSXFWEIGHT[0] 0
QOSTARGETXF[0] 0.00
QOSTARGETQT[0] 00:00:00
QOSFLAGS[0]
QOSPRIORITY[1] 0
QOSQTWEIGHT[1] 0
QOSXFWEIGHT[1] 0
QOSTARGETXF[1] 0.00
QOSTARGETQT[1] 00:00:00
QOSFLAGS[1]
# SERVER MODULES: MX
SERVERMODE NORMAL
SERVERNAME
SERVERHOST master.cm.cluster
SERVERPORT 42559
LOGFILE maui.log
LOGFILEMAXSIZE 250000000
LOGFILEROLLDEPTH 100
LOGLEVEL 2
LOGFACILITY fALL
SERVERHOMEDIR /cm/shared/apps/maui/3.3.1/spool/
TOOLSDIR /cm/shared/apps/maui/3.3.1/spool/tools/
LOGDIR /cm/shared/apps/maui/3.3.1/spool/log/
STATDIR /cm/shared/apps/maui/3.3.1/spool/stats/
LOCKFILE /cm/shared/apps/maui/3.3.1/spool/maui.pid
SERVERCONFIGFILE /cm/shared/apps/maui/3.3.1/spool/maui.cfg
CHECKPOINTFILE /cm/shared/apps/maui/3.3.1/spool/maui.ck
CHECKPOINTINTERVAL 00:05:00
CHECKPOINTEXPIRATIONTIME 3:11:20:00
TRAPJOB
TRAPNODE
TRAPFUNCTION
RESDEPTH 24

RMPOLLINTERVAL 00:00:30
JOBAGGREGATIONTIME 00:00:10
NODEACCESSPOLICY SHARED
ALLOCLOCALITYPOLICY [NONE]
SIMTIMEPOLICY [NONE]
ADMIN1 root mdiehn
ADMINHOSTS ALL
NODEPOLLFREQUENCY 0
DISPLAYFLAGS
DEFAULTDOMAIN
DEFAULTCLASSLIST [DEFAULT:1]
FEATURENODETYPEHEADER
FEATUREPROCSPEEDHEADER
FEATUREPARTITIONHEADER
DEFERTIME 00:05:00
DEFERCOUNT 24
DEFERSTARTCOUNT 1
JOBPURGETIME 0
NODEPURGETIME 2140000000
APIFAILURETHRESHHOLD 6
NODESYNCTIME 600
JOBSYNCTIME 600
JOBMAXOVERRUN 00:10:00
NODEMAXLOAD 0.0

PLOTMINTIME 120
PLOTMAXTIME 245760
PLOTTIMESCALE 11
PLOTMINPROC 1
PLOTMAXPROC 512
PLOTPROCSCALE 9
SCHEDCFG[] MODE=NORMAL
SERVER=master.cm.cluster:42559
# RM MODULES: PBS SSS WIKI NATIVE
RMCFG[master.cm.cluster] AUTHTYPE=CHECKSUM EPORT=15004 TIMEOUT=00:00:09
TYPE=PBS
SIMWORKLOADTRACEFILE workload
SIMRESOURCETRACEFILE resource
SIMAUTOSHUTDOWN OFF
SIMSTARTTIME 0
SIMSCALEJOBRUNTIME FALSE
SIMFLAGS
SIMJOBSUBMISSIONPOLICY CONSTANTJOBDEPTH
SIMINITIALQUEUEDEPTH 16
SIMWCACCURACY 0.00
SIMWCACCURACYCHANGE 0.00
SIMNODECOUNT 0
SIMNODECONFIGURATION NORMAL
SIMWCSCALINGPERCENT 100
SIMCOMRATE 0.10
SIMCOMTYPE ROUNDROBIN
COMINTRAFRAMECOST 0.30
COMINTERFRAMECOST 0.30
SIMSTOPITERATION -1
SIMEXITITERATION -1


Output from qmgr -c 'print server'

#
# Create queues and set their attributes.
#
#
# Create and define queue batch
#
create queue batch
set queue batch queue_type = Execution
set queue batch Priority = 10
set queue batch acl_host_enable = False
set queue batch acl_hosts = marlin29.cm.cluster
set queue batch acl_hosts += marlin19.cm.cluster
set queue batch acl_hosts += marlin09.cm.cluster
set queue batch acl_hosts += marlin28.cm.cluster
set queue batch acl_hosts += marlin18.cm.cluster
set queue batch acl_hosts += marlin08.cm.cluster
set queue batch acl_hosts += marlin27.cm.cluster
set queue batch acl_hosts += marlin17.cm.cluster
set queue batch acl_hosts += marlin07.cm.cluster
set queue batch acl_hosts += marlin26.cm.cluster
set queue batch acl_hosts += marlin16.cm.cluster
set queue batch acl_hosts += marlin06.cm.cluster
set queue batch acl_hosts += marlin25.cm.cluster
set queue batch acl_hosts += marlin15.cm.cluster
set queue batch acl_hosts += marlin05.cm.cluster
set queue batch acl_hosts += marlin24.cm.cluster
set queue batch acl_hosts += marlin14.cm.cluster
set queue batch acl_hosts += marlin04.cm.cluster
set queue batch acl_hosts += marlin23.cm.cluster
set queue batch acl_hosts += marlin13.cm.cluster
set queue batch acl_hosts += marlin03.cm.cluster
set queue batch acl_hosts += marlin32.cm.cluster
set queue batch acl_hosts += marlin22.cm.cluster
set queue batch acl_hosts += marlin12.cm.cluster
set queue batch acl_hosts += marlin02.cm.cluster
set queue batch acl_hosts += marlin31.cm.cluster
set queue batch acl_hosts += marlin21.cm.cluster
set queue batch acl_hosts += marlin11.cm.cluster
set queue batch acl_hosts += marlin01.cm.cluster
set queue batch acl_hosts += marlin30.cm.cluster
set queue batch acl_hosts += marlin20.cm.cluster
set queue batch acl_hosts += marlin10.cm.cluster
set queue batch resources_max.walltime = 58:00:00
set queue batch resources_min.walltime = 00:00:00
set queue batch resources_default.walltime = 02:00:00
set queue batch enabled = True
set queue batch started = True
#
# Create and define queue perftest
#
create queue perftest
set queue perftest queue_type = Execution
set queue perftest Priority = 100
set queue perftest acl_host_enable = False
set queue perftest acl_hosts = marlin29.cm.cluster
set queue perftest acl_hosts += marlin19.cm.cluster
set queue perftest acl_hosts += marlin09.cm.cluster
set queue perftest acl_hosts += marlin28.cm.cluster
set queue perftest acl_hosts += marlin18.cm.cluster
set queue perftest acl_hosts += marlin08.cm.cluster
set queue perftest acl_hosts += marlin27.cm.cluster
set queue perftest acl_hosts += marlin17.cm.cluster
set queue perftest acl_hosts += marlin07.cm.cluster
set queue perftest acl_hosts += marlin26.cm.cluster
set queue perftest acl_hosts += marlin16.cm.cluster
set queue perftest acl_hosts += marlin06.cm.cluster
set queue perftest acl_hosts += marlin25.cm.cluster
set queue perftest acl_hosts += marlin15.cm.cluster
set queue perftest acl_hosts += marlin05.cm.cluster
set queue perftest acl_hosts += marlin24.cm.cluster
set queue perftest acl_hosts += marlin14.cm.cluster
set queue perftest acl_hosts += marlin04.cm.cluster
set queue perftest acl_hosts += marlin23.cm.cluster
set queue perftest acl_hosts += marlin13.cm.cluster
set queue perftest acl_hosts += marlin03.cm.cluster
set queue perftest acl_hosts += marlin32.cm.cluster
set queue perftest acl_hosts += marlin22.cm.cluster
set queue perftest acl_hosts += marlin12.cm.cluster
set queue perftest acl_hosts += marlin02.cm.cluster
set queue perftest acl_hosts += marlin31.cm.cluster
set queue perftest acl_hosts += marlin21.cm.cluster
set queue perftest acl_hosts += marlin11.cm.cluster
set queue perftest acl_hosts += marlin01.cm.cluster
set queue perftest acl_hosts += marlin30.cm.cluster
set queue perftest acl_hosts += marlin20.cm.cluster
set queue perftest acl_hosts += marlin10.cm.cluster
set queue perftest resources_max.walltime = 10:00:00
set queue perftest resources_min.walltime = 00:00:00
set queue perftest resources_default.walltime = 02:00:00
set queue perftest acl_group_enable = True
set queue perftest acl_groups = perftest
set queue perftest acl_group_sloppy = True
set queue perftest enabled = True
set queue perftest started = True
#
# Set server attributes.
#
set server scheduling = True
set server acl_hosts = master.cm.cluster
set server managers = ***@marlin.cm.cluster
set server managers += ***@marlinhead.cm.cluster
set server managers += ***@marlin.cm.cluster
set server managers += ***@marlinhead.cm.cluster
set server managers += ***@marlin.cm.cluster
set server managers += ***@marlinhead.cm.cluster
set server operators = ***@marlin.cm.cluster
set server default_queue = batch
set server log_events = 63
set server mail_from = root
set server query_other_jobs = True
set server resources_default.nodect = 1
set server resources_default.nodes = 1
set server scheduler_iteration = 60
set server node_ping_rate = 300
set server node_check_rate = 600
set server tcp_timeout = 300
set server default_node = 1
set server node_pack = True
set server job_stat_rate = 300
set server poll_jobs = True
set server log_level = 3
set server down_on_error = True
set server mom_job_sync = True
set server submit_hosts = marlinhead
set server allow_node_submit = True
set server log_file_max_size = 25000000
set server log_file_roll_depth = 100
set server next_job_number = 31084
set server record_job_info = True
set server record_job_script = True
set server moab_array_compatible = True
set server nppcu = 1
set server timeout_for_job_delete = 120
set server timeout_for_job_requeue = 120
--
Mike Diehn
Enfield, NH
***@diehn.net
Rick McKay
2016-10-13 21:55:25 UTC
Permalink
Hi Mike,

Maybe someone else here can spot something, but I don't see enough
information to identify the source of the trouble. If nodes=32:ppn=16 runs,
then I'd expect a 32:8 job to run. Is that even the case when the cluster
is completely empty? I'd want to see the full checkjob -v output, instead
of just a snippet. mdiag -n -v and pbsnodes may also help shed some light.
Also, of course, you'd want to revisit anything that recently changed in
the configuration, if that applies. If those don't help, I'd temporarily
crank up the LOGLEVEL setting in Maui and see if anything jumps out in the
logs after submitting a test job.

Rick
-----
Rick McKay | Technical Support Engineer
Post by Mike Diehn
We've discovered an odd problem here.
We have a cluster of 32 M620 nodes, 16 cores per node. It's managed by
Bright 6, Torque 6.0.0 with Maui 3.3.1, and the distro is CentoOS 6.4.
Normally, our users submit jobs that use all the cores on a node. But
occasionally, one will want to run a job that uses only half the cores on a
set of nodes. Or other suchness.
The normal full-node jobs, with nodes=32:ppn=16, run as expected and so
does nodes=12:ppn=8. When enough nodes are available, the job starts, give
or take scheduling.
When we submit a job with nodes=32:ppn=8, nodes=24:ppn=6, nodes=24,ppn=7
for example, that job will sit in queue forever.
The checkjob always output says
Holds: Batch (hold reason: NoResources)
But at the time of submission, there was nothing else in the queue and no
reservations had been scheduled that I'd expect to block it.
I imagine the problem really is that I've configured the cluster wrong,
but I can't see how.
If anyone can help, I'll be grateful.
<snip>
Mike Diehn
2016-10-17 13:07:31 UTC
Permalink
If nodes=32:ppn=16 runs, then I'd expect a 32:8 job to run. Is that even
the case when the cluster is completely empty?
Yes, in all the trials, the cluster is completely empty.
I'd want to see the full checkjob -v output, instead of just a snippet.
mdiag -n -v and pbsnodes may also help shed some light. Also, of course,
you'd want to revisit anything that recently changed in the configuration,
if that applies.
No recent changes, but this problem may have existed a long time. I'll
paste in the outputs below.

[***@marlin ~]# mdiag -*j* -v
Name State Par Proc QOS WCLimit R Min User
Group Account QueuedTime Network Opsys Arch Mem Disk Procs
Class Features

31024 Idle ALL 144 DEF 1:00:00 0 144 andros
wheel - 8:01:27:04 [NONE] [NONE] [NONE] >=0 >=0 NC0
[batch:1] [NONE]



Queue State:


[***@marlin ~]# qstat -a

marlin.cm.cluster:

Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS
TSK Memory Time S Time
----------------------- ----------- -------- ---------------- ------ -----
------ --------- --------- - ---------
31024.marlin.cm.cluste andros batch test -- 24
144 -- 01:00:00 Q --


Checkjob -v output

[***@marlin ~]# checkjob -v 31024

checking job 31024 (RM job '31024.marlin.cm.cluster')

State: Idle
Creds: user:andros group:wheel class:batch qos:DEFAULT
WallTime: 00:00:00 of 1:00:00
SubmitTime: Wed Oct 5 11:03:23
(Time Queued Total: 8:07:24:56 Eligible: 00:00:00)

Total Tasks: 144

Req[0] TaskCount: 144 Partition: ALL
Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0
Opsys: [NONE] Arch: [NONE] Features: [NONE]
Exec: '' ExecSize: 0 ImageSize: 0
Dedicated Resources Per Task: PROCS: 1
NodeAccess: SHARED
TasksPerNode: 6 NodeCount: 24


IWD: [NONE] Executable: [NONE]
Bypass: 0 StartCount: 0
PartitionMask: [ALL]
SystemQueueTime: Wed Oct 5 17:03:15

Flags: RESTARTABLE

Holds: Batch (hold reason: NoResources)
Messages: cannot create reservation for job '31024' (intital reservation
attempt)

PE: 144.00 StartPriority: 11605
cannot select job 31024 for partition DEFAULT (job hold active)
--
Mike Diehn
Enfield, NH
***@diehn.net
Mike Diehn
2016-10-24 17:30:25 UTC
Permalink
David Beer:

Do you know if the sort of problem I describe below is more likely caused
by torque or by maui?

In short, we have a cluster of 32 nodes with 16 cores per node. Torque
6.0.0 and Maui 3.3.1

Jobs asking for nodes=32:ppn=16 run just fine.
Jobs asking for nodes=24:ppn=8 will not run, sits in queue saying no
resources
Jobs asking for nodes=24:ppn=7 also won't run
Jobs asking for nodes=24:ppn=6 will run

Thanks,
Mike
Post by Mike Diehn
If nodes=32:ppn=16 runs, then I'd expect a 32:8 job to run. Is that even
the case when the cluster is completely empty?
Yes, in all the trials, the cluster is completely empty.
--
Mike Diehn
Enfield, NH
***@diehn.net
David Beer
2016-10-24 20:03:04 UTC
Permalink
I don't, but you should be able to determine it by starting the job using
qrun instead of Maui. If you have the problem still, it is likely to be
Torque's fault, and if the problem goes away it is definitely Maui's fault.
Post by Mike Diehn
Do you know if the sort of problem I describe below is more likely caused
by torque or by maui?
In short, we have a cluster of 32 nodes with 16 cores per node. Torque
6.0.0 and Maui 3.3.1
Jobs asking for nodes=32:ppn=16 run just fine.
Jobs asking for nodes=24:ppn=8 will not run, sits in queue saying no
resources
Jobs asking for nodes=24:ppn=7 also won't run
Jobs asking for nodes=24:ppn=6 will run
Thanks,
Mike
Post by Mike Diehn
If nodes=32:ppn=16 runs, then I'd expect a 32:8 job to run. Is that even
the case when the cluster is completely empty?
Yes, in all the trials, the cluster is completely empty.
--
Mike Diehn
Enfield, NH
_______________________________________________
torqueusers mailing list
http://www.supercluster.org/mailman/listinfo/torqueusers
--
David Beer | Torque Architect
Adaptive Computing
Loading...