[torqueusers] How to control local scratch allocation?

Henrik Bengtsson

2017-01-20 04:39:57 UTC

Hi. We're running Torque with Moab on a multiuser cluster with
heterogeneous nodes. Specifically, the different nodes have a
different amount of /scratch/ available. Users' jobs need significant
amounts of /scratch/ space. The jobs run for hours up to several
days.

I'm looking for a way to request a specific amount of /scratch/ space per job.

I'm aware of the 'size[fs=/scratch]' option in
/var/spool/torque/mom_priv/config, but that doesn't prevent too may
jobs that are submitted as:

qsub -l file=2tb ...

to be allocated to the same node with, say, 5 TiB of /scratch/. The
problem with this approach appears to be that the job will be sent to
a node as long as the file resource requested is available **when
launched**.

The only alternative that looks like an option is to use a *generic
consumable resource* that corresponds to, say, the size in GiB of
/scratch/ (regardless of disk usage). Something like adding the
following to /var/spool/torque/mom_priv/config:

scratch 5120

corresponding to 5120 MiB = 5 TiB /scratch/. This needs to be node
specific since the different nodes have different amounts of
/scratch/.

With the above GRES setup, jobs can request this consumable resource as:

qsub -l gres=scratch:2048 ...
qsub -l gres=scratch:2048 ...
qsub -l gres=scratch:2048 ...

Here the 3rd job will be queued until one of the other two finished
(assuming there's only one node).

Is this a recommended approach? What are others using for this?
Other suggestions.

Thank you

Henrik

PS. I'm not a sysadm, but an advanced user trying to help identify
best practices.