Kazuhiro Fujita
2016-08-30 08:14:01 UTC
Hi all,
I successfully installed the TORQUE 6.02 on Ubuntu 16.04 LTS (procedure is
attached).
It looks working, but shows an odd behavior for job queuing.
The submitted jobs were executed in the last-in, first-out (LIFO) manner
(see below).
When I submit the jobs by the following command 3 times serially,
the jobs are supposed to be executed in the FIFO (first-in, first-out)
manner.
$ echo "sleep 10" | qsub -t 1-10
But, before completing the 1st jobs TORQUE started to execute the 3rd jobs.
(I allocated 6 threads for TORQUE in a desktop test machine.)
$ qstat -t
Job ID Name User Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
1[1].kaf-Ubuntu STDIN-1 kaf 00:00:00 C batch
1[2].kaf-Ubuntu STDIN-2 kaf 0 Q batch
1[3].kaf-Ubuntu STDIN-3 kaf 0 Q batch
1[4].kaf-Ubuntu STDIN-4 kaf 0 Q batch
1[5].kaf-Ubuntu STDIN-5 kaf 0 Q batch
1[6].kaf-Ubuntu STDIN-6 kaf 00:00:00 C batch
1[7].kaf-Ubuntu STDIN-7 kaf 00:00:00 C batch
1[8].kaf-Ubuntu STDIN-8 kaf 00:00:00 C batch
1[9].kaf-Ubuntu STDIN-9 kaf 00:00:00 C batch
1[10].kaf-Ubuntu STDIN-10 kaf 00:00:00 C batch
2[1].kaf-Ubuntu STDIN-1 kaf 0 Q batch
2[2].kaf-Ubuntu STDIN-2 kaf 0 Q batch
2[3].kaf-Ubuntu STDIN-3 kaf 0 Q batch
2[4].kaf-Ubuntu STDIN-4 kaf 0 Q batch
2[5].kaf-Ubuntu STDIN-5 kaf 0 Q batch
2[6].kaf-Ubuntu STDIN-6 kaf 0 Q batch
2[7].kaf-Ubuntu STDIN-7 kaf 0 Q batch
2[8].kaf-Ubuntu STDIN-8 kaf 0 Q batch
2[9].kaf-Ubuntu STDIN-9 kaf 0 Q batch
2[10].kaf-Ubuntu STDIN-10 kaf 0 Q batch
3[1].kaf-Ubuntu STDIN-1 kaf 0 Q batch
3[2].kaf-Ubuntu STDIN-2 kaf 0 Q batch
3[3].kaf-Ubuntu STDIN-3 kaf 0 Q batch
3[4].kaf-Ubuntu STDIN-4 kaf 0 Q batch
3[5].kaf-Ubuntu STDIN-5 kaf 0 R batch
3[6].kaf-Ubuntu STDIN-6 kaf 0 R batch
3[7].kaf-Ubuntu STDIN-7 kaf 0 R batch
3[8].kaf-Ubuntu STDIN-8 kaf 0 R batch
3[9].kaf-Ubuntu STDIN-9 kaf 0 R batch
3[10].kaf-Ubuntu STDIN-10 kaf 0 R batch
Do you have any ideas to solve this issue?
I used TORQUE 4.2.10 on Ubuntu 14.04 LTS for about 1 year,
and have not encountered this kind of behavior before upgrading to Ubuntu
16.04 LTS.
Thanks in advance,
Kaz
Procedure of TORQUE 6.02 installation on Ubuntu 16.04 LTS
# install packages
sudo apt-get install lsb-core build-essential libtool openssl libssl-dev
libxml2-dev libboost-all-dev automake
# check and edit /etc/hosts to specify host
sudo nano /etc/hosts
# install torque
tar xvzf torque-6.0.2-1469811694_d9a3483.tar.gz
cd torque-6.0.2-1469811694_d9a3483
./configure
make
sudo make install
# confirm the host name is correctly set.
cat /var/spool/torque/server_name
# configure the trqauthd daemon to start automatically at system boot.
sudo cp contrib/systemd/trqauthd.service /etc/systemd/system/
sudo systemctl enable trqauthd.service
sudo sh -c "echo /usr/local/lib > /etc/ld.so.conf.d/torque.conf"
sudo ldconfig
sudo systemctl start trqauthd.service
# set up qmgr
sudo ./torque.setup root
# check qmgr settings
sudo qmgr -c 'p s'
# set compute nodes
echo "$HOSTNAME np=`cat /proc/cpuinfo | grep processor | wc -l`" | sudo tee
/var/spool/torque/server_priv/nodes
# change number of threads for TORQUE
sudo nano /var/spool/torque/server_priv/nodes
# Configure pbs_server to start automatically at system boot, and then
start the daemon.
sudo qterm
sudo cp contrib/systemd/pbs_server.service /etc/systemd/system/
sudo systemctl enable pbs_server.service
sudo systemctl start pbs_server.service
# Configure pbs_mom to start at system boot, and then start the daemon
sudo cp contrib/systemd/pbs_mom.service /etc/systemd/system/
sudo systemctl enable pbs_mom.service
sudo systemctl start pbs_mom.service
# Configure pbs_sched to start at system boot, and then start the daemon
sudo cp contrib/systemd/pbs_sched.service /etc/systemd/system/
sudo systemctl enable pbs_sched.service
sudo systemctl start pbs_sched.service
# check node settings
pbsnodes -a
# check torque behavior with a small job
echo "sleep 30" | qsub
qstat
echo "sleep 10" | qsub -t 1-10
qstat -t
# pbs_sched could not start in boot. So, I need to start it after the boot.
sudo systemctl start pbs_sched.service
I successfully installed the TORQUE 6.02 on Ubuntu 16.04 LTS (procedure is
attached).
It looks working, but shows an odd behavior for job queuing.
The submitted jobs were executed in the last-in, first-out (LIFO) manner
(see below).
When I submit the jobs by the following command 3 times serially,
the jobs are supposed to be executed in the FIFO (first-in, first-out)
manner.
$ echo "sleep 10" | qsub -t 1-10
But, before completing the 1st jobs TORQUE started to execute the 3rd jobs.
(I allocated 6 threads for TORQUE in a desktop test machine.)
$ qstat -t
Job ID Name User Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
1[1].kaf-Ubuntu STDIN-1 kaf 00:00:00 C batch
1[2].kaf-Ubuntu STDIN-2 kaf 0 Q batch
1[3].kaf-Ubuntu STDIN-3 kaf 0 Q batch
1[4].kaf-Ubuntu STDIN-4 kaf 0 Q batch
1[5].kaf-Ubuntu STDIN-5 kaf 0 Q batch
1[6].kaf-Ubuntu STDIN-6 kaf 00:00:00 C batch
1[7].kaf-Ubuntu STDIN-7 kaf 00:00:00 C batch
1[8].kaf-Ubuntu STDIN-8 kaf 00:00:00 C batch
1[9].kaf-Ubuntu STDIN-9 kaf 00:00:00 C batch
1[10].kaf-Ubuntu STDIN-10 kaf 00:00:00 C batch
2[1].kaf-Ubuntu STDIN-1 kaf 0 Q batch
2[2].kaf-Ubuntu STDIN-2 kaf 0 Q batch
2[3].kaf-Ubuntu STDIN-3 kaf 0 Q batch
2[4].kaf-Ubuntu STDIN-4 kaf 0 Q batch
2[5].kaf-Ubuntu STDIN-5 kaf 0 Q batch
2[6].kaf-Ubuntu STDIN-6 kaf 0 Q batch
2[7].kaf-Ubuntu STDIN-7 kaf 0 Q batch
2[8].kaf-Ubuntu STDIN-8 kaf 0 Q batch
2[9].kaf-Ubuntu STDIN-9 kaf 0 Q batch
2[10].kaf-Ubuntu STDIN-10 kaf 0 Q batch
3[1].kaf-Ubuntu STDIN-1 kaf 0 Q batch
3[2].kaf-Ubuntu STDIN-2 kaf 0 Q batch
3[3].kaf-Ubuntu STDIN-3 kaf 0 Q batch
3[4].kaf-Ubuntu STDIN-4 kaf 0 Q batch
3[5].kaf-Ubuntu STDIN-5 kaf 0 R batch
3[6].kaf-Ubuntu STDIN-6 kaf 0 R batch
3[7].kaf-Ubuntu STDIN-7 kaf 0 R batch
3[8].kaf-Ubuntu STDIN-8 kaf 0 R batch
3[9].kaf-Ubuntu STDIN-9 kaf 0 R batch
3[10].kaf-Ubuntu STDIN-10 kaf 0 R batch
Do you have any ideas to solve this issue?
I used TORQUE 4.2.10 on Ubuntu 14.04 LTS for about 1 year,
and have not encountered this kind of behavior before upgrading to Ubuntu
16.04 LTS.
Thanks in advance,
Kaz
Procedure of TORQUE 6.02 installation on Ubuntu 16.04 LTS
# install packages
sudo apt-get install lsb-core build-essential libtool openssl libssl-dev
libxml2-dev libboost-all-dev automake
# check and edit /etc/hosts to specify host
sudo nano /etc/hosts
# install torque
tar xvzf torque-6.0.2-1469811694_d9a3483.tar.gz
cd torque-6.0.2-1469811694_d9a3483
./configure
make
sudo make install
# confirm the host name is correctly set.
cat /var/spool/torque/server_name
# configure the trqauthd daemon to start automatically at system boot.
sudo cp contrib/systemd/trqauthd.service /etc/systemd/system/
sudo systemctl enable trqauthd.service
sudo sh -c "echo /usr/local/lib > /etc/ld.so.conf.d/torque.conf"
sudo ldconfig
sudo systemctl start trqauthd.service
# set up qmgr
sudo ./torque.setup root
# check qmgr settings
sudo qmgr -c 'p s'
# set compute nodes
echo "$HOSTNAME np=`cat /proc/cpuinfo | grep processor | wc -l`" | sudo tee
/var/spool/torque/server_priv/nodes
# change number of threads for TORQUE
sudo nano /var/spool/torque/server_priv/nodes
# Configure pbs_server to start automatically at system boot, and then
start the daemon.
sudo qterm
sudo cp contrib/systemd/pbs_server.service /etc/systemd/system/
sudo systemctl enable pbs_server.service
sudo systemctl start pbs_server.service
# Configure pbs_mom to start at system boot, and then start the daemon
sudo cp contrib/systemd/pbs_mom.service /etc/systemd/system/
sudo systemctl enable pbs_mom.service
sudo systemctl start pbs_mom.service
# Configure pbs_sched to start at system boot, and then start the daemon
sudo cp contrib/systemd/pbs_sched.service /etc/systemd/system/
sudo systemctl enable pbs_sched.service
sudo systemctl start pbs_sched.service
# check node settings
pbsnodes -a
# check torque behavior with a small job
echo "sleep 30" | qsub
qstat
echo "sleep 10" | qsub -t 1-10
qstat -t
# pbs_sched could not start in boot. So, I need to start it after the boot.
sudo systemctl start pbs_sched.service