peteris.rocks

Sun Grid Engine installation on Ubuntu Server

How to install, configure and use Sun Grid Engine (SGE) for HPC

Last updated on

This guide will help you set up and configure Sun Grid Engine (SGE) on Ubuntu Server 14.04 LTS.

Normally, the installation process will require your input several times, but by following this guide you will be able to perform an unattended installation which means that you can automate the setup of your cluster with a shell script. Alternatively, you can setup SGE manually by copy & pasting commands in this guide in the order that they are presented.

SGE is a task or job scheduler. You submit your typically long running tasks to a queue and the scheduler will try to run the task on one of the worker hosts when it is available.

Installation

A SGE cluster conceptually consists of a master host and one or several worker hosts. The master host can also function as a worker. Then there are also clients which submit jobs to the cluster.

Master

The commands below will perform an unattended installation. If you copy&paste them in the terminal, keep in mind that apt-get swallows pasted commands that follow that line.

Note that SGE will also install postfix (an SMTP server) which we will disable.

# Configure the master hostname for Grid Engine
echo "gridengine-master       shared/gridenginemaster string  $HOSTNAME" | sudo debconf-set-selections
echo "gridengine-master       shared/gridenginecell   string  default" | sudo debconf-set-selections
echo "gridengine-master       shared/gridengineconfig boolean false" | sudo debconf-set-selections
echo "gridengine-common       shared/gridenginemaster string  $HOSTNAME" | sudo debconf-set-selections
echo "gridengine-common       shared/gridenginecell   string  default" | sudo debconf-set-selections
echo "gridengine-common       shared/gridengineconfig boolean false" | sudo debconf-set-selections
echo "gridengine-client       shared/gridenginemaster string  $HOSTNAME" | sudo debconf-set-selections
echo "gridengine-client       shared/gridenginecell   string  default" | sudo debconf-set-selections
echo "gridengine-client       shared/gridengineconfig boolean false" | sudo debconf-set-selections
# Postfix mail server is also installed as a dependency
echo "postfix postfix/main_mailer_type        select  No configuration" | sudo debconf-set-selections

# Install Grid Engine
sudo DEBIAN_FRONTEND=noninteractive apt-get install -y gridengine-master gridengine-client

# Set up Grid Engine
sudo -u sgeadmin /usr/share/gridengine/scripts/init_cluster /var/lib/gridengine default /var/spool/gridengine/spooldb sgeadmin
sudo service gridengine-master restart

# Disable Postfix
sudo service postfix stop
sudo update-rc.d postfix disable

Test that it works by running

$ qhost
HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS
-------------------------------------------------------------------------------
global                  -               -     -       -       -       -       -

If you see an error message like this

vagrant@master:~$ qhost
error: commlib error: access denied (client IP resolved to host name "localhost". This is not identical to clients host name "master")
error: unable to send message to qmaster using port 6444 on host "master": got send error

it means that SGE is expecting 127.0.0.1 to resolve to master which is our hostname but in this case master resolves to 127.0.1.1 since that's what Ubuntu tends to put in /etc/hosts

vagrant@master:~$ cat /etc/hosts
127.0.0.1 localhost
127.0.1.1 master master

In this case, I am going to solve this problem with

echo 127.0.0.1 localhost | sudo tee /etc/hosts
echo 192.168.9.10 master | sudo tee -a /etc/hosts
sudo service gridengine-master restart

but what it means is that you need to make sure that you have no problems resolving hostnames and IPs that you are going to use with SGE.

Worker

We need to know the master hostname before proceeding.

export MASTER_HOSTNAME=master

The following commands will perform an unattended installation on a worker host.

echo "gridengine-common       shared/gridenginemaster string  $MASTER_HOSTNAME" | sudo debconf-set-selections
echo "gridengine-common       shared/gridenginecell   string  default" | sudo debconf-set-selections
echo "gridengine-common       shared/gridengineconfig boolean false" | sudo debconf-set-selections
echo "gridengine-client       shared/gridenginemaster string  $MASTER_HOSTNAME" | sudo debconf-set-selections
echo "gridengine-client       shared/gridenginecell   string  default" | sudo debconf-set-selections
echo "gridengine-client       shared/gridengineconfig boolean false" | sudo debconf-set-selections
echo "postfix postfix/main_mailer_type        select  No configuration" | sudo debconf-set-selections

sudo DEBIAN_FRONTEND=noninteractive apt-get install -y gridengine-exec gridengine-client

sudo service postfix stop
sudo update-rc.d postfix disable

Got errors about /var/lib/gridengine/default/common/act_qmaster?

echo $MASTER_HOSTNAME | sudo tee /var/lib/gridengine/default/common/act_qmaster
sudo service gridengine-exec restart

Test it with

vagrant@worker1:~$ qhost
error: denied: host "worker1" is neither submit nor admin host

which means that the installation was successful.

Otherwise you'd see errors about communication error.

(To get rid of this error, you can run sudo qconf -ah worker1 on the master host to add this worker as an admin host. Read more in the Hosts section below.)

Note that gridengine-exec is the package to required to run SGE on a worker host. gridengine-client installs command line utilities like qhost and qstat that can help diagnose problems.

Need to reinstall SGE?

export MASTER_HOSTNAME=master
sudo rm -rf /var/lib/gridengine/
sudo apt-get remove gridengine-exec gridengine-client gridengine-common --purge -y
echo "gridengine-common       shared/gridenginemaster string  $MASTER_HOSTNAME" | sudo debconf-set-selections
echo "gridengine-client       shared/gridenginemaster string  $MASTER_HOSTNAME" | sudo debconf-set-selections
sudo DEBIAN_FRONTEND=noninteractive apt-get install -y gridengine-exec gridengine-client

Configuration

You'll want to run these commands on the master host.

Users

Managers are like root users and are able to change SGE settings. Note that sgeadmin and root are already on the manager list.

# add yourself to the manager list
sudo qconf -am $USER

Operators are less privileged than managers and are able to add/remove workers.

# add yourself to the operator list (will be able to add/remove workers)
sudo qconf -ao $USER

Scheduler

You will probably want to adjust the scheduler configuration.

Here we are using the default settings except for schedule_interval. This setting specifies how often the scheduler checks for new jobs. By default, the value is 15 seconds which can be too high and cause delays if you submit jobs every second and they finish quickly.

Consult the man pages for more information.

# change scheduler config
cat > ./grid <<EOL
algorithm                         default
schedule_interval                 0:0:1
maxujobs                          0
queue_sort_method                 load
job_load_adjustments              np_load_avg=0.50
load_adjustment_decay_time        0:7:30
load_formula                      np_load_avg
schedd_job_info                   true
flush_submit_sec                  0
flush_finish_sec                  0
params                            none
reprioritize_interval             0:0:0
halftime                          168
usage_weight_list                 cpu=1.000000,mem=0.000000,io=0.000000
compensation_factor               5.000000
weight_user                       0.250000
weight_project                    0.250000
weight_department                 0.250000
weight_job                        0.250000
weight_tickets_functional         0
weight_tickets_share              0
share_override_tickets            TRUE
share_functional_shares           TRUE
max_functional_jobs_to_schedule   200
report_pjob_tickets               TRUE
max_pending_tasks_per_job         50
halflife_decay_list               none
policy_hierarchy                  OFS
weight_ticket                     0.500000
weight_waiting_time               0.278000
weight_deadline                   3600000.000000
weight_urgency                    0.500000
weight_priority                   0.000000
max_reservation                   0
default_duration                  INFINITY
EOL
sudo qconf -Msconf ./grid
rm ./grid

Queues

First, create a host list on which the jobs in the queue will run.

The name of the host list will be allhosts but in SGE configuration it is usually used with the @ as a prefix: @allhosts.

# create a host list
echo -e "group_name @allhosts\nhostlist NONE" > ./grid
sudo qconf -Ahgrp ./grid
rm ./grid

Finally, create a queue for your jobs. There is a convention to add the .q suffix to your queue name. In this case, we will be creating a queue with the name peteris.q.

All settings have default values except for qname, hostlist and load_thresholds.

# create a queue
cat > ./grid <<EOL
qname                 peteris.q
hostlist              @allhosts
seq_no                0
load_thresholds       NONE
suspend_thresholds    NONE
nsuspend              1
suspend_interval      00:00:01
priority              0
min_cpu_interval      00:00:01
processors            UNDEFINED
qtype                 BATCH INTERACTIVE
ckpt_list             NONE
pe_list               make
rerun                 FALSE
slots                 2
tmpdir                /tmp
shell                 /bin/csh
prolog                NONE
epilog                NONE
shell_start_mode      posix_compliant
starter_method        NONE
suspend_method        NONE
resume_method         NONE
terminate_method      NONE
notify                00:00:01
owner_list            NONE
user_lists            NONE
xuser_lists           NONE
subordinate_list      NONE
complex_values        NONE
projects              NONE
xprojects             NONE
calendar              NONE
initial_state         default
s_rt                  INFINITY
h_rt                  INFINITY
s_cpu                 INFINITY
h_cpu                 INFINITY
s_fsize               INFINITY
h_fsize               INFINITY
s_data                INFINITY
h_data                INFINITY
s_stack               INFINITY
h_stack               INFINITY
s_core                INFINITY
h_core                INFINITY
s_rss                 INFINITY
h_rss                 INFINITY
s_vmem                INFINITY
h_vmem                INFINITY
EOL
sudo qconf -Aq ./grid
rm ./grid

Hosts

Allow a host to submit jobs to SGE.

# add the current host to the submit host list (will be able to do qsub)
sudo qconf -as $HOSTNAME

Allow a host to admin SGE, e.g., to see job statuses, etc.

# add to the admin host list so that we can do qstat, etc.
sudo qconf -ah $HOSTNAME

Add a worker

You can use the following bash script to add a worker to a queue.

#!/bin/bash

QUEUE=$1
HOSTNAME=$2
SLOTS=$3

# add to the execution host list
TMPFILE=/tmp/sge.hostname-$HOSTNAME
echo -e "hostname $HOSTNAME\nload_scaling NONE\ncomplex_values NONE\nuser_lists NONE\nxuser_lists NONE\nprojects NONE\nxprojects NONE\nusage_scaling NONE\nreport_variables NONE" > $TMPFILE
qconf -Ae $TMPFILE
rm $TMPFILE

# add to the all hosts list
qconf -aattr hostgroup hostlist $HOSTNAME @allhosts

# enable the host for the queue, in case it was disabled and not removed
qmod -e $QUEUE@$HOSTNAME

if [ "$SLOTS" ]; then
    qconf -aattr queue slots "[$HOSTNAME=$SLOTS]" $QUEUE
fi

Then use it as follows

$ sudo ./sge-worker-add.sh peteris.q worker1 4
root@master added "worker1" to exechost list
root@master modified "@allhosts" in host group list
Queue instance "peteris.q@worker1" is already in the specified state: enabled
root@master modified "peteris.q" in cluster queue list

You should now be able to see worker1 in the output of qhost.

vagrant@master:~$ qhost
HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS
-------------------------------------------------------------------------------
global                  -               -     -       -       -       -       -
worker1                 -               -     -       -       -       -       -

But when you run qstat -f you may notice that worker1 load average is N/A and the state is u which stands for unreachable.

vagrant@master:~$ qstat -f
queuename                      qtype resv/used/tot. load_avg arch          states
---------------------------------------------------------------------------------
peteris.q@worker1              BIP   0/0/4          -NA-     -NA-          u

To fix that, restart SGE on the worker host.

vagrant@worker1:~$ sudo service gridengine-exec restart

And the output of qstat -f should look like

vagrant@master:~$ qstat -f
queuename                      qtype resv/used/tot. load_avg arch          states
---------------------------------------------------------------------------------
peteris.q@worker1              BIP   0/0/4          0.02     lx26-amd64

Why do you need to run sge-worker-add.sh as sudo? Because otherwise you'll get permission errors like denied: "vagrant" must be manager for this operation. To make your user a manager, run sudo qconf -am $USER.

Remove a worker

You can use the following bash script to remove a worker from a queue.

#!/bin/bash

QUEUE=$1
HOSTNAME=$2

# disable the host to avoid any jobs to be allocated to this host
qmod -d $QUEUE@$HOSTNAME

# remove it from the all hosts list
qconf -dattr hostgroup hostlist $HOSTNAME @allhosts

# remove it from the execution host list
qconf -de $HOSTNAME

# delete specific slot count for the host
qconf -purge queue slots $QUEUE@$HOSTNAME

Then use it as follows

vagrant@master:~$ sudo ./sge-worker-remove.sh peteris.q worker1
root@master changed state of "peteris.q@worker1" (disabled)
root@master modified "@allhosts" in host group list
root@master removed "worker1" from execution host list
root@master modified "peteris.q" in cluster queue list

Usage

Submit jobs

You can submit jobs to SGE with qsub which is installed with the gridengine-client package.

Note that you need to be on a host that is allowed to submit jobs to SGE (run sudo qconf -as $HOSTNAME if you are not).

Let's submit a simple job that will execute the hostname program:

$ qsub -b y hostname
Your job 1 ("hostname") has been submitted

It will be executed on one of the workers. In my case, worker1 was chosen:

vagrant@worker1:~$ ls
hostname.e1  hostname.o1
vagrant@worker1:~$ cat hostname.*
worker1

The standard output was written to hostname.o1 and stderr was written to hostname.e1 where hostname is the name of our command and 1 was our job ID.

You can change stdout/stderr filenames like this:

qsub -b y -o out.txt -e err.txt hostname

Both out.txt and err.txt will still be created on the worker host, so you'll typically may want to use a network share or something for them.

We can reduce the output of qsub to just the job number with the -terse flag:

$ qsub -b y hostname
Your job 3 ("hostname") has been submitted
$ qsub -terse -b y hostname
4

It will generally be useful to name your jobs with -N so that you can easily identify them in the queue:

$ qsub -b y -N this-job-has-a-name sleep 10
Your job 31 ("this-job-has-a-name") has been submitted
$ qstat -f
queuename                      qtype resv/used/tot. load_avg arch          states
---------------------------------------------------------------------------------
peteris.q@worker1              BIP   0/1/4          0.01     lx26-amd64
     31 0.50000 this-job-h vagrant      r     02/30/2016 12:03:22     1

qsub will by default return immediately. Use qsub -sync y to wait until the job is completed:

$ date && qsub -b y sleep 10 && date
Wed Feb 30 12:07:13 UTC 2016
Your job 35 ("sleep") has been submitted
Wed Feb 30 12:07:13 UTC 2016

$ date && qsub -b y -sync y sleep 10 && date
Wed Feb 30 12:07:13 UTC 2016
Your job 36 ("sleep") has been submitted
Job 36 exited with exit code 0.
Wed Feb 30 12:07:24 UTC 2016

Sometimes you'll want a job to run after another one has completed. In this case, you can use -hold_jid <id>,<id>:

$ qsub -terse -b y -N date1 "date && sleep 10"
39
$ qsub -terse -b y -N date2 -hold_jid 39 date
40
$ cat date1*
Wed Feb 30 12:15:02 UTC 2016
$ cat date2*
Wed Feb 30 12:15:13 UTC 2016

List jobs

You can generate lots of jobs with

for i in `seq 1 30`; do qsub -b y hostname; done

qstat -f will show you the currently running jobs:

$ qstat -f
queuename                      qtype resv/used/tot. load_avg arch          states
---------------------------------------------------------------------------------
peteris.q@worker1              BIP   0/2/4          0.01     lx26-amd64
     27 0.50000 hostname   vagrant      r     02/30/2016 11:55:26     1
     28 0.50000 hostname   vagrant      t     02/30/2016 11:55:26     1
---------------------------------------------------------------------------------
peteris.q@worker2              BIP   0/2/2          0.01     lx26-amd64
     26 0.50000 hostname   vagrant      r     02/30/2016 11:55:26     1
     29 0.50000 hostname   vagrant      t     02/30/2016 11:55:26     1

and qstat -f -u \* will also show pending jobs:

$ qstat -f -u \*
queuename                      qtype resv/used/tot. load_avg arch          states
---------------------------------------------------------------------------------
peteris.q@worker1              BIP   0/4/4          0.01     lx26-amd64
     20 0.50000 hostname   vagrant      r     02/30/2016 11:55:24     1
     23 0.50000 hostname   vagrant      t     02/30/2016 11:55:24     1
     24 0.50000 hostname   vagrant      t     02/30/2016 11:55:24     1
     25 0.50000 hostname   vagrant      t     02/30/2016 11:55:24     1
---------------------------------------------------------------------------------
peteris.q@worker2              BIP   0/2/2          0.01     lx26-amd64
     21 0.50000 hostname   vagrant      r     02/30/2016 11:55:24     1
     22 0.50000 hostname   vagrant      t     02/30/2016 11:55:24     1

############################################################################
 - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
     26 0.50000 hostname   vagrant      qw    02/30/2016 11:55:23     1
     27 0.50000 hostname   vagrant      qw    02/30/2016 11:55:23     1
     28 0.50000 hostname   vagrant      qw    02/30/2016 11:55:23     1
     29 0.50000 hostname   vagrant      qw    02/30/2016 11:55:23     1

Note that the asterix * is needed to match all tasks but unless you escape it \*, your shell will replace it with filenames in the current directory.

To see details of a job that is still in the queue, use qstat -j <id>:

$ qsub -terse -b y sleep 10
30
$ qstat -j 30
==============================================================
job_number:                 30
exec_file:                  job_scripts/30
submission_time:            Wed Feb 30 12:00:00 2016
owner:                      vagrant
uid:                        1000
group:                      vagrant
gid:                        1000
sge_o_home:                 /home/vagrant
sge_o_log_name:             vagrant
sge_o_path:                 /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games
sge_o_shell:                /bin/bash
sge_o_workdir:              /home/vagrant
sge_o_host:                 master
account:                    sge
mail_list:                  vagrant@master
notify:                     FALSE
job_name:                   sleep
jobshare:                   0
env_list:
job_args:                   10
script_file:                sleep
usage    1:                 cpu=00:00:00, mem=0.00000 GBs, io=0.00000, vmem=N/A, maxvmem=N/A
scheduling info:            There are no messages available

It is also possible to get the output as XML which will make it easier to process if you use a script or something to analyze the status of your cluster, for instance, to create a simple dashboard.

$ qstat -f -xml
<?xml version='1.0'?>
<job_info  xmlns:xsd="http://gridengine.sunsource.net/source/browse/*checkout*/gridengine/source/dist/util/resources/schemas/qstat/qstat.xsd?revision=1.11">
  <queue_info>
    <Queue-List>
      <name>peteris.q@worker1</name>
      <qtype>BIP</qtype>
      <slots_used>0</slots_used>
      <slots_resv>0</slots_resv>
      <slots_total>4</slots_total>
      <arch>lx26-amd64</arch>
    </Queue-List>
    <Queue-List>
      <name>peteris.q@worker2</name>
      <qtype>BIP</qtype>
      <slots_used>0</slots_used>
      <slots_resv>0</slots_resv>
      <slots_total>2</slots_total>
      <arch>lx26-amd64</arch>
    </Queue-List>
  </queue_info>
  <job_info>
  </job_info>
</job_info>

Canceling jobs

Use qdel.

$ qsub -terse -b y sleep 1000
32
$ qdel 32
vagrant has registered the job 32 for deletion

Restart SGE

If nothing is working, try restarting SGE.

sudo service gridengine-master restart
sudo service gridengine-exec restart

Vagrantfile

You can use the following Vagrantfile that will spin up a master node and two worker nodes for your experiments.

Vagrant.configure("2") do |config|
  # Ubuntu 14.04 LTS x64 official cloud image
  config.vm.box = "ubuntu/trusty64"

  # VirtualBox, common settings
  config.vm.provider "virtualbox" do |vb|
    vb.memory = 256
    vb.cpus = 1
    vb.customize ["modifyvm", :id, "--natdnshostresolver1", "on"] # fixes slow dns lookups
  end

  config.vm.define "master" do |srv|
    srv.vm.hostname = "master"
    srv.vm.network :private_network, ip: "192.168.9.10"
    srv.vm.provider "virtualbox" do |vb| vb.name = "SGE-Master"; end
  end

  config.vm.define "worker1" do |srv|
    srv.vm.hostname = "worker1"
    srv.vm.network :private_network, ip: "192.168.9.11"
    srv.vm.provider "virtualbox" do |vb| vb.name = "SGE-Worker1"; end
  end

  config.vm.define "worker2" do |srv|
    srv.vm.hostname = "worker2"
    srv.vm.network :private_network, ip: "192.168.9.12"
    srv.vm.provider "virtualbox" do |vb| vb.name = "SGE-Worker2"; end
  end
end

Then

vagrant up
vagrant ssh master
vagrant ssh worker1
vagrant ssh worker2
vagrant destroy -f

Make sure that you change /etc/hosts to the following on all hosts:

127.0.0.1 localhost
192.168.9.10 master
192.168.9.11 worker1
192.168.9.12 worker2

Final remarks

I hope you find this guide useful as it took me a long while to discover how to automate and debug everything.

The man pages are extensive but they serve as a reference rather and a step by step tutorial.