The following steps apply if you are adding in new nodes to your cluster and these nodes will be cloned from your existing nodes image.

First edit /act/etc/act_nodes.conf and add your new node definitions below the existing node definitions. If you do not have these already they can be provided by ACT support.

Next edit /act/etc/act_util.conf

$ vi /act/etc/act_util.conf
CODE

Look for the [node] section:

[node] 
type=range 
start=1 
end=10
CODE

The idea is that you will increase the end of any range value by the number of nodes that you are adding. For example, If you had 10 nodes, and are adding 8 more, increase ’10’ to ’18’.

The type of lines to look for are as follows:

end= dev[eth0]_ipend= 
dev[ipmi]_ ipend= 
If you have InfiniBand: 
dev[ib0]_ipend
CODE

Regenerate all the appropriate configuration files by running this command:

$ /act/bin/act_cfgfile --hosts --ssh --cloner --dhcp --prefix=/
CODE

Restart DHCP since new hosts were added:

$ service dhcpd restart
CODE

Copy the new hosts and known_hosts files to all the nodes:

$ /act/bin/act_cp -a /etc/hosts
$ /act/bin/act_cp -a /etc/ssh/ssh_known_hosts2
CODE

Log in to node01 as root and run the following in order to update your compute node image:

$ /act/cloner/bin/cloner --server=head --image=node
CODE

Back on the head node, run the following. Replace with the names and range of your new nodes:

$ /act/bin/act_netboot -r node11-node18 --set=cloner3
CODE

When the new nodes are turned on they will network boot, install their OS, and reboot when completed. ** Once the new nodes are up and accessible continue to the next steps. **

Synchronize the clocks on the entire cluster:

$ act_exec -a 'service ntpd stop; ntpdate 1.centos.pool.ntp.org; hwclock --systohc ; service ntpd start'
CODE

The following commands use the information in act_util.conf to set the IPMI IP address and network settings on the new nodes. Replace with the names and range of your new nodes:

$ act_exec -r node11-node18 “service ipmi start” 
$ act_ipmi_netcfg -r node11-node18 
$ act_ipmi_netcfg -a --dump_dhcp > /etc/dhcpd.d/ipmi.conf 
$ service dhcpd restart 
$ act_exec -r node11-node18 “service ipmi stop” 
$ act_ipmi_log -a setdate
CODE
If you are using SGE, Sun Grid Engine, for your job scheduler

To add the new compute nodes to the SGE queueing system, run the following commands, and follow the direction with each step:

$ qconf -mhgrp @allhosts
CODE

— add an entry for each new host that you are adding

$ qconf -ae <hostname>
CODE

— add an exec host entry for each new host that you are adding
— this opens a file editor
— set ‘hostname’ to the new hostname
— set ‘complex_values’ to ‘slots=#’ where # is the # of CPU cores in that system

$ for i in `act_nodenames -r node11-node18`; do qconf -ah $i; done
CODE

— add an administrative host entry for each new host that you are adding

$ for i in `act_nodenames -r node11-node18`; do qconf -as $i; done
CODE

—add an submit host entry for each new host that you are adding Each host has to have a configuration file added in for it. We can create a config file for each of the new nodes from one of the already configured nodes.

$ qconf -sconf <existing hostname> > <new hostname>
CODE

So for our example above we can do the following:

$ mkdir /tmp/sge; cd /tmp/sge$ for i in `act_nodenames -r node11-node18`; do qconf -sconf node01 > $i; done$ for i in `act_nodenames -r node11-node18`; do qconf -Aconf $i; done
CODE

(Note: this is creating a file for each hostname within the current working directory, /tmp/sge)

If you are using Torque for your job scheduler
To add in the new compute nodes to the Torque scheduler edit the nodes list

$ vi /var/spool/torque/server_priv/nodes
CODE

— Add an entry line for each new compute node Next restart the pbs_server and pbs_sched services

$ /etc/init.d/pbs_server restart $ /etc/init.d/pbs_sched restart
CODE

If you are using SLURM for your job scheduler
To add the new compute nodes to SLURM, run the following commands and follow the directions with each step:

For GPU nodes, create the file gres.conf in /act/slurm

cd /act/slurm
vi gres.conf
CODE

And add a line for each type of GPU node.

NodeName=node[17-18] Name=gpu Type=kepler File=/dev/nvidia0
CODE

Then for the GPU and all other nodes, add them to slurm.conf

$ vi /act/slurm/slurm.conf
CODE

At the bottom, extend the NodeName= to include the additional nodes or add a new line if the nodes are different.

NodeName=node[01-16] CPUs=16 RealMemory=128000 Sockets=2 CoresPerSocket=8 ThreadsPerCore=1 State=UNKNOWN
NodeName=node[17-18] CPUs=16 RealMemory=128000 Sockets=2 CoresPerSocket=8 ThreadsPerCore=1 Gres=gpu:kepler:1 State=UNKNOWN
PartitionName=batch Nodes=node[01-16] Default=YES MaxTime=30-0:00:00 State=UP QOS=batch DefMemPerCPU=8000
CODE

Then from the head node, restart the services.

CentOS/EL6

$ service slurmdbd restart
$ chkconfig --add slurmdbd
$ service slurmctld restart
$ scontrol reconfigure
CODE

CentOS/EL7

$ systemctl restart slurmdbd
$ systemctl restart slurmctld
$ scontrol reconfigure
CODE

Enable and start the slurm daemon on the new compute nodes.

CentOS/EL6

$ act_exec -r node11-node18 service slurm start
$ act_exec -r node11-node18 chkconfig slurm on
CODE

CentOS/EL7

$ act_exec -r node11-node18 systemctl start slurmd.service
$ act_exec -r node11-node18 systemctl enable slurmd.service
CODE