4. Configuring nodes
Configurations in ClusterVisor
ClusterVisor makes it easy to make changes in the cluster by consolidating all of the settings and configurations into one location, this being the Configuration page. This page organizes all of the settings that can be changed into collections, and each collection has a list of sections that belongs to it (e.g. the node sections belong to the Node collection, the cloner image sections belong to the Cloner Image collection, etc.). Throughout ClusterVisor, the sections are referred to in the syntax collection.section. This means that a node named "storage01" would be referred to as node.storage01, and a cloner image named "node" would be referred to as cloner_image.node.
Changes to the sections can be done by clicking on the Edit button in the Actions column of the section. This will display a pop-up window that will display all of that section's fields, and some sections with a large number of fields may have tabs in this window to help organize them. The fields will be listed with the name of the field on the left, the value on the right, and beneath the value will be a description of what the field is for. While editing the fields, any changes made will not actually be saved unless the Save button at the bottom of the window is pressed; otherwise they will be discarded once the window is closed. As well, to help make sure invalid options are not used, every time the changes are saved they will be checked to make sure they are valid before actually being saved. If any invalid changes have been made, an error message displaying what went wrong will be displayed both on the top and bottom of the window (in case either is not currently visible). From that point, the changes can either be corrected to validate properly or be discarded by exiting the window.
Some fields are references to other fields, so trying to delete a referenced field will fail validation until all references to that field are removed. Also, some fields will be prefixed with an asterisk ( * ) to indicate that the field is required. If a required field is left blank, it will fail validation.
If any of the values in the sections are changed and saved, those changes will be queued so that they can be committed to the affected nodes. Once ready to commit those changes, click on the Commit changes to nodes button at the top of the page (if the button is greyed out, this means that no actionable settings have been changed yet). This provides a list of the nodes that will be receiving modifications from ClusterVisor along with what types of changes will be made (e.g. networking, hosts, timezone, etc). If any of the committed changes fail upon reaching the node, they will be queued to try again later. Those queued changes, along with the message received from the node on why those changes could not be made, can be viewed by clicking the View change queue button. As well, if there is a need to purge any of the committed changes from being sent to the nodes, after clicking either the Commit changes to nodes or View change queue button, at the bottom of the pop-up window is a button to purge the entries.
Some of the changes (particularly the network changes) can cause the node to become unresponsive. The fields that can cause these issues are flagged with a hazard icon (i.e., an orange triangle with an exclamation mark) and should only be changed if one is prepared to handle those issues. Additionally, whenever a node is rebooted, by default it will reconfigure itself based on the current settings it has in ClusterVisor. This can be used to fix a node if it gets put into an odd state for whatever reason.
Using the command line
The configurations in ClusterVisor can be changed from the command line using the utility cv-conf. All of the collections can be listed using:
$ cv-conf --collections
All of the sections within a collection can be listed using:
$ cv-conf --sections COLLECTION
or
$ cv-conf -s COLLECTION
All of the collections and their respective sections can be listed using:
$ cv-conf --sections-all
or
$ cv-conf -sa
The way to add, edit, delete, and copy a section is by using the collection.section syntax used in the web interface, so the --sections-all flag will typically be the most useful for finding the needed parameter. When it comes to creating or editing a section, it will open up the entry in whatever text editor the $EDITOR environment variable is set to. The document will contain the existing fields at the top of the document along with the documentation at the bottom listing all the possible fields that can be added alongside their description. The format for the document is in YAML by default, but can also be set to use the formats:
Regardless of the format used, be sure any modifications are valid in that format or the changes will be rejected. Furthermore, much in the same way as the web interface, all changes will be checked for validation and upon failure will display the errors it found and prompt if it should a) return to the editor so the invalid entries can be corrected or b) to quit to discard the changes.
Assuming the changes passed validation, those changes can then be committed so that any reconfiguration to the nodes can be performed by using the cv-commit utility. This utility can display what type of changes will be committed, proceed with committing those changes, display the status of failed commits that will be re-queued, and clear the commits altogether. Additionally, to ensure that the cv-commit utility is run after using cv-conf, if any changes need to be committed to the nodes the cv-conf utility will instruct that cv-commit needs to be run after the changes have been saved.
Modifying node configurations
The nodes can be modified on the Configuration page under the Node tab; this will list all of the sections of the node collection. Instead of each node being its own section, they are instead grouped together so that each "type" of node will have its own section. The reason for this is that clusters tend to have many nodes that are setup exactly the same as others of their same type, with the only difference being their name and networking information. As well, in most cases any changes made to one node will also need to be made to others of the same type. To handle this, the name of each section is a syntax showing the range of nodes that belong in the section (the same syntax used in Slurm's configuration file for configuring nodes). For example, given node01, node02, node03, and node04 they would be referred to as node.node[01-04] in ClusterVisor. Do note that sections do not need to handle multiple nodes; e.g., for single head node clusters there will be a section that will manage just the head node and nothing else.
Variant fields and field variables
For sections handling multiple nodes, rather than just one, when any setting is changed by default that setting is applied to all of the nodes in that section. However, to change a field for a specific node or sub-section of the nodes the "+" button to the left of the field name can be pressed to add a variant to the field. The variant field will present two input boxes, the left being for the desired value and the right being for the node(s) to apply the value to. In practice, the variant fields will primarily be used for things like the MAC address of the node since they will be different for every node, but there are other ways to handle fields that need a different value for each node. An example of this is the IP address for a node; while it can be set individually using variant fields, it can also use a field variable if the value follows a sequential pattern for each node.
Typically, the IP address of a node will tend to end with the index of the node (e.g., node01 will be 10.1.1.1, node02 will be 10.1.1.2, and so on). As well, the hostname of a node will usually use its own name (e.g., node01 will have the hostnames node01, node01-ib, and node01-ipmi). For this reason, ClusterVisor provides field variables to make use of these patterns by providing _name and _index. The field variables can be used by wrapping them with ${ and } at the beginning and end, so using them as a value will looks like ${_name} and ${_index} where anything else can be on either side of the field variables (e.g., ${_name}-ib and 10.1.1.${_index} would both be valid).
The _name field variable is the replaced with each node's name in the section; e.g., given the section node[01-04], when getting the values for node01 the _name field variable will become node01. For an example of how to use it, if one wanted to set the hostname of the InfiniBand interface to be the name of the node with "-ib" at the end (e.g., node01-ib), then setting the value of the hostname field to ${_name}-ib would accomplish this.
The _index field variable would be the index of the node within its section; so in node[01-04] the value of _index for node01 would be 1 (because it is the first entry), for node02 would be 2 (because it is the second entry), and so on. However, if there were another section, node[05-08], then in that section the value of _index for node05 would also be 1, for node06 would be 2, and so on. To handle these kinds of offsets, basic math will work with the _index variable inside the field variable wrapper. For example, if one wanted to set the IP address for an interface in the section node[05-08] from 10.1.1.5 to 10.1.1.8, respectively, then using the value 10.1.1.${_index+4} to add four to the value of _index (making the _index value for node05 become 5, node06 become 6, and so on) would accomplish this.
Be aware that field variables are just a nicety and can be substituted with manually defining each unique value with variant fields if one does not prefer to use field variables. As well, variant fields and/or field variables are only needed in cases where the value of a field differs for each node; otherwise neither needs to be used.
Using the command line
The web interface provides the "+" button for adding variant fields, but to accomplish this in cv-conf the field name just needs to be appended with a colon followed by the node(s) used in the variant. For example, given the hostname field (in YAML format):
hostname: ${_name}
A variant for node01 and node[02-03] can be added like so:
hostname: ${_name}
hostname:node01: firstnode
hostname:node[02-03]: othernode${_index}
These fields can be added, modified, or removed by editing them with cv-conf using the --edit flag.
The plugins can be edited for the nodes by using the --plugins flag in cv-conf, which will list the same fields that are presented in the web interface and will display the list of all available plugins in the documentation at the bottom of the document. The additional plugin fields are all listed under the _plugins field of the node, but keep in mind that only the fields entered for the enabled plugins will actually be utilized by the plugins.
Node plugins
When changes are made in ClusterVisor, what determines if those changes make any modifications to the node(s) are the plugins loaded on the node(s). The loaded plugins can be viewed from the Plugins button in the Actions column of the section (instead of from the pop-up window from the Edit button). The plugins window follows the same rules mentioned earlier as the edit window, the only different being that it is only listing the plugin properties of the node rather than the rest of the node's fields. The plugin properties available are the active plugins (which plugins the node(s) are using), the plugin priority (whether the actions from the plugins need to run before or after the other sections of nodes), and if the plugins on the section of nodes are disabled or not. Do note, regardless of what plugins are selected, if the Plugins Disabled field is set to true then none of the plugins will be active.
The plugins themselves will dictate what features that ClusterVisor will be managing on the node. For instance, with no plugins enabled there will be nothing changed on the nodes if any configurations are changed, but if the networking plugin is enabled then changing any networking field on the node(s) will cause ClusterVisor to reconfigure the node to match those settings (but only after the changes have been committed). The reason not all plugins are enabled is that some nodes will have different responsibilities than other nodes; e.g., one may want to enable the DHCP/DNS server plugin on the head node, but not the compute nodes as they would be clients to the DHCP/DNS server.
Along with enabling which features that ClusterVisor will manage for a section of nodes, some plugins can expose additional fields to configure for those nodes. These additional fields will be in their own tab in the pop-up window from the Edit button and will be delineated with a plugin icon in front of the tab name. Plugins fields are used for configuring the service(s) offered by the plugin; e.g., the DHCP/DNS server plugin will provide additional fields for configuring the DHCP and DNS servers.
Keep in mind that disabling a plugin that exposes additional fields will result in the values of those fields being cleared out, so be sure to copy any values that may be needed before disabling the plugin.
Adding new nodes
Export and import
The simple way to add new nodes to the cluster if they are being purchased from Advanced Clustering Technologies, Inc. (a.k.a. ACT) is to:
- Use the Export button on the Configuration page to download the current state of ClusterVisor to a file.
- Send the file ACT via email (support@advancedclustering.com).
- The ACT support team will then add the new node entries from the purchase order.
Once that is completed and the revised file has been received, it can be added to ClusterVisor by running the following on the node hosting the ClusterVisor server daemon:
CODE$ systemctl stop cv-serverd $ cv-db-image --auto-detect --import-from <path to file> --overwrite $ systemctl start cv-serverd
Which will effectively stop the ClusterVisor server daemon (since changing the data underneath the service can cause unexpected behaviors), upload the revised file to ClusterVisor, and then start the server daemon back up.
Manually adding new node
If the export/import method is not being used, the new nodes will needed to be added in manually. At this point, either an existing set of nodes is being expanded (i.e., where the hardware matches the other nodes) or a completed new set is being added.
In the case of the former, all that would need to be done is:
- Click on the Edit button from the Actions column of the node section being expanded.
- In the pop-up window, go to the General tab and edit the range of nodes in the Name field to include the new nodes (e.g. if four new nodes are being added to node[01-04] it would become node[01-08] to accommodate them).
- Add any variant fields for any unique information about the nodes.
The necessary variant fields may vary depending on the setup of the cluster, but at minimum this will require the serial number(s) (found under the General tab) and the MAC address(es) (found under the Networking tab under each interface) of each new node to be added.
In the case of the latter where nodes with different hardware are being added and/or servicing a different role than the existing set of nodes, the new nodes can be added either from scratch or by using the existing set of nodes as a template. To copy values from an existing section of nodes for the new nodes, click the Copy button in the Actions column of the node section and a pop-up window will be presented. At minimum, the Name field under the General tab needs to be changed to match the names of the new nodes, but otherwise only change the fields that differ from the copied values.
However, if starting from scratch, the new section can be created by filling out the input box at the top of the Node collection with the name of the new section followed by clicking the Add Node button to the right of the input box. This will display a pop-up window where all of the values for the new nodes can be entered. While not all the fields need to be filled out, at a minimum the serial number(s), MAC address(es), and all required fields (which will be prefixed with an asterisk) need to be completed.
Using the command line
The configurations can be exported from the command line using the utility cv-db-image. To export the ClusterVisor configuration to a file, the following command needs to be run:
$ cv-db-image --uri localhost:27017/clustervisor --export <path to file>
Where <path to file> would be replaced with the desired filename of the export file (note, this file does not need to exist already). This will create a JSON file that can edited and imported back into ClusterVisor using:
$ systemctl stop cv-serverd
$ cv-db-image --uri localhost:27017/clustervisor --import-from <path to file> --overwrite
$ systemctl start cv-serverd
Where <path to file> would be replaced with the path to the export file from the previous step.
As for mimicking the expanding node sections and making a copy of a node section mentioned above, both can be done using the cv-conf utility. The former can be done by using the --edit flag and expanding the __name__ field to include the new nodes, and the latter can be done using the --copy flag and editing the __name__ field to the names of the new nodes. Lastly, a new section can be added from scratch by using the --add flag to start with a blank node section.
To avoid confusion, while cv-conf does have a --dump flag to dump the configuration file and looks similar to the contents generated by the export flag of cv-db-image, the output from --dump is missing fields needed to by cv-db-image to properly restore the data back into ClusterVisor. The purpose of the --dump flag from cv-conf is to view all the configurations from ClusterVisor in varying format, not for any export/import purposes.
Replacing existing nodes
If at any point in time an existing node needs to be replaced, the process can be simplified using ClusterVisor. First, make any and all changes that need to made to the node's configuration to adhere to the replacement node (for more information on this, see section Modifying node configurations of this guide). At a minimum, the node's serial number (found under the General tab) and MAC address(es) (found under the Networking tab for each interface) need to be updated to match the replacement node. Once the changes are completed, to reconfigure the replaced node to match the configuration it needs to be set to boot into Clustervisor's cloner software. This can be done by doing the following:
- Navigate to the Boot/Power control page.
- From the drop-down menu select the desired nodes.
- Find the Netboot column and select cloner from the drop-down menu.
Next, assuming that the boot order of the node has network boot set as its first boot option, all that is left is to power on the node (unfortunately, since the node is not yet handled by ClusterVisor this will need to be done manually). Once the node is powered on, if at least one of the MAC addresses was set properly it will network boot from ClusterVisor's cloner image and will begin re-configuring the node to match what was set in ClusterVisor. Once the cloning process has been completed, it will automatically reboot from the OS drive and will be ready to be used.
In case there are any issues during the cloning process, the status can be monitored in ClusterVisor or by connecting a monitor to the node. To monitor from ClusterVisor:
- Navigate to the Log viewer page.
- Select the node from the drop-down menu.
- Click on the Cloner tab.
This will update in real-time with latest log entries from the cloning process. If any errors are occurring, refer to the Troubleshooting section of this guide.
Using the command line
The same steps above for changing the network boot image can be done from the command line utility cv-netboot using the --set flag to set it to the cloner image. As for viewing the cloner log, by this can be done by viewing the contents of the file /var/log/clustervisor/cloner.log, but the directory can be different if the log_directory field from config.global is not set to /var/log/clustervisor (in which case, the cloner.log file will be stored under that directory instead).
Removing nodes
If nodes are being removed from ClusterVisor, they are either being removed from a section or the section itself is being removed. In the case of the latter, this can be done by clicking the Edit button from the Actions column of the node section and clicking the red Delete button at the bottom of the pop-up window. This will remove all entries of the node from ClusterVisor, and they will no longer be managed by ClusterVisor. For this reason, before removing the node it is advisable to remove all plugins from the node (or to set the Plugins Disabled field on the node to true) so that ClusterVisor can stop any services it was running on behalf of the node (otherwise this will need to be done manually).
If a node needs to be removed from a section, the section will need to be partitioned at the removed node (e.g. if the section is node[01-08] and node04 is being removed, the section will need to be partitioned to node[01-03] and node[05-08]). This can be accomplished by:
- Using the Copy button from the Actions column of the node section to duplicate the section.
- From the original node section, rename the section to the name of the first partition.
- In the duplicate node section, rename the section to the name of the second partition.
This will effectively remove the node from ClusterVisor, but with the node sections partition, this will result in any changes needing to be made to the nodes to be done in both sections. A way to counter against this would be to instead reassign a node at the end of the node section to replace the removed node (e.g. if the section is node[01-08] and node04 is being removed, then node08 would be reassigned as node04). This can be done by:
- Remove all variant fields of the removed node.
- Reassign the variant fields of the reassigned node to the removed node (e.g., following the previous example, this would mean changing all node08 variant fields to be node04 instead).
- Rename the section so that only the last node is removed (e.g., following the previous example, the section name would change from node[01-08] to node[01-07]).
- Reboot the reassigned node so that it can be reconfigured (e.g., following the previous example, node08 would need to be rebooted so that it will then be configured as node04).
This method would not require any duplicate sections to be created, so there would be no partition to manage like the other method. However, this can potentially lead to confusion since the labels on the node will still have the old node names on them (e.g., following the previous example, although node08 has been reassigned as node04, its labels will still show it is node08).
There is one other edge case for removing a node, which is if the node that needs to be removed is the beginning or end of a node's section (e.g., if the section is node[01-08], the beginning and ending nodes would be node01 and node08, respectively). In this case, nothing special needs to be done like the previous two methods, and only the node section needs to be renamed with the beginning or end removed (e.g., using the previous example, the section name would become node[02-08] if node01 is removed or node[01-07] if node08 is removed).
It is important to note that nodes do not need to be removed from ClusterVisor for the node to no longer be managed by it. Using a variant field to disable the plugins for a node will effectively make it so that ClusterVisor will no longer make modifications to the node. The method of using variant fields to isolate a node from its node section is a way to "softly" remove a node without needing to manage any of the hassles listed above.
Using the command line
Most of what has been described above has already been outlined in previous sections on how to modify the configuration values of nodes. However, the one new concept introduced was being able to delete a section. This can be done using the cv-conf utility using the --delete flag, but as mentioned above it is advisable to remove or disable the plugins for the nodes being removed first, which can be done using the --plugins flag.