Clusters can encounter many software and hardware health issues over extended periods of use. Since clusters are not static systems, there must be constant monitoring to reduce the possibility of wear and tear problems. Fortunately, there are ways to monitor your clusters remotely which makes checking the cluster health much more manageable. For IT, remotely monitoring and controlling systems is an essential feature for servers and cluster nodes of all types. The following are examples of creating your own solution for managing and monitoring systems via IPMI.

 

Bright Cluster Management can also be used to manage and monitor IPMI in a more robust manner.

 

NOTE: The username (-U) and and password (-P) parameters used below in this blog posts are with the ASUS default username/password combinations for demonstration purposes. These should be substituted to use the correct username and password combination for the IPMI configuration you’re using.

 

IPMItool Raw Commands

 

Monitor Chassis Status

 

ipmitool -I lanplus -H node.address.here -U admin -P admin chassis status

 

Resetting a System

 

ipmitool -I lanplus -H node.address.here -U ADMIN -p admin power reset

 

Reading the system sensors

 

ipmitool -I lanplus -H node.address.here -U ADMIN -p admin sdr list

 

Reading the system event log (SEL)

 

ipmitool -I lanplus -H node.address.here -U ADMIN -p admin sel elist

 

BASH Wrapper Script Example

 

Writing a wrapper script to automate some of the task, and fit your node hostname format. Below is an example.

 

The below example assumes you have a node hostname format of node3NN-ipmi for the ipmi interface. This example script could be modified to be a wrapper script for other ipmi functionality.

 

#!/bin/bash

 

if [ -z “$1” ]

 

then

 

printf “USAGE: getChassisStatus.sh <NodeNum>\nExample: getChassisStatus.sh 1 for node301\n”

 

exit 0;

 

fi

 

NODE=”node30$1-ipmi”

echo “node30$1 Chassis Status:”

ipmitool -I lanplus -H $NODE -U admin -P admin chassis status