Clusters can encounter many software and hardware health issues over extended periods of use. Since clusters are not static systems, there must be constant monitoring to reduce the possibility of wear and tear problems. Fortunately, there are ways to monitor your clusters remotely which makes checking the cluster health much more manageable. For IT, remotely monitoring and controlling systems is an essential feature for servers and cluster nodes of all types. The following are examples of creating your own solution for managing and monitoring systems via IPMI.
Bright Cluster Management can also be used to manage and monitor IPMI in a more robust manner.
NOTE: The username (-U) and and password (-P) parameters used below in this blog posts are with the ASUS default username/password combinations for demonstration purposes. These should be substituted to use the correct username and password combination for the IPMI configuration you’re using.
IPMItool Raw Commands
Monitor Chassis Status
|ipmitool -I lanplus -H node.address.here -U admin -P admin chassis status|
Resetting a System
|ipmitool -I lanplus -H node.address.here -U ADMIN -p admin power reset|
Reading the system sensors
|ipmitool -I lanplus -H node.address.here -U ADMIN -p admin sdr list|
Reading the system event log (SEL)
|ipmitool -I lanplus -H node.address.here -U ADMIN -p admin sel elist|
BASH Wrapper Script Example
Writing a wrapper script to automate some of the task, and fit your node hostname format. Below is an example.
The below example assumes you have a node hostname format of node3NN-ipmi for the ipmi interface. This example script could be modified to be a wrapper script for other ipmi functionality.
if [ -z “$1” ]
printf “USAGE: getChassisStatus.sh <NodeNum>\nExample: getChassisStatus.sh 1 for node301\n”
echo “node30$1 Chassis Status:”
ipmitool -I lanplus -H $NODE -U admin -P admin chassis status