With passively cooled accelerator cards becoming more and more common, installing and monitoring temperatures and fans via OpenIPMI and ipmitool are more important than ever. See the following for installing, starting and utilizing OpenIPMI and ipmitool to read NVIDIA Tesla/Intel Phi temperatures.

NOTE: Instructions are based on CentOS 6.x environment.

 

OpenIPMI and IPMITool Package Installation

# Install ipmitool and OpenIPMI

yum install ipmitool OpenIPMI -y

 

Starting the IPMI service

# Start OpenIPMI

[anelson@K80TD ~]# service ipmi start

Starting ipmi drivers:                                     [  OK  ]

 

Monitoring sensors with ipmitool

 

Monitoring Fans

[root@K80TD ~]# ipmitool sensor list | grep -i fan

FAN1             | 4600.000   | RPM        | ok    | 300.000   | 500.000   | 700.000   | 25300.000  | 25400.000 | 25500.000

FAN2             | 6300.000   | RPM        | ok    | 300.000   | 500.000   | 700.000   | 25300.000  | 25400.000 | 25500.000

FAN3             | 4300.000   | RPM        | ok    | 300.000   | 500.000   | 700.000   | 25300.000  | 25400.000 | 25500.000

FAN4             | 4400.000   | RPM        | ok    | 300.000   | 500.000   | 700.000   | 25300.000  | 25400.000 | 25500.000

FAN5             | 3600.000   | RPM        | ok    | 300.000   | 500.000   | 700.000   | 25300.000  | 25400.000 | 25500.000

FAN6             | 3300.000   | RPM        | ok    | 300.000   | 500.000   | 700.000   | 25300.000  | 25400.000 | 25500.000

FANA             | 6400.000   | RPM        | ok    | 300.000   | 500.000   | 700.000   | 25300.000  | 25400.000 | 25500.000

FANB             | 4700.000   | RPM        | ok    | 300.000   | 500.000   | 700.000   | 25300.000  | 25400.000 | 25500.000

FANC             | 8600.000   | RPM        | ok    | 300.000   | 500.000   | 700.000   | 25300.000  | 25400.000 | 25500.000

FAND             | 8800.000   | RPM        | ok    | 300.000   | 500.000   | 700.000   | 25300.000  | 25400.000 | 25500.000

FAN7             | 5600.000   | RPM        | ok    | 300.000   | 500.000   | 700.000   | 25300.000  | 25400.000 | 25500.000

Monitoring GPU Temperatures

[root@K80TD ~]# ipmitool sensor list | grep GPU

GPU1 Temp        | 63.000     | degrees C  | ok    | -11.000   | -8.000    | -5.000    | 85.000    | 90.000    | 95.000

GPU2 Temp        | 28.000     | degrees C  | ok    | -11.000   | -8.000    | -5.000    | 85.000    | 90.000    | 95.000

GPU3 Temp        | 66.000     | degrees C  | ok    | -11.000   | -8.000    | -5.000    | 85.000    | 90.000    | 95.000

GPU4 Temp        | 68.000     | degrees C  | ok    | -11.000   | -8.000    | -5.000    | 85.000    | 90.000    | 95.000

Using watch to monitor sensors continuously

By using the watch command, it is possible to continuously poll ipmitool for sensor readings at specified intervals. This can be useful for monitoring things interactively or creating automated scripts as well as logging.

See below for an example. ­­­

[root@K80TD ~]# watch -n 60 ‘ipmitool sensor list | grep GPU’