Troubleshooting MPSS (Manycore Platform Software Stack) Initialization Issues
The MPSS software stack for the Intel Xeon Phi co-processor can be confusing to some people because it uses a different method than other approaches like OpenCL and CUDA. If you are having an issue with MPSS hanging at any stage during the Phi configuration or initialization processes, the following may help you identify the cause:
The MPSS Log
The primary location to look for MPSS related messages is /var/log/mpssd.
Example of a PCIe slot or specific Phi card issue
The following is an example of a particular card (mic7) failing to boot.
|Fri Nov 13 05:28:54 2015: mic7: Waiting for reset to complete
Fri Nov 13 05:28:55 2015: mic7: Waiting for reset to complete
Fri Nov 13 05:28:56 2015: mic7: Waiting for reset to complete
Fri Nov 13 05:28:57 2015: mic7: State resetting -> reset failed
Fri Nov 13 05:28:57 2015: mic7: Waiting for reset to complete
Fri Nov 13 05:28:58 2015: mic7: Current state “reset failed” cannot boot card
Tracing the issue back to the card or PCIe slot when the issue is identified
Once the issue (or issues) has been identified via /var/log/mpssd, you can use the following commands/methods to find the PCIe location of the card(s) you identified as a potential culprit.
|[anelson@hostname ~]# micinfo -listDevices
MicInfo Utility Log
Created Wed Sep 30 13:46:40 2015
List of Available Devices
deviceId | domain | bus# | pciDev# | hardwareId
0 | 0 | 2 | 0 | 22508086
1 | 0 | 3 | 0 | 22508086
2 | 0 | 82 | 0 | 22508086
3 | 0 | 83 | 0 | 22508086
The Bus# column would correlate to the PCI Bus ID mapping for the system. With the motherboard manufacturer’s reference info, this information can be used to trace the issue back to the physical slot where you can try either re-seating or replacing the card.
Ready to see your performance gains on Exxact Intel Xeon Phi Solutions? Learn more about our Test Drive program here.