Intel Skylake-SP Integrated I/O Architecture Implications for GPGPU/PCIe Connectivity

There is a subtle but major change regarding PCIe I/O connectivity with the introduction of Intel’s newest dual/quad processor CPU architecture, Skylake-SP. The architecture calls for the 48 PCIe lanes to be grouped in 3 separate pairs of 16 lanes per CPU coming from 3 separate IIO modules. In previous generations of CPU, the aggregate number of PCIe lanes would be associated with one IIO module.

The impact of this manifests itself in the inability to drive one single PCIe root complex per CPU, with a larger number of aggregate lanes as in previous CPU architectures. For example, with the Broadwell architecture, you could have a dual CPU system, and drive one PCIe root complex with 40 lanes of PCIe from each CPU. This is not possible with Skylake-SP. Each Skylake-SP CPU will have IIO #1 – 16 lanes, IIO #2 – 16 lanes, IIO #3 – 16 lanes. This means that even if all of the PCIe slots in the system were desired to be a part of a single PCIe root complex, the maximum amount of lanes from CPU ↔ PCIe fabric is 16 PCIe lanes.

  1. Each Skylake-SP processor has one IIO(Integrated I/O) module.
  2. Each IIO module consists of 3 individual IOU modules.
  3. Each IOU module consists of 4 individual PCIe x4 lanes, which can be aggregated to form one PCIe x16 link / PCIe root complex. See drawing below for a breakout of one of the three IOU modules found on a Skylake-SP processor.

This blog post may help get an understanding of PCIe root complex and the implications as well.

Skylake-SP IOU Module (1 of 3 which comprise of a full IIO module on each Skylake-SP processor)


Full Diagram of  the Skylake-SP IIO Module

Skylake-SP Single Root Complex 10 PCIe/GPU Conceptual Example

In this example, you can see that single the IIO is split into 3 separate 16 lane IIO segments. It limits how many PCIe lanes can feed a single root complex PCIe topology.


Broadwell Single Root Complex 10 PCie/GPU Conceptual Example

In this example, you can see that since Broadwell had a single IIO with all of the aggregate lanes. This creates more PCIe lanes that can be used to connect a single root complex PCIe topology.